Information Hunting: The Many Faces of Recommendations for Data Exploration

Kostas Stefanidis, Eirini Ntoutsi, Haridimos Kondylakis

March 10, 2015

Information Hunting: The Many Faces of Recommendations for Data Exploration

With the growing complexity of the Web, users often find themselves overwhelmed by the mass of choices available. For example, shopping for DVDs or clothes online becomes more and more difficult, as the variety of offers increases rapidly and gets unmanageable. To facilitate users in their selection process, recommender systems provide suggestions of potential interest on items. In particular, recommender systems aim at giving recommendations to users by estimating their item preferences and recommending those items featuring the maximal predicted preference. The prerequisite for determining such recommendations is information on the users’ interests, e.g., the users’ purchase history.

Recently, big data poses several new challenges for computing recommendations, such as the need for quick understanding and consuming data, as well as the need for integrating diverse, structured and unstructured data. This way, motivated by both current and traditional challenging problems related to interesting information hunting, we investigate the different aspects that are involved in the process of identifying valuable data items to suggest. We organize these aspects as follows:

a) Data enrichment and integration: In general, different types of information can be semantically enhanced to be used for computing recommendations.

At user level, we distinguish between user-defined information and user information aggregated from external sources (e.g., social networks). That is, instead of using the plain information that a user gives for himself to a recommender system, we exploit information available in numerous external sources, such as Facebook, LinkedIn, Forthsquare and Amazon. The motivation behind this, is that a user describes differently himself in different networks, depending on the domain, so we can identify different interests, user activities, information about places he visited, and so forth. A challenge towards this direction is to integrate the user’s social profile, as well as integrate, or expand, the social graph to bring together different social networks. Moreover, such a solution can be used for encountering the cold start problem.

At item level, one can consider that information about items can be enhanced with semantic information. In addition to the items descriptions, in which temporal characteristics of the items, such as popularity and freshness, are updated in real-time manner, we can exploit information retrieved from the Web, such as published results and reports, Web pages, thesauri or ontologies. The plethora of well-organized information over the Web in collectively maintained knowledge repositories, such as Wikipedia and LibraryThing, can be used for correlating and computing similarities between data items. The aforementioned data sources cannot be considered static, as they continuously evolve and change. Novel solutions are required to dynamically adapt to such changes [2], [3].

At preference level, we consider context-enhanced rating values and review texts. We discern ratings between overall and multi-criteria ones. The former associates a single rating with an item, while the latter associates a set of ratings with an item, each one with respect to a certain criterion/aspect of the item. Nowadays, mainly due to the abundance of free text reviews, there are attempts to implicitly extract both the aspects of the items that are of interest to a user and their associated ratings/sentiment [10].

b) Input information for recommendations: Enriched data for users, items and preferences can be used for different purposes towards producing recommendations. A key question is which is the appropriate set of users, or peers, that should be used for estimating the preferences of a given user. Let’s assume three kinds of peers: close friends, domain experts and similar users.

The close friends of a user are explicitly selected by him, or they can be implicitly extracted through his neighborhoods in social networks. In this case, link prediction methods can be applied for broadening the set of friends.

Domain experts can be used for producing recommendations for specific queries, since they are considered to be knowledgeable on a specific topic or area.

Differently, a user can opt to employ the preferences of the most similar users to him for computing relevance scores for unrated items. Traditionally, similarity between users was evaluated in the full dimensional item space. Due to the high dimensionality and sparsity of the data space, recently, a subspace-based notion of similarity is used [6]. As a side effect, this approach also diversifies the peers set, allowing for a wider yet qualitative pool of people to get suggestions from.

Regarding items, semantic-enhanced descriptions are used in either textual or tabular form. Regarding preferences, in addition to rating values, we have many other sources of information to consider during the recommendation process, like text reviews, user profile information and item dependencies. Users fine-grained like and dislike preferences are captured explicitly through multi-criteria or implicitly through sentiment analysis ratings, allowing for precise delineation of user profiles, whereas preferences are augmented with context and temporal information, allowing users to have different choices under different circumstances.

c) Recommendations output: Traditionally, recommender systems offer suggestions within a domain, i.e., when asking for movies or job vacancies, the suggestions consist only of movies or jobs. But why should one be limited to movies, when similar books exist as well? Such an example describes a cross-domain recommender system that can be realized due to data enrichment, offering knowledge about related items. In this line, packet recommendations (e.g., [7]) produce composite items consisting of a central item, possibly in the main domain of interest for a user, and a set of satellite items from different domains compatible with the central item. Compatibility can be assumed either as soft (e.g., other books that are often purchased together with the movie being browsed) or hard (e.g., battery packs that must be compatible with a laptop or a travel destination that must be within a certain distance from the main destination). Composite items can be further constrained by specific criteria, such as a price budget on purchases or a time budget on travel itineraries.

Apart from recommendations for single users, there are cases, such as visiting a restaurant or selecting a holiday destination, where a group of people participates in an activity. For such cases, group recommendations try to satisfy the preferences of all the group members (e.g., [5]). A different aspect of group recommendations appears when specific constraints apply to the members of the group [9]. That is, constraints refer to preferences that the members of the group express for the other participants. For example, a vacation package may seem more attractive to a user, if the other members of the group are of a similar age, whereas a course may be recommended to a group of students that have similar or diverse backgrounds depending on the scope of the course. Constraints may describe limitations from the user/customer or the system/company perspective. In the latter, constraints refer to a set of properties that the group under construction must satisfy, expressing the requirements of the company concerning the group that an item is targeting on.

Since users usually have different preferences under different circumstances, for both single and group recommendations, context can be employed in conjunction with recommender systems [1]. Furthermore, given that the granularity of a user’s taste that is captured by his profile is, in general, too coarse, recommender systems help users to express their needs by allowing them to provide examples based on which the system’s suggestions are identified (e.g., Pandora).

d) Recommendations explanation and visualization: The success of recommendations, i.e., their adoption by end users, relies on explaining the cause behind them. To this end, except for the suggested items, several approaches provide the user with an explanation for each suggested item, i.e., which is the reason that the specific item appears in the list of recommendations. In this direction, other approaches focus on the effective presentation of the recommended items to the end user, aiming at minimizing the browsing effort of the user and help him receive a broader view of them. For example, [8] exploits preferences defined by users upon items, extracts ranking of preferences that is used for ordering the suggested items and summarizes these items in a compact, yet intuitive and representative way.

e) Exploring your past: As data and knowledge bases get larger and accessible to a more diverse and less technically-oriented audience, new forms of data seeking become increasingly more attractive. A specific form of exploration is re-finding. In our context, re-finding aims to locate suggestions, possibly via browsing, that have been produced and seen by a user in the past. Unlike the typical task of constructing recommendations, here we face the task of recovery. Following the notion of personal dataspaces, it is challenging to integrate all data pertaining to a user from different sources, and allow him to visualize, search and explore his recommendations through a specific piece of information. Explicit (given by a user) or implicit (extracted, for instance, by his online traces, e.g., via Foursquare) feedback on suggestions, content and other users can significantly increase the quality of recommendations and searching features of a system. Recently, [4] introduces complementary types of feedback that can be achieved through the evolution of the interests of the users that belong to the social neighborhood of our respective user, or through his reactions, either attractions or aversions, towards past suggestions.

f) Lifelong recommendations and learning: The majority of recommendation approaches require the whole set of data (users, items and preferences) as input (static case), which is obsolete nowadays, due to the huge amount of generated data and the lifelong tracking of users presence online. Keeping track of the user history does not only result in more data for recommendations (useful for tackling the sparsity problem), but also allows for the study of possible changes in user tastes and identifying periodicity in his habits. A stream-mining inspired approach is that of data ageing that downgrades historical ratings as obsolete and pays more attention to recent ones that reflect the current user profile best (e.g., [8]). However, results on the effect of time in the quality of recommendations are contradictory some times, since approaches that discard past instances may lose too much signal. For such cases, more elaborate methods that separate transient factors from lasting ones appear to be beneficial. In the same scenario, the general notion of context, such as location and companion, can be employed as well. For implicitly extracting such sort of information, online reviews can be used. However, a long term user monitoring implies an extensive knowledge about user tastes and preferences, which might result in privacy risks for the user.

Recommendations for Data Exploration: Recommendations have always been an important area for both research and industry. Lately, they are being reshaped due to the huge amount of mostly heterogeneous data that are continuously collected from the Web. Data integration methods can handle the huge volumes of data, the different types of their heterogeneity and their evolution. Interestingly, the produced enriched data can be fed into the recommendation process and such wealth of data can facilitate the user experience with a recommender system.

Clearly, recommendations are considered as one of the main aspects of exploratory search, since they tend to anticipate user needs by automatically suggesting the information, which is most appropriate to the users and their current context. Their sophisticated capabilities are valuable for data discovery in numerous applications in various domains, such as social media, healthcare, telecommunication, e-commerce and Web analytics, business intelligence, and cyber-security. Moving forward, there is still a need to develop novel paradigms for user-data recommendations-like interactions that emphasize user context and interactivity with the goal of facilitating exploration, interpretation, retrieval, and assimilation of information. This year’s ExploreDB workshop, a premier workshop on Exploratory Search in Databases and the Web, co-located with ACM SIGMOD/PODS 2015, will cover the fascinating topic of recommendations, as well as encompass a wide range of research directions, highlighting data discovery and exploration.

REFERENCES
[1] G. Adomavicius, R. Sankaranarayanan, S. Sen, and A. Tuzhilin. Incorporating contextual information in recommender systems using a multidimensional approach. ACM Trans. Inf. Syst., 23(1):103–145, 2005.

[2] H. Kondylakis and D. Plexousakis. Exelixis: evolving ontology-based data integration system. In SIGMOD, 2011.

[3] H. Kondylakis and D. Plexousakis. Ontology evolution without tears. J. Web Sem., 19:42–58, 2013.

[4] W. Lu, S. Ioannidis, S. Bhagat, and L. V. S. Lakshmanan. Optimal recommendations under attraction, aversion, and social influence. In KDD, 2014.

[5] E. Ntoutsi, K. Stefanidis, K. Norvag, and H. Kriegel. Fast group recommendations by applying user clustering. In ER, 2012.

[6] E. Ntoutsi, K. Stefanidis, K. Rausch, and H. Kriegel. Strength lies in differences: Diversifying friends for recommendations through subspace clustering. In CIKM, 2014.

[7] S. B. Roy, S. Amer-Yahia, A. Chawla, G. Das, and C. Yu. Constructing and exploring composite items. In SIGMOD, 2010.

[8] K. Stefanidis, E. Ntoutsi, M. Petropoulos, K. Norvag, and H. Kriegel. A framework for modeling, computing and presenting time-aware recommendations. T. Large-Scale Data- and Knowledge-Centered Systems, 10:146–172, 2013.

[9] K. Stefanidis and E. Pitoura. Finding the right set of users: Generalized constraints for group recommendations. In PersDB, 2012.

[10] M. Zimmermann, E. Ntoutsi, and M. Spiliopoulou. Discovering and monitoring product features and the opinions on them with OPINSTREAM. Neurocomputing, 150:318–330, 2015.

Blogger Profiles:

Kostas Stefanidis is a research scientist at ICS-FORTH, Greece. Previously, he worked as a post-doctoral researcher at NTNU, Norway, and at CUHK, Hong Kong. He got his PhD in personalized data management from the Univ. of Ioannina, Greece. His research interests include recommender systems, personalized and context-aware data management, social networks, and information extraction, resolution and integration. He has co-authored more than 30 papers in peer-reviewed conferences and journals, including ACM SIGMOD, IEEE ICDE and ACM TODS. He is the General co-Chair of the Workshop on Exploratory Search in Databases and the Web (ExploreDB), and he will be the Web & Information Chair of SIGMOD/PODS 2016.

Eirini Ntoutsi is a researcher at LMU, Germany. She received her PhD in data mining from the Univ. of Piraeus, Greece. Previously, she worked as a data mining expert at OTE SA, the largest telecommunications operator in Greece. Her research interests lie in the areas of data mining, machine learning and databases, with a current focus on recommendations, opinionated streams and pattern stability analysis, stream mining and high dimensional data. She has co-authored more than 40 publications in international venues, including KDD, DKE and CIKM, serves as a reviewer in several conferences and journals, and co-organizes the BASNA workshop on Business Applications of Social Network Analysis.

Haridimos Kondylakis is a research scientist at ICS-FORTH, Greece. He received his PhD in Computer Science from the Univ. of Crete, Greece. His research interests span the following areas: Semantic Integration & Enrichment; Knowledge Evolution; Applications of Semantic Technologies to Information Systems. He has more than 40 publications in international conferences, books and journals including ACM SIGMOD, JWS and KER. He has also served as a reviewer in several journals and conferences, such as JWS, JODS, CIKM, EDBT, ISWC and as a PC member in premier conferences and workshops.

549 views

The elephant in the room: getting value from Big Data Data systems that are easy to design*

Kostas Stefanidis, Eirini Ntoutsi, Haridimos Kondylakis

Information Hunting: The Many Faces of Recommendations for Data Exploration

Blogger Profiles:

Categories

Recent Comments

Archives