June 21, 2014
Exploratory search has gained prominence in recent years. There is an increased interest from several scientific communities, from the information retrieval and database to human-computer interaction and visualization communities, in moving beyond the traditional query-browse-refine model supported by web search engines and database systems alike, and towards support for human intelligence amplification and information understanding. Exploratory search systems should inherently support symbiotic human-machine relationships that provide guidance in exploring unfamiliar information landscapes [Whi 09]. In this post, four researchers set to explore exploratory search from different angles and study the emerging needs and objectives as well as the challenges and problems that need to be tackled.
Exploratory Search in Structured Data: Data Warehouses and OLAP to the Rescue?
by Melanie Herschel
Looking more closely at what problems the field of exploratory search encompasses, Marchionini has established two main aspects of exploratory search that go beyond the classical problem of looking up information using classical search mechanisms [Mar 06]. More specifically, exploratory search has the general goal of (i) learning, thus acquiring new knowledge and (ii) investigating to possibly reveal new facts. Sub-tasks to achieve these goals include:
Learning: knowledge acquisition, comprehension and interpretation, comparison, aggregation and integration, and socialization.
Investigating: Accretion, analysis, synthesis, evaluation, discovery, planning and forecasting, and transformation.
Clearly, exploratory search applies to both structured and unstructured data, as we want to leverage as much data as possible in the above tasks. However, if we focus on structured data, we notice that quite a few of the tasks mentioned above resemble those for which data warehouses and online analytical processing (OLAP) have been developed. One example of a learning task supported by data warehouses is integration, as one of the main purposes of data warehouses is to integrate data from distributed and heterogeneous sources in a single repository. Consequently, data comparison is easily supported as well. As for supporting investigation tasks, OLAP and data mining algorithms readily support analysis of data stored in a data warehouse, and reports synthesize such data. Finally, data warehouses are deployed to support decision making, such as planning and forecasting.
So are the problems of exploratory search already solved for structured data through data warehouses and OLAP?
To some extent yes but there are still many areas of exploratory search that data warehouses do not cover, such as socialization, comprehension and interpretation, and discovery. In the following, I will highlight three main areas where I believe exploratory search will require novel approaches to reach its full potential.
Discovery. Data warehouses are designed based on a predefined information demand, meaning that the analytical queries they support are fixed in advance and for sure cannot go beyond queries that the schema of the data warehouse supports. Essentially, the data warehouse is designed to efficiently answer predefined queries. Opposed to that, in exploratory search, the information demand is unknown a priori, the goal of discovery being to also reveal things that users of the system did not think about during design time and thus are not covered by the rigid schema. Hence, system design for exploratory search needs to plan for the unplanned. Making an analogy to farming, we can say that whereas data warehouses and OLAP are useful for harvesting what has been planted, exploratory search is all about exploring and laboring new land.
Adaptation. Given the rigid cage schemas impose on a data warehouse, these systems typically have limited capabilities for changing or evolving information needs. But these are the essence of exploratory search, as learning is an iterative process that requires a search-refine-expand paradigm. That is, whereas data warehouses allow us to freely move in a cage, exploratory search systems need to be able to adapt to a new and rapidly changing environment.
Data and Users. As a final distinction here, we have limited the discussion above to structured data, but obviously, a wealth of information resides in unstructured data that exploratory search should natively support. This brings us back to a long going discussion on integrating databases and information retrieval. Additionally, many limitations of the data warehouse come from the rigid schema, so could maybe NoSQL databases be of help here? Finally, data warehouses and OLAP are targeted towards expert users that know their domain, whereas exploratory search is for the masses and human-computer interaction will be key to success.
To conclude, we have seen that data warehouses may support to some extent exploratory search, but they have serious limitations when it comes to discovery, adaptability, and the support of all types of relevant data and users.
Exploratory Search and Web searching
by Yannis Tzitzikas
Web searching is probably the most frequent user task in the Web, during which users typically get back a linear list of hits. In fact, web search engines are very good in ranking, and there is a plethora of ranking methods for capturing various aspects (topic relevance, authority, popularity, credibility and personalization). However, ranking is not enough. Ranking is adequate for focalized (else called precision-oriented) information needs, however the majority of information needs are recall-oriented (according to various user studies). For recall-oriented information needs, the user wants to find and understand more than one search hit. Examples include bibliographic survey writing, medical information seeking, patent search, hotel booking, and car buying. In general, recall-oriented information needs have an exploratory nature and aim at decision making (based on one or more criteria). To support such information needs, it is beneficial to provide methods that:
– enable easy access to low ranked hits,
– allow browsing the relevant hits and resources in groups (according to various criteria),
– offer overviews (of variable complexity) of the hits, and
– let the user to restrict gradually the search results.
The last years we, in the University of Crete and FORTH-ICS, have been investigating methods that can complement the classical Web searching with functionality for exploratory search. At first, we developed the web search engine (WSE) Mitos, a system which offers faceted search over the results of the submitted queries. Specifically, it supports facets corresponding to metadata attributes of the web pages (static metadata), as well as facets corresponding to the outcome of snippet-based clustering algorithms (a kind of dynamic metadata). The user can then restrict her focus gradually, by interacting with the resulting multidimensional structure through simple clicks.
The system IOS [Faf 12] was developed next, providing analogous features but during query typing (type-ahead search). This functionality is offered for the frequent queries and takes advantage of novel partitioned trie-based indexes for reducing the increased main memory requirements. Subsequently, through XSearch [Faf 12b] we investigated and showed how the available Linked Open Data (LOD) can be exploited for offering facets based on the outcome of entity mining techniques. Again, this approach is applied at query time over the textual snippets of the search hits, where LOD provides information about the names of the entities of interest. This approach has been proved successful for professional search, specifically in the domain of patent search and marine search. With X-ENS [Faf 12c] we focused on configurability, i.e., allowing the end user to configure the entities of interest, and also browse the information that LOD provides for these entities.
However, to offer entity mining not over the snippets, but over the full contents of the search hits, more than one machines are needed for downloading and analyzing the full contents of the hits. A recent work shows how MapReduce can be used for this purpose [Kit 14].
Subsequently, we investigated how we could enable the user to easily express preferences, simple and complicated ones, over the elements the user interacts with. Such preferences can affect the ordering of the different aspects of the multidimensional (and hierarchically organized) information space. We have developed the prototype system Hippalus that realized this approach over a proposed preference framework [Pap 14].
More information about the aforementioned systems is available here.
Based on our experience, we could identify four main topics that we believe current IR/Web search does not cover, and exploratory search will require to reach its full potential:
Ubiquity. Faceted browsing of search results and gradual restriction should be possible for any kind of query, for any domain, and with no predetermined facets. In other words, methods that bypass the need for explicit configuration (regarding facets, entity types, LOD sources) are required.
Fusion of Structured and Unstructured Content. The exploitation of LOD in the exploratory search process is promising, e.g. for Named Entity Recognition and disambiguation. However, the fusion of structured and unstructured content requires more work.
User Control. Explicit, user-provided, and controllable preference management is beneficial for supporting a transparent decision making process. We believe that the framework supported by the Hippalus system is a first step towards this direction.
Evaluation. We need easy-to-follow methods for evaluating the effectiveness of exploratory search methods, and easily reproducible evaluation results. Although the classical IR has well established methodologies for evaluation, things are not so clear and straightforward in interactive IR (IIR).
Exploratory Search and Multimedia Data
by K. Selcuk Candan
By its very nature, multimedia data exploration shares the 3V challenges ([V]olume, [V]elocity, and[V]ariety ) of the so called “Big Data” applications. Systems supporting multimedia data exploration, however, must tackle additional, more specific, challenges, including those posed by the [H]igh-dimensional, [M]ulti-modal (temporal, spatial, hierarchical, and graph-structured), and inter-[L]inked nature of most multimedia data as well as the [I]mprecision of the media features and [S]parsity of the observations in the real-world.
Moreover, since the end-users for most multimedia data exploration tasks are us (i.e., humans), we need to consider fundamental constraints posed by [H]uman beings, from the difficulties they face in providing unambiguous specifications of interest or preference, subjectivity in their interpretations of results, and their limitations in perception and memory. Last, but not the least, since a large portion of multimedia data is human-centered, we also need to account for the users’ (and others’) needs for [P]rivacy.
Multimedia exploration (on data with the above characteristics and by users with the above limitations) is an inherently dynamic process, and systems for multimedia exploration must be able to support, efficiently and effectively, a continuous exploration cycle involving four key steps:
sense & integrate: the system takes as inputs and integrates data, media, and models of the application space and continuously sensed real-time media data,
filter, rank & recommend: the system provides support for context-aware access to integrated media data sets,
visualize & feedback: the system acquires accurate user feedback through an intuitive data and result representation, and
act & adapt: the system provides continuous adaptation of models of data, context, and user preference based on user feedback.
Naturally, each and every step of this multimedia exploration cycle poses significant challenges. While it would be impossible to enumerate all the challenges, I would like to identify the following five “core” challenges:
– media annotation, summarization, and (dimensionality) reduction,
– user, community, context, preference modeling and feedback,
– multi-modal and richly structured/linked data exploration,
– dynamic/evolving multimedia data exploration, and of course
– bridging the semantic gap in media exploration.
Unfortunately, while as a community we have done great advances in tackling these core challenges, we probably have to admit that we are still quite far from addressing any of these issues satisfactorily, especially within the context of large multimedia data collections. In fact, at least in the short- to medium-term future, these five issues will continue to form the core challenges in multimedia exploration and retrieval.
If there is one thing that is becoming more urgent, however, it is that the ever-increasing scale and the speed of the data implies that to support the above media exploration cycle, our emphasis must shift towards development of integrated data platforms that can support, in an optimized and scalable manner, both media analysis (feature extraction, clustering, partitioning aggregation, summarization, classification, latent analysis) and data manipulation (filtering, integration, personalized and task-oriented retrieval) operations.
Personalizing Data Exploration: Exploring the Past
by Amelie Marian
Personal data is now pervasive as digital devices are capturing every part of our lives. Data is constantly collected and saved by users, either voluntarily in files, emails, social media interactions, multimedia objects, calendar items, contacts, etc., or passively by various applications such as GPS tracking of mobile devices, records of utility usage, financial transactions, or quantified self sensors. A typical user will have data recorded on many devices, cloud services, and proprietary systems; access to the data can be difficult because of the data fragmentation, security issues, or practical concerns.
Companies and organizations have famously been able to take advantage of the wealth of information produced by individuals, learning patterns and visualizing trends on a large number of data points produced by many users. Yet, individual uses do not have accessible tools to retrieve, manage and analyze their own data.
Leveraging personal data is critical to many data exploration tasks:
Exploring our past. A specific type of data exploration is re-finding [Tee 04], [Ber 08], whose goal is to find information that has been created, received, or seen by the user. Unlike traditional web search or exploration tasks whose focus is usually on discovering new information/documents, a re-finding exploration task has a target object it is attempting to recover.
Fragmentation of data is a main challenge. Users rarely own and store their personal data, with the exception of personal files. Most personal information is stored in the cloud by commercial companies that may offer some limited access, usually via a web browser or an API, to a user’s personal data. Attempting to retrieve and cross-reference personal information then leads to a tedious, often maddening, process of individually accessing all the relevant sources of data and manually linking their information. For example, checking that an insurance claim was correctly processed may require looking at a calendar application to find the doctor’s appointment date, checking the claim status on the insurance web site, and consulting a bank web portal to confirm that the payment was received.
The future of personal data exploration lies in the creation of personal data exploration tools, in the spirit of Dataspaces [Blu 07], [Hal 06], that will integrate personal data from a variety of sources, and allow users to visualize and search through their digital memories based on any piece of information they remember, following threads of information to navigate from one memory to the next.
Exploring our social data. Personal data is not limited to a user’s own data production. In an increasingly connected world, what other users in our social network share with us is also relevant to our data exploration. Users looking for book recommendations may want to explore their social network connections for books they have read or discussed. Some desktop search systems, such as deskWeb [Zer 10], have integrated the user’s social network graph, expanding the searched data set to include information available in the social network.
Personalizing our data exploration. Users have individual habits and patterns, as well as different types of data, context and information needs. By inferring knowledge from their past personal information and their interactions with their data, we can personalize data exploration and fit it to the users’ individual needs.
Social network information can be leveraged to learn contextual information about a user and guide data exploration. For instance, “Student Center” has different meaning for different groups of users. Knowing which social group the user belongs to can help us identify entities, e.g., observing that a majority of the user’s friends go to Rutgers University and attend events at “Busch Campus Student Center, Rutgers University”, increases the likelihood that the phrase “Student Center” in the user’s data and queries refers to that particular student center. Similarly, past queries can be used to learn some information about the type of the entities present in the personal data, or about the personal context.
This type of personalized data exploration is gaining traction in prospective memory applications, such as Google Now, which focus on reminding users of their appointments and to-do items.
Overall, understanding and leveraging personal data is crucial for many data exploration tasks, either because the exploration is on the personal information itself, or because personal information can give significant insights and directions for traditional data exploration needs.
Copyright @ 2014, Melanie Herschel, Yannis Tzitzikas, K. Selçuk Candan, Amélie Marian, All rights reserved.