{"id":2897,"date":"2019-06-20T15:42:36","date_gmt":"2019-06-20T15:42:36","guid":{"rendered":"http:\/\/wp.sigmod.org\/?p=2897"},"modified":"2019-09-23T12:47:28","modified_gmt":"2019-09-23T12:47:28","slug":"the-rise-of-natural-language-interfaces-to-databases","status":"publish","type":"post","link":"https:\/\/wp.sigmod.org\/?p=2897","title":{"rendered":"The Rise of Natural Language Interfaces to Databases"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\" style=\"text-align:left\">The\nvision of natural language interfaces to databases (NLIDBs) is to make data\nstores more accessible for a wide range of non-tech savvy end users with the\nultimate goal to talk to a database (almost) like to a human. While initially\nthe database community focused on relational databases, there is currently a\nrenaissance of building natural language interfaces for RDF-triple stores with\nDBPedia as the major playground. In particular, the semantic web as well as the\nnatural language processing communities are very active in this field. The\ntrend of building NLIDBs is triggered mainly by the recent success stories of artificial\nintelligence and in particular deep learning. The main idea is to consider the\ndesign of natural language interfaces as a machine translation problem using a\nsupervised machine learning approach. This trend has the benefit that suddenly\nthe database community has a challenging problem to solve that attracts\nresearchers who are interested in artificial intelligence and deep learning. As\na consequence, the database community gets a new refreshing influx of ideas &#8211; as\nwe discuss throughout this blog post.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In this blog post, we will first provide two challenging real-world case studies that motivated us to work on NLIDBs. Afterwards we give an overview of the major design choices for NLIDBs and sketch some of the major open research challenges at the intersection of database research, natural language processing and artificial intelligence. The aim of this blog post is not to provide an exhaustive overview on NLIDB research but to highlight the major challenges, how these challenges are currently solved and to sketch some of the open research challenges to stimulate further research in this area. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Case Studies <\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\"> Relational Database Technology in Banking<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Major\nglobal banks typically store their most critical data in large data warehouses\noften in combination with data lakes (Jossen et al., 2012). These data\nwarehouses are fed by thousands of source systems and have tens of thousands of\nattributes. The data sets are analyzed by tens to hundreds of different\nbusiness units. However, typically business users have no technical knowledge\nabout the underlying database or data lake structure nor sufficient knowledge\nof SQL to perform complex analytics tasks by themselves without considerable\nsupport by highly skilled IT professionals or data scientists. One of the\ncritical use cases became apparent&nbsp;\nduring the financial crisis that culminated in 2008 when business users\nneeded to find all customers with exposure to Lehman Brothers above a certain\nlimit. Solving this use case was very hard for the business users since data\nwas spread across databases and data warehouses. Moreover, financial products\nsuch as derivatives are highly complex with often non-trivial hierarchical\nstructures which require deep SQL know-how to fully analyze.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It is these types of business users that need to be empowered by NLIDBs. One approach to bridge the gap between the business and IT personal is SODA (Search Over Data Warehouse, Blunschi et al., 2012). SODA showed very promising results for keyword queries. However, one of open challenges in SODA is how to design a visual data exploration interface where users are guided by the system, which summarizes the most important data items and suggests what queries to ask.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"> Semantic Web Technology in Biology<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Studying biological phenomena is a very complex problem. Moreover, data stores are even more heterogeneous than traditional business data warehouses. Some data is stored in relational databases while the majority is stored in RDF-triple stores (Sima et al. 2019). Other bioinformatics databases use proprietary data formats. This heterogeneity makes it non-trivial to query across databases which is of importance in biology. Even though there are many biological ontologies and world-wide efforts to standardize such ontologies, data is inherently ambiguous and there is no single truth of data models or ontologies. Unlike in chemistry, where the periodic table consists of 118 uniquely identified chemical elements, there is no common agreement on the naming of genes. Currently there are about 20,000 different genes in a human with conflicting names. Hence, executing queries means resolving ambiguities interactively and suggesting potential correlations that the end user was not even aware of. We are currently building Bio-SODA (Sima et al. 2019) as a joint research effort between two universities and a major bioinformatics institute, namely the Swiss Institute of Bioinformatics (SIB) &#8211; the creator of one of the most widely used gene sequence databases called SwissProtKB.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One of the major open issues is how to disambiguate user queries by designing a query dialog that intelligently ranks the query results and interacts with the users without being intrusive. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Major Design&nbsp; Choices <\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We\nwill now discuss the main design choices of different approaches for NLIDBs and\nreference some of the major papers in that area. Note that the list of papers\nis not meant to be exhaustive but rather serves as starting point for people\nwho want to get into the field of NLIDBs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">According to a recent survey (Affolter et al., 2019) we can distinguish between the following five different approaches of designing NLIDBs: (a) Keyword-based, (b) Pattern-based, (c) Parsing-based, (d) Grammar-based, and (e) Neural machine translation-based.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Keyword-Based<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Rather than supporting full natural language queries, these systems typically&nbsp; only allow keyword-based queries. The basic idea of this type of NLIDBs is to build an inverted index on the base data and on the meta data, i.e. the database schema. Given a keyword query, the systems try to match the keywords against the inverted index. The inverted index is used to identify which tables contain the requested data. In case a query contains several keywords, the result could be several matching tables.  <br><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Consider the following simplified entity-relationship diagram of a database about movies: <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"http:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure1-1024x576.png\" alt=\"\" class=\"wp-image-2908\" srcset=\"https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure1-1024x576.png 1024w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure1-150x84.png 150w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure1-300x169.png 300w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure1-768x432.png 768w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure1-590x332.png 590w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure1-102x57.png 102w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Figure 1: Simplified ER diagram of a movie database<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Consider\nthe following keyword query: \u201cActor in Beautiful Mind\u201d. In order to answer this\nquery, the first step is to design a query parser that identifies which\nkeywords of the query are contained in the database. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Assume\nthat in this example \u201cActor\u201d is identified as a table. \u201cBeautiful\nMind\u201d is identified as the name of a movie since table \u201cMovie\u201d has a column\ncalled \u201cname\u201d that contains \u201cBeautiful Mind\u201d. The basic idea is to build two\ndifferent inverted indexes that help to solve this problem.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One\ninverted index is built on the base data that describes in which table and\ncolumn a specific keyword is contained. The following table shows a few\nexamples of how the index could look like:<\/p>\n\n\n\n<table class=\"wp-block-table has-fixed-layout is-style-regular\"><tbody><tr><td><strong>KEYWORD<\/strong><\/td><td><strong>TABLENAME<\/strong><\/td><td><strong>COLUMNNAME<\/strong><\/td><\/tr><tr><td>Beautiful Mind<\/td><td>Movie<\/td><td>Name<\/td><\/tr><tr><td>Arnold<\/td><td>Actor<\/td><td>FirstName<\/td><\/tr><tr><td>Zurich<\/td><td>Address<\/td><td>City<\/td><\/tr><\/tbody><\/table>\n\n\n\n<p class=\"wp-block-paragraph\">The second inverted index is built on the database schema. In our example above, \u201cActor\u201d is identified as a table, since there exists a table called \u201cActor\u201d. On the other hand, the keyword \u201cname\u201d could be identified as the column of the table \u201cMovie\u201d or of the table \u201cActor\u201d (see example index below). <\/p>\n\n\n\n<table class=\"wp-block-table has-fixed-layout\"><tbody><tr><td><strong>KEYWORD<\/strong><\/td><td><strong>TABLENAME<\/strong><\/td><td><strong>COLUMNNAME<\/strong><\/td><\/tr><tr><td>Actor<\/td><td>Actor<\/td><td><\/td><\/tr><tr><td>Name<\/td><td>Movie<\/td><td>Name<\/td><\/tr><tr><td>Name<\/td><td>Actor<\/td><td>Name<\/td><\/tr><\/tbody><\/table>\n\n\n\n<p class=\"wp-block-paragraph\">In the next step, the relationships between these tables (primary key\/foreign key relationships) are analyzed to identify how these tables can be joined. A common approach is to join these tables such that the distance between these tables is minimized. After the minimal join paths are identified, the SQL statement can be generated.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Assume that the query parser identifies the tables \u201cActor\u201d and \u201cMovie\u201d. How can these tables be joined automatically? In this case we can take advantage of the entity-relationship diagram that shows that \u201cActor\u201d and \u201cMovie\u201d can be joined via the relationship \u201cplays\u201d. In the database, there exists a primary\/foreign key relationship between the tables \u201cActor\u201d and \u201cplays\u201d as well as between the tables \u201cMovie\u201d and \u201cplays\u201d. By \u201cchasing\u201d the primary key\/foreign key relationships, we can identify that the tables \u201cActor\u201d and \u201cMovie\u201d can be joined via table \u201cplays\u201d.  <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">There are several strategies to improve this basic approach:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">(a) Use natural language processing techniques such as stemming and stop word removal to process the input query.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">(b) Use synonyms or ontologies to enable semantic queries rather than only direct match queries.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">(c) Use different ranking algorithms that, for instance, take into account the \u201cimportance\u201d of tables and relationship to solve ambiguity problems when a query results in multiple possible answers<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Examples of keyword-based system are Precis (Simitsis et al., 2008), SODA (Blunschi et al., 2012) and Aqqu (Bast and Haussmann, 2015). <\/p>\n\n\n\n<h4 class=\"wp-block-heading\"> Pattern-Based<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Pattern-based\nNLIDBs are extensions of keyword-based systems to enable answering more complex\nqueries in natural language rather than only keywords. The basic idea is to be\nable to handle <em>certain language patterns<\/em>\nto identify, for instance, <em>aggregation<\/em>\nqueries. Consider the following query \u201cShow the number of movies by actor\u201d.\nIn this case, the trigger word \u201cby\u201d specifies the aggregation pattern. The difficulty,\nhowever, is that aggregations could also be formulated with the trigger word\n\u201cfor each\u201d. Consider the following reformulation of the same question: \u201cShow\nthe number of movies for each actor\u201d.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Other patterns\ncould, for instance, be certain domain-specific <em>concepts<\/em>, such as \u201cgreat movies\u201d or \u201cexpensive productions\u201d. In\nboth cases, these patterns require a certain definition, for instance, a great\nmovie has a rating of 5 out of 5. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The\nchallenge is how to handle these language patterns and how to properly deal\nwith ambiguities of the queries. One possible solution is to interact with the\nuser and design the system in such a way that the user interactions are\nminimized.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Examples of pattern-based systems are QuestIO (Damljanovic et al., 2008) and NLQ\/A (Zheng et al., 2017). <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Parsing-based<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">These\nsystems are&nbsp; a further extension of\npattern-based systems. The basic idea is to use a natural language parser to\nanalyze the input query and to reason about the grammatical structure of the\nquery. The grammatical structure can then be used to better understand the\ndependencies between tokens of the query.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Figure 2 shows parts of speech tags (PoS) and the dependency tree for the query \u201cWho is the director of Inglorious Basterds\u201d using Stanford CoreNLP. For instance, WP refers to a \u201cwh-pronoun\u201d such as \u201cwho\u201d. \u201cVBZ\u201d refers to a verb in third person singular and \u201cNN\u201d refers to a noun. <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"397\" height=\"78\" src=\"http:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure2.png\" alt=\"\" class=\"wp-image-2912\" srcset=\"https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure2.png 397w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure2-150x29.png 150w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure2-300x59.png 300w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure2-102x20.png 102w\" sizes=\"auto, (max-width: 397px) 100vw, 397px\" \/><figcaption>Figure 2: Dependency Tree for the query  \u201cWho is the director of Inglorious Basterds\u201d  <\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Consider\nthe following queries: \u201cShow the number of movies by actor\u201d. \u201cMovies by\nSchwarzenegger\u201d. Both queries contain the trigger word \u201cby\u201d. However, the first\nquestion is an aggregation while the second one is not. The basic idea of\nparsing-based systems is to analyze the grammatical structure and deduce matching\npatterns and such disambiguate queries.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Examples of pattern-based systems are NaLIR (Li and Jagadish, 2014) and ATHENA (Saha et al., 2016). <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Grammar-Based<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The\nbasic idea is to use a set of rules, i.e. a grammar, to define how questions\ncan be built, understood and answered by the system. These rules can then be\nused to assist the users in typing their queries via autocompletion.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Consider, for example, the following simple grammar:<br><\/p>\n\n\n\n<p class=\"has-small-font-size wp-block-paragraph\"><strong>Sentence<\/strong> &#8211;&gt; NounPhrase VerbPhrase<\/p>\n\n\n\n<p class=\"has-small-font-size wp-block-paragraph\"><strong>NounPhrase<\/strong> &#8211;&gt; Noun Determiner<\/p>\n\n\n\n<p class=\"has-small-font-size wp-block-paragraph\"><strong>Noun<\/strong> &#8211;&gt; \u201cPerson\u201d \u201cMovie\u201d \u201cInglorious Basterds\u201d \u201cThe Girl with the Dragon Tattoo\u201d \u201cZurich\u201d<\/p>\n\n\n\n<p class=\"has-small-font-size wp-block-paragraph\"><strong>Determiner<\/strong> &#8211;&gt; \u201cWho\u201d \u201cWhich\u201d \u201cWhat\u201d<\/p>\n\n\n\n<p class=\"has-small-font-size wp-block-paragraph\"><strong>VerbPhrase<\/strong> &#8211;&gt; Verb Noun<\/p>\n\n\n\n<p class=\"has-small-font-size wp-block-paragraph\"><strong>Verb<\/strong> &#8211;&gt; \u201cdirects\u201d \u201cplays\u201d \u201cis filmed\u201d <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The above grammar says that a sentence (S) consists of a noun phrase (NP) followed by a verb phrase (VP). A noun phrase consists of a noun (N) and a determiner (Det), etc. The grammar can then be used to identify the syntactic structure of a sentence, e.g. \u201cWhich movie is filmed in Zurich\u201d (see Figure 3). Finally, SQL or SPARQL can be generated by traversing the syntax tree. <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"523\" src=\"http:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure3-1024x523.png\" alt=\"\" class=\"wp-image-2915\" srcset=\"https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure3-1024x523.png 1024w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure3-150x77.png 150w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure3-300x153.png 300w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure3-768x393.png 768w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure3-590x302.png 590w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure3-102x52.png 102w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure3.png 1665w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Figure 3 shows the syntax tree for the sentence \u201cWhich movie is filmed in Zurich\u201d based on a simple grammar.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Examples of grammar-based systems are TR Discover (Song et al., 2015), and SPARKLIS (Ferre, 2017). <\/p>\n\n\n\n<h4 class=\"wp-block-heading\"> Neural Machine Translation-Based<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The\nnewest approach of tackling NLIDBs is to use a neural machine translation\napproach. The basic idea is to apply supervised machine learning techniques on\nset of question\/answer pairs where the questions are the natural language\nqueries and the answers are the respective SQL or SPARQL statements. For\ntranslating from natural language to SQL or SPARQL the same techniques can be\napplied as for natural language translation, e.g. from English to French or\nSpanish.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One of the most commonly used methods is a certain recurrent neural network (RNN) called a Long Short-Term Memory (LSTM by Hochreiter &amp; Schmidhuber, 1997). In order to apply these neural networks for translating from a natural language to SQL or SPARQL, the questions and answers need to be transformed into a vector by applying word embedding techniques. These vectors are then used by a bi-directional neural network consisting of an encoder and decoder. Figure 4 sketches the basic ideas. <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"http:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure4-1024x576.png\" alt=\"\" class=\"wp-image-2917\" srcset=\"https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure4-1024x576.png 1024w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure4-150x84.png 150w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure4-300x169.png 300w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure4-768x432.png 768w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure4-590x332.png 590w, https:\/\/wp.sigmod.org\/wp-content\/uploads\/2019\/06\/Figure4-102x57.png 102w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Figure 4: Recurrent neural network architecture based on Long Short-Term Memory to translate a natural language question to SQL via sequence learning. The left part shows the encoder, while the right part shows the decoder.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Examples of neural machine translation-based systems are introduced by (Iyer et al., 2017), (Basik et al., 2018) and (Yavuz et al., 2018). First results are very promising. However, the major disadvantage of these types of supervised machine learning solutions is that they typically require a large number of training data sets which are often not available. Hence, some approaches tackle this problem by generating training data. The idea is to use existing questions and reformulate them to introduce additional linguistic variations and to increase the potential training data sets. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Conclusions <\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We\nhave studied several approaches of designing NLIDBs. Currently the most\neffective systems are those based on a parsing or grammar approach according to\na recent survey (Affolter et al., 2019). However, neural machine translation\napproaches show very promising results even though are \u201cnot enterprise ready\nfor real database systems\u201d yet.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The technique based on neural machine translation are currently very popular \u2013 not only in the database community but also in the semantic web, natural language processing and artificial intelligence community. On the one hand, these techniques open up exciting new perspectives to make core data management problems visible in other communities. On the other hand, the database community can also attract bright researchers from artificial intelligence and deep learning \u2013 two fields that are among the most popular now both in academia and industry. In summary, studying NLIDBs is not only an interesting research challenge, but also an excellent way of attracting new talent to tackle one of the open research challenges namely to how talk to databases (almost) like to humans. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">References<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"> Affolter, K., Stockinger, K., &amp; Bernstein, A., A Comparative Survey of Recent Natural Language Interfaces for Databases, VLDB Journal 2019, <a href=\"https:\/\/doi.org\/10.1007\/s00778-019-00567-8\">https:\/\/doi.org\/10.1007\/s00778-019-00567-8<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Basik, F., H\u00e4ttasch, B., Ilkhechi, A., Usta, A., Ramaswamy, S., Utama, P.,  &amp; Cetintemel, U. (2018, May). DBPal: A Learned NL-Interface for Databases. In <em>Proceedings of the 2018 International Conference on Management of Data<\/em> (pp. 1765-1768). ACM.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Bast, H., &amp; Haussmann, E. (2015,\nOctober). More accurate question answering on freebase. In <em>Proceedings of\nthe 24th ACM International on Conference on Information and Knowledge\nManagement<\/em> (pp. 1431-1440). ACM.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Blunschi,\nL., Jossen, C., Kossmann, D., Mori, M., &amp; Stockinger, K. (2012). Soda: Generating sql for business users. <em>Proceedings of\nthe VLDB Endowment<\/em>, <em>5<\/em>(10), 932-943.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Damljanovic, D., Agatonovic, M.,\n&amp; Cunningham, H. (2010, May). Natural language interfaces to ontologies:\nCombining syntactic analysis and ontology-based lookup through the user interaction.\nIn <em>Extended Semantic Web Conference<\/em> (pp. 106-120). Springer, Berlin,\nHeidelberg.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Ferr\u00e9, S. (2017). Sparklis: an\nexpressive query builder for SPARQL endpoints with guidance in natural\nlanguage. <em>Semantic Web<\/em>, <em>8<\/em>(3), 405-418.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Hochreiter, S., &amp; Schmidhuber,\nJ. (1997). Long short-term memory. <em>Neural computation<\/em>, <em>9<\/em>(8),\n1735-1780.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Iyer, S., Konstas, I., Cheung, A., Krishnamurthy,\nJ., &amp; Zettlemoyer, L. (2017, April). Learning a Neural Semantic Parser from\nUser Feedback. In <em>55th Annual Meeting of the Association for Computational\nLinguistics<\/em>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Jossen, C.,\nBlunschi, L., Mori, M., Kossmann, D., &amp; Stockinger, K. (2012, April). The credit suisse meta-data warehouse. In <em>2012 IEEE 28th\nInternational Conference on Data Engineering<\/em> (pp. 1382-1393). IEEE.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Kaufmann, E., Bernstein, A., &amp;\nFischer, L. (2007, June). NLP-Reduce: A naive but domainindependent natural\nlanguage interface for querying ontologies. In <em>4th European Semantic Web\nConference ESWC<\/em> (pp. 1-2).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Li, F., &amp; Jagadish, H. V.\n(2014). Constructing an interactive natural language interface for relational\ndatabases. <em>Proceedings of the VLDB Endowment<\/em>, <em>8<\/em>(1), 73-84.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Saha, D., Floratou, A.,\nSankaranarayanan, K., Minhas, U. F., Mittal, A. R., &amp; \u00d6zcan, F. (2016).\nATHENA: an ontology-driven system for natural language querying over relational\ndata stores. <em>Proceedings of the VLDB Endowment<\/em>, <em>9<\/em>(12), 1209-1220.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Sima, A. C., Stockinger, K., de\nFarias, T. M., &amp; Gil, M., Semantic Integration and Enrichment of\nHeterogeneous Biological Databases, To appear in <em>Evolutionary Genomics: Statistical and Computational Methods<\/em>, 2nd Edition,\nSpringer<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Simitsis, A., Koutrika, G., &amp;\nIoannidis, Y. (2008). Pr\u00e9cis: from unstructured keywords as queries to\nstructured databases as answers. <em>The VLDB Journal\u2014The International Journal\non Very Large Data Bases<\/em>, <em>17<\/em>(1), 117-149.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Song, D.,\nSchilder, F., Smiley, C., Brew, C., Zielund, T., Bretz, H., &#8230; &amp; Harrison, J. (2015, October). TR discover: A natural\nlanguage interface for querying and analyzing interlinked datasets. In <em>International\nSemantic Web Conference<\/em> (pp. 21-37). Springer, Cham.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Zheng, W., Cheng, H., Zou, L., Yu, J. X., &amp; Zhao, K. (2017, November). Natural language question\/answering: let users talk with the knowledge graph. In <em>Proceedings of the 2017 ACM on Conference on Information and Knowledge Management<\/em> (pp. 217-226). ACM.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Blogger Profile<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Prof. Dr. <\/em><a href=\"http:\/\/www.zhaw.ch\/en\/about-us\/person\/stog\"><em>Kurt Stockinger <\/em><\/a><em>is Professor of Computer Science, Director of Studies in Data Science at Zurich University of Applied Sciences (ZHAW) and Deputy Head of the <\/em><a href=\"http:\/\/www.zhaw.ch\/datalab\"><em>ZHAW Datalab<\/em><\/a><em>. His research focuses on Data Science with emphasis on Big Data, data warehousing, business intelligences, advanced analytics and natural language query processing on knowledge bases. He is also on the Advisory Board of Callista Group AG.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Previously Kurt Stockinger worked at Credit Suisse in Zurich, Switzerland, at Lawrence Berkeley National Laboratory in Berkeley, California, at California Institute of Technology, California as well as at CERN in Geneva, Switzerland. He holds a Ph.D. in computer science from CERN \/ University of Vienna.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"> Copyright @ 2019, Kurt Stockinger, All rights reserved. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>The vision of natural language interfaces to databases (NLIDBs) is to make data stores more accessible for a wide range of non-tech savvy end users with the ultimate goal to talk to a database (almost) like to a human. While initially the database community focused on relational databases, there is currently a renaissance of building [&hellip;]<\/p>\n","protected":false},"author":67,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20,16,14],"tags":[],"coauthors":[],"class_list":["post-2897","post","type-post","status-publish","format-standard","hentry","category-data-exploration","category-databases","category-search"],"views":6814,"_links":{"self":[{"href":"https:\/\/wp.sigmod.org\/index.php?rest_route=\/wp\/v2\/posts\/2897","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wp.sigmod.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wp.sigmod.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wp.sigmod.org\/index.php?rest_route=\/wp\/v2\/users\/67"}],"replies":[{"embeddable":true,"href":"https:\/\/wp.sigmod.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2897"}],"version-history":[{"count":18,"href":"https:\/\/wp.sigmod.org\/index.php?rest_route=\/wp\/v2\/posts\/2897\/revisions"}],"predecessor-version":[{"id":2928,"href":"https:\/\/wp.sigmod.org\/index.php?rest_route=\/wp\/v2\/posts\/2897\/revisions\/2928"}],"wp:attachment":[{"href":"https:\/\/wp.sigmod.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2897"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wp.sigmod.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2897"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wp.sigmod.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2897"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/wp.sigmod.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcoauthors&post=2897"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}