Kian-Lee Tan

Data Management For The Metaverse


In 2009, we wrote an article highlighting some database challenges in a co-space environment [1]. In such an environment, the physical space and the digital space co-exist in a “universe” and applications can manipulate the data flow within and across the two spaces. 13 years have since passed and progress on co-space research has been (very) slow. However, this may be about to change. First, the buzzword for “co-space” is now the “metaverse”. The metaverse is still at its early stage of development and is still finding its identity. Second, there has been tremendous progress on technologies that potentially support the metaverse research – hardware accelerators like GPUs and FPGAs can deliver the horsepower needed for high-performance processing in the metaverse; novel programmable network architectures and IoT over 5G network (6G soon to come) can offer lower latency and faster download speeds that are needed for remote accesses and data exchange between the two spaces; advances in immersive technologies (e.g., augmented reality and virtual reality) can enable real-time, immersive and interactive end-user experiences. Third, AI/ML can unlock the potentials of the metaverse through intelligent avatar/digital twins, understanding of natural languages to facilitate human-machine interactions and learning and drawing insights from data. At the same time, blockchain technologies can guarantee safe, secure and protected transactions. Fourth, there are significant amount of investments and interests from the tech industries including Meta, Microsoft and Google. It has been estimated that the metaverse is a $800 billion market opportunity [2].

In the metaverse, we can design innovative applications that provide experiences and opportunities that neither the physical nor the virtual spaces can offer. Some example applications include partnership in shopping among online and physical shoppers, an enhanced digital model that captures physical troop movement, location based games and social networking. Clearly, the metaverse will encompass large amount of data that flow between the physical and digital spaces in order to ensure that the real and virtual worlds are synchronized. Given that these applications are data-driven, and the potential size of the data that can be generated is enormous, we believe that the database community has much to offer to drive the growth of this field.

A Use Case: Marketplace in the Metaverse

Today’s shoppers may visit a mall onsite or they can buy products online. In tomorrow’s metaverse-oriented marketplace, a physical mall can be “expanded” into a mall (virtually) that houses many more shops than the physical mall. Shoppers (online or onsite) may enjoy richer shopping experience through immersive technologies: they can be “teleported” to different shops instantly; they can browse and handle 3D models of products with augmented product information; they can easily find shops (within the same physical mall if they are physically onsite) selling similar products and possibly compare their prices/quality; their avatars (mini digital models of themselves in fitting dimensions) can easily find shirts/dresses that match them exactly, etc. The virtual mall needs to be kept up-to-date with real time information from the physical mall, e.g., live programs that are happening in the physical mall, on-going lucky draws, updates on availability of products, etc. In addition, the cyber and physical shoppers can interact with one another.  Socially connected “friends” in the social networks (direct or through common friends) who happened to be in the same shop/mall (e.g., one in the physical space while the other in the virtual space) may communicate and benefit from discounts (e.g., for a “buy two for the price of one” offer, each can buy one while sharing the cost) or share their opinions on the products with each other.

We observe that the metaverse generates large amounts of data from multiple different data sources and sensing devices. In addition, a large amount of data may have to be streamed from one space to another, particularly from the physical to the virtual to ensure real-time tracking of the environment. The metaverse will also produce a large number of events which may trigger further actions/events both in the physical and virtual worlds. Finally, a large number of users will interact with the metaverse and their actions correspond to data/queries on the metaverse. Each user device basically contributes a distributed node into a highly distributed environment. Moreover, such a marketplace offers a rich source of social and behavioral data as well as shoppers’ products preferences for retail analysts. Clearly, our community has been dealing with a wide range of research problems that are relevant – sensor networks, data streams, distributed databases, update-intensive operations, search and data retrieval. As such, our experience will enable us to contribute to this new multi-disciplinary field and to chart the research directions ahead. Here, we shall highlight some challenges in designing database engines for the metaverse (using the marketplace as our running example); interested readers may refer to [1] for more challenges and issues (e.g., data fusion over heterogeneous data sources, distributed architectures, buffer management, data consistency, and data privacy).

Storage Manager

For the metaverse, we have data in diverse formats. In the marketplace, products are associated with structured data (e.g., item name, cost, quantity available, and so on). In addition, products may come with images/videos, textual descriptions and 3D models. While it makes sense to manage different types of data separately (i.e., structured data may be stored using a relational DBMS, while multimedia data are managed by a multimedia database), it is less clear whether data from the physical and virtual spaces should be handled differently. For example, should the locations of shoppers onsite be stored differently from virtual shoppers? How about products of physical mall/shops verses products of virtual shops? On one hand, we can simply tag data to reflect the space it belongs to. This offers a unified view of the metaverse and simplifies data management and processing.

On the other hand, it may be beneficial to organize the data from the two spaces separately, e.g., performance is likely to be improved in situations where only data from a single space is required. It may also be possible that a hybrid strategy is preferred. For example, structured data may be stored together while location information may be managed separately (so that promotions tailored to onsite shoppers can be disseminated to them without spamming the online shoppers). It would also be interesting to investigate how recent storage design such as row- or column- stores and self-organizing storage can be exploited for the metaverse applications. We can expect different storage engines to be used for different data types.

Another issue that is closely related is how best to fuse data from these diverse heterogeneous sources. While this is similar to data integration in traditional heterogeneous databases, metaverse data management requires more complex logic inferences over the data sources. Recent works on polyglot data store/management offers a good starting point here [3].

Query Processing and Optimization

We can expect query processing and optimization to be more complex in the metaverse. First, new operators may have to be introduced. For example, Jane who is considering renovating her dining hall may walk into a furniture showroom (physical or online). She may like the displayed furniture set for dining room  (comprising dining tables, chairs, chandelier, wall frames and pictures, etc). However, she would also like to have a sense of how the dining room would look like if some of the items were swapped. In addition, she would not want to go beyond her budget. She could go on a virtual tour of the showroom and issue an exploratory-type query that would return new display sets within her budget. She could iteratively refine the results by fixing the items that she wanted and exploring the other items continuously. Here, a variant of preference query over a set of objects need to be designed.

Figure 1. Amazon already provides a “simple” virtual showroom that allows users to “mix-and-match” according to their liking. However, the engine is static, and products can only be swapped manually,  i.e., user selects the items to swap and to replace. The figure on the right shows 5 items replaced from the figure on the left.  The future metaverse-based showroom will offer better shopper experience.

As another example, for performance reasons, query processing/optimization algorithms have to be context-aware. Several flavors of awareness have to be considered. The algorithm may need to be space-aware. For example, a navigation query from point A to point B in the physical space may require a detailed map and directions to be provided; for a virtual user, it may be sufficient to simply provide a list of shops along the way as the shopper speedily walks through the virtual environment.

The algorithm has to be device-aware. Depending on the devices used (e.g., mobile phone, PCs, 3D VR headgear), a feasible and optimal plan tailored to the device can be generated. Yet another notion is that of visibility-awareness. Depending on the viewpoint of the user, only objects that are visible need to be accessed. Moreover, objects further away from the viewpoint may be approximated by coarser representations while those nearer may require higher resolution for enhancing user immersive experience. All these context-aware scenarios call for some new forms of approximation operators to be developed.

From the mall/shop owner perspective, besides traditional analytics (on each of spaces) that seek to provide insights on mall visitors, capture rates, proximity traffics, path analytics, sales conversion, etc, there may be a need to integrate findings from onsite and online visitors. Are both kinds of visitors being influenced in the same ways by recommendations? Can we adapt and apply effective measures in one space to the other?  How can we exploit actual locations of online/onsite shoppers more effectively? How do we manage users who are onsite and online at the same time?


As mentioned, the metaverse offers a wide diversity of data. To manage this, we may need novel indexing methods to support efficient similarity search of 3D objects (which are represented as 3D models comprising tens of thousands of polygons). This is becoming increasingly important as the metaverse not only manages complex 3D data objects but virtual users are expected to interact with and handle such objects.

Another aspect that has not received much attention is that of visual fidelity of virtual environments. In particular, some background objects may be occluded while others may not be needed to be streamed in high resolution. One direction to explore is to design data structures that can support a hierarchy of coarse-grained and fine-grained resolutions. An example is the HDoV-tree [4] that captures objects at fine granularity at the leaf level, but represents collection of objects at the internal levels in coarser forms. In this way, it can be tuned to provide visual fidelity and performance based on the degree of visibility of objects.

However, this structure is obtained statically with a fixed granularity at each level, and requires high computation overhead. In the metaverse, we may need a more robust and dynamic structure. For example, it may be possible to build a HDoV-tree that captures the models at all levels in high resolution but dynamically transform the model into an appropriate resolution for transmission at real time. In addition, to cater to frequent updates of information, we need more flexible schemes to be able to handle update intensive applications and frequently changing scenes.


Advances in technologies (and fueled by the pandemics) have changed the way we live. Many of the things that we do in the physical space (e.g., shopping, education, etc) can also be replicated in the digital world. We believe the co-existence of these two spaces, not as independent entities, but as an integrated universe, will enhance user immersive experience in the metaverse seamlessly. The metaverse is still at its infancy stage. For it to be successful, researchers from the various discipline – multimedia, visualization, networking, hardware and databases – must come together. This is a good opportunity for the database community to work with other research communities to tackle this multi-disciplinary challenge. In the discussion, we have largely focused on designing a database engine for managing data for real-time processing. With the metaverse, we can also travel back in time. We may be able to be at a historical site experiencing virtually an event that has transpired in the past. Perhaps, by 2030, we will experience the world of the metaverse as end-users and be brought “back to the future”!

Blogger Profile

Kian-Lee Tan is Tan Sri Runme Shaw Senior Professor of Computer Science at the School of Computing, National University of Singapore (NUS). He received his Ph.D. in computer science in 1994 from NUS. His research interests include query processing and optimization in advanced database systems, database performance and data analytics. Kian-Lee has served as co-EIC of the VLDB Journal (2009-2015), and member of the VLDB Endowment Board (2013-2017) and PVLDB Advisory Committee (2014-2017). Kian-Lee has also co-chaired the TPC of VLDB’2010 and ICDE’2011. Kian-Lee was a 2013 IEEE Technical Achievement Award recipient for his contributions to advanced query processing.


[1] B.C. Ooi, K.L. Tan, A. K. H. Tung: Sense the physical, walkthrough the virtual, manage the co (existing) spaces: a database perspective. SIGMOD Rec. 38(3): 5-10 (2009). (Look out for an updated version in arXiv soon.)


[3] D. Glake, F. Kiehn, M. Schmidt, F. Panse, N. Ritter: Towards Polyglot Data Stores — Overview and Open Research Questions. arXiv:2204.05779 (2022).

[4] L. Shou, Z. Huang, K.L Tan: HDoV-tree: The Structure, The Storage, The Speed. ICDE 2003: 557-568

Copyright @ 2022, Kian-Lee Tan, All rights reserved.