VAIRoma- Visual Interface for Spatial/ Temporal analysis of Big Data

Categories: Blog

Eric Sauda, Wenwen Dou, Isaac Cho, Bill Ribarsky

https://vairoma.uncc.edu/

At UNC Charlotte, the Digital Arts Center in the College of Arts + Architecture, and the Charlotte Visualization Center in the College of Computing and Informatics have created a prototype of a space, time and topic based visual analytics system, VAiRoma, (Visual Analytic interface for Roma) that focuses on the 3,000 year history of Rome in the broadest sense, encompassing the Roman Republic and the Empire, through to the Renaissance and the formation of the modern state with Rome as the seat of the worldwide Catholic Church. This history is placed in the context of external events and civilizations that affected the history of Rome. Our work is unique in the application of interactive visual analytic techniques to the humanities and in its incorporation of geospatial into existing work on temporal and topical data. It is also unique in its ability to take hundreds of thousands of individually rich but collectively unstructured documents and organize them into a coherent narrative that can be explored for new insights and unexpected relationships.

Initially we used Wikipedia as a data source, since we could download all articles (nearly 5 million articles in English). We assembled a subset of 189,000 articles selected based on whether they contained the words “Rome”, “Roma”, or “Roman”. Except for this initial step, there was no further hands-on manipulation of the collection. All subsequent analyses were automated, including topic modeling and entity extraction (extraction of dates and named entities, such as locations). Thus formation of the topics that were interpreted to create a long-timeline narrative was driven solely by the textual patterns in the collection without human guidance. Our topic modeling approach is based on Latent Dirichlet Allocation (LDA), a probabilistic machine learning approach that reveals topics embedded in text collection by generating a distinct, ranked set of keywords for each topic. Each document in the collection is then categorized by its main topics. We have significantly extended LDA by developing a scalable version, a hierarchical topic structure, and methods for revealing changing topic behavior over time.

The Wikipedia collection is comprehensive if not complete. We have found that it gives a good overview of Roman history, but it is not very deep or authoritative. We are now adding a much more scholarly collection. Through an agreement with JSTOR, we have gotten access to the complete database. This contains full text articles from nearly 2,000 scholarly journals, including some that cover a range of hundreds of years. Using the same initial keywords as above, we have selected a corpus of 800,000 articles to which we will apply our topic modeling and entity extraction approaches.

We are seeking scholarly partners both to help support the continued development of this system and to demonstrate its utility in exploring and understanding important and very extensive scholarly archives.

Visual Analytics is the field of analytical reasoning facilitated by interactive visual interfaces. Its important features include the ability to deal with high-dimensional data sets, to present information visually, and to allow users to interact with this information, building knowledge and decision-making capability. Visual analytics’ fundamental premise is that analysis is better undertaken as a symbiosis of the computational power of computers with the pattern and sense-making capacity of human users. It is this premise that guides the VAiRoma project.

How, then, can visual analytics enhance archival research among the humanities? The potential utility of the VAiRoma system is revealed by a use case of on-going scholarly research by our colleague, Professor Jeffrey Balmer.

This research focuses on the interior iconography of Bernini’s Sant’Andrea al Quirinale church in Rome. There are ample primary source materials available on this building. Balmer began his research by examining Bernini’s entire oeuvre, attempting to place Sant’Andrea in that context, or others: Roman churches undertaken in the 1650s; those constructed in the Quirinal district during the 17th century; or perhaps churches dedicated to Saint Andrew, or those commissioned by the Jesuit Order – all within narrow parameters of time and place, to allow for an exhaustive examination of all relevant archival materials.

Such parameters are normative for traditional archival scholarship. There are, however, broader contexts of interest.  Sant’Andrea al Quirinale might be studied in the context of a history of the Jesuit Order, the Roman Baroque, or the liturgical and aesthetic priorities of both patrons and designers with onset of the Counter-Reformation. Wider yet, the church might be placed within broader histories of painting, sculpture, architecture or urban design. Traditionally, as the frame of reference grows wider, so too does the scope of work: exhaustive command of a subject is demanded of scholarship. Understandably, such efforts remain the province of those special individuals with either super-human stamina or a rare access to the support (financial and temporal) required of such work. Even then, such works are surveys that adopt a measure of superficiality as a necessary premise. The pitfalls of such attempts are well documented: scholars may establish conclusions based on their necessary reliance on secondary sources, which runs the risk of being biased, flawed, or merely incomplete.

The capacities of the proposed VAiRoma system could track not just a specific history of a single building, but of entire districts in Rome, while preserving Balmer’s direct access to primary sources and to deep, highly relevant (non-summary) secondary sources from across scholarly areas. For example, the Quirinal district witnessed widespread construction throughout the 17th century, driven by the expansion of institutions affiliated with the Church in the wake of the Counter Reformation. The VAiRoma system would not only provide Balmer the means to track construction records for Sant’Andrea, but when and where else specific contractors, plasterers, gilders, or mosaicists were active on other projects. Such minutiae, presently isolated into discrete packets of archival particulars, could, like the tesserae of a mosaic, be arranged and re-arranged at will through the capacities that the VAiRoma system would provide researchers.

VAiRoma has already proved to be a useful tool when employed by non-experts and students, who are a better fit for the initial Wikipedia collection. We have developed a complete VAiRoma interface for use by these groups, as described in the attached paper. The interface has been evaluated in multiple user studies and case studies. For example, a group of students who spent a summer session in Rome studying the piazzas were introduced to VAiRoma on the return. After 15 minutes of preparation, they were able to use VAiRoma to successfully find multiple additional Wikipedia articles (about 4 relevant articles per student) on the piazzas even though they had previously been collecting articles for a paper on Roman piazzas. All the students rated the interface highly for ease of use, understandability, and superiority to standard searches. We have since deployed the interface to other student classes with similar results.

In a case study, we gave a small group of faculty members and graduate students access to VAiRoma and the Wikipedia collection. We asked them to construct a timeline, with references, of the history of ancient Rome, over the period from 500 B.C. to 500 A.D, which encompasses the Roman Republic and Empire. These users had no expertise on the history of ancient Rome; they were not historians. After 15-20 minutes of training, members of the group began exploring for main events using for guidance the automatically generated topics, event peaks in the timeline, and geographic patterns extracted from the locations in the documents. Each participant focused on a 250 year period, though some studied wider time ranges. After a few hours, they had collected altogether about 200 most relevant references describing main events. These main events matched a high level timeline compiled by historians. This result shows how the VAiRoma interface permits one to quickly and effectively filter down from a very large number of initial documents (189,000) to a small number of most relevant ones that one can place in an interrelated knowledge structure. Furthermore, the Rome timeline created went beyond a narrow Roman history. For example, participants established a sub-timeline having to do with the development of the Christian church, especially in the Mediterranean area, and the concurrent development of the Jewish Rabbinical tradition. They even discovered the connection between the Roman Empire and the Indian and Chinese civilizations via trade routes including the overland and sea-borne Silk Road paths. An interesting but unexpected connection was established between the Roman Empire and the Han Dynasty in China (a concurrent empire with similarities to the Roman Empire).

Another important advantage of VAiRoma is that it is now available as a Web service. The interface has been placed on a robust Web server and can be accessed by a range of devices including desktop, laptop, and tablet computers. (Access via smart phones is even possible and has been provided for our other Web service-based tools.) A scalable cloud computing backend infrastructure has also been developed so that the full database of analysis products plus (in the case where we are permitted to share them) full texts of the articles in the collection are available for fast, interactive use.