News People Research & Projects Publications Teaching Theses Open Theses Ongoing Theses Finished Theses Cooperations Contact Us Intranet

Finished Theses

Bachelor's Theses

d3-textVis: Text Visualization Techniques Based on JavaScript

Status: Finished 2024 (thesis report in Swedish)

Grading: Bachelor's Project for Several Students (TNM094)

Area: Information Visualization

Supervisor(s): Dr. Kostiantyn Kucher

Active Student(s): Isak Karlsson, Daniel Laesker, Berkay Orhan, and Alma Linder

Content and Tasks:

The term "text visualization" is typically used for information visualization techniques that in some cases focus on raw textual data, in other cases on results of text mining algorithms [1]. There are multiple concerns related to design, implementation, and evaluation of text visualization approaches within academic research and applications [2], and while more specialized problems require custom visual representations and techniques, there is a need to develop and maintain basic functionality for text visualization that could be used as building blocks for a variety of projects.
One of the arguably best options for such common blocks is the widely used library D3.js [3–4], which provides a straightforward API for developing plugins [5]. The main aim of this thesis project is, thus, to develop a number of (arguably) generic text visualization techniques as D3 plugins. Generalizability must considered here, e.g., providing a function for splitting an input text string into tokens and representing them with a list with D3 while ensuring support for languages beyond English. The list of thesis project objectives includes the following:

Propose designs of several basic text visualization techniques based on the existing literature, tools, and generalizability considerations.
Design and implement the respective approaches as D3.js library plugins.
Prepare software documentation.
Implement basic demos of individual techniques as well as a larger demo with several interacting techniques (including text data in several languages, genres, and document lengths).
Make the source code, documentation, and demos available via an open source repository.

D3 Discovery [4]
(taken from here).

Prerequisites:

Confident front-end web development skills (including JavaScript)
Basic knowledge in information visualization

References:

Kostiantyn Kucher and Andreas Kerren. 2015. Text Visualization Techniques: Taxonomy, Visual Survey, and Community Insights. In Proceedings of the IEEE Pacific Visualization Symposium (PacificVis '15), pages 117–121. IEEE. https://doi.org/10.1109/PACIFICVIS.2015.7156366
Kostiantyn Kucher, Nicole Sultanum, Angel Daza, Vasiliki Simaki, Maria Skeppstedt, Barbara Plank, Jean-Daniel Fekete, and Narges Mahyar. 2022. An Interdisciplinary Perspective on Evaluation and Experimental Design for Visual Text Analytics: Position Paper. In Proceedings of the 2022 IEEE Workshop on Evaluation and Beyond - Methodological Approaches to Visualization (BELIV '22). IEEE. https://doi.org/BELIV57783.2022.00008.
Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. 2011. D3: Data-Driven Documents. IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 12, pages 2301–2309. https://doi.org/10.1109/TVCG.2011.185
Mike Bostock and Observable, Inc. 2023. D3 by Observable: The JavaScript Library for Bespoke Data Visualization. https://d3js.org/
Webkid. 2023. D3 Discovery: Finding D3 Plugins with Ease. https://d3-discovery.net/

Master's Theses

Lightweight Text Visualization for Close and Distant Reading

Status: Finished 2026 (thesis report) (LiU DiVA)

Grading: Master's Thesis

Area: Information Visualization / Information Visualization Evaluation / Experimental Human-Computer Interaction

Supervisor: Dr. Kostiantyn Kucher

Examiner: Prof. Dr. Andreas Kerren

Active Student(s): Philip Robertsson

Content and Tasks:

Text visualization approaches allow the users to represent and interact with the contents of individual text documents, document collections (corpora), or document streams, including the cases when support for additional text data analyses is required [1–2]. One of the ways to consider possible uses of text visualization is to contrast the traditional, manual analyses of texts (close reading) vs the ability to gain overview of the complete documents or collections (distant reading) [3]. Support for such interactive tools can make a difference in a variety of applications, as demonstrated by Voyant Tools [4] widely used in (digital) humanities, for instance. The majority of the approaches developed within the text visualization / visual text analytics research community, however, focus on highly specific scenarios that require complex solutions that might be overwhelming to most users without sufficient expertise or training, resulting in their eventual abandonment.
The main aim of this thesis project is, thus, to develop a lightweight web-based text visualization tool that would implement generic close + distant reading techniques for individual texts as well as texts from several documents (without large-scale data support requirements). Generalizability must considered here, e.g., ensuring support for languages beyond English. Furthermore, the priority must be given to implementing the basic functionality with D3.js [5–6] and/or other JavaScript libraries on frontend only, so that a tool deployed online on a public web server would allow for analysing user data without uploading it to the server (for both performance and privacy concerns). The possibility of applying more advanced natural language processing analyses on the frontend [7] in this case (e.g., text classification with specific categories [8]) must be explored further.

DoSVis tool [8] providing both close and distant reading functionality [3] for a single document.
(taken from here).

The list of thesis project objectives includes the following:

Propose a design of the interactive lightweight text visualization tool supporting individual documents and (small-scale) document collections based on the existing literature, tools, and generalizability considerations.
Implement a working prototype of the tool with basic visual representations and interactions in order to support close and distant reading.
Explore the possibilities to implement support for more advanced NLP analyses.
Evaluate the usability of the resulting prototype with user studies.
Prepare software documentation.
Make the source code, documentation, and deployment scripts/packages available via an open source repository.

Prerequisites:

Confident front-end web development skills (including JavaScript)
Basic knowledge in D3.js
Basic knowledge in information visualization
Willingness to recruit and communicate with user study participants
Programming skills in Python are a plus
Basic knowledge in natural language processing is a plus

References:

Kostiantyn Kucher and Andreas Kerren. 2015. Text Visualization Techniques: Taxonomy, Visual Survey, and Community Insights. In Proceedings of the IEEE Pacific Visualization Symposium (PacificVis '15), pages 117–121. IEEE. https://doi.org/10.1109/PACIFICVIS.2015.7156366
Kostiantyn Kucher, Nicole Sultanum, Angel Daza, Vasiliki Simaki, Maria Skeppstedt, Barbara Plank, Jean-Daniel Fekete, and Narges Mahyar. 2022. An Interdisciplinary Perspective on Evaluation and Experimental Design for Visual Text Analytics: Position Paper. In Proceedings of the 2022 IEEE Workshop on Evaluation and Beyond - Methodological Approaches to Visualization (BELIV '22). IEEE. https://doi.org/BELIV57783.2022.00008.
Stefan Jänicke, Greta Franzini, Muhammad Faisal Cheema, and Gerik Scheuermann. 2017. Visual Text Analysis in Digital Humanities. Computer Graphics Forum, vol. 36, no. 6, pages 226–250. https://doi.org/10.1111/cgf.12873
Stéfan Sinclair and Geoffrey Rockwell. 2015. Text Analysis and Visualization: Making Meaning Count. In: A New Companion to Digital Humanities. Wiley. https://doi.org/10.1002/9781118680605.ch19 . Online tool: https://voyant-tools.org/
Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. 2011. D3: Data-Driven Documents. IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 12, pages 2301–2309. https://doi.org/10.1109/TVCG.2011.185
Mike Bostock and Observable, Inc. 2023. D3 by Observable: The JavaScript Library for Bespoke Data Visualization. https://d3js.org/
Charu C. Aggarwal. 2018. Machine Learning for Text. Springer. https://doi.org/10.1007/978-3-319-73531-3
Kostiantyn Kucher, Carita Paradis, and Andreas Kerren. 2018. DoSVis: Document Stance Visualization. In Proceedings of the International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP '18) — Volume 3: IVAPP, pages 168–175. SciTePress. https://doi.org/10.5220/0006539101680175

Interactive Visual Mining of Event Sequence Patterns at Multiple Levels of Event Granularity

Status: Finished 2026 (thesis report) (LiU DiVA)

Grading: Master's Thesis

Area: Visual Analytics / Data Mining, Process Mining

Supervisor(s): Dr. Katerina Vrotsou

Examiner: Prof. Dr. Andreas Kerren

Active Student(s): Caitlin Wu

Content and Tasks: This thesis is concerned with interactively and visually mining event sequence patterns at multiple levels of detail or abstraction. Such an approach can be particularly useful in complex domains where events can be viewed from various perspectives, ranging from very detailed (fine-grained) to more generalized (coarse-grained).

Background: This thesis project is part of research concerned with the design and development of visual analytics approaches for efficient and flexible visual exploration and analysis of event sequence data. There is today a vast number of data driven applications in society and industry producing event-sequence data such as various types of logging and monitoring data. Event sequence data comprise sequences of point or interval events occurring over time. Effective analysis of this data can enable analysts to gain crucial understanding of complex and interconnected processes. Examples include the study of patients’ medical records for diagnostics and treatment planning, the analysis of visitor patterns in an exhibition for exhibit placement, and the analysis of logged events in business processes for optimization of workflows. The analysis of event sequence data is commonly focused around the identification and exploration of patterns across the data, where a pattern is defined as a sub-sequence of events that displays some interesting behaviour (eg. frequency of occurrence). For example, consider a collection of event sequences describing a pizza making process. A common pattern in this context maybe: buy ingredients (bi) -> create base (cb) -> add tomato (at) -> add cheese (ac) -> add ham (ah) -> bake in oven (bo).

In our previous work in the field of event sequence analytics, an approach for interactive visual sequence mining was proposed that allows the user to intervene in the execution of a pattern-growth algorithm and steer it towards directions of interest. This approach offers flexibility in the mining process, optimized pattern search space and incorporates the analysts interests and expertise. To further optimize the pattern discovery process there are some additional directions of interest that could be pursued.

Problem: The events composing sequences are commonly described at a certain level of detail which is decided prior to starting the pattern discovery process. Sometimes to increase flexibility in the descriptions, a hierarchy of event descriptions can be defined, for example the events add tomato (at), add cheese (ac), add ham (ah) could be aggregated to a single evet add ingredients (AI). Either way, when mining for patterns in traditional pattern mining algorithms a certain fixed level of detail needs to be selected prior to starting the mining and all patterns are mined at the same levelwhich limits the analysis. Having the flexibility to discover patterns composed of events defined at different levels of detail can often provide more meaningful patterns and insights. Mining patterns at different levels of granularity brings challenges with respect to both computation and meaningful representation of patterns which make it an interesting problem to study. This will be the focus of this thesis which sets out to explore the implementation of an interactive visual analytics approach for discovering patterns at multiple levels of granularity.

Research questions

How is multi-level mining of events handled today in event sequence analytics approaches?
How could event granularity be dynamically adjusted during the mining process?
How could visualization be used to explore mined patterns at multiple levels of event granularity?

The work in this thesis can be tailored to the students’ interests and skills in algorithm development, visualization development or tool development.

Why choosing this thesis

You are interested in working on cutting-edge research problems in the field of data analytics and visualization.
You are interested in learning about the analysis and processing of event data across multiple application domains.
You are interested in implementing interactive approaches to support actual challenges of data analysts.

References and Further Reading:

Zerbato, F., Seiger, R., Di Federico, G., Burattin, A., & Weber, B. (2021). Granularity in Process Mining: Can we fix it?. In Problems@ BPM (pp. 40-44).
Vrotsou, K., & Nordman, A. (2019). Exploratory visual sequence mining based on pattern-growth. IEEE transactions on visualization and computer graphics, 25(8), 2597-2610.

ViNCent 2.0: A Web-Based Implementation for the Visualization and Analysis of Multivariate and Temporal Networks

Status: Finished 2026 (thesis report) (LiU DiVA)

Grading: Master's Thesis

Area: Information Visualization / Visual Analytics

Supervisor: Prof. Dr. Andreas Kerren

Examiner: Dr. Kostiantyn Kucher

Active Student(s): Adrian Szuter

Content and Tasks: Centrality analysis determines the importance of vertices in a network based on their connectivity within the network structure. It is a widely used technique to analyze network-structured data. In the life sciences centrality measures help scientists to understand underlying biological processes and have been successfully applied to different biological networks. Generally spoken, those measures build a multivariate data set attached to the network nodes.
We already have developed a tool, called ViNCent, that combines exploratory data visualization with automatic analysis techniques, such as computing a variety of centrality values for network nodes as well as hierarchical clustering or node reordering based on centrality or any other multivariate values. Automatic and interactive approaches are seamlessly integrated which provides insight into the importance of an individual node or groups of nodes and allows quantifying the network structure.

Screenshot of ViNCent.
Taken from here; there is also a video available.

Aim of this work is to reimplement the ViNCent tool (especially the part that uses the Prefuse library) in JavaScript, to transform it into a web-based visualization environment where expensive computations are performed on the server side, and to reimplement the node-link representations by using the yFiles library for graph layouts. The legacy visualizations (currently done with the Prefuse library that is not longer supported) must also be reimplemented in JavaScript with the help of d3js.org. Additional interactive features could also be implemented. Those additions may include embedding the centrality views into the node-link representation or to extend the tool for the analysis of temporal networks (networks that change over time). This thesis is also well-suited for two students who want to work as a team.

Prerequisites:

Very good experiences in Java and JavaScript
Knowledge in client-server architectures
Interest to build a more complex visualization system
Basic knowledge in D3.js

Dynamic Graph Comparison Using a Magic Lens — Enhancing Network Visualisation for Temporal and Multivariate Data

Status: Finished 2024 (thesis report) (LiU DiVA)

Grading: Master's Thesis

Area: Information Visualization

Supervisor: Prof. Dr. Andreas Kerren

Examiner: Dr. Kostiantyn Kucher

Active Student(s): Casper Larsson

Content and Tasks: So-called Magic Lenses are one possibility to interact with complex visualizations, see the image below for example. A challenge in the visualization of complex and large networks is to integrate additional information into the drawing. One way to reach this aim, is to develop a magic lens that shows additional information if the user moves the lens over parts of the network. This can be done in several ways. The simple way is to show the information as separate visualization within the lens. Another possibility is to seamlessly intergrate it into the different network elements, for example, as visual node or edge attribute. The last idea was already implemented by our group, and the resulting tool is called The Network Lens.

A screenshot of the current Network Lens tool!

This lens allows to show attributes of graph nodes using glyphs.

The aim of this theses is the reimplementation and extension of the "The Network Lens" in Javascript including a user study on how exactly people use the tool. With respect to the possible extensions, the next version could provide a method to visualize temporal networks, consider the visualization of data quality (uncertainty), and more advanced node glyphs. This master's thesis is also well-suited for two students.

Prerequisites:

Javascript, D3.js, eventually WebGL
Good knowledge in computer graphics (2D) and information visualization

XploreSMR: Visual Analytic Tool for Classification and Exploration of Mass Causality Incidents Using News Media Data

Status: Finished 2024 (thesis report) (LiU DiVA)

Grading: Master's Thesis

Area: Information Visualization / Visual Analytics / Machine Learning

Supervisor: Dr. Kostiantyn Kucher

Examiner: Dr. Katerina Vrotsou

Active Student: Erik Gimbergsson

Content and Tasks: This project focuses on the design and development of a visual text analytics approach for mapping and exploring mass trauma epidemiology with an aim to increase surge capacity. Such an approach bears specific relevance for low-income countries where epidemiological databases are currently unavailable.
Surge capacity is defined as “the ability to manage a sudden, unexpected increase in patient volume that would otherwise severely challenge or exceed the current capacity of the health care system”[1]. During disasters, patient numbers increase, and patients can present with specific injuries and exposures putting the health system under additional strain. To develop surge capacity, it is essential to understand the mass trauma epidemiology of the specific setting. No globally accepted definition exists for mass trauma and its definition may vary depending on context and the capacity to handle the trauma. However, difficulties in linking hospital data to information on disaster collected and reported outside the hospital setting hinder a holistic understanding of disaster medicine epidemiology.
To address this challenge, researchers at LiU have developed a methodology; the “systematic media review” [4]. The methodology was piloted in a study where the epidemiology of mass-trauma events in Rwanda between 2010-2020 was assessed. Rwandan and international news media were analyzed, using the NexisUni search engine [2]– a software which primarily has been used for sociological research previously.
This master thesis project aims to build further on this approach and develop a visual analytics system for performing “systematic media reviews”. The system should include a machine learning (ML) component for identification and classification of relevant news media entries and a visual interface for exploring and assessing the spatial, temporal, and contextual characteristics of the retrieved media content.

The project will involve the following steps:

Survey of the field in visualization approaches for mapping and exploration of epidemiology to get an overview of existing research.
Survey of the field in Natural Language Processing techniques for mining text corpuses to identify appropriate candidates for the ML component.
Design study to identify potential user needs and define task requirements on the interface.
Design and implementation of the visual analytics system.
Pilot case study of the tool by assessing mass-trauma in Rwanda for the years 2010-2020 (so data can be compared with the data from the media review referenced above)

The vision is that such a tool will be beneficial for disaster medicine research as described above, however, in future iterations it could also be expanded to other types of epidemiological research.

References

Barbera J, McIntyre A. Jane’s Mass Casualty Handbook: Hospital. Emergency Preparedness and Response. Surrey, U.K.: Jane’s’ Information Group, Ltd; 2013.
LexisNexis Academic website [Internet]. 2021. Tillgänglig vid: www.nexisuni.com
Singh JP, Dwivedi YK, Rana NP, Kumar A, Kapoor KK. Event classification and location prediction from tweets during disasters. Ann Oper Res. 01 december 2019;283(1):737–57.
Velin L, Donatien M, Wladis A, Nkeshimana M, Riviello R, Uwitonze J-M, m.fl. Systematic media review: A novel method to assess mass-trauma epidemiology in absence of databases—A pilot-study in Rwanda. PLOS ONE. 13 oktober 2021;16(10):e0258446.
Wei, L. L. Y., Ibrahim, A. A. A., Nisar, K., Ismail, Z. I. A., & Welch, I. (2020). Survey on geographic visual display techniques in epidemiology: Taxonomy and characterization. Journal of Industrial Information Integration, 18, 100139.

Visual Analysis of Humor Assessment in Edited News Headlines

Status: Finished 2023 (thesis report) (LiU DiVA)

Grading: Master's Thesis for Two Students

Area: Information Visualization / Visual Analytics / Text Mining

Supervisor: Dr. Kostiantyn Kucher

Examiner: Prof. Dr. Andreas Kerren

Active Students: Johanna Folde and Elin Akkurt

Content and Tasks:

Identification, prediction, and generation of text with highly subjective and context-dependent properties are important and difficult challenges in computational linguistics. Sarcasm and irony detection are examples of natural language processing (NLP) tasks that are considered challenging, including the issue of agreeing on the consistent annotations/labels for particular sentences or documents among human annotators with respect to such elusive categories.
Computational approaches for identifying and analyzing humor in texts have also been in the focus of the NLP research community, with the shared task (contest) titled "Assessing Humor in Edited News Headlines" recognized as the best task at SemEval-2020 [1]. The respective task provides the data on news headlines in English with minimal edits (e.g., single word replacements) made in order to make the respective headlines humorous [2-4]. The actual level of humor/funniness of such edited headlines was assessed by multiple annotators on an ordinal scale [5-6]. The aim of the contest was to discover better-performing computational methods (e.g., machine learning approaches) that would predict the funniness level or rank two edited versions of a headline. While a number of solutions focusing on such regression and classification tasks were proposed [3], there are further questions to be asked and insights to be discovered within the respective data, for instance, how consistent are the funniness level annotations across the topics, or to which extent can the funniness scores range across several related headlines.
To support answering such questions (rather than focusing on text classification or regression itself), a visual text analytic approach is required, and this is precisely the topic of this thesis project. The prior work on text visualization and visual text analytics [7–8] can be used to guide the design process for this project, however, computational analysis of humor is not widely covered by such prior work. The very recent DeHumor approach by Wang et al. [9], for instance, focuses on in-depth multimodal analyses of comedy performances. In order to support interactive visual analyses of humor in edited news headlines, the challenges of representing and interacting with the data in the respective text genre, multiple annotations, and further possible data facets (such as named entities and topics identified in headlines) will have to be addressed.

Humorous headline annotation interface by Hossain et al. [6]
(taken from here).

Prerequisites:

Programming skills in Python and JavaScript
Basic knowledge in D3.js and/or Plotly.js
Basic knowledge in information visualization
Basic knowledge in natural language processing / text mining

References:

International Workshop on Semantic Evaluation 2020. https://alt.qcri.org/semeval2020/
SemEval-2020 Task 7: Assessing Humor in Edited News Headlines. https://competitions.codalab.org/competitions/20970
Nabil Hossain, John Krumm, Michael Gamon, and Henry Kautz. 2020. SemEval-2020 Task 7: Assessing Humor in Edited News Headlines. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 746–758, Barcelona (online). International Committee for Computational Linguistics. https://doi.org/10.18653/v1/2020.semeval-1.98
Headline Humor Dataset. https://www.cs.rochester.edu/u/nhossain/humicroedit.html
Nabil Hossain, John Krumm, and Michael Gamon. 2019. “President Vows to Cut Hair”: Dataset and Analysis of Creative Text Editing for Humorous Headlines. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 133–142, Minneapolis, Minnesota. Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1012
Nabil Hossain, John Krumm, Tanvir Sajed, and Henry Kautz. 2020. Stimulating Creativity with FunLines: A Case Study of Humor Generation in Headlines. ArXiV preprint 2002.02031. https://doi.org/10.48550/arXiv.2002.02031
Kostiantyn Kucher and Andreas Kerren. 2015. Text Visualization Techniques: Taxonomy, Visual Survey, and Community Insights. In Proceedings of the 8th IEEE Pacific Visualization Symposium (PacificVis '15), pages 117-121, https://doi.org/10.1109/PACIFICVIS.2015.7156366
Mohammad Alharbi and Robert S. Laramee. 2019. SoS TextVis: An Extended Survey of Surveys on Text Visualization. Computers 8, no. 1, article 17. https://doi.org/10.3390/computers8010017
Xingbo Wang, Yao Ming, Tongshuang Wu, Haipeng Zeng, Yong Wang, and Huamin Qu. 2022. DeHumor: Visual Analytics for Decomposing Humor. IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 12, pages 4609-4623. https://doi.org/10.1109/TVCG.2021.3097709