Finished Theses
Bachelor's Theses
d3-textVis: Text Visualization Techniques Based on JavaScript
Status: Finished 2024 (thesis report in Swedish)
Grading: Bachelor's Project for Several Students (TNM094)
Area: Information Visualization
Supervisor(s): Dr. Kostiantyn Kucher
Active Student(s): Isak Karlsson, Daniel Laesker, Berkay Orhan, and Alma Linder
Content and Tasks:
The term "text visualization" is typically used for information visualization techniques that in some cases focus on raw textual data, in other cases on results of text mining algorithms [1].
There are multiple concerns related to design, implementation, and evaluation of text visualization approaches within academic research and applications [2], and while more specialized problems require custom visual representations and techniques, there is a need to develop and maintain basic functionality for text visualization that could be used as building blocks for a variety of projects.
One of the arguably best options for such common blocks is the widely used library D3.js [3–4], which provides a straightforward API for developing plugins [5].
The main aim of this thesis project is, thus, to develop a number of (arguably) generic text visualization techniques as D3 plugins.
Generalizability must considered here, e.g., providing a function for splitting an input text string into tokens and representing them with a list with D3 while ensuring support for languages beyond English.
The list of thesis project objectives includes the following:
- Propose designs of several basic text visualization techniques based on the existing literature, tools, and generalizability considerations.
- Design and implement the respective approaches as D3.js library plugins.
- Prepare software documentation.
- Implement basic demos of individual techniques as well as a larger demo with several interacting techniques (including text data in several languages, genres, and document lengths).
- Make the source code, documentation, and demos available via an open source repository.
D3 Discovery [4]
(taken from here). |
- Confident front-end web development skills (including JavaScript)
- Basic knowledge in information visualization
- Kostiantyn Kucher and Andreas Kerren. 2015. Text Visualization Techniques: Taxonomy, Visual Survey, and Community Insights. In Proceedings of the IEEE Pacific Visualization Symposium (PacificVis '15), pages 117–121. IEEE. https://doi.org/10.1109/PACIFICVIS.2015.7156366
- Kostiantyn Kucher, Nicole Sultanum, Angel Daza, Vasiliki Simaki, Maria Skeppstedt, Barbara Plank, Jean-Daniel Fekete, and Narges Mahyar. 2022. An Interdisciplinary Perspective on Evaluation and Experimental Design for Visual Text Analytics: Position Paper. In Proceedings of the 2022 IEEE Workshop on Evaluation and Beyond - Methodological Approaches to Visualization (BELIV '22). IEEE. https://doi.org/BELIV57783.2022.00008.
- Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. 2011. D3: Data-Driven Documents. IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 12, pages 2301–2309. https://doi.org/10.1109/TVCG.2011.185
- Mike Bostock and Observable, Inc. 2023. D3 by Observable: The JavaScript Library for Bespoke Data Visualization. https://d3js.org/
- Webkid. 2023. D3 Discovery: Finding D3 Plugins with Ease. https://d3-discovery.net/
Master's Theses
Dynamic Graph Comparison Using a Magic Lens — Enhancing Network Visualisation for Temporal and Multivariate Data
Status: Finished 2024 (thesis report) (LiU DiVA)
Grading: Master's Thesis
Area: Information Visualization
Supervisor: Prof. Dr. Andreas Kerren
Examiner: Dr. Kostiantyn Kucher
Active Student(s): Casper Larsson
Content and Tasks: So-called Magic Lenses are one possibility to interact with complex visualizations, see the image below for example. A challenge in the visualization of complex and large networks is to integrate additional information into the drawing. One way to reach this aim, is to develop a magic lens that shows additional information if the user moves the lens over parts of the network. This can be done in several ways. The simple way is to show the information as separate visualization within the lens. Another possibility is to seamlessly intergrate it into the different network elements, for example, as visual node or edge attribute. The last idea was already implemented by our group, and the resulting tool is called The Network Lens.
|
A screenshot of the current Network Lens tool! This lens allows to show attributes of graph nodes using glyphs. |
The aim of this theses is the reimplementation and extension of the "The Network Lens" in Javascript including a user study on how exactly people use the tool. With respect to the possible extensions, the next version could provide a method to visualize temporal networks, consider the visualization of data quality (uncertainty), and more advanced node glyphs. This master's thesis is also well-suited for two students.
Prerequisites:
- Javascript, D3.js, eventually WebGL
- Good knowledge in computer graphics (2D) and information visualization
XploreSMR: Visual Analytic Tool for Classification and Exploration of Mass Causality Incidents Using News Media Data
Status: Finished 2024 (thesis report) (LiU DiVA)
Grading: Master's Thesis
Area: Information Visualization / Visual Analytics / Machine Learning
Supervisor: Dr. Kostiantyn Kucher
Examiner: Dr. Katerina Vrotsou
Active Student: Erik Gimbergsson
Content and Tasks: This project focuses on the design and development of a visual text analytics approach for mapping and exploring mass trauma epidemiology with an aim to increase surge capacity. Such an approach bears specific relevance for low-income countries where epidemiological databases are currently unavailable.Surge capacity is defined as “the ability to manage a sudden, unexpected increase in patient volume that would otherwise severely challenge or exceed the current capacity of the health care system”[1]. During disasters, patient numbers increase, and patients can present with specific injuries and exposures putting the health system under additional strain. To develop surge capacity, it is essential to understand the mass trauma epidemiology of the specific setting. No globally accepted definition exists for mass trauma and its definition may vary depending on context and the capacity to handle the trauma. However, difficulties in linking hospital data to information on disaster collected and reported outside the hospital setting hinder a holistic understanding of disaster medicine epidemiology.
To address this challenge, researchers at LiU have developed a methodology; the “systematic media review” [4]. The methodology was piloted in a study where the epidemiology of mass-trauma events in Rwanda between 2010-2020 was assessed. Rwandan and international news media were analyzed, using the NexisUni search engine [2]– a software which primarily has been used for sociological research previously.
This master thesis project aims to build further on this approach and develop a visual analytics system for performing “systematic media reviews”. The system should include a machine learning (ML) component for identification and classification of relevant news media entries and a visual interface for exploring and assessing the spatial, temporal, and contextual characteristics of the retrieved media content.
The project will involve the following steps:
- Survey of the field in visualization approaches for mapping and exploration of epidemiology to get an overview of existing research.
- Survey of the field in Natural Language Processing techniques for mining text corpuses to identify appropriate candidates for the ML component.
- Design study to identify potential user needs and define task requirements on the interface.
- Design and implementation of the visual analytics system.
- Pilot case study of the tool by assessing mass-trauma in Rwanda for the years 2010-2020 (so data can be compared with the data from the media review referenced above)
References
- Barbera J, McIntyre A. Jane’s Mass Casualty Handbook: Hospital. Emergency Preparedness and Response. Surrey, U.K.: Jane’s’ Information Group, Ltd; 2013.
- LexisNexis Academic website [Internet]. 2021. Tillgänglig vid: www.nexisuni.com
- Singh JP, Dwivedi YK, Rana NP, Kumar A, Kapoor KK. Event classification and location prediction from tweets during disasters. Ann Oper Res. 01 december 2019;283(1):737–57.
- Velin L, Donatien M, Wladis A, Nkeshimana M, Riviello R, Uwitonze J-M, m.fl. Systematic media review: A novel method to assess mass-trauma epidemiology in absence of databases—A pilot-study in Rwanda. PLOS ONE. 13 oktober 2021;16(10):e0258446.
- Wei, L. L. Y., Ibrahim, A. A. A., Nisar, K., Ismail, Z. I. A., & Welch, I. (2020). Survey on geographic visual display techniques in epidemiology: Taxonomy and characterization. Journal of Industrial Information Integration, 18, 100139.
Visual Analysis of Humor Assessment in Edited News Headlines
Status: Finished 2023 (thesis report) (LiU DiVA)
Grading: Master's Thesis for Two Students
Area: Information Visualization / Visual Analytics / Text Mining
Supervisor: Dr. Kostiantyn Kucher
Examiner: Prof. Dr. Andreas Kerren
Active Students: Johanna Folde and Elin Akkurt
Content and Tasks:
Identification, prediction, and generation of text with highly subjective and context-dependent properties are important and difficult challenges in computational linguistics. Sarcasm and irony detection are examples of natural language processing (NLP) tasks that are considered challenging, including the issue of agreeing on the consistent annotations/labels for particular sentences or documents among human annotators with respect to such elusive categories.
Computational approaches for identifying and analyzing humor in texts have also been in the focus of the NLP research community, with the shared task (contest) titled "Assessing Humor in Edited News Headlines" recognized as the best task at SemEval-2020 [1]. The respective task provides the data on news headlines in English with minimal edits (e.g., single word replacements) made in order to make the respective headlines humorous [2-4]. The actual level of humor/funniness of such edited headlines was assessed by multiple annotators on an ordinal scale [5-6]. The aim of the contest was to discover better-performing computational methods (e.g., machine learning approaches) that would predict the funniness level or rank two edited versions of a headline. While a number of solutions focusing on such regression and classification tasks were proposed [3], there are further questions to be asked and insights to be discovered within the respective data, for instance, how consistent are the funniness level annotations across the topics, or to which extent can the funniness scores range across several related headlines.
To support answering such questions (rather than focusing on text classification or regression itself), a visual text analytic approach is required, and this is precisely the topic of this thesis project. The prior work on text visualization and visual text analytics [7–8] can be used to guide the design process for this project, however, computational analysis of humor is not widely covered by such prior work. The very recent DeHumor approach by Wang et al. [9], for instance, focuses on in-depth multimodal analyses of comedy performances. In order to support interactive visual analyses of humor in edited news headlines, the challenges of representing and interacting with the data in the respective text genre, multiple annotations, and further possible data facets (such as named entities and topics identified in headlines) will have to be addressed.
Humorous headline annotation interface by Hossain et al. [6]
(taken from here). |
- Programming skills in Python and JavaScript
- Basic knowledge in D3.js and/or Plotly.js
- Basic knowledge in information visualization
- Basic knowledge in natural language processing / text mining
- International Workshop on Semantic Evaluation 2020. https://alt.qcri.org/semeval2020/
- SemEval-2020 Task 7: Assessing Humor in Edited News Headlines. https://competitions.codalab.org/competitions/20970
- Nabil Hossain, John Krumm, Michael Gamon, and Henry Kautz. 2020. SemEval-2020 Task 7: Assessing Humor in Edited News Headlines. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 746–758, Barcelona (online). International Committee for Computational Linguistics. https://doi.org/10.18653/v1/2020.semeval-1.98
- Headline Humor Dataset. https://www.cs.rochester.edu/u/nhossain/humicroedit.html
- Nabil Hossain, John Krumm, and Michael Gamon. 2019. “President Vows to Cut Hair”: Dataset and Analysis of Creative Text Editing for Humorous Headlines. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 133–142, Minneapolis, Minnesota. Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1012
- Nabil Hossain, John Krumm, Tanvir Sajed, and Henry Kautz. 2020. Stimulating Creativity with FunLines: A Case Study of Humor Generation in Headlines. ArXiV preprint 2002.02031. https://doi.org/10.48550/arXiv.2002.02031
- Kostiantyn Kucher and Andreas Kerren. 2015. Text Visualization Techniques: Taxonomy, Visual Survey, and Community Insights. In Proceedings of the 8th IEEE Pacific Visualization Symposium (PacificVis '15), pages 117-121, https://doi.org/10.1109/PACIFICVIS.2015.7156366
- Mohammad Alharbi and Robert S. Laramee. 2019. SoS TextVis: An Extended Survey of Surveys on Text Visualization. Computers 8, no. 1, article 17. https://doi.org/10.3390/computers8010017
- Xingbo Wang, Yao Ming, Tongshuang Wu, Haipeng Zeng, Yong Wang, and Huamin Qu. 2022. DeHumor: Visual Analytics for Decomposing Humor. IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 12, pages 4609-4623. https://doi.org/10.1109/TVCG.2021.3097709