DH Inside Out - Mini-conference at DH2024 - Program

DH Inside Out - Program

From August 6-9, 2024, DH2024 will be held at George Mason University in Arlington, Virginia, USA. DHTech will hold a mini-conference at DH2024 themed “DH Inside Out”. Typical DH conference presentations are focussed on the research with a slight nod to the technical details; we want to flip that format and dive more deeply into the technical aspects of the work, while still keeping it in context of the research and domain specifics.

We received a great number of high quality submissions, showing the importance of providing a venue for technically focused talks. We were able to accept 16 submissions to be included in the mini-conference. We thank all the submitters and reviewers for their work!

Agenda - Monday, August 5, 9:00am to 4:30pm

Session 1 - Moderator: Malte Vogl
9:00am - 9:20amDHTech Steering CommitteeWelcome
9:20am - 9:40amBryan TarpleyCorpora: A Dataset Studio for the Digital Humanities
This session will showcase Corpora, an open-source, web-based tool created by the author for allowing humanities researchers to conveniently create, manage, and explore their datasets; while also serving as a dynamically generated backend (REST API) for 3rd-party tools or public facing websites. A single instance of Corpora running on modest hardware currently serves data behind the scenes for the New Variorum Shakespeare, the Carlyle Letters Online, the Maria Edgeworth Letters Project, and the Advanced Research Consortium. The author will also survey the tech stack comprising Corpora, which includes a multi-model database architecture relying on MongoDB, Elasticsearch, and Neo4J.
9:40am-10:00amAdam PorterFolium
Folium is a Python library that facilitates the creation of interactive HTML maps with Leaflet.js. Working with Leaflet directly requires some knowledge of Javascript and HTML, but Folium allows mapmakers to work entirely within the Python environment to create a wide variety of maps. Drawing on a couple of different datasets, my presentation will provide a brief overview of the types of maps Folium creates before delving more deeply into choropleth maps. I will discuss how to establish useful color scales when the data distribution is uneven and has significant positive (or negative) skews.
10:00am-10:20amNilo PedrazziniGramTypix: a Python pipeline for subtoken-based language typology and probabilistic semantic mapping from parallel corpora
GramTypix is a Python-based pipeline designed for linguistic typology and probabilistic semantic mapping from massively parallel corpora. It employs a subtoken-based approach to capture cross-linguistic variation in constructions, encoded both lexically and morphologically. The method integrates lexical associations with character n-grams from multiple languages to build semantic maps based on both lexical items and morphological markers, while automatically considering potential allomorphs as one marker. The emphasis is on customizable components in n-gram searches and on statistical analysis techniques, such as multidimensional scaling and geostatistical interpolation (Kriging), now standard in typological research.
10:20am-10:40amRebecca KoeserSimulating risk attitudes and rationality using agent-based modeling
10:40am-11:00amBreak
Session 2 - Moderator: Cole Crawford
11:00am-11:20amJulia Damerow, Rebecca S. Koeser, Malte VoglBuilding a Community for DH Code Review
Code review is a commonly used practice in software development. In a code review, a developer who is not the author of a piece of code, reads through the code and makes suggestions on how to improve it, asks questions where the work is unclear, and points out cases that may not be handled. In digital humanities, many developers writing code for a research project do not work in teams, which poses the question, how is it possible to integrate code review effectively in this situation? Our talk will discuss the DHTech working group on code review that is aiming to address this question by building a community for peer code review. We will share the successes and challenges the working group has faced, and our experiences and lessons learnt from facilitating code reviews.
11:20am-11:40amSam BlickhanFunding maintenance and sustaining free, open-source DH tools on the Zooniverse platform
This presentation will describe an NEH-funded project to review key digital humanities resources on the Zooniverse crowdsourcing platform in preparation for plans to scale. We will describe the technical motivations, the code audit and survey design process, and early results of the user surveys. It will dig into the complexities of maintaining and sustaining free, open-source software in an environment where funding is traditionally tied to innovation, and how we balance technical necessities with user needs. We hope that audience members will leave understanding our approach to code maintenance and community preference, and what opportunities exist for funding DH tools.
11:40am-12:00pmJulia DamerowEmbracing Mistakes, Accepting Imperfection
Research software is in many cases developed by people who learn the craft of software engineering while they are developing the software. What are the consequences of this situation, in which the developers of a piece of code are not expert software engineers? I will give examples of projects I have worked on in which mistakes and decisions were made that have negatively influenced the development even years later. I believe that it is important to be open about the mistakes and bad decisions we make along the way to encourage every member of the community to share their experiences and their code, so we can learn from each other in order to create trustworthy research software that creates reliable results.
12:00pm-12:20pmJose Hernandez, David Nelson, Julia DamerowDHTech Education & Training Working Group
The DHTech Interest Group has established the Education and Training Working Group to address the identification of necessary competencies for DH software developers and the creation of educational resources tailored for our community. This session shall inform attendees of the ongoing labors of the working group, which have focused on the consolidation and creation of materials for digital project organization and tangible digital skills such as Python packaging and the implementation of documentation. In addition, a discussion space will be provided to identify the areas of focus for the next year by the community and gauge interest for additional contributors.
12:20pm-1:30pmLunch Break
Session 3 - Moderator: Rebecca Sutton Koeser
1:30pm-1:50pmJulian GonggrijpThe READ-IT interface: RDF throughout
The READ-IT interface is a central hub through which users upload, annotate and search through sources containing reading testimonies. This web application is unusual in its thorough adoption of linked data (RDF). Even the data model is dynamically read from an OWL ontology, so the client can offer appropriate controls to the user while staying agnostic of the problem domain. Hence, the client may be considered an advanced reasoner. We examine some of the abstractions employed to make this possible and some of the challenges encountered while attempting to embed our application in the existing web of linked open data.
1:50pm-2:10pmMalte VoglTippingPigs - Serious games as communication device between scientists, programmers and the public
In this tool presentation, we will introduce the game Tipping Pigs. The game was initially developed as a science outreach exercise, with the goal to communicate the intricacies of cascading tipping points to general audiences. Developed in the opensource framework Godot, the games source code is available at Gitlab and can be played online on Itch.io. The game interface is in German, but should be more or less self-explaining. In the tool presentation, we will present the game motivation and logic and present our argument for a stronger integration of game development and scientific inquiry.
2:10pm-2:30pmJose HernandezParallel Processing in DH Projects: Web Scraping with Python
This session will discuss the application of parallel processing methods for digital humanities research by showcasing a web-scraping project on online extremism. Here we will explain two methods: the Python multiprocessing module and SLURM job arrays for the use of supercomputing systems. The example project extracted 2.5 million URLs, each corresponding to one forum, and then scraped these to compile 15 million posts for future data analysis. The presentation aims to illustrate parallelism as another tool for DH software developers and provide a comprehensive guide for identifying projects that could benefit from parallel processing in the attendees’ areas of research.
2:30pm-2:50pmAshley Dennis-HendersonUsing Regular Expressions and Optimisation to Create a Method for Automatically Extracting Dates from War Diaries
Analysing historic diaries using natural language processing techniques is advantageous as they contain temporal information regarding the sequence the text was written. However, dates within such diaries are not always written consistently, and can be challenging and time consuming to extract. Using a collection of Australian World War I diaries as a case study, we have devised a method for automatically extracting dates using a combination of regular expressions and optimisation. This talk explores this method, focusing on the optimisation program format, the method’s accuracy, and the use of a GUI to manually create a testing data set.
2:50pm-3:00pmBreak
Session 4 - Moderator: Robert Casties
3:00pm-3:20pmMees van StiphoutI-Analyzer – A Digital Text Corpus Tool for All Forever?
I-Analyzer is a text corpus analysis tool that implements distant reading functionalities for a wide range of corpora, from historical newspapers to funerary inscriptions. Over six years of continual development has given us some insight into how a research software can grow in size and scope based on the demand from researchers, as well as into some of the challenges that come from the discrepancy between the project-based funding and the continual need for maintenance. During this session we hope to not only share these insights, but also hear how others have dealt with these challenges.
3:20pm-3:40pmKeli DuSoftware development begins when the code is complete: Lessons learnt from the development of Pydistinto
In corpus and computational linguistics as well as in the digital humanities, there are a variety of measures of distinctiveness (also known as keyness measures) that can be used for comparative text analysis. In our talk, we will first introduce our Python package Pydistinto, give an overview of the nine implemented measures, describe the parameters that need to be set, and explain the process of comparing a target corpus with a comparison corpus as well as visualizing the results. With this in mind, we would like to focus on sharing our experiences with the design and development of the program.
3:40pm-4:00pmNick LaiaconaPublishing Digital Editions with EditionCrafter
EditionCrafter is an open source tool for publishing digital editions encoded in TEI XML. It is based on code from the digital critical edition "Secrets of Craft and Nature in Renaissance France" published by the Making and Knowing Project at the Center for Science and Society at Columbia University. EditionCrafter can generate sophisticated diplomatic renderings of text side-by-side with a deep zoom window of the original folios. There are also built-in functions for variorums, editorial notes, and glossaries. In this presentation, we will talk about how we created EditionCrafter and demonstrate how to incorporate it into your website.
4:00pm-4:20pmTrent WintermeierAVAnnotate
AVAnnotate is an application and workflow, designed by Dr. Tanya Clement and Brumfield Labs, which allows users to build digital exhibits of annotated audiovisual artifacts. AVAnnotate connects open source tools for annotation (such as Audacity), audiovisual management (Aviary), public code and document repositories (GitHub), and the AVAnnotate web application to create and share IIIF manifests and annotations via static, multi-page, online exhibits. In this Tools Presentation, we will discuss how the technical infrastructure of AVAnnotate represents a new kind of ecosystem where exchange is opened between institutional repositories, annotation software, online repositories and publication platforms, and all kinds of users.
4:20pm-4:30pmDHTech Steering CommitteeGoodbye and Thank You