DH RSE Survey Discussion - Part 1

The survey results can be found here. A second blog post that takes a closer look at the demographic questions of the survey will follow. A total of 66 people responded to the survey.

Self-classification

About half of all respondents classified themselves as “Service RSE,” which we defined as someone who is working with a team of other Research Software Engineers (RSEs) to provide support for digital projects. This was rather surprising to us as from our experiences most “technical” people in the digital humanities seem to be people that use coding in their research or are in postdoc or similar roles doing “the coding part.” It might be due to the fact that we reached more successfully more established technical teams than people who do technical work but are less connected to formal and informal networks like ours. The other half of respondents consisted for a large part of people self-identifying as “Embedded RSEs,” which we defined as someone who works with others that do not have any coding skills and helps them with their coding related work. We considered the participants who classified themselves as neither Service nor Embedded RSE for the most part as having a similar role to Embedded RSEs in the sense that they were researchers that have robust technical expertise. A few respondents didn’t fall clearly in either category.

We realize that the classifications we provided were less than ideal. Many people doing technical work in the digital humanities might not refer to themselves as research software engineers. If we redo this survey at some point, we should rethink this question to make it more applicable to a wider range of people. For the purpose of this blog post, we use the terms “service RSE” and “embedded RSE” in the widest possible sense, meaning that not just software engineers fall into these categories but anyone doing technical work.

Reflecting, however, on the above numbers a bit, we are wondering what does this mean for building a community of people doing technical work in the digital humanities? It is reasonable to assume that people working in a more service-oriented RSE role are more closely aligned with traditional software engineering positions. They might be more likely part of national RSE associations or more general software engineering communities. In regards to conferences, they might attend yearly RSE events but they are probably a lot less likely to attend the yearly DH conference, which tends to be more focused on the humanities part in digital humanities than the technical part. Embedded RSEs or researchers that program for their own research on the other hand are probably more likely to attend the DH conference or community events aimed at their research area than attending an RSE event. We make these assumptions as they seem to be supported by our personal experiences. They would also explain why we were surprised by the survey responses to the self-classification question. If our assumption is right and you connect to your community through the DH conference or smaller domain-specific events, you will probably not meet too many people in service RSE positions.

The question is, is this a problem? Should we aim to get people together from both ends of the spectrum? Should there be a venue that brings people together for exchanging ideas and best practices? We believe there should be and that it would improve the work of both groups of people. People who use coding as a tool to further their research might learn from people with potentially more robust software engineering expertise, experiences and skills to improve their code, code management, and more in general the software development lifecycle of digital projects; and those in defined RSEs roles and careers might get a better understanding of the challenges, issues and questions that humanities research raises.

Programming Languages and Frameworks

We compared the results of the question about programming languages with the results from the 2018 RSE survey. Not very surprisingly, Python is the language of choice for RSEs in general, DH RSE or not (we are using RSE here in the broadest sense to include anyone doing coding or technical work). When it comes to the second and third most used languages though, the Digital Humanities have different preferences. While in most countries, the second preferred language is C++ or R for RSEs in general, Javascript is the clear winner for DH RSEs. We assume that this is so because a majority of digital humanities projects have at a least one component that focuses on presenting their work through web applications. Since the objects of study are typically texts, images, videos, or similar data types, many projects develop web applications or web pages that use Javascript.

Another difference to the general RSE community in regards to programming languages, are that places three and five in the DH RSE survey are taken by XML-related technologies (XSLT, XQuery) and PHP. While PHP seems to be used by RSEs in general though ranking lower overall, XSLT or XQuery is not even on their list. We explain this by the wide-spread use of TEI and the existence of so many digital editions and collections projects that often use XML to encode their texts.

Another thing to note is that although it was only a very small number, some respondents did indicate they didn’t program at all or only very little, which means that their work probably focuses on other aspects of software development. This adds another dimension to the question what technical work in DH is or entails.

When it comes to frameworks, unsurprisingly, the most common ones are Python, Javascript, and PHP frameworks. Most of them (if not all) being web application frameworks. This fits with our assumption that DH projects often develop web applications or web pages to present their work.

Classification of Tasks

50% of the respondents indicated that they did project management tasks in addition to other (technical) work. Since we didn’t define what we understood as project management, this number might actually be even higher if we consider any kind of task that relates to processes and workflows around the development of software and digital resources as being project management. We are wondering what that means for the quality of the resulting products?

In many industry software development projects, there are dedicated project managers that handle tasks such as setting up processes for user feedback, communication with stakeholders, or developing budget plans. In a research setting, there is often no such role. Many projects are managed by the PI and the software development process is often managed by the person doing the technical work. In many cases where only ad hoc small projects are managed at a time, this makes sense. However, the question arises, should there be project management related training opportunities? How well do people manage their technical work? Does everybody use a bug tracker or other tool to track bugs, change or feature requests or would it be valuable to have sessions at conferences and through webinars that explain basic (and not so basic) project management topics? Do we all know how to properly document code, and simply don’t do it, or do we not know any better? Could the issue of unmaintained (and unmaintainable), unsustainable code we are all familiar with be improved by teaching our community better project management skills?

Obviously strategies to solve these topics have to depend on each case at hand and no single solution will fit all projects. For one project, simply tracking bugs via GitHub issues is enough, while bigger projects or teams working on multiple projects at the same time might need a dedicated project manager that for instance handles communication with stakeholders and developers. We don’t have the answer but we believe these are questions worth considering.

Topic preference

In the last section of this blog post, we are looking at the three questions that were asked together in regards to which topics the respondents mostly worked on, enjoyed the most working on, and would like to improve in. The first thing to notice is that a majority of respondents work in software engineering. However, only about ⅔ of them actually enjoy it. We have to note though that since we asked what topics people enjoyed most and not what people didn’t enjoy, for some respondents software engineering might just not be the part of their job they enjoy the most, but they also don’t mind it.

Only 5 people indicated that they would like to improve their software engineering skills. This begs the question if people doing software engineering in DH are already so skilled they don’t require more training, if they are simply not aware that there is more to learn, or if they know there is more to learn but are not interested in doing so. Similar to software engineering, quite a few people’s work involves server setup and maintenance, but hardly anyone enjoys doing it. There seems to be a bigger awareness for its usefulness however, as twice as many people indicated they’d like to improve their skills in that regard (10 vs 5).

When taking into account how respondents classified themselves, we find that a total of 15 people that either self-classified as service RSEs or are in a role that seems similar enough, stated that they enjoy software engineering. 9 service RSEs, however, did not indicate that they enjoyed software engineering. The reverse is true for embedded RSEs. 7 stated that they enjoyed it, while 9 did not. It seems logical that service RSEs more often enjoy software engineering, if we assume that many service RSEs do their job because they enjoy building software, while embedded RSEs might often code out of necessity to further their research.

We noticed that the topics that would align more with research tasks such as algorithm development, data modeling, or machine learning, were all more in demand in regards to skill improvement than the tasks related to building and maintaining software (software engineering and server setup). We have two possible explanations for this. On the one hand, developing software might be more the means to an end especially for people whose main focus is on their research. For them it might be good enough as long as their code produces results, but they might not care about employing software engineering best practices. On the other hand, it is still the case that publishing results in papers is valued more for career progression in a research focused position than creating clean, maintainable, and reusable code. Hence, with the many time constraints we all face, brushing up on one’s software engineering skill might simply not be a priority.

Conclusion

We’ve made quite a few assumptions in this blog post to explain the results we’ve seen. Most of them are backed by our personal experiences. Obviously, there are other explanations that we haven’t thought of and ours might not be correct. However, we believe many of the questions we have raised should be considered and thought through. There might be some simple measures we can take as a community to improve some of the issues we’ve identified, may it be workshops at conferences or webinars. We will publish a second blog post soon that will discuss the other half of the survey that asked for demographic information.

Call for Opinions

We are interested in your view of this. If you want to share your experiences in your professional setting or discuss with us, feel free to join our Slack group or simply write a small opinion post for this blog.