A review of RSECon24

17 September 2024

EPCC's Kara Moraw writes about attending the Eighth Annual Conference for Research Software Engineering.

RSECon24 in Newcastle got off to a perfect start with a sunny evening at a street food market on the banks of the Tyne. Everyone picked their favourite food, took photos of the picturesque view of the bridges, and sat down to welcome both old and new faces into the research software engineering (RSE) community. 

This was the first time I had attended RSECon, and I found that all the good things they say about it are true – everyone was friendly and excited to be there, and the programme was fun, inclusive and interesting. In fact, I found it surprisingly hard to pick which sessions to attend from the packed schedule, with so many great activities running in parallel. 

Programme highlights

The mix of workshops and talks included everything from research software practices to applications and community. For example, on the technical end, I went to a talk showcasing how to write more performant data analysis code in Python. Robert Chisholm gave a dense overview of small code patterns that can lead to significant speedup, which I will certainly use as a reference from now on. The next day, Philip Whybra walked us through how they blended rainfall data from the Seventeenth Century with more recent data in an efficient and traceable manner. 

I was also very excited to attend sessions about software development processes, for example the walkthrough showcasing the peer-review process used for submissions to the Journal of Open Source Software, which ensures high software quality using a collaborative approach. 

One of my favourite sessions was Agile Methods for RSEs, where the RSE team at the University of Manchester walked us through how they adapted the Scrum framework to a research context, where teams are often much smaller than in industry and work on multiple projects at the same time. They first gave an overview of the agile values and the expected sprint process, and then showed how they scaled the process up or down depending on the needs of a project. In their experience, the expectations set out clearly by the different roles in a Scrum process were extremely helpful both for the developers and the researchers who acted as stakeholders.  

Poster: analysing the lifecycle of a research software repository

I also had a small part to play in the conference programme, namely in the poster session, where I brought a poster presenting the tools I had developed for one of my projects at EPCC: an exploratory analysis of patterns and events in the lifecycle of a research software repository. (Here is the poster: Mining RSE repository timelines on GitHub: How long will it live, and who will notice?)

My work was part of research carried out by the Software Sustainability Institute, but before we could even start looking for patterns such as community engagement phases, we had to tackle two big questions: How do we find research software repositories, and what data can we retrieve about them? 

We decided to retrieve over 200,000 research publications from ePrints repositories hosted by 16 universities across the UK. An ePrints repository contains all publications authored by researchers affiliated with the respective university, across all disciplines. For each ePrints entry, we were able to extract metadata such as the publication title and authors for potential linking with other data sources at a later stage, as well as the PDF of the research paper, which we searched for links to GitHub repositories. 

We then used GitHub’s API to collect a wide range of data about the repositories linked in the research publications, such as issues and contribution statistics, from the creation of the software repository until the date of analysis (June 2023) and reshaped this data into a timeline format. 

The resulting graphs provide an insight into common phases of a research software repository, showing increased interest after the publication date in the form of stars, forks and issues, and that repositories with a larger group of contributors tend to remain active for much longer (five years on average) than those with just one contributor which have a life expectancy of only one year. This underlines the importance of building a community around a piece of software. 

While the analysis was only exploratory, the resulting toolkit is reusable and modular, and can hopefully be used on a wider range of repositories so that we can learn much more about how to encourage other RSEs to contribute to existing software, thus extending its life span beyond the length of a research project. It was exciting to see the amount of interest at RSECon about the tools and results. 

The poster session was a lot of fun, and everyone who came to talk to us had great ideas for similar research in this area and how to build on our work. I was thrilled to be awarded second place in the poster competition and would love to see how other RSEs reuse the tools to discover more about how to make our software more sustainable.  

Community Discovery Day

The final day of RSECon was the Community Discovery Day, dedicated to exploring different groups of interest within the research software engineering community. In the morning I joined the Birds of a Feather (BoF) session looking at RSE training and learned how other RSE groups tackle training and certifying their RSEs as educators. After that, EPCC’s Kirsty Pringle hosted the Green RSE BoF, with two inspiring talks about adopting greener ways of developing research software, followed by a discussion about the proposed GREENER principles for RSE. Both the Training and the Green RSE groups plan to set up a special interest group within the Society of Research Software Engineering, so it’ll be exciting to see how these discussions continue beyond the conference!  

https://rsecon24.society-rse.org

Author

Ms Kara Moraw