Two papers on Open Source and a reflection

✍️ Written on 2022-04-29 in 845 words.
Part of cs software-development community

Motivation

The event Grazer Linuxtage, we organized, is an event on free software / open source / open hardware platforms. Related, starting from June, I will develop an open source software eight hours per week. As such I care about the FLOSS (‘Free/Libre/Open Source Software’) movement and I read two associated academic papers. Let’s talk about them.

Underproduction: An Approach for Measuring Risk in Open Source Software

In the paper “Underproduction: An Approach for Measuring Risk in Open Source Software” (peer-reviewed at SANER 2021, video on youtube) (my paper review) the authors Kaylea Champion and Benjamin Mako Hill define the notion of underproduction for software. Roughly the notion captures the imbalance between activity/attention of open source packages in relation to demand (values below 1, in particular).

To empirically quantify underproduction, they looked at the time for bug resolution versus installments of 21,902 Debian packages. 4,327 packages have been identified as underproduced (about 20%). One interesting result is that the worst underproduced applications are GUI applications. They argue that GUI applications are more difficult to maintain and often they concern not only GUI development but interaction with the operating system or kernel requiring more maintainer expertise.

A Replication Study on Measuring the Growth of Open Source

In the paper “A Replication Study on Measuring the Growth of Open Source” (updated 4 times between 2020-08 and 2022-01, I read version 5, not peer-reviewed yet) (my paper review) the authors Michael Dorner, Maximilian Capraro, Ann Barcomb, and Krzysztof Wnuk replicate previous results from 2003 (“Characteristics of open source projects”), 2007 (“Software evolution in open source projects—a large-scale investigation”), and 2008 (“The Total Growth of Open Source”) on the OpenHub project index. Open Hub lists 355,111 open-source projects as of 2021-06-06. The previous studies claimed that Open Source grows w.r.t. byte size (2003), grows quadratically (2007) or exponentially (2008) w.r.t. lines of code as well as exponentially w.r.t. projects (2008).

The initial exponential growth claimed in 2008 was confirmed until 2010, but this growth is not retained. The authors recognize a peak around 2010 and 2013 and a steady decline afterwards in various parameters (size in bytes, LoCs, number of projects).

The paper shows how bad the situation regarding empirical replication of data in software development is. On the other hand, it shows how it improved because version control and public availability of data improved. My personal summary is just that Open Source changed over time and since 2013 in particular.

Comments

The FLOSS movement certainly changed. From a subjective point of view, the FLOSS movement evolved from a rebellious group (the conflict is illustrated by e.g. the Halloween documents) towards a community with orthogonal properties to proprietary software products. Both fields have their issues (proprietary software cannot prove software behavior to a convincing extent in the IT security sector, FLOSS has issues with sustainable business models), but none is dominant to overtake the other field. As technician, you are certainly happier if you can look at the code and study software making it particularly interesting for newcomers in IT. But the complexity is often enough an issue preventing contributions from peer production.

So what has really changed (e.g. in 2013)? It is difficult to tell, but I think Github had a tremendous effect on how we deal with open source software since then. Apparently Microsoft also chimed in around that time. We easily look at source code, create issues, run CI/CD pipelines and plan designs on Github (or other platforms). The process of validating your contributions is usually easier theses days which enables people from the outside to join. But this contradicts the results of the 2nd paper: FLOSS is declining, not continuing its exponential growth.

Apparently, there must be a certain saturation. The number of people investing into FLOSS in their leisure time might be saturated. On the one hand there are more people in the software industry, but at the same time the rebellious character is gone and FLOSS development is just an alternative to doing sports/cooking/movie watching/…. I think many parts of software development also professionalized. Maintaining software takes a huge effort, because more and more aspects concern software development (e.g. the process of bug reporting, public relations, questions of community inclusivity) as we became aware of past problems. So in some ways, I would claim some parts just balanced out and thus we don’t see an exponential growth. The decline does not really indicate to me that FLOSS will be gone in a few decades.

Conclusion

Understanding and quantifying FLOSS is difficult. But these papers help to analyze issues. It was fun to share my thoughts and read the papers. Cheers!