The Open Source Software (OSS) movement is the logical outcome of the free flow of information across political boundaries. During the internet boom of the nineties, decentralized networks of talented and passionate developers began writing open-source versions of proprietary software to be distributed over the internet. With some technical knowledge, it is now possible to download free software with feature parity with closed software suites like Microsoft's Windows and Office or Adobe's Creative Cloud. The dream of West Coast Techno-Libertarianism made manifest, libre gratis. OSS is found everywhere and is the foundation for the lightspeed innovation we've seen in the last three and a half decades.
The openness of OSS means that individuals from anywhere in the world can contribute to projects. But what are the incentives for writing OSS? Software is hard, and a workable and secure product requires constant attention and planning. Motivations fall into a few broad categories:
Payroll Developers: Many open-source developers are on the payroll of software companies. These companies often have open-source components that support or interact with their products, like the PayPal SDK, which allows e-shop developers to use the PayPal system. Sometimes, the Open-Source version will be a stripped-down version of the paid product, like Gitlab and Gitlab Enterprise. Perhaps customers can pay companies to host the software they wrote, like Canvas or MongoDB.
Grants: A couple of private and public programs support the development of OSS and the developers' compensation. The Google Open Source office webpage boasts support of a couple dozen open source projects like Python, OpenTitan, LLVM, and Apache, and initiatives that promote advocacy, like the Software Freedom Conservancy, interoperability, like Schema.org, and preservation like the Software Heritage. Typically, these programs come from organizations with deep pockets, like the German Government, advanced technical needs, like Mozilla, or both in the case of Google.
Academia: Researchers must open source at least the foundations of their projects to get published, so plenty of projects, like the Linux kernel, have their roots in academic research.
Company Uses It: If a company consumes OSS, there's a decent chance that they want a new feature or come across a bug. Plus, it is in a company's interest to curry favor with the maintainers of projects they use. One of AWS's first services was EC2, which allows users to spin up their Virtual Machines on Amazon's infrastructure. When EC2 was run on the Xen Hypervisor, Amazon was one of the primary contributors to the project.
Love of the Game: Some Contributors don't require compensation for various reasons. Some come across a problem, solve it, then publish the solution as part of their day job. Some contribute to OSS for resume credit, or because they ideologically support the cause of Open-Source Software. Some just really like software engineering and do it in their free time.
Many companies contribute to open source, even if the consultants tell them it's eating into their profit margins. In Engineerland, there is an unspoken hierarchy of Mag 7 companies based on their willingness to interact and contribute to the Open-Source ecosystem.
Microsoft: 2/10 (Owning Github is doing a LOT of heavy-lifting)
Apple: 5/10
Meta: 6/10
Amazon: 3/10
Google: 9/10
Tesla: 4/10
Nvidia: 6/10
So Who Is Writing Our Software?
I selected 24 open source projects to analyze with diverse use cases. There are web browsers, operating systems, GUI-based consumer software, web servers, libraries, content management systems, and databases. Some software is meant for desktops, like the password manager clients, Chromium, and GIMP, while other software projects are intended to be run on servers for IT operations, like Linux, Tomcat, Jenkins, and Kubernetes.
Each of these projects is managed out of a platform called GitHub, which is now owned by Microsoft. On GitHub, developers can create a "Repository" or "Repo" to hold the code for a project, and "Contributors" can "Commit" code to the repo. The process for accepting commits to a project varies from repo to repo, but typically some developers have taken charge of the project called "Maintainers". The maintainers are in charge of accepting commits to the repo and often roadmap the future of the product. Some smaller projects can get by with only one or two maintainers, but almost certainly all the projects listed below have multiple full-time developers who manage the project.
Of the 24 projects analyzed, roughly two-thirds of the code changes come from the Linux Kernel repository and the Chromium web browser. The Linux Kernel is the foundation for every Android phone and almost every computer that isn't a desktop or a PC, and the Chromium browser is the foundation for every modern browser, excluding Firefox. These two pieces of software are the most widely used and are foundational to interacting with the modern software ecosystem.
I pulled the email attached to every commit in the dataset, then isolated the domain name and Top-Level Domain (TLD) for each email. In the email foo@bar.com
, the domain is bar.com
, with the TLD .com
. There is a set of two-letter long TLDs assigned to every country to distribute, called country code TLD or ccTLD, like .uk
for the United Kingdom and .de
for Germany. If the TLD of the domain matches an existing ccTLD, we can guess the email is somehow affiliated with an organization based in that country.
Some countries allow private entities to take their TLD, even if they do not reside within the country. For example,
twitch.tv
uses the.tv
ccTLD from Tuvalu. In that case, I go on to the next step. Those states and their ccTLD are:
Niue:
.nu
British Indian Ocean Territory:
.io
Montenegro:
.me
Anguilla:
.ai
Tuvalu:
.tv
Colombia:
.co
If we can't guess the origin country of the email from the ccTLD, we can instead turn to the WHOIS system to get a guess. The WHOIS system holds the registration information for a domain, including the registrant's country. While this doesn't tell us much about the individual who made the commit, it tells us where the organization they are affiliated with is located. I tallied up all the countries of origin and calculated how many commits each country made.
I also took the democracy value from the V-Dem dataset for 2024 and colored each country accordingly—red for authoritarian, blue for democratic.
Woah! It looks like the United States is dominating in this space. Lots of the data will be like that.
Let's remove us from the equation and see what the data looks like:
Looks like Europe, Canada, China, and Japan are writing most of the software. While China has a much higher population, it is still producing much less code. We'll dive into this later.
You can likely access the WHOIS system yourself. Open a command line and type
whois
followed by whatever website you want to investigate.If you've tried running
whois
on a company's name, there's a chance that WHOIS returnedMARKMONITOR
for all the fields, which refers to an American company that protects against fraud. We're going to throw all those out.
Let's calculate how many commits per capita each country has:
It looks like this metric is biased towards smaller countries with high incomes and high education.
Each commit is logged with the number of code lines deleted and inserted. We can use this to get a rough estimate of how substantial individual changes are. Here's the breakdown of lines of code changed by country.
Without the US:
And here's that same metric per capita:
A few years ago, a Linux maintainer named Qu Wenruo openly flamed Zhen Lei, a contributor from Huawei for submitting useless or low-impact changes, asking them to "do real contribution" to fix their "damaged reputation". Qu Wenrou accused Huawei of KPI grabbing, where experienced developers will submit easy changes rather than tackle the tough problems so that management can brag that Huawei is in "the top 2 contributors to the Linux Kernel 5.10".
Again, while calculating the lines changed per commit is an imperfect measurement for impact, it should be a decent approximation. Knowing that at least one Chinese company has garnered a reputation for being inefficient about their commits, let's break down the data:
It looks like some gigachads in Bangladesh and Uganda are putting in the work. There's probably something funky going on. Let's limit it to countries with over 7,500 commits.
This is every country with more than 7,500 commits in the dataset, which is only 19. This means China is the least efficient of this group with their commits. This data combined with the reputation of certain Chinese tech giants in the Open Source space implies that China, a country completely full of capable developers should strive for higher-quality contributions to the open source ecosystem.
What does this mean for security?
With the sheer volume of commits that maintainers wade through, there are certainly dormant or undiscovered backdoors. Let's imagine an attacker can insert their own malicious code into a foundational software like the Linux Kernel. A breach in the security of that one repo renders millions of machines vulnerable quickly, with more becoming vulnerable the longer a vulnerability goes unnoticed. This class of attack is a supply-chain attack, where an attacker inserts a vulnerability into a software supply chain, rendering downstream users of a software vulnerable.
We saw an incident like this in 2019, where the Russian Foreign Intelligence Service injected a software Trojan into "Orion", a platform built by Solarwinds, an American Software company. This Trojan was distributed to an estimated 18,000 customers, placing a backdoor into each Orion instance that only the Russians knew about. This included federal government customers like the Departments of Energy, Justice, Treasury, Homeland Security, and Defense, and prominent technology suppliers to the government and private sector, including Microsoft, Intel, and Cisco. Effectively, the Russian Government was able to plant a foothold in the most critical networks in the public and private world—a national security nightmare.
The SolarWinds hack affected a closed-source enterprise product, which only impacted enterprises. Let's look at the botched xz-utils
hack, which would have had a massive impact if successful. xz-utils
is an open source compression library packaged with most Linux installations, and a compromise in that software would lead to devastating consequences. Starting in 2021, a malicious GitHub account called JiaT75, or "Jia Tan," began committing to the xz-utils
repo, building trust in the project. Russian-owned Kaspersky's post on the subject claims that the owner of "Jia Tan" used alternative accounts to request features to the xz-utils
repo so that the existing maintainer had to take on a new team member, and added "Jia Tan" to their ranks, as well as putting pressure through the project's mailing list by complaining about feature velocity. From there, "Jia Tan" added a backdoor to the compiled installation of version 5.6.0 in 2023, which would provide remote code execution to attackers, a worst-case scenario. Weirdly enough, the Russian Foreign Intelligence Service is the primary suspect for this event. Perhaps, Kaspersky recognized that not publishing the work would be more suspicious.
On March 28th, 2024, the vulnerability was publicized on the CVE database, and hit front headlines of Tech Twitter, Hacker News, and other nerd watering holes. The industry had seen supply chain attacks happen in prolific closed-source enterprise software, but the xz-utils
hack could feasibly provide the Russian Government a backdoor into any Linux computer, from a hobbyist's laptop to a cloud provider's hypervisor. In fact, Kaspersky claims one of the alternative accounts used in the social engineering campaign had pressured a prominent Linux distribution to add the vulnerable software. A terrifying thought for any security professional. This hack required patience and complexity, and was almost flawlessly executed. Almost.
The open source ecosystem is a target for nations with advanced cyber capabilities. Open source is a resource that needs to be defended. No one is hawkish enough to want to exclude our neighbors in the few remaining authoritarian countries, like the People's Republic of China and the Russian Federation, from contributing to the open source ecosystem. China and Russia are home to the world's most competent and educated developers, and their exclusion would only worsen the state of the open source ecosystem. Not only is there no legal mechanism to do so, such exclusion is antithetical to the values of the open source community. Besides, there would quickly be a parallel open source community shared within and between the excluded countries. If you're an open source maintainer, thank you for the work you do, and keep in mind that you are a target.
Next Steps
I'm going to dig into this data more. I'm interested in producing a "democracy" score for the open source ecosystem by looking at the share of code that comes from democratic countries, and how that has changed over time.
Music I listened to this week
All Day and All of the Night - The Kinks
Should I Stay Or Should I GO - Los Fabulosos Cadillacs
Solar Sailer - Daft Punk
Talk talk featuring troye sivan - Charli xcx