February 7, 2019 — Like millions of other programmers, every day I depend on central package repositories (CR) like npm, PyPI and CRAN.
The other day I was curious: does every programming language have one of these? I decided to find out. I pointed my crawler and trained a model to check for a package repository for every one of the 3,006 languages I am tracking. The results surprised me.
My model found only 39 languages with central package repositories. (For comparison, Wikipedia lists ~20). That's just ~1% of languages. I thought it would be higher.
Given that a programming language is very popular and appears in my top 100 list, it is about 15 - 30x more likely to have a CR. Given a language is not in the top 100, <1% will have a CR.
There are over 2,000,000 packages (aka modules or libraries) across these CRs. That means there are about 1,000x more packages than there are programming languages.
With that many packages, name collision is certainly a problem (maybe a subject for another post), though not as much of a problem as in the domain system where 130,000,000+ ".coms" alone are registered.
At over 900,000, Javascript's npm has almost more packages than all other CRs combined. Javascript, Java, PHP, Perl, and Python account for about 80% of packages.
Given the size of GitHub, and it's growth as somewhat of a universal central package repository (though totally unmoderated), and given that many (if not the majority) of the packages in these CRs are also listed on GitHub, it's conceivable that GitHub is the largest CR and that the number of packages out there is easily 10x bigger than 2M.
This surprised me. The median age of a language with a CR is 24 (1995). Of the top 5 languages I mentioned above, all were created by then. Almost always the creation of the CR follows the launch of the language, sometimes by months or sometimes by years. I expected most CRs to be from newer languages, but that wasn't the case. While some new languages like Rust and Julia have CRs, others like Go and Kotlin do not.
Here's my list of the main central package repositories for languages that have them. I cut the list a bit to only include CRs with more than 100 packages available. If you spot any omissions or mistakes, let me know on Twitter.
As Jay18001 pointed out, a few of these repositories serve packages for more than one language. Cocoapods => Objective-C, and Nuget => F# and other .Net langs. In this post I collapsed things so each repo only has 1 language.
Update: 8/26/2019. Multiple readers pointed out that my stat for Ruby was off by 10x. I used the # I found on this page, which turned out to be just the gems beginning with the letter A. I apologize for the mistake and am very grateful for the corrections.
Thanks to PallHaraldsson for reviewing this post and encouraging it to get updated.