PyPI projects backends - version2


Backends declared in the pyproject.toml files

This dataset contains CSV and SQLite files with data about projects backends extracted from Metadata about every file uploaded to PyPI :

  • extract-pyproject-all-versions.csv, extract-pyproject-all-versions.db : for the projects having a pyproject.toml file and having been uploaded after 2018, get the project_name, max project_version, max uploaded_on, list of distinct project_version, list of distinct uploaded_on, list of distinct path, ...
  • extract-pyproject-latest.csv, extract-pyproject-latest.db : for each project found in extract-pyproject-all-versions, get the data of the latest uploaded_on date (1)
  • pyproject_backends.csv, pyproject_backends.db : the build backend found in extract-pyproject-latest.db for each project only in the pyproject.toml file on the root of the project (2)

Source code for the data extraction.

(1) There are several pyproject.toml files for some projects (e.g poetry), often in test folders
(2) The test is quite basic, but there are few projects that have several pyproject.toml file matching this test

PyPI metadata further analysis

After the publication of the first charts, I wanted to know how many projects had no source package, how many projects had no pyproject.toml to complete the first statistics.

This dataset contains CSV and SQLite files extracted from the same source (parquet files from "Metadata about every file uploaded to PyPI"):

  • extract-project-releases-2018-and-later.csv, extract-project-releases-2018-and-later.db : extract the metadata of the projects uploaded to since 2018 : get the project_name, project_version, project_release, release type (source or wheel), ...

These files weight 1.1 and 1.3 Go respectively.

Source code for the second data extraction.

Download instructions

The dataset is anonymously accessible through HTTPS

Download from
Publication date
Françoise CONIL
Link to previous dataset version
Dataset size
2,8 Go, 8 files