PyPSA-Eur is compiled from a variety of data sources. The following table provides an overview of the data sources used in PyPSA-Eur. Different licenses apply to the data sources.
.. toctree:: :maxdepth: 1 ../data-base-network ../data-cutouts ../data-repos
Many of the data sources used in PyPSA-Eur are updated regularly.
To ensure reproducibility, PyPSA-Eur uses a versioning system for data sources which
allows users to select specific versions of the data sources to use in their models.
Next to the versioning and if the license allows, most datasets are also mirrored to a
public file storage for the repository under https://data.pypsa.org.
Note
For users, selection and control over which data sources to use is managed through the configuration file.
See :ref:`data_cf` for details. In most cases you just wanna stick with the latest archive
version. Reproducibility is given even when using the latest tag via the
versions.csv, which is version controlled.
The file data/versions.csv is the central registry for all data sources and their versions.
Each row defines a specific version of a dataset with the following columns:
dataset: The name of the dataset (e.g.,worldbank_urban_population).version: The version identifier, typically following the original data source's versioning (e.g.,2025-08-14).source: The source type -primary(original data source),archive(mirrored copy ondata.pypsa.org), orbuild(generated from other data).tags: Space-separated tags likelatest,supportedordeprecated.added: The date when this entry was added to the registry.note: Optional notes about the dataset or version.url: The download URL for the data.
Entries to the versions.csv are never deleted and if a dataset was removed or is not available, the entry is marked as deprecated.
Note
For primary sources, each combination of dataset and version should point to a specific version of that dataset with a unique URL.
If the original data source does not provide versioned URLs (i.e., the URL always points to the latest data), the version is set to unknown.
In this case, the corresponding archive entries do not mirror the same version but represent snapshots taken at specific points in time from that primary source.
If you notice that a data source has been updated and want to add the new version to PyPSA-Eur:
- Add a new row to
data/versions.csvwith the samedatasetname, the newversion,sourceset toprimary, and theurlpointing to the original data source. - Set appropriate tags (typically
latest supported). - Update the tags of the previous version (remove
latest, keepsupportedif still compatible). - Create a pull request with your changes.
- Of course, any potential workflow adjustments should be considered and implemented as well.
Note
If the primary source has version set to unknown (i.e., the URL always points to the latest data) and a new version is available that has not been archived yet, please open an issue on the PyPSA-Eur GitHub repository to request an archive update.
To add a completely new data source to PyPSA-Eur:
- Add a
primaryentry todata/versions.csvwith a new unique dataset name, version, and URL pointing to the original data source. - Implement a
retrieverule for your dataset inrules/retrieve.smk. Take inspiration from existing rules in the file. - Add the new data source to:
datasection in the pydantic schemascripts/lib/validation/config/data.pydata_inventory.csvdata inventory for PyPSA-Eur
- Create a pull request with your changes.
Note
Maintainers of the repository will create the corresponding archive entry after reviewing your contribution.