Requirements for new repositories

Note that these requirements have evolved along with Repology, and some repositories already supported by Repology may not comply to them. This may not be used as an excuse for new repositories not to comply, and may be a base for delisting or pessimization of existing repositories.

Rationale

As time passes, Repology requires more and more time to maintain, keeping increasing number of supported repositories successfully updating, ensuring that increasing number of packages are being properly grouped into corresponding projects, and preventing increasing number of these projects from being affected by incorrect data from some repositories.

To keep providing reliable information on packaging statuses and latest versions, and free our resources for improving Repology instead of processing incorrect data reports and adding rules, we gradually raise requirements to repositories to provide consistent data in an easy to process way.

Apart from keeping Repology up and running, we strive to unite packaging communities, and these requirements serve this purpose no less, allowing more unified tools to appear, being able to use consistent package data.

Data format

We expect package data to be in a machine readable format, which does not require complex parsing code, not mentioning execution of third party code.

Acceptable formats:

  • Commonly used data interchange formats, such as JSON (preferred), XML, YAML, Protobuf, CSV/DSV.
  • Plain text key/value-like formats such as Debian Sources.xz.

Formats which are not acceptable include raw shell (PKGBUILDs, ebuilds) or build system (CMake) scripts. For instance, these may contain variable substitutions which require full-fledged execution to be parsed reliably. HTML is also not acceptable as it's intended for humans and is prone to layout change which breaks parsing.

Availability

We expect to be able to fetch package data in a fast, easy, reliable, and consistent way. Single file is preferred, tarball or git repository with a bunch of small files is acceptable, however it must not contain any weighty unusable data such as software sources.

The approach where Repology has to do a HTTP request per each package, or paginate through an API is not acceptable. This involves a lot of HTTP requests which implies slow fetching, multiplicatively increases chance of unsuccessfull fetch, and provides inconsistent data from different points in time.

Not publicly available sources which require registration or private access tokens are absolutely not supported.

Completeness

We require the following package information to be available:

  • Package name in a compatible format (see below).
  • Package version (although exception may be made for repositories providing all packages from VCS master branch).
  • Some kind of upstream URL such as homepage or download. We rely on that to split unrelated, but similarly named projects.
  • Package recipe URL (e.g. valid link to Makefile/.spec/PKGBUILD/ebuild/...). This is required for anyone to be able to check where package data comes from and verify its correctness. It does not need to be explicitly provided in the data if it can be constructed from other data fields (as in filling in package name and version in a link template http://example.com/specs/{name}/{version}.spec).

The following information is optional, but desirable:

  • Maintainer(s), if applicable (used in maintainer search, to generate per-maintainer statistics and feeds, in project filtering). Note that it's also possible to configure default maintainer for a repository.
  • One-line summary (shown on project information page). Multiline descriptions are currently not supported.
  • License (shown on project information page).
  • CPE information (used to report bad or missing CPE information back to repositories).
  • Homepage and download URLs (used to match related projects, shown on project information page, broken links are reported back to repositories).
  • Categories or tags (used in project filtering).
  • Alternative package names or identifiers (used for various purposes such as tracking packages across renames and creating human readable project names). In particular, list of binary package names for a source package.

The following links are also very desirable:

  • Links to patches.
  • Links to package build logs and build status pages.
  • Links to bug tracker issues for a package.
  • Links to package related statistics (such as Debian Popularity Contest).
  • Links to package documentation (such as related wiki pages).

These are currently only shown on project information pages, but wider support is planned, e.g. providing dedicated pages with all build logs, statuses, patches or issues for upstreams convenience. Like with links to recipes, there's no need to explicitly provide these URLs if they can be constructed from other package data.

The following information is not currently used, but will be in the future:

  • Architecture.
  • Dependencies.

Consistency and quality

We need data (mainly names and versions) to be in a compatible form in order to be able to match packages and compare versions from different repositories.

Requirements on package names:

  • Should be short project names as used in URLs, distfiles, repository and obviously package names, such as firefox, clementine, or gnome-games. It should not be some obscure (org.gnome.games) or human readable (Firefox Web Browser) custom format.
  • If a repository commonly provides multiple packages for a single project (for example, there may be packages named libogg0, libogg-dev, libogg-dbg, libogg-doc for libogg project) common name (libogg in this case) should be available for all the packages. Some repositories may call it basename or source package name.
  • Likewise, of a repository commonly uses prefixes or suffixes for package names (such as -git or -devel when packaging development versions, or -compat for legacy versions), it should be easy to strip these prefixes or suffixes off.
  • If a repository packages programming language (Perl/Python/Ruby/PHP/Node.js/Haskell/R/Rust etc.) modules, these should be appropriately and consistently prefixed (suffixed) to distinguish them from each other and from other projects both within a single repository and across different repositories. Packages are also expected to be named the exactly same way they are in official module repositories (such as Rubygems or PyPI).

    For example, python modules may have python-<PyPI name>, py39-<PyPI name> or lib<PyPI name>-python naming pattern, and a module named python-twitter may have to be packaged as e.g. python-python-twitter (otherwise it will clash with python-twitter, a package for twitter module).

For repositories failing to comply with these requirements Repology may be unable to merge some packages into their designated projects (which, in turn, prevents it from reporting new versions and vulnerabilities), or, which is worse, would merge packages into unrelated projects.

Requirements on package versions:

  • No trimming of version components is allowed. E.g. 1.2.3alpha4 must not be shortened to 1.2.3.
  • No incompatible changes to version scheme. E.g. 1.2.3alpha4, 2.04, 1.4.0-rc5, 1.20.30 must not be conveyed as e.g. 1.2.3.a.4, 2.04000000, 1.4.0.5, and 1.2030 correspondingly. Still, some modifications are allowed, for instance it's OK to change version component separators (1-2_3 equals 1.2.3) or trim trailing zero components (1.2 equals 1.2.0). See libversion documentation for details.
  • It should be possible to strip repository-specific extra version components (such as package epoch and revisions). For instance, 1:2.3.4_5 is OK as long as semicolon separates an epoch and underscore separates a revision, so these can be stripped to get upstream version 2.3.4, while with something like 2.3.4+dfsg1~alpha1+1-2.3 it would be impossible and Repology will have to ignore such version.
  • No unrelated appendages, such as version or name of another product (zfs 0.7.12-4.18.20, where 4.18.20 is version of kernel, not related to version of zfs), or branch name (wine 3.14-staging).
  • Obviously, no fake versions, e.g. versions which were not officially released by upstream. Note that a mere mention of "next" version by upstream (in changelog or build system script) does not make it official. A git tag or a release announcement does.
  • Snapshot versions are generally allowed, but impose additional requirements:

    • Are required to have consistent scheme across a repository (so we can reliably match them with a single pattern and process specially).
    • Must be distinguishable from official versions (so no official versions are falsely matched by the named pattern).
    • Must use version relative to the latest official release, that is if a latest upstream release was 1.2, a snapshot may have a version like 1.2.20210101, but not 1.3.20210101 (based on non-existing version) or 20210101 (incompatible scheme).

    Snapshots cannot be compared with each other meaningfully as there's common compatible version format for them (but we'd like repositories to adopt one), but complying to these requirements allows Repology to handle them gracefully, which includes

    • Not treating them as fake versions.
    • Not incorrectly considering them outdated.
    • Being able to report when a snapshot is outdated by a new official release.

Repositories failing to comply with these requirements may have package statuses reported incorrectly or, which is worse, may make Repology report incorrect statuses to other repositories.