Repology bots

You're likely seeing this because your site had been visited by Repology.

What's repology?

Repology is a free and open source service which monitors a huge number of package repositories, comparing versions of packaged software across them and gathering other information on free and open source projects which may be useful to the F/OSS community.

What robots are used by repology?

repology-fetcher

Identifies itself as repology-fetcher/0 (+https://repology.org/docs/bots)

This process regularly retrieves information from software repositories. The preferred way is to get a single file which describes all the available packages, but for repositories which don't support this the robot may iterate over some web API. The robot visits a site on each update cycle (~2-3 hours currently) and fetches files it needs sequentially (e.g. it never does parallel requests).

You may find metadata on which repositories are fetched here and the fetcher code here.

If you think the robot creates excess load on your site, feel free to drop an issue in the GitHub. If Repology gets information on your repository through web API, we'd greatly appreciate if you provide a regular dump of package information from your repository (data used by Repology include package name, version, one-line summary, list of maintainers, list of categories/tags, homepage and download URLs, license information) in machine readable format (preferably JSON) as well. This will allow more frequent updates with less load on the repository side, and faster update process, simpler parsing code and probably more useful data for Repology.

repology-linkchecker

Identifies itself as repology-linkchecker/1 (+https://repology.org/docs/bots)

This process pokes links retrieved from package metadata to check that they are alive. Dead links and links which involve redirects are reported to package maintainers so the package metadata could be correspondingly updated. If this robot visits you site, this means it is mentioned in some package metadata.

The process visits each link once a week. It issues HEAD request first, and only if that fails it falls back to a GET request. This means that it most cases the robot won't retrieve the contents of an URL, using only marginal amount of web traffic. Also, there's a delay of 3 seconds between consecutive requests to a single hostname, to ensure no excess site load is generated.

You may see the link checker source code here.

repology-vulnupdater

Identifies itself as repology-vulnupdater/1 (+https://repology.org/docs/bots)

This process maintains up to date information on software security vulnerabilities in Repology by periodically fetching NVD JSON feeds. It isses a GET request to each feed every 10 minutes, Etag and If-None-Match HTTP headers are used to avoid refetching the files if they have not changed since the last request.

You may see the vuln updater source code here.

robots.txt policy

Please note that none of our robots is a crawler. Unlike most search engines which would try to gather all available URLs from a specific website, so it may be required to restrict them through robots.txt file, Repology only interacts with a fixed small set of man-made links, and needs unconditional access to them to perform its tasks (e.g. retrieving repository information and link availability checking), so neither of repology robots respects robots.txt.