robots.txt

Tue, 03 Mar 2026 00:00:00 +0000

robots.txt is a plain-text file placed at the root of a website (e.g., https://example.com/robots.txt) that tells web crawlers which parts of the site they may or may not access. It follows the Robots Exclusion Protocol, first proposed by Martijn Koster in 1994 and codified as an internet standard in RFC 9309 (2022).

How it works

A robots.txt file consists of one or more records, each specifying a user-agent (the crawler’s identifier) and a set of Allow and Disallow directives. Crawlers are expected to fetch this file before crawling any other page and to respect its directives, though compliance is voluntary — robots.txt is a convention, not an access control mechanism.

Crawling on emsenn.net

robots.txt

How it works