Registration information. An analysis of WHOIS and RDAP consistency
Public registration information on domain names, such as the accredited registrar, the domain name expiration date, or the abuse contact is crucial for many security tasks, from automated abuse notifications to botnet or phishing detection and classification systems. Various domain registration data is usually accessible through the WHOIS or RDAP protocols—a priori they provide the same data but use distinct formats and communication protocols. While WHOIS aims to provide human-readable data, RDAP uses a machine-readable format. Therefore, deciding which protocol to use is generally considered a straightforward
technical choice, depending on the use case and the required automation and security level.
However, WHOIS and RDAP records are spread across multiple servers through a referral system, and are sometimes managed by different entities. Moreover, no protocol garantees that the values of the different fields are identical across the servers.
To challenge the assumption that registration data is coherent across servers and protocols (or to caracterise their inconsistencies) we collected, processed, and compared 164 million WHOIS and RDAP records for a sample of 55 million domain names. We also collected 360k DNS entries to check in specific cases which source has the right value. This dataset holds these parsed entries in a format that allows for easy detection of inconsistencies and the DNS records.