Friday, June 20, 2025
All the Bits Fit to Print
System for detecting dead websites and ownership changes improves crawler efficiency and data quality
Marginalia Search has implemented a new system to detect when websites are offline or have undergone significant changes, such as ownership transfers or domain parking, using minimal server requests to avoid burdening web servers. This system relies mainly on HTTP HEAD requests and DNS queries to gather data on site availability and changes, improving the quality of search results and crawler efficiency.
Why it matters: Detecting dead or changed websites prevents serving broken links and helps decide when to recrawl or archive domains.
The big picture: The web’s complexity and inconsistent standards make reliable uptime and change detection difficult but crucial for search engines and crawlers.
Stunning stat: Over 1 million domains checked in 8 hours revealed 777,062 successes and tens of thousands of various connection errors.
Commenters say: Users appreciate the nuanced approach and challenges but note edge cases like domain re-registration, geopolitical effects, and the usefulness of integrating archived web content like the Wayback Machine.