How the Crawler Works

February 19, 2026 How It Works

Under the hood, rails2static uses a breadth-first crawler to discover and render every page in your app. Understanding how it works helps you get the most out of it.

The process starts with entry paths — by default, just /. The crawler fetches each entry path using Rack::Test, which calls your Rails app directly through the middleware stack without an HTTP server. This means the crawl is fast and doesn't need a running process.

For each HTML page it fetches, the crawler parses the response with Nokogiri and extracts all internal links: a[href], link[href], script[src], img[src], and srcset attributes. These discovered URLs get added to a queue.

The crawler continues until the queue is empty or it hits the max_pages safety limit (default: 10,000). It tracks visited URLs to avoid cycles and skips anything matching your exclude_patterns.

After crawling, the link rewriter adjusts href attributes so they work as static files. If trailing_slash mode is enabled (the default), /about becomes /about/index.html and links point to /about/ — which most static hosts serve correctly.

Finally, the asset collector fetches all CSS, JavaScript, images, and fonts referenced in your pages. It even parses CSS files for url() references and @import statements, fetching those recursively.

The result is a self-contained _site/ directory with everything needed to serve your site from any static host.

← Back to all posts