Building a crawler in Rust: Scraping Javascript Single Page Applications (SPA) with a headless browser

Building a crawler in Rust:

Nowadays, more and more websites generate elements of the pages client-side, using JavaScript. In order to get this data, we need a headless browser: it's a browser that can be operated remotely and programmatically.

This post is an excerpt from my book Black Hat Rust

For that, we will use chromedriver.

On a Debian-style machine, it can be installed with:

$ sudo apt install chromium-browser chromium-chromedriver

Because the headless browser client methods require a mutable reference (&mut self), we need to wrap it with a mutex to be able to use it safely in our pool of scrapers.


impl QuotesSpider {
    pub async fn new() -> Result<Self, Error> {
        let mut caps = serde_json::map::Map::new();
        let chrome_opts = serde_json::json!({ "args": ["--headless", "--disable-gpu"] });
        caps.insert("goog:chromeOptions".to_string(), chrome_opts);
        let webdriver_client = ClientBuilder::rustls()

        Ok(QuotesSpider {
            webdriver_client: Mutex::new(webdriver_client),

Fetching a web page with our headless browser can be achieved in two steps:

  • first, we go to the URL
  • then, we fetch the source


async fn scrape(&self, url: String) -> Result<(Vec<Self::Item>, Vec<String>), Error> {
    let mut items = Vec::new();
    let html = {
        let mut webdriver = self.webdriver_client.lock().await;

Once we have the rendered source of the page, we can scrape it like any other HTML page:

let document = Document::from(html.as_str());

let quotes ="quote"));
for quote in quotes {
    let mut spans ="span"));
    let quote_span =;
    let quote_str = quote_span.text().trim().to_string();

    let author = spans

    items.push(QuotesItem {
        quote: quote_str,
let next_pages_link = document
    .filter_map(|n| n.attr("href"))
    .map(|url| self.normalize_url(url))

Ok((items, next_pages_link))

To run this spider, you first need to launch chromedriver in a separate shell:

$ chromedriver --port=4444 --disable-dev-shm-usage

Then, in another shell, go to the git repository accompanying this book, in ch_05/crawler/, and run:

$ cargo run -- run --spider quotes
1 email / week to learn how to (ab)use technology for fun & profit: Programming, Hacking & Entrepreneurship.
I hate spam even more than you do. I'll never share your email, and you can unsubscribe at any time.

Tags: hacking, programming, rust, tutorial

Want to learn Rust, Cryptography and Security? Get my book Black Hat Rust!