Feb 10, 2025

Firecrawl: A Simple Web Scraping Script

A command-line utility for scraping web content through the Firecrawl API with automatic file naming and content cleaning

Code is available via gist here

A command-line utility for scraping web content through the Firecrawl API. The script processes URLs—either individually or in bulk from a text file—and outputs clean, formatted content with automatic file naming based on page titles.

Beyond basic scraping, it handles the tedious aspects of web content extraction: stripping navigation elements, removing duplicate content, and managing filename collisions.

The script maintains a focused scope: it takes a URL or file of URLs as input, processes them through Firecrawl’s API endpoints, and saves the results in a specified directory (defaulting to ./scraped when no output location is provided).

It also makes use of inline depdencies to ensure the script is self-contained and can be run anywhere with minimal setup.

With the magic of uv you can even run this via a gist URL, making it extremely portable.

Usage

Locally

# Set your API key
export FIRE_CRAWL_API_KEY=your-api-key-here

# Run with a single URL
uv run firecrawl_scrape.py <https://example.com>

# Or with a file containing URLs
uv run firecrawl_scrape.py urls.txt

# Optionally specify output directory (default: ./scraped)
uv run firecrawl_scrape.py urls.txt -o ./my-scrapes

Via gist

# Set your API key
export FIRE_CRAWL_API_KEY=your-api-key-here

# Run with a single URL
uv run `https://gist.githubusercontent.com/safurrier/8714235a36a5dc502a8f4b2edb98ece3/raw/969f25a37895943725e8a42cae6a219bda3565fa/firecrawl_scrape.py` <https://example.com>

# Or with a file containing URLs
uv run `https://gist.githubusercontent.com/safurrier/8714235a36a5dc502a8f4b2edb98ece3/raw/969f25a37895943725e8a42cae6a219bda3565fa/firecrawl_scrape.py` urls.txt

# Optionally specify output directory (default: ./scraped)
uv run `https://gist.githubusercontent.com/safurrier/8714235a36a5dc502a8f4b2edb98ece3/raw/969f25a37895943725e8a42cae6a219bda3565fa/firecrawl_scrape.py` urls.txt -o ./my-scrapes

A simple script with a simple purpose. Plus by running via gist you have this available anywhere, anytime.