Archive API Pricing

Power advanced pipelines for model training and web search with the world's largest web data repository

pay as you go plan icon
PAY AS YOU GO
$0.2 /1K HTMLs
Keine Verpflichtung
Erste Schritte
Includes:
  • API access with advanced filtering (domains, categories, dates, languages, countries, paths)
  • Flexible delivery: AWS, GCP, Snowflake, Databricks and more
  • Standard support
  • Historical data (+72 hours) starts at $1/1K HTMLs
2nd plan icon
ENTERPRISE
Contact us for a
personalized quote
Talk to a sales expert
Includes:
  • API access with advanced filtering (domains, categories, dates, languages, countries, paths)
  • Flexible delivery: AWS, GCP, Snowflake, Databricks and more
  • Standard support
  • Dedicated Account Manager
  • 24/7 premium support
  • SLA guarantees
  • Volume discounts for large-scale data needs
  • Custom integration support
  • Extended delivery options
* Volume discounts are available for large data volumes, long-term commitments, or multi-solution projects.
Wir akzeptieren diese Zahlungsmethoden:
AWS logoVerwenden Sie AWS? Ab sofort können Sie über den AWS-Marktplatz bezahlen
Erste Schritte
Mehr als 20,000+ Kunden weltweit schenken uns ihr Vertrauen

Customer favorite features

  • Petabyte-scale repository
  • Full HTML pages & metadata
  • Advanced filtering & search
  • ~2.5 PB added daily
  • Text, images, video and audio
  • Flexible delivery options
  • 5T+ text tokens added daily
  • API-first access
  • AI-ready data
  • 2.5B+ image/video URLs added daily
  • Maintenance-free
  • 99.99% uptime + 24/7 support
STREAMLINED

Payments with AWS Marketplace

Leverage your purchases to meet your AWS commitments and enjoy streamlined procurement and invoicing all in one place. Benefit from AWS’s robust validation and compliance checks for partners.

COMPLIANT

Industry Leading Compliance

Our privacy practices comply with data protection laws, including the EU data protection regulatory framework, GDPR, and CCPA – respecting requests to exercise privacy rights and more.

Archive API FAQ

Archive API is a massive, continuously expanding, cached repository by Bright Data, designed to capture and deliver public web data at scale. It provides full web pages and metadata, making it ideal for AI training, machine learning, and large-scale data analysis. Unlike traditional web crawls, Archive API prioritizes relevance, freshness, and usability, giving you access to the most important parts of the internet as they are scraped daily.

You can start accessing data immediately through our Archive API. The API allows you to search, retrieve, and filter data snapshots from Archive seamlessly and efficiently.

  • Data from the last 3 days: Will take from within minutes and up to a few hours to deliver (depending on snapshot size)
  • Data older than 3 days: Will take from a few hours and up to 3 days to process and deliver (depending on snapshot size)

Archive offers two delivery options to ensure seamless integration into your existing workflows:

  • Amazon S3 bucket: Have your Data Snapshot delivered directly to your S3 bucket.
  • Webhook: Retrieved via webhook for real-time integration into your systems.

Absolutely! Archive API allows filtering by category, domains, date, languages, and country before retrieving data, ensuring you only get what you need.

When working with large-scale web data, freshness, relevance, and accessibility are key. While Common Crawl provides a broad historical snapshot of the web, Bright Data's Archive API offers real-time, continuously updated data with advanced filtering and delivery options. Here's how they compare:

Feature Bright Data's Archive Common Crawl
Data Collection Continuously captures public web data in real time, providing results as recent as "now." Periodic web crawling (not real-time), updated monthly or bimonthly. Data can be outdated
Data Volume 17.5 PB collected in 8 months, covering 118 billion pages (28 billion unique URLs from 40 million domains). Adds ~2.5 PBs and billions of unique URLs/week. 250b pages collected over 18 years.
Website Coverage & Relevance Focuses on high-value, relevant website data based on real scraping business needs. Crawls indiscriminately, including outdated or low-quality pages.
Data Types Full web pages (JS-rendered) 98.6% HTML and text
Filtering & Delivery Full discovery and delivery platform- filtering by category, domain, language, date etc. Delivered via Amazon S3 or webhook. No built-in filtering or delivery. Need to manually process huge raw WARC files.

Not sure what you need?