Blog / AI
AI

Integrating OpenAI Agents SDK With a Web Unlocker for High Performance

Build powerful AI agents in Python using OpenAI’s Agents SDK and Web Unlocker API to access and extract data from any website.
17 min read
OpenAI Agents SDK Integration With Web Unlocker blog image

In this guide, you will learn:

  • What the OpenAI Agents SDK is
  • Why integrating it with a web unlocker service is key to maximizing its effectiveness
  • How to build a Python agent using the OpenAI Agents SDK and a Web Unlocker API in a detailed step-by-step tutorial

Let’s dive in!

What Is OpenAI Agents SDK?

The OpenAI Agents SDK is an open-source Python library from OpenAI. It is designed for building agent-based AI applications in a simple, lightweight, and production-ready way. This library is a refined evolution of OpenAI’s earlier experimental project called Swarm.

The OpenAI Agents SDK focuses on providing just a few core primitives with minimal abstraction:

  • Agents: LLMs paired with specific instructions and tools to perform tasks
  • Handoffs: To let agents delegate tasks to other agents when needed
  • Guardrails: To validate agent inputs to ensure they meet expected formats or conditions

These building blocks, combined with Python’s flexibility, make it easy to define complex interactions between agents and tools.

The SDK also includes built-in tracing, so you can visualize, debug, and evaluate your agent workflows. It even supports fine-tuning models for your specific use cases.

Biggest Limitations of This Approach to Building AI Agents

Most AI agents aim at automating operations on web pages, whether it is retrieving content or interacting with elements on a page. In other words, they need to programmatically browse the Web.

Aside from potential misunderstandings from the AI model itself, the biggest challenge these agents face is dealing with websites’ protective measures. The reason is that many sites implement anti-bot and anti-scraping technologies that can block or mislead AI agents. This is especially true today, as anti-AI CAPTCHAs and advanced bot detection systems become increasingly common.

So, is this the end of the road for AI web agents? Absolutely not!

To overcome these barriers, you need to enhance your agent’s ability to navigate the Web by integrating it with a solution like Bright Data’s Web Unlocker API. This tool works with any HTTP client or solution that connects to the Internet (including AI agents), acting as a web-unlocking gateway. It delivers clean, unblocked HTML from any webpage. No more CAPTCHAs, IP bans, or blocked content.

See why combining the OpenAI Agents SDK with the Web Unlocker API is the ultimate strategy for building powerful, web-savvy AI agents!

How to Integrate Agents SDK with a Web Unlocker API

In this guided section, you will learn how to integrate the OpenAI Agents SDK with Bright Data’s Web Unlocker API to build an AI agent capable of:

  1. Summarizing the text of any web page
  2. Retrieving structured product data from e-commerce websites
  3. Gathering key information from news articles

To achieve that, the agent will instruct the OpenAI Agents SDK to use the Web Unlocker API as an engine for fetching the content of any web page. Once the content is retrieved, the agent will apply AI logic to extract and format the data as needed for each of the above tasks.

Disclaimer: The above three use cases are just examples. The approach shown here can be extended to many other scenarios by customizing the agent’s behavior.

Follow the steps below to build an AI scraping agent in Python using the OpenAI Agents SDK and Bright Data’s Web Unlocker API for high performance!

Prerequisites

Before diving into this tutorial, make sure you have the following:

  • Python 3 or higher installed locally
  • An active Bright Data account
  • An active OpenAI account
  • A basic understanding of how HTTP requests work
  • Some knowledge of how Pydantic models work
  • A general idea of how AI agents function

Do not worry if everything is not set up just yet. You will be guided through the setup in the next sections.

Step #1: Project Setup

Before we begin, make sure you have Python 3 installed on your system. If not, download Python and follow the installation instructions for your operating system.

Open your terminal and create a new folder for your scraping agent project:

mkdir openai-sdk-agent

The openai-sdk-agent folder will contain all the code for your Python-based, Agents SDK-powered agent.

Navigate into the project folder and set up a virtual environment:

cd openai-sdk-agent
python -m venv venv

Load the project folder in your favorite Python IDE. Visual Studio Code with the Python extension or PyCharm Community Edition are great choices.

Inside the openai-sdk-agent folder, create a new Python file called agent.py. Your folder structure should now look like this:

The file structure of the AI agent project

Currently, scraper.py is a blank Python script but it will soon contain the desired AI agent logic.

In the IDE’s terminal, activate the virtual environment. In Linux or macOS, run this command:

./env/bin/activate

Equivalently, on Windows, execute:

env/Scripts/activate

You are all set! You now have a Python environment to build a powerful AI agent using the OpenAI Agents SDK and a web unlocker.

Step #2: Install the Project’s Dependencies and Get Started

This project uses the following Python libraries:

  • openai-agents: The OpenAI Agents SDK, used to build AI agents in Python.
  • requests: To connect to Bright Data’s Web Unlocker API and fetch the HTML content of a web page that the AI agent will operate on. Learn more in our guide on mastering the Python Requests library.
  • pydantic: To define structured output models, allowing the agent to return data in a clear and validated format.
  • markdownify: To convert raw HTML content into clean Markdown. (We will explain why this is useful soon.)
  • python-dotenv: To load environment variables from a .env file. That is where we’ll store secrets for OpenAI and Bright Data.

In an activated virtual environment, installe them all with:

pip install requests pydantic openai-agents openai-agents markdownify python-dotenv

Now, initialize scraper.py with the following imports and async boilerplate code:

import asyncio
from agents import Agent, RunResult, Runner, function_tool
import requests
from pydantic import BaseModel
from markdownify import markdownify as md
from dotenv import load_dotenv

# AI agent logic...

async def run():
    # Call the async AI agent logic...

if __name__ == "__main__":
    asyncio.run(run())

Wonderful! Time to load environment variables.

Step #3: Set Up Environment Variables Reading

Add a .env file in your project folder:

Adding a .env file to your project

This file will hold your environment variables, such as API keys and secret tokens. To load the environment variables from the .env file, use load_dotenv() from the dotenv package:

load_dotenv()

You can now read specific environment variables using os.getenv() like this:

os.getenv("ENV_NAME")

Do not forget to import os from the Python standard library:

import os

Great! The environment variables are ready to be read.

Step #4: Set Up OpenAI Agents SDK

You need a valid OpenAI API key to utilize the OpenAI Agents SDK. If you have not generated one yet, follow OpenAI’s official guide to create your API key.

Once you have it, add the key to your .env file like this:

OPENAI_API_KEY="<YOUR_OPENAI_KEY>"

Be sure to replace the <YOUR_OPENAI_KEY> placeholder with your actual key.

No additional setup is required, as the openai-agents SDK is designed to automatically read the API key from the OPENAI_API_KEY env.

Step #5: Set Up Web Unlocker API

If you have not already, create a Bright Data account. Otherwise, simply log in.

Next, read Bright Data’s official Web Unlocker documentation to retrieve your API token. Alternatively, follow the steps below.

In your Bright Data “User Dashboard” page, press the “Get proxy products” option:

Clicking the “Get proxy products” option

In the products table, locate the row labeled “unblocker” and click on it:

Clicking the “unblocker” row

⚠️Note: You will have to create a new Web Unblocker API zone first if you haven’t done it yet. Go over the Web Unblocker setup documentation to get started.

On the “unlocker” page, copy your API token using the clipboard icon:

Copying the API token

Also, make sure the toggle in the top-right corner is switched to “On,” which indicates that the Web Unlocker product is active.

Under the “Configuration” tab, ensure these options are enabled for optimal effectiveness:

Making sure that the premium options for effectiveness are enabled

In the .env file, add the following environment variable:

BRIGHT_DATA_WEB_UNLOCKER_API_TOKEN="<YOUR_BRIGHT_DATA_WEB_UNLOCKER_API_TOKEN>"

Replace the placeholder with your actual API token.

Perfect! You can now use both the OpenAI SDK and Bright Data’s Web Unlocker API in your project.

Step #6: Create the Web Page Content Extraction Function

Create a get_page_content() function that:

  1. Reads the BRIGHT_DATA_WEB_UNLOCKER_API_TOKEN environment variable
  2. Uses requests to send a request to Bright Data’s Web Unlocker API using the provided URL
  3. Retrieves the raw HTML returned by the API
  4. Converts the HTML to Markdown and returns it

This is how you can implement the above logic:

@function_tool
def get_page_content(url: str) -> str:
    """
    Retrieves the HTML content of a given web page using Bright Data's Web Unlocker API,
    bypassing anti-bot protections. The response is converted from raw HTML to Markdown
    for easier and cheaper processing.

    Args:
        url (str): The URL of the web page to scrape.

    Returns:
        str: The Markdown-formatted content of the requested page.
    """

    # Read the Bright Data's Web Unlocker API token from the envs
    BRIGHT_DATA_WEB_UNLOCKER_API_TOKEN = os.getenv("BRIGHT_DATA_WEB_UNLOCKER_API_TOKEN")

    # Configure the Web Unlocker API call
    api_url = "https://api.brightdata.com/request"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {BRIGHT_DATA_WEB_UNLOCKER_API_TOKEN}"
    }
    data = {
        "zone": "unblocker",
        "url": url,
        "format": "raw"
    }

    # Make the call to Web Uncloker to retrieve the unblocked HTML of the target page
    response = requests.post(api_url, headers=headers, data=json.dumps(data))

    # Extract the raw HTML response
    html = response.text

    # Convert the HTML to markdown and return it
    markdown_text = md(html)

    return markdown_text

Note 1: The function must be annotated with @function_tool. This special decorator tells the OpenAI Agents SDK that this function can be used as a tool by an agent to perform specific actions. In this case, the function acts as the “engine” the agent can utilize to retrieve the content of the web page it will operate on.

Note 2: The get_page_content() function must declare the input types explicitly.
If you omit them, you will get an error like: Error getting response: Error code: 400 - {'error': {'message': "Invalid schema for function 'get_page_content': In context=('properties', 'url'), schema must have a 'type' key.``"

Now, you might be wondering: why convert raw HTML to Markdown? The answer is simple—performance efficiency and cost-effectiveness!

HTML is highly verbose and often includes unnecessary elements like scripts, styles, and metadata. That is content that AI agents typically do not need. If your agent only needs the essentials like text, links, and images, Markdown provides a much cleaner and more compact representation.

In detail, the HTML-to-Markdown transformation can reduce the input size by up to 99%, saving both:

  • Tokens, which lowers the cost when using OpenAI models
  • Processing time, since models work faster on smaller inputs

For more insight, read the article “Why Are the New AI Agents Choosing Markdown Over HTML?

Step #7: Define the Data Models

To work properly, OpenAI SDK agents require Pydantic models to define the expected structure of their output data. Now, remember that the agent we are building can return one of three possible outputs:

  1. A summary of the page
  2. Product information
  3. News article information

So, let’s define three corresponding Pydantic models:

class Summary(BaseModel):
    summary: str

class Product(BaseModel):
    name: str
    price: Optional[float] = None
    currency: Optional[str] = None
    ratings: Optional[int] = None
    rating_score: Optional[float] = None

class News(BaseModel):
    title: str
    subtitle: Optional[str] = None
    authors: Optional[List[str]] = None
    text: str
    publication_date: Optional[str] = None

Note: The use of Optional makes your agent more robust and general-purpose. Not all pages will include every piece of data defined in the schema, so this flexibility helps prevent errors when fields are missing.

Do not forget to import Optional and List from typing:

from typing import Optional, List

Fantastic! You are now ready to build your agent’s logic.

Step #8: Initialize the Agent logic

Use the Agent class from the openai-agents SDK to define the three specialized agents:

summarization_agent = Agent(
    name="Text Summarization Agent",
    instructions="You are a content summarization agent that summarizes the input text.",
    tools=[get_page_content],
    output_type=Summary,
)

product_info_agent = Agent(
    name="Product Information Agent",
    instructions="You are a product parsing agent that extracts product details from text.",
    tools=[get_page_content],
    output_type=Product,
)

news_info_agent = Agent(
    name="News Information Agent",
    instructions="You are a news parsing agent that extracts relevant news details from text.",
    tools=[get_page_content],
    output_type=News,
)

Each agent:

  1. Includes a clear instruction string that describes what it is supposed to do. This is what the OpenAI Agents SDK will use to guide the agent’s behavior.
  2. Employs get_page_content() as a tool to retrieve the input data (i.e., the content of the web page).
  3. Returns its output in one of the Pydantic models (Summary, Product, or News) defined earlier.

To automatically route user requests to the correct specialized agent, define a higher-level agent:

routing_agent = Agent(
    name="Routing Agent",
    instructions=(
        "You are a high-level decision-making agent. Based on the user's request, "
        "hand off the task to the appropriate agent."
    ),
    handoffs=[summarization_agent, product_info_agent, news_info_agent],
)

This is the agent you will interrogate in your run() function to drive the AI agent logic.

Step #9: Implement the Execution Loop

In the run() function, add the following loop to launch your AI agent logic:

# Keep iterating until the use type "exit"
while True:
    # Read the user's request
    request = input("Your request -> ")
    # Stops the execution if the user types "exit"
    if request.lower() in ["exit"]:
        print("Exiting the agent...")
        break
    # Read the page URL to operate on
    url = input("Page URL -> ")

    # Routing the user's request to the right agent
    output = await Runner.run(routing_agent, input=f"{request} {url}")
    # Conver the agent's output to a JSON string
    json_output = json.dumps(output.final_output.model_dump(), indent=4)
    print(f"Output -> \n{json_output}\n\n")

This loop continuously listens for user input and processes each request by routing it to the right agent (summary, product, or news). It combines the user’s query with the target URL, runs the logic, and then prints the structured result in JSON format using json. Import it with:

import json

Amazing! Your OpenAI Agents SDK integration with Bright Data’s Web Unlocker API is now complete.

Step #10: Put It All Together

Your scraper.py file should now contain:

import asyncio
from agents import Agent, RunResult, Runner, function_tool
import requests
from pydantic import BaseModel
from markdownify import markdownify as md
from dotenv import load_dotenv
import os
from typing import Optional, List
import json

# Load the environment variables from the .env file
load_dotenv()

# Define the Pydantic output models for your AI agent
class Summary(BaseModel):
    summary: str

class Product(BaseModel):
    name: str
    price: Optional[float] = None
    currency: Optional[str] = None
    ratings: Optional[int] = None
    rating_score: Optional[float] = None

class News(BaseModel):
    title: str
    subtitle: Optional[str] = None
    authors: Optional[List[str]] = None
    text: str
    publication_date: Optional[str] = None

@function_tool
def get_page_content(url: str) -> str:
    """
    Retrieves the HTML content of a given web page using Bright Data's Web Unlocker API,
    bypassing anti-bot protections. The response is converted from raw HTML to Markdown
    for easier and cheaper processing.

    Args:
        url (str): The URL of the web page to scrape.

    Returns:
        str: The Markdown-formatted content of the requested page.
    """

    # Read the Bright Data's Web Unlocker API token from the envs
    BRIGHT_DATA_WEB_UNLOCKER_API_TOKEN = os.getenv("BRIGHT_DATA_WEB_UNLOCKER_API_TOKEN")

    # Configure the Web Unlocker API call
    api_url = "https://api.brightdata.com/request"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {BRIGHT_DATA_WEB_UNLOCKER_API_TOKEN}"
    }
    data = {
        "zone": "unblocker",
        "url": url,
        "format": "raw"
    }

    # Make the call to Web Uncloker to retrieve the unblocked HTML of the target page
    response = requests.post(api_url, headers=headers, data=json.dumps(data))

    # Extract the raw HTML response
    html = response.text

    # Convert the HTML to markdown and return it
    markdown_text = md(html)

    return markdown_text

# Define the individual OpenAI agents
summarization_agent = Agent(
    name="Text Summarization Agent",
    instructions="You are a content summarization agent that summarizes the input text.",
    tools=[get_page_content],
    output_type=Summary,
)

product_info_agent = Agent(
    name="Product Information Agent",
    instructions="You are a product parsing agent that extracts product details from text.",
    tools=[get_page_content],
    output_type=Product,
)

news_info_agent = Agent(
    name="News Information Agent",
    instructions="You are a news parsing agent that extracts relevant news details from text.",
    tools=[get_page_content],
    output_type=News,
)

# Define a high-level routing agent that delegates tasks to the appropriate specialized agent
routing_agent = Agent(
    name="Routing Agent",
    instructions=(
        "You are a high-level decision-making agent. Based on the user's request, "
        "hand off the task to the appropriate agent."
    ),
    handoffs=[summarization_agent, product_info_agent, news_info_agent],
)

async def run():
    # Keep iterating until the use type "exit"
    while True:
        # Read the user's request
        request = input("Your request -> ")
        # Stops the execution if the user types "exit"
        if request.lower() in ["exit"]:
            print("Exiting the agent...")
            break
        # Read the page URL to operate on
        url = input("Page URL -> ")

        # Routing the user's request to the right agent
        output = await Runner.run(routing_agent, input=f"{request} {url}")
        # Conver the agent's output to a JSON string
        json_output = json.dumps(output.final_output.model_dump(), indent=4)
        print(f"Output -> \n{json_output}\n\n")


if __name__ == "__main__":
    asyncio.run(run())

Et voilà! In just over 100 lines of Python, you have built an AI agent that can:

  • Summarize content from any web page
  • Extract product information from any e-commerce site
  • Pull out news details from any online article

Time to see it in action!

Step #11: Test the AI Agent

To start your AI agent, run:

python agent.py

Now, suppose you want to summarize the content from Bright Data’s AI services hub. Just enter a request like this:

The input to get a summary of Bright Data’s AI services

Below is the result in JSON format you will get:

The summary returned by your AI agent

This time, assume you want to retrieve product data from an Amazon product page, like the PS5 listing:

The Amazon PS5 page

Normally, the Amazon CAPTCHA and anti-bot systems would block your request. Thanks to the Web Unlocker API, your AI agent can access and parse the page without getting blocked:

Getting Amazon product data

The output will be:

{
    "name": "PlayStation\u00ae5 console (slim)",
    "price": 499.0,
    "currency": "USD",
    "ratings": 6321,
    "rating_score": 4.7
}

That is the exact product data from the Amazon page!

Finally, consider you want to get structured news info from a Yahoo News article:

The target Yahoo News article

Achieve your goal with the following input:

Your request -> Give me news info
Page URL -> https://www.yahoo.com/news/pope-francis-dies-88-080859417.html

The result will be:

{
    "title": "Pope Francis Dies at 88",
    "subtitle": null,
    "authors": [
        "Nick Vivarelli",
        "Wilson Chapman"
    ],
    "text": "Pope Francis, the 266th Catholic Church leader who tried to position the church to be more inclusive, died on Easter Monday, Vatican officials confirmed. He was 88. (omitted for brevity...)",
    "publication_date": "Mon, April 21, 2025 at 8:08 AM UTC"
}

Once again, the AI agent delivers precise data—and thanks to Web Unlocker, there are no blocks from the news site!

Conclusion

In this blog post, you learned how to use the OpenAI Agents SDK in combination with a web unlocking API to build a highly effective web agent in Python.

As demonstrated, combining the OpenAI SDK with Bright Data’s Web Unlocker API helps you create AI agents that can reliably operate on truly any web page. This is just one example of how Bright Data’s products and services can support powerful AI integrations.

Explore our solutions for AI agent development:

  • Autonomous AI agents: Search, access, and interact with any website in real-time using a powerful set of APIs.
  • Vertical AI apps: Build reliable, custom data pipelines to extract web data from industry-specific sources.
  • Foundation models: Access compliant, web-scale datasets to power pre-training, evaluation, and fine-tuning.
  • Multimodal AI: Tap into the world’s largest repository of images, videos, and audio—optimized for AI.
  • Data providers: Connect with trusted providers to source high-quality, AI-ready datasets at scale.
  • Data packages: Get curated, ready-to-use datasets—structured, enriched, and annotated.

For more information, explore our full range of AI products.

Create a Bright Data account and try all our products and services for AI agent development!

No credit card required