In this guide, you will learn:
- What the OpenAI Agents SDK is
- Why integrating it with a web unlocker service is key to maximizing its effectiveness
- How to build a Python agent using the OpenAI Agents SDK and a Web Unlocker API in a detailed step-by-step tutorial
Let’s dive in!
What Is OpenAI Agents SDK?
The OpenAI Agents SDK is an open-source Python library from OpenAI. It is designed for building agent-based AI applications in a simple, lightweight, and production-ready way. This library is a refined evolution of OpenAI’s earlier experimental project called Swarm.
The OpenAI Agents SDK focuses on providing just a few core primitives with minimal abstraction:
- Agents: LLMs paired with specific instructions and tools to perform tasks
- Handoffs: To let agents delegate tasks to other agents when needed
- Guardrails: To validate agent inputs to ensure they meet expected formats or conditions
These building blocks, combined with Python’s flexibility, make it easy to define complex interactions between agents and tools.
The SDK also includes built-in tracing, so you can visualize, debug, and evaluate your agent workflows. It even supports fine-tuning models for your specific use cases.
Biggest Limitations of This Approach to Building AI Agents
Most AI agents aim at automating operations on web pages, whether it is retrieving content or interacting with elements on a page. In other words, they need to programmatically browse the Web.
Aside from potential misunderstandings from the AI model itself, the biggest challenge these agents face is dealing with websites’ protective measures. The reason is that many sites implement anti-bot and anti-scraping technologies that can block or mislead AI agents. This is especially true today, as anti-AI CAPTCHAs and advanced bot detection systems become increasingly common.
So, is this the end of the road for AI web agents? Absolutely not!
To overcome these barriers, you need to enhance your agent’s ability to navigate the Web by integrating it with a solution like Bright Data’s Web Unlocker API. This tool works with any HTTP client or solution that connects to the Internet (including AI agents), acting as a web-unlocking gateway. It delivers clean, unblocked HTML from any webpage. No more CAPTCHAs, IP bans, or blocked content.
See why combining the OpenAI Agents SDK with the Web Unlocker API is the ultimate strategy for building powerful, web-savvy AI agents!
How to Integrate Agents SDK with a Web Unlocker API
In this guided section, you will learn how to integrate the OpenAI Agents SDK with Bright Data’s Web Unlocker API to build an AI agent capable of:
- Summarizing the text of any web page
- Retrieving structured product data from e-commerce websites
- Gathering key information from news articles
To achieve that, the agent will instruct the OpenAI Agents SDK to use the Web Unlocker API as an engine for fetching the content of any web page. Once the content is retrieved, the agent will apply AI logic to extract and format the data as needed for each of the above tasks.
Disclaimer: The above three use cases are just examples. The approach shown here can be extended to many other scenarios by customizing the agent’s behavior.
Follow the steps below to build an AI scraping agent in Python using the OpenAI Agents SDK and Bright Data’s Web Unlocker API for high performance!
Prerequisites
Before diving into this tutorial, make sure you have the following:
- Python 3 or higher installed locally
- An active Bright Data account
- An active OpenAI account
- A basic understanding of how HTTP requests work
- Some knowledge of how Pydantic models work
- A general idea of how AI agents function
Do not worry if everything is not set up just yet. You will be guided through the setup in the next sections.
Step #1: Project Setup
Before we begin, make sure you have Python 3 installed on your system. If not, download Python and follow the installation instructions for your operating system.
Open your terminal and create a new folder for your scraping agent project:
The openai-sdk-agent
folder will contain all the code for your Python-based, Agents SDK-powered agent.
Navigate into the project folder and set up a virtual environment:
Load the project folder in your favorite Python IDE. Visual Studio Code with the Python extension or PyCharm Community Edition are great choices.
Inside the openai-sdk-agent
folder, create a new Python file called agent.py
. Your folder structure should now look like this:
Currently, scraper.py
is a blank Python script but it will soon contain the desired AI agent logic.
In the IDE’s terminal, activate the virtual environment. In Linux or macOS, run this command:
Equivalently, on Windows, execute:
You are all set! You now have a Python environment to build a powerful AI agent using the OpenAI Agents SDK and a web unlocker.
Step #2: Install the Project’s Dependencies and Get Started
This project uses the following Python libraries:
openai-agents
: The OpenAI Agents SDK, used to build AI agents in Python.requests
: To connect to Bright Data’s Web Unlocker API and fetch the HTML content of a web page that the AI agent will operate on. Learn more in our guide on mastering the Python Requests library.pydantic
: To define structured output models, allowing the agent to return data in a clear and validated format.markdownify
: To convert raw HTML content into clean Markdown. (We will explain why this is useful soon.)python-dotenv
: To load environment variables from a.env
file. That is where we’ll store secrets for OpenAI and Bright Data.
In an activated virtual environment, installe them all with:
Now, initialize scraper.py
with the following imports and async boilerplate code:
Wonderful! Time to load environment variables.
Step #3: Set Up Environment Variables Reading
Add a .env
file in your project folder:
This file will hold your environment variables, such as API keys and secret tokens. To load the environment variables from the .env
file, use load_dotenv()
from the dotenv
package:
You can now read specific environment variables using os.getenv()
like this:
Do not forget to import os
from the Python standard library:
Great! The environment variables are ready to be read.
Step #4: Set Up OpenAI Agents SDK
You need a valid OpenAI API key to utilize the OpenAI Agents SDK. If you have not generated one yet, follow OpenAI’s official guide to create your API key.
Once you have it, add the key to your .env
file like this:
Be sure to replace the <YOUR_OPENAI_KEY>
placeholder with your actual key.
No additional setup is required, as the openai-agents
SDK is designed to automatically read the API key from the OPENAI_API_KEY
env.
Step #5: Set Up Web Unlocker API
If you have not already, create a Bright Data account. Otherwise, simply log in.
Next, read Bright Data’s official Web Unlocker documentation to retrieve your API token. Alternatively, follow the steps below.
In your Bright Data “User Dashboard” page, press the “Get proxy products” option:
In the products table, locate the row labeled “unblocker” and click on it:
⚠️Note: You will have to create a new Web Unblocker API zone first if you haven’t done it yet. Go over the Web Unblocker setup documentation to get started.
On the “unlocker” page, copy your API token using the clipboard icon:
Also, make sure the toggle in the top-right corner is switched to “On,” which indicates that the Web Unlocker product is active.
Under the “Configuration” tab, ensure these options are enabled for optimal effectiveness:
In the .env
file, add the following environment variable:
Replace the placeholder with your actual API token.
Perfect! You can now use both the OpenAI SDK and Bright Data’s Web Unlocker API in your project.
Step #6: Create the Web Page Content Extraction Function
Create a get_page_content()
function that:
- Reads the
BRIGHT_DATA_WEB_UNLOCKER_API_TOKEN
environment variable - Uses
requests
to send a request to Bright Data’s Web Unlocker API using the provided URL - Retrieves the raw HTML returned by the API
- Converts the HTML to Markdown and returns it
This is how you can implement the above logic:
Note 1: The function must be annotated with @function_tool
. This special decorator tells the OpenAI Agents SDK that this function can be used as a tool by an agent to perform specific actions. In this case, the function acts as the “engine” the agent can utilize to retrieve the content of the web page it will operate on.
Note 2: The get_page_content()
function must declare the input types explicitly.
If you omit them, you will get an error like: Error getting response: Error code: 400 - {'error': {'message': "Invalid schema for function 'get_page_content': In context=('properties', 'url'), schema must have a 'type' key.``"
Now, you might be wondering: why convert raw HTML to Markdown? The answer is simple—performance efficiency and cost-effectiveness!
HTML is highly verbose and often includes unnecessary elements like scripts, styles, and metadata. That is content that AI agents typically do not need. If your agent only needs the essentials like text, links, and images, Markdown provides a much cleaner and more compact representation.
In detail, the HTML-to-Markdown transformation can reduce the input size by up to 99%, saving both:
- Tokens, which lowers the cost when using OpenAI models
- Processing time, since models work faster on smaller inputs
For more insight, read the article “Why Are the New AI Agents Choosing Markdown Over HTML?”
Step #7: Define the Data Models
To work properly, OpenAI SDK agents require Pydantic models to define the expected structure of their output data. Now, remember that the agent we are building can return one of three possible outputs:
- A summary of the page
- Product information
- News article information
So, let’s define three corresponding Pydantic models:
Note: The use of Optional
makes your agent more robust and general-purpose. Not all pages will include every piece of data defined in the schema, so this flexibility helps prevent errors when fields are missing.
Do not forget to import Optional
and List
from typing
:
Fantastic! You are now ready to build your agent’s logic.
Step #8: Initialize the Agent logic
Use the Agent
class from the openai-agents
SDK to define the three specialized agents:
Each agent:
- Includes a clear instruction string that describes what it is supposed to do. This is what the OpenAI Agents SDK will use to guide the agent’s behavior.
- Employs
get_page_content()
as a tool to retrieve the input data (i.e., the content of the web page). - Returns its output in one of the Pydantic models (
Summary
,Product
, orNews
) defined earlier.
To automatically route user requests to the correct specialized agent, define a higher-level agent:
This is the agent you will interrogate in your run()
function to drive the AI agent logic.
Step #9: Implement the Execution Loop
In the run()
function, add the following loop to launch your AI agent logic:
This loop continuously listens for user input and processes each request by routing it to the right agent (summary, product, or news). It combines the user’s query with the target URL, runs the logic, and then prints the structured result in JSON format using json
. Import it with:
Amazing! Your OpenAI Agents SDK integration with Bright Data’s Web Unlocker API is now complete.
Step #10: Put It All Together
Your scraper.py
file should now contain:
Et voilà! In just over 100 lines of Python, you have built an AI agent that can:
- Summarize content from any web page
- Extract product information from any e-commerce site
- Pull out news details from any online article
Time to see it in action!
Step #11: Test the AI Agent
To start your AI agent, run:
Now, suppose you want to summarize the content from Bright Data’s AI services hub. Just enter a request like this:
Below is the result in JSON format you will get:
This time, assume you want to retrieve product data from an Amazon product page, like the PS5 listing:
Normally, the Amazon CAPTCHA and anti-bot systems would block your request. Thanks to the Web Unlocker API, your AI agent can access and parse the page without getting blocked:
The output will be:
That is the exact product data from the Amazon page!
Finally, consider you want to get structured news info from a Yahoo News article:
Achieve your goal with the following input:
The result will be:
Once again, the AI agent delivers precise data—and thanks to Web Unlocker, there are no blocks from the news site!
Conclusion
In this blog post, you learned how to use the OpenAI Agents SDK in combination with a web unlocking API to build a highly effective web agent in Python.
As demonstrated, combining the OpenAI SDK with Bright Data’s Web Unlocker API helps you create AI agents that can reliably operate on truly any web page. This is just one example of how Bright Data’s products and services can support powerful AI integrations.
Explore our solutions for AI agent development:
- Autonomous AI agents: Search, access, and interact with any website in real-time using a powerful set of APIs.
- Vertical AI apps: Build reliable, custom data pipelines to extract web data from industry-specific sources.
- Foundation models: Access compliant, web-scale datasets to power pre-training, evaluation, and fine-tuning.
- Multimodal AI: Tap into the world’s largest repository of images, videos, and audio—optimized for AI.
- Data providers: Connect with trusted providers to source high-quality, AI-ready datasets at scale.
- Data packages: Get curated, ready-to-use datasets—structured, enriched, and annotated.
For more information, explore our full range of AI products.
Create a Bright Data account and try all our products and services for AI agent development!
No credit card required