Back to blog
OpenAI Agents SDK Integration With Oxylabs Web Scraper API


Vytenis Kaubrė
Back to blog
Vytenis Kaubrė
OpenAI just made it easier to orchestrate AI agent workflows with the release of production-ready Agents SDK. Integrating Oxylabs Web Scraper API significantly cuts web search costs and ensures real-time access to web feed for a ceiling price of $1.35/1k results. Follow this guide to see how you can leverage both tools for building smart AI agents.
The Agents SDK is a powerful open-source developer toolkit for building goal-driven, autonomous agents powered by OpenAI’s LLMs. Replacing the experimental Swarm SDK, it offers a streamlined way to create agents that can use tools, maintain memory, and make multi-step decisions to accomplish complex tasks.
Under the hood, the SDK utilizes Responses API and Chat Completions API and features tools for custom function calls, web search, file search, and computer use, on top of other capabilities. Developers can also monitor and debug agent behavior in real-time and continuously improve performance using the dashboard's evaluation tool.
Let’s create a basic agent workflow that scrapes text from any website using Web Scraper API and implements Agents SDK to either summarize the content or parse product information.
Claim your 1-week free trial to test Web Scraper API for your project needs.
In your Python project, run the following pip command.
pip install requests pydantic openai-agents
The requests library will handle HTTP communication with Web Scraper API, pydantic will enforce data structure validation, and openai-agents will orchestrate AI assistant workflows.
Before coding, use your terminal to configure environment variables for API authentication and run the echo command to confirm everything is in place. Use the Web Scraper API credentials you’ve created in the Oxylabs dashboard.
setx OXYLABS_CREDENTIALS "username:password"
setx OPENAI_API_KEY "your-openai-api-key"
Exit and open a new terminal session to print the values:
echo %OXYLABS_CREDENTIALS%
echo %OPENAI_API_KEY%
echo "export OXYLABS_CREDENTIALS='username:password'" >> ~/.zshrc
echo "export OPENAI_API_KEY='your-openai-api-key'" >> ~/.zshrc
source ~/.zshrc
echo $OXYLABS_CREDENTIALS
echo $OPENAI_API_KEY
Note: This snippet uses Zsh (~/.zshrc). If you’re using another shell, update the corresponding config file (e.g., ~/.bashrc).
Inside a new Python script file, import the libraries as follows.
import asyncio
import os
import requests
from agents import Agent, RunResult, Runner, function_tool
from pydantic import BaseModel
Pydantic models help you define the structure for agent outputs, ensuring data consistency and validation. HtmlSummary captures general content summaries with a single text field, while Product structures specific item details with title, price, and availability fields.
class HtmlSummary(BaseModel):
summary: str
class Product(BaseModel):
title: str
price: float
availability: str
Define a function tool that agents can use to extract public website content. Here, the os.environ.get pulls your API credentials from the environment variable. Pay attention to the payload dictionary, which lets you add more API parameters and refine the logic of parsing instructions according to your needs. Feel free to visit our documentation to learn more.
@function_tool
def scrape_website(url: str) -> str:
'''Scrapes the website and returns the text.
Args:
url (str): The URL of the website to scrape.
Returns:
str: The text of the website.
'''
USERNAME, PASSWORD = os.environ.get('OXYLABS_CREDENTIALS').split(':')
# Structure payload.
payload = {
'source': 'universal',
'url': url, # Link to the target website.
'render': 'html',
'parse': True,
'parsing_instructions': {
'text': {
'_fns': [
{
'_fn': 'xpath',
'_args': ['//*//text()'] # Extract all text.
}
]
}
}
}
for attempt in range(3):
try:
# Send a POST request using the Realtime integration method.
response = requests.post(
'https://realtime.oxylabs.io/v1/queries',
auth=(USERNAME, PASSWORD),
json=payload
)
# Check if status code is 200.
status = response.json()['results'][0]['status_code']
if status == 200:
return response.json()['results'][0]['content']['text']
except Exception as e:
print(e)
# If all retries failed
print(f'Status code: {status}. Stopping execution.')
return ''
Create as many specialized AI agents as your application requires to handle specific tasks, each configured with appropriate instructions and output types. This example showcases two simple agents for demonstration.
summarization_agent = Agent(
name='Product summarization Agent',
instructions='You are a summarization agent that summarizes the text.',
tools=[scrape_website],
output_type=HtmlSummary,
)
product_parsing_agent = Agent(
name='Product Parsing Agent',
instructions='You are a product parsing agent that parses the text.',
tools=[scrape_website],
output_type=Product,
)
Next, set up an entry point agent that analyzes user intent and routes requests to specialized agents for efficient task delegation.
triage_agent = Agent(
name='Triage Agent',
instructions=(
'You are a triage agent that decides '
'what to do with the user intention.'
),
handoffs=[product_parsing_agent, summarization_agent],
)
Construct a function that utilizes the previously defined Pydantic classes to format raw agent responses into user-friendly output based on the result type.
def format_output(output: RunResult) -> str:
if isinstance(output.final_output, Product):
return (
f'{output.final_output.title} - '
f'{output.final_output.price} - '
f'{output.final_output.availability}'
)
elif isinstance(output.final_output, HtmlSummary):
return output.final_output.summary
return str(output.final_output)
Finally, let’s make a very simple interactive command-line interface for communicating with the agent.
async def _run() -> None:
while True:
question = input('\033[1;34mQuestion ->\033[0m ')
if question.lower() in ['exit']:
print('\033[1;31mExiting the assistant.\033[0m')
break
url = input('\033[1;34mURL ->\033[0m ')
output = await Runner.run(triage_agent, input=f'{question} {url}')
print(
'\033[1;33mAnswer -> ' +
f'\n\033[1;32m{format_output(output)}\033[0m'
)
if __name__ == '__main__':
asyncio.run(_run())
By now, you should have compiled the following script that’s ready for execution:
import asyncio
import os
import requests
from agents import Agent, RunResult, Runner, function_tool
from pydantic import BaseModel
class HtmlSummary(BaseModel):
summary: str
class Product(BaseModel):
title: str
price: float
availability: str
@function_tool
def scrape_website(url: str) -> str:
'''Scrapes the website and returns the text.
Args:
url (str): The URL of the website to scrape.
Returns:
str: The text of the website.
'''
USERNAME, PASSWORD = os.environ.get('OXYLABS_CREDENTIALS').split(':')
# Structure payload.
payload = {
'source': 'universal',
'url': url, # Link to the target website.
'render': 'html',
'parse': True,
'parsing_instructions': {
'text': {
'_fns': [
{
'_fn': 'xpath',
'_args': ['//*//text()'] # Extract all text.
}
]
}
}
}
for attempt in range(3):
try:
# Send a POST request using the Realtime integration method.
response = requests.post(
'https://realtime.oxylabs.io/v1/queries',
auth=(USERNAME, PASSWORD),
json=payload
)
# Check if status code is 200.
status = response.json()['results'][0]['status_code']
if status == 200:
return response.json()['results'][0]['content']['text']
except Exception as e:
print(e)
# If all retries failed
print(f'Status code: {status}. Stopping execution.')
return ''
summarization_agent = Agent(
name='Product summarization Agent',
instructions='You are a summarization agent that summarizes the text.',
tools=[scrape_website],
output_type=HtmlSummary,
)
product_parsing_agent = Agent(
name='Product Parsing Agent',
instructions='You are a product parsing agent that parses the text.',
tools=[scrape_website],
output_type=Product,
)
triage_agent = Agent(
name='Triage Agent',
instructions=(
'You are a triage agent that decides '
'what to do with the user intention.'
),
handoffs=[product_parsing_agent, summarization_agent],
)
def format_output(output: RunResult) -> str:
if isinstance(output.final_output, Product):
return (
f'{output.final_output.title} - '
f'{output.final_output.price} - '
f'{output.final_output.availability}'
)
elif isinstance(output.final_output, HtmlSummary):
return output.final_output.summary
return str(output.final_output)
async def _run() -> None:
while True:
question = input('\033[1;34mQuestion ->\033[0m ')
if question.lower() in ['exit']:
print('\033[1;31mExiting the assistant.\033[0m')
break
url = input('\033[1;34mURL ->\033[0m ')
output = await Runner.run(triage_agent, input=f'{question} {url}')
print(
'\033[1;33mAnswer -> ' +
f'\n\033[1;32m{format_output(output)}\033[0m'
)
if __name__ == '__main__':
asyncio.run(_run())
When executed, this code enables you to submit a question with a target website URL, automatically routing your query to either the summarization or product parsing agent.
With Agents SDK and Web Scraper API, you've got a powerful combination that unlocks endless possibilities for AI agent development. This integration provides cost-effective web access, enabling you to build specialized agents that can search, analyze, and transform real-time web data into insights. You can try the API at no cost by activating your free trial through the dashboard.
If you have any questions, don’t hesitate to contact us via live chat or email.
About the author
Vytenis Kaubrė
Technical Copywriter
Vytenis Kaubrė is a Technical Copywriter at Oxylabs. His love for creative writing and a growing interest in technology fuels his daily work, where he crafts technical content and web scrapers with Oxylabs’ solutions. Off duty, you might catch him working on personal projects, coding with Python, or jamming on his electric guitar.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Roberta Aukstikalnyte
2025-01-23
Get the latest news from data gathering world
Scale up your business with Oxylabs®