Back to blog

OpenAI Agents SDK Integration With Oxylabs Web Scraper API

OpenAI Agents SDK Integration With Oxylabs Web Scraper API
vytenis kaubre avatar

Vytenis Kaubrė

2025-03-143 min read
Share

OpenAI just made it easier to orchestrate AI agent workflows with the release of production-ready Agents SDK. Integrating Oxylabs Web Scraper API significantly cuts web search costs and ensures real-time access to web feed for a ceiling price of $1.35/1k results. Follow this guide to see how you can leverage both tools for building smart AI agents.

What is Agents SDK?

The Agents SDK is a powerful open-source developer toolkit for building goal-driven, autonomous agents powered by OpenAI’s LLMs. Replacing the experimental Swarm SDK, it offers a streamlined way to create agents that can use tools, maintain memory, and make multi-step decisions to accomplish complex tasks.

Under the hood, the SDK utilizes Responses API and Chat Completions API and features tools for custom function calls, web search, file search, and computer use, on top of other capabilities. Developers can also monitor and debug agent behavior in real-time and continuously improve performance using the dashboard's evaluation tool.

How to create AI agents with Agents SDK

Let’s create a basic agent workflow that scrapes text from any website using Web Scraper API and implements Agents SDK to either summarize the content or parse product information.

Get a free trial

Claim your 1-week free trial to test Web Scraper API for your project needs.

  • 5K requests
  • No credit card is required

1. Install prerequisites

In your Python project, run the following pip command.

pip install requests pydantic openai-agents

The requests library will handle HTTP communication with Web Scraper API, pydantic will enforce data structure validation, and openai-agents will orchestrate AI assistant workflows.

2. Set up environment variables

Before coding, use your terminal to configure environment variables for API authentication and run the echo command to confirm everything is in place. Use the Web Scraper API credentials you’ve created in the Oxylabs dashboard.

Windows

setx OXYLABS_CREDENTIALS "username:password"
setx OPENAI_API_KEY "your-openai-api-key"

Exit and open a new terminal session to print the values:

echo %OXYLABS_CREDENTIALS%
echo %OPENAI_API_KEY%

macOS/Linux

echo "export OXYLABS_CREDENTIALS='username:password'" >> ~/.zshrc
echo "export OPENAI_API_KEY='your-openai-api-key'" >> ~/.zshrc
source ~/.zshrc
echo $OXYLABS_CREDENTIALS
echo $OPENAI_API_KEY

Note: This snippet uses Zsh (~/.zshrc). If you’re using another shell, update the corresponding config file (e.g., ~/.bashrc).

3. Import libraries

Inside a new Python script file, import the libraries as follows.

import asyncio
import os

import requests
from agents import Agent, RunResult, Runner, function_tool
from pydantic import BaseModel

4. Define data models with Pydantic

Pydantic models help you define the structure for agent outputs, ensuring data consistency and validation. HtmlSummary captures general content summaries with a single text field, while Product structures specific item details with title, price, and availability fields.

class HtmlSummary(BaseModel):
    summary: str

class Product(BaseModel):
    title: str
    price: float
    availability: str

5. Create a web scraping function tool

Define a function tool that agents can use to extract public website content. Here, the os.environ.get pulls your API credentials from the environment variable. Pay attention to the payload dictionary, which lets you add more API parameters and refine the logic of parsing instructions according to your needs. Feel free to visit our documentation to learn more.

@function_tool
def scrape_website(url: str) -> str:
    '''Scrapes the website and returns the text.
    
    Args:
        url (str): The URL of the website to scrape.
        
    Returns:
        str: The text of the website.
    '''
    USERNAME, PASSWORD = os.environ.get('OXYLABS_CREDENTIALS').split(':')

    # Structure payload.
    payload = {
        'source': 'universal',
        'url': url, # Link to the target website.
        'render': 'html',
        'parse': True,
        'parsing_instructions': {
            'text': {
                '_fns': [
                    {
                        '_fn': 'xpath',
                        '_args': ['//*//text()'] # Extract all text.
                    }
                ]
            }
        }
    }

    for attempt in range(3):
        try:
            # Send a POST request using the Realtime integration method.
            response = requests.post(
                'https://realtime.oxylabs.io/v1/queries',
                auth=(USERNAME, PASSWORD),
                json=payload
            )

            # Check if status code is 200.
            status = response.json()['results'][0]['status_code']
            if status == 200:
                return response.json()['results'][0]['content']['text']
            
        except Exception as e:
            print(e)

    # If all retries failed
    print(f'Status code: {status}. Stopping execution.')
    return ''

6. Initialize specialized agents

Create as many specialized AI agents as your application requires to handle specific tasks, each configured with appropriate instructions and output types. This example showcases two simple agents for demonstration.

summarization_agent = Agent(
    name='Product summarization Agent',
    instructions='You are a summarization agent that summarizes the text.',
    tools=[scrape_website],
    output_type=HtmlSummary,
)

product_parsing_agent = Agent(
    name='Product Parsing Agent',
    instructions='You are a product parsing agent that parses the text.',
    tools=[scrape_website],
    output_type=Product,
)

7. Create a triage agent

Next, set up an entry point agent that analyzes user intent and routes requests to specialized agents for efficient task delegation.

triage_agent = Agent(
    name='Triage Agent',
    instructions=(
        'You are a triage agent that decides '
        'what to do with the user intention.'
    ),
    handoffs=[product_parsing_agent, summarization_agent],
)

8. Define output formatting

Construct a function that utilizes the previously defined Pydantic classes to format raw agent responses into user-friendly output based on the result type.

def format_output(output: RunResult) -> str:
    if isinstance(output.final_output, Product):
        return (
            f'{output.final_output.title} - '
            f'{output.final_output.price} - '
            f'{output.final_output.availability}'
        )
    elif isinstance(output.final_output, HtmlSummary):
        return output.final_output.summary
    return str(output.final_output)

9. Implement the main execution loop

Finally, let’s make a very simple interactive command-line interface for communicating with the agent.

async def _run() -> None:
    while True:
        question = input('\033[1;34mQuestion ->\033[0m ')
        if question.lower() in ['exit']:
            print('\033[1;31mExiting the assistant.\033[0m')
            break
        url = input('\033[1;34mURL ->\033[0m ')

        output = await Runner.run(triage_agent, input=f'{question} {url}')
        print(
            '\033[1;33mAnswer -> ' + 
            f'\n\033[1;32m{format_output(output)}\033[0m'
        )


if __name__ == '__main__':
    asyncio.run(_run())

Full code and live demonstration

By now, you should have compiled the following script that’s ready for execution:

import asyncio
import os

import requests
from agents import Agent, RunResult, Runner, function_tool
from pydantic import BaseModel


class HtmlSummary(BaseModel):
    summary: str

class Product(BaseModel):
    title: str
    price: float
    availability: str


@function_tool
def scrape_website(url: str) -> str:
    '''Scrapes the website and returns the text.
    
    Args:
        url (str): The URL of the website to scrape.
        
    Returns:
        str: The text of the website.
    '''
    USERNAME, PASSWORD = os.environ.get('OXYLABS_CREDENTIALS').split(':')

    # Structure payload.
    payload = {
        'source': 'universal',
        'url': url, # Link to the target website.
        'render': 'html',
        'parse': True,
        'parsing_instructions': {
            'text': {
                '_fns': [
                    {
                        '_fn': 'xpath',
                        '_args': ['//*//text()'] # Extract all text.
                    }
                ]
            }
        }
    }

    for attempt in range(3):
        try:
            # Send a POST request using the Realtime integration method.
            response = requests.post(
                'https://realtime.oxylabs.io/v1/queries',
                auth=(USERNAME, PASSWORD),
                json=payload
            )

            # Check if status code is 200.
            status = response.json()['results'][0]['status_code']
            if status == 200:
                return response.json()['results'][0]['content']['text']
            
        except Exception as e:
            print(e)

    # If all retries failed
    print(f'Status code: {status}. Stopping execution.')
    return ''


summarization_agent = Agent(
    name='Product summarization Agent',
    instructions='You are a summarization agent that summarizes the text.',
    tools=[scrape_website],
    output_type=HtmlSummary,
)

product_parsing_agent = Agent(
    name='Product Parsing Agent',
    instructions='You are a product parsing agent that parses the text.',
    tools=[scrape_website],
    output_type=Product,
)

triage_agent = Agent(
    name='Triage Agent',
    instructions=(
        'You are a triage agent that decides '
        'what to do with the user intention.'
    ),
    handoffs=[product_parsing_agent, summarization_agent],
)


def format_output(output: RunResult) -> str:
    if isinstance(output.final_output, Product):
        return (
            f'{output.final_output.title} - '
            f'{output.final_output.price} - '
            f'{output.final_output.availability}'
        )
    elif isinstance(output.final_output, HtmlSummary):
        return output.final_output.summary
    return str(output.final_output)


async def _run() -> None:
    while True:
        question = input('\033[1;34mQuestion ->\033[0m ')
        if question.lower() in ['exit']:
            print('\033[1;31mExiting the assistant.\033[0m')
            break
        url = input('\033[1;34mURL ->\033[0m ')

        output = await Runner.run(triage_agent, input=f'{question} {url}')
        print(
            '\033[1;33mAnswer -> ' + 
            f'\n\033[1;32m{format_output(output)}\033[0m'
        )


if __name__ == '__main__':
    asyncio.run(_run())

When executed, this code enables you to submit a question with a target website URL, automatically routing your query to either the summarization or product parsing agent.

Wrap up

With Agents SDK and Web Scraper API, you've got a powerful combination that unlocks endless possibilities for AI agent development. This integration provides cost-effective web access, enabling you to build specialized agents that can search, analyze, and transform real-time web data into insights. You can try the API at no cost by activating your free trial through the dashboard.

If you have any questions, don’t hesitate to contact us via live chat or email.

About the author

vytenis kaubre avatar

Vytenis Kaubrė

Technical Copywriter

Vytenis Kaubrė is a Technical Copywriter at Oxylabs. His love for creative writing and a growing interest in technology fuels his daily work, where he crafts technical content and web scrapers with Oxylabs’ solutions. Off duty, you might catch him working on personal projects, coding with Python, or jamming on his electric guitar.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested