Web Scraping With LangGraph Tutorial

Setting up the environment

To start off, let’s first install all of the necessary libraries that will be needed for the integrations that we set out to cover. We will also use OpenAI as the LLM provider, but you can use any sort of model supported by the LangChain framework.

Inside a new project’s folder, initialize a Python virtual environment and activate it:

python -m venv .venv
source .venv/bin/activate

Then, run the following pip command to install the libraries in the activated .venv:

pip install -U langchain-oxylabs "langchain[openai]" langgraph langchain-mcp-adapters requests python-dotenv

Next, save your Oxylabs credentials and the LLM key as environment variables. You can easily do so by creating a .env file in your project’s directory and storing the authentication details as shown below:

OXYLABS_USERNAME=your-username
OXYLABS_PASSWORD=your-password
OPENAI_API_KEY=your-openai-key

You may also save environment variables system-wide using your terminal and thus skip the dotenv library altogether.

Integrating via oxylabs-langchain module

To showcase the integration, let’s first define a multi-agent supervisor workflow that would perform a sort of a data enrichment process for a given company.

#!/usr/bin/env python3


import os
from dotenv import load_dotenv
from langgraph.prebuilt import create_react_agent
from langgraph_supervisor import create_supervisor
from langchain_oxylabs import OxylabsSearchAPIWrapper, OxylabsSearchRun
from langchain_openai import ChatOpenAI


load_dotenv()


# Initialize search tool
search = OxylabsSearchRun(
   wrapper=OxylabsSearchAPIWrapper(
       oxylabs_username=os.getenv("OXYLABS_USERNAME"),
       oxylabs_password=os.getenv("OXYLABS_PASSWORD")
   )
)


# Create agent
research_agent = create_react_agent(
   model="openai:gpt-4.1",
   tools=[search],
   name="research_agent",
   prompt=(
       "You are a research agent helping another LLM agent to find information about a given query. "
       "You have access to a search tool that can search the web for information. "
       "You will use the search tool to find information about the query. "
       "You will then use the information to help the other LLM agent. "
       "You will only use the search tool to find information about the query. "
       "You will not use the search tool to find information about anything else."
   )
)


# Initialize the model for supervisor
supervisor_model = ChatOpenAI(model="gpt-4.1", temperature=0)


# Create supervisor workflow
workflow = create_supervisor(
   agents=[research_agent],
   model=supervisor_model,
   prompt=(
       "You are a Company Intelligence Supervisor responsible for orchestrating comprehensive company data enrichment processes. "
       "Your primary objective is to create detailed company profiles by coordinating with your research agent.\n\n"
      
       "CORE MISSION:\n"
       "Transform basic company information into comprehensive business intelligence profiles through systematic data enrichment.\n\n"
      
       "WORKFLOW PROCESS:\n"
       "1. INITIAL ANALYSIS: Parse the company input (name, domain, or basic info)\n"
       "2. RESEARCH COORDINATION: Direct your research agent to gather comprehensive company data\n"
       "3. DATA SYNTHESIS: Organize findings into a structured company profile\n"
       "4. QUALITY ASSURANCE: Ensure completeness and accuracy of the enriched profile\n"
       "5. FINAL DELIVERY: Present a professional company intelligence report\n\n"
      
       "RESEARCH AREAS TO COVER:\n"
       "• Company Overview: Legal name, founding date, headquarters, business model\n"
       "• Financial Information: Revenue, funding rounds, valuation, financial health\n"
       "• Industry & Market: Sector classification, market position, competitors\n"
       "• Leadership: Key executives, founders, board members\n"
       "• Products & Services: Core offerings, product lines, key differentiators\n"
       "• Technology Stack: Known technologies, platforms, technical infrastructure\n"
       "• Recent News: Latest developments, partnerships, acquisitions, press releases\n"
       "• Contact Information: Official website, social media, contact details\n"
       "• Employee Information: Company size, key departments, hiring trends\n"
       "• Regulatory & Compliance: Industry regulations, certifications, legal status\n\n"
      
       "DELEGATION INSTRUCTIONS:\n"
       "Always delegate research tasks to your research_agent with specific, targeted queries. "
       "Break down complex research into multiple focused searches for comprehensive coverage.\n\n"
      
       "OUTPUT FORMAT:\n"
       "Deliver results as a structured company profile with clear sections, bullet points, and actionable intelligence. "
       "Include data sources and confidence levels where appropriate.\n\n"
      
       "QUALITY STANDARDS:\n"
       "• Accuracy: Verify information through multiple sources when possible\n"
       "• Completeness: Address all requested profile areas\n"
       "• Timeliness: Focus on recent and current information\n"
       "• Relevance: Prioritize actionable business intelligence\n"
       "• Professional Format: Present findings in a clear, executive-ready format"
   )
)




def main():
   app = workflow.compile()


   user_input = "Oxylabs"
  
   # Invoke the agent
   result = app.invoke({"messages": user_input})
  
   # Print the agent's response
   print(result["messages"][-1].content)


if __name__ == "__main__":
   main()

Here we have created a two agent team, one of which performs any kind of research that is requested and another that oversees the whole process of data collection for a given company.

Let’s look at what it can gather:

The report looks quite extensive and well-researched.

Integrating via Oxylabs MCP server

Another way of integrating Oxylabs as a tool for an LLM is through a MCP server. To use it in our code, we need to install a package that would allow us to run it, uv is that sort of thing.

Install the uv package in your environment by following this installation guide. For example, on macOS, you can use Homebrew:

brew install uv

Now, let’s add the MCP code to our agent orchestration code and change up the conversation type a bit, so that the user could ask for multiple data collection tasks consecutively.

#!/usr/bin/env python3


import os
import asyncio
from dotenv import load_dotenv
from langchain_mcp_adapters.sessions import create_session
from langchain_mcp_adapters.tools import load_mcp_tools
from langgraph_supervisor import create_supervisor
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI




load_dotenv()


# Define the MCP server config
config = {
   "transport": "stdio",
   "command": "uvx",
   "args": ["oxylabs-mcp"],
   "env": {
       "OXYLABS_USERNAME": os.getenv("OXYLABS_USERNAME"),
       "OXYLABS_PASSWORD": os.getenv("OXYLABS_PASSWORD"),
   }
}


async def main():
   # Initialize an MCP server session and load the tools
   async with create_session(config) as session:
       await session.initialize()
       tools = await load_mcp_tools(session)


       # Create an AI agent that uses the MCP server tools
   research_agent = create_react_agent(
       model="openai:gpt-4.1",
       tools=tools,
       name="research_agent",
       prompt="You are a research agent helping another LLM agent to find information about a given query. You have access to a search tool that can search the web for information. You will use the search tool to find information about the query. You will then use the information to help the other LLM agent. You will only use the search tool to find information about the query. You will not use the search tool to find information about anything else."
   )


   # Initialize the model for supervisor
   supervisor_model = ChatOpenAI(model="gpt-4.1", temperature=0)


   # Create supervisor workflow
   workflow = create_supervisor(
       agents=[research_agent],
       model=supervisor_model,
       prompt=(
           "You are a Company Intelligence Supervisor responsible for orchestrating comprehensive company data enrichment processes. "
           "Your primary objective is to create detailed company profiles by coordinating with your research agent.\n\n"
          
           "CORE MISSION:\n"
           "Transform basic company information into comprehensive business intelligence profiles through systematic data enrichment.\n\n"
          
           "WORKFLOW PROCESS:\n"
           "1. INITIAL ANALYSIS: Parse the company input (name, domain, or basic info)\n"
           "2. RESEARCH COORDINATION: Direct your research agent to gather comprehensive company data\n"
           "3. DATA SYNTHESIS: Organize findings into a structured company profile\n"
           "4. QUALITY ASSURANCE: Ensure completeness and accuracy of the enriched profile\n"
           "5. FINAL DELIVERY: Present a professional company intelligence report\n\n"
          
           "RESEARCH AREAS TO COVER:\n"
           "• Company Overview: Legal name, founding date, headquarters, business model\n"
           "• Financial Information: Revenue, funding rounds, valuation, financial health\n"
           "• Industry & Market: Sector classification, market position, competitors\n"
           "• Leadership: Key executives, founders, board members\n"
           "• Products & Services: Core offerings, product lines, key differentiators\n"
           "• Technology Stack: Known technologies, platforms, technical infrastructure\n"
           "• Recent News: Latest developments, partnerships, acquisitions, press releases\n"
           "• Contact Information: Official website, social media, contact details\n"
           "• Employee Information: Company size, key departments, hiring trends\n"
           "• Regulatory & Compliance: Industry regulations, certifications, legal status\n\n"
          
           "DELEGATION INSTRUCTIONS:\n"
           "Always delegate research tasks to your research_agent with specific, targeted queries. "
           "Break down complex research into multiple focused searches for comprehensive coverage.\n\n"
          
           "OUTPUT FORMAT:\n"
           "Deliver results as a structured company profile with clear sections, bullet points, and actionable intelligence. "
           "Include data sources and confidence levels where appropriate.\n\n"
          
           "QUALITY STANDARDS:\n"
           "• Accuracy: Verify information through multiple sources when possible\n"
           "• Completeness: Address all requested profile areas\n"
           "• Timeliness: Focus on recent and current information\n"
           "• Relevance: Prioritize actionable business intelligence\n"
           "• Professional Format: Present findings in a clear, executive-ready format"
       )
   )


   app = workflow.compile()


       # A loop to ask questions and get answers
   while True:
       question = input("\nQuestion -> ")
       if question == "exit":
           break
       result = await app.ainvoke({"messages": question})
       print(f"\n{result['messages'][-1].content}\n")


if __name__ == "__main__":
   asyncio.run(main())

And if we run the code, we can see the little bit of change we added, as well as the MCP integration working:

Conclusion

Langgraph is one of the powerful AI agent frameworks that enables you to orchestrate multi-agent workflows that can accomplish complex tasks by encouraging a divide-and-conquer approach. Combined with Oxylabs’ Web Scraper API, these workflows can easily process vast amounts of data in seconds and provide you with detailed analyses.

Interested in content like this? Check out more integrations, such as CrewAI integration and AutoGen integration, on our website.

Please be aware that this is a third-party tool not owned or controlled by Oxylabs. Each third-party provider is responsible for its own software and services. Consequently, Oxylabs will have no liability or responsibility to you regarding those services. Please carefully review the third party's policies and practices and/or conduct due diligence before accessing or using third-party services.

Frequently Asked Questions

What is LangGraph?

LangGraph is an open-source framework for building, deploying, and managing complex generative AI agent workflows. Instead of linear chains, you define workflows as directed graphs of nodes (agent functions) and edges (state transitions). It supports streaming outputs, checkpointed state persistence, and human-in-the-loop interventions for complex or long-running AI tasks.

What are the key differences between LangChain and LangGraph?

LangChain organizes LLM calls in mostly linear or tree-structured chains, ideal for straightforward prompting or tool use. See LangChain integration for more useful information. LangGraph, by contrast, uses graph-based orchestration, allowing nodes to loop back, share state, and run agents concurrently. It also provides built-in persistence layers and finer control over execution flow.

What are some real-world applications of LangGraph?

Advanced chatbots: Route user queries between specialized sub-agents, maintain multi-turn context, and recover via checkpointed state.
Automated web scraping pipelines: Coordinate scraping tools (like Oxylabs Web Scraper API) with LLMs to extract, filter, and summarize data in one stateful workflow.
Multi-agent retrieval systems: Run multiple retrieval agents for tasks such as news monitoring or compliance checks, aggregating their outputs into a unified decision graph.

How to choose the right framework for my use case?

Start by evaluating your project’s complexity and requirements (simple projects and one-off tasks need simple, linear-focused libraries while complex projects will benefit from graph-based systems Consider ecosystem maturity, documentation quality, and available integrations.

To help you learn more and compare different AI frameworks, check our comparison blog posts: CrewAI vs AutoGen, LangGraph vs LangChain, n8n vs Flowise.

Get the latest news from data gathering world

I'm interested

ISO/IEC 27001:2017 certified products:

Proxy Solutions

Scraper APIs

Get Web Scraper API for $1.35/1K results

Company

About us Our values Affiliate program Service partners Press area Residential Proxies sourcing Careers OxyCon®Project 4beta Sustainability Community

Proxies

Datacenter Proxies Dedicated Datacenter Proxies Residential Proxies SOCKS5 Proxies Mobile Proxies ISP Proxies Private Proxies Free Proxies

Advanced proxy solutions

Web Unblocker