Features
The core logic defined in src/agent/graph.py orchestrates a sophisticated search workflow that:
-
Intent Classification: Uses Gemini 2.0 Flash to classify queries into four categories:
general_search: News, facts, definitions, explanationsproduct_search: Shopping, prices, reviews, recommendationsweb_scraping: Data extraction from specific websitescomparison: Comparing multiple items or services
-
Multi-Modal Search:
- Google Search: Via Bright Data’s MCP search engine for general queries
- Web Scraping: Using Bright Data’s Web Unlocker for targeted data extraction
- Smart Routing: Automatically chooses the best search strategy based on intent
-
Result Processing:
- Sanitizes and deduplicates results
- Scores results on relevance and quality
- Returns configurable top N results with confidence scores
- Provides query summaries
-
Error Handling: Graceful fallbacks and comprehensive error management
Architecture
The agent follows a sophisticated graph-based workflow:
START Intent Classifier [Google Search | Web Unlocker] Final Processing END
Routing Logic:
- URLs in query Direct to Web Unlocker
general_searchGoogle Search onlyproduct_searchGoogle Search then Web Scrapingweb_scrapingWeb Unlocker onlycomparisonBoth search methods in parallel
Tech Stack
- LangGraph
- Gemini 2.0 Flash
- Bright Data MCP
- Pydantic
- LangGraph Studio
Getting Started
- Install dependencies along with the LangGraph CLI:
cd unified-search-agent
pip install -e . "langgraph-cli[inmem]"
- Set up environment variables. Create a
.envfile with your API keys:
cp .env.example .env
Add your API keys to the .env file:
# Required
GOOGLE_API_KEY=your_gemini_api_key_here
BRIGHT_DATA_API_TOKEN=your_bright_data_token_here
# Optional zones (defaults provided)
WEB_UNLOCKER_ZONE=unblocker
BROWSER_ZONE=scraping_browser
# Optional - for LangSmith tracing
LANGSMITH_API_KEY=lsv2...
- Start the LangGraph Server:
langgraph dev
- Open LangGraph Studio at the URL provided (typically
https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024)
For more information on getting started with LangGraph Server, see here.
Usage Examples
Basic Search
{
"query": "Who is Or Lenchner",
"max_results": 3
}
Product Search
{
"query": "best laptops under $1000",
"max_results": 5
}
Web Scraping
{
"query": "extract contact info from https://example.com",
"max_results": 10
}
Comparison Query
{
"query": "iPhone 15 vs Samsung Galaxy S24 comparison",
"max_results": 5
}
Configuration
The agent supports several configurable parameters:
max_results: Number of final results to return (default: 5)- Query-specific routing: URLs in queries automatically trigger web scraping
- Search strategies: Automatically determined by intent classification
How to Customize
-
Modify Intent Classification: Update the categories and examples in
intent_classifier_node()insrc/agent/nodes.py -
Adjust Search Strategies: Modify the routing logic in
src/agent/graph.pyto change how different intents are handled -
Customize Result Scoring: Update the scoring criteria in
final_processing_node()to change how results are ranked -
Add New Search Sources: Extend the graph with additional search nodes for other data sources
-
Configure Parameters: Modify the
Configurationclass ingraph.pyto expose additional runtime parameters
Development
While iterating on your graph in LangGraph Studio, you can:
- Edit past state and rerun from previous states to debug specific nodes
- Hot reload – local changes are automatically applied
- Create new threads using the
+button to clear previous history - Visual debugging – see the exact flow and state at each step
The graph structure allows for easy debugging of:
- Intent classification accuracy
- Search result quality
- Routing decisions
- Final result scoring
Result Format
The agent returns structured results with comprehensive scoring:
{
"final_results": [
{
"title": "Result Title",
"url": "https://example.com",
"snippet": "Relevant description...",
"source": "google_search",
"relevance_score": 0.95,
"quality_score": 0.88,
"final_score": 0.92,
"metadata": {
"search_engine": "google",
"via": "bright_data_mcp",
"query": "original query"
}
}
],
"query_summary": "Found information about...",
"total_processed": 8,
"intent": "general_search",
"intent_confidence": 0.95
}
Advanced Features
- Parallel Processing: Comparison queries execute both search methods simultaneously
- Intelligent Fallbacks: Graceful error handling with default responses
- Duplicate Detection: Automatic deduplication of results across sources
- URL Validation: Filters out invalid or empty URLs
- Content Sanitization: Cleans and validates all text content
For more advanced features and examples, refer to the LangGraph documentation.
LangGraph Studio integrates with LangSmith for in-depth tracing and team collaboration, allowing you to analyze and optimize your search agent’s performance.
Dependencies
langgraph>=0.2.6: Core orchestration frameworklangchain-google-genai: Gemini integration for LLM operationspydantic>=2.0.0: Data validation and parsingmcp-use: MCP client for Bright Data integrationlangchain-core: Core LangChain utilitiespython-dotenv>=1.0.1: Environment variable management
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test with LangGraph Studio
- Submit a pull request
License
This project is licensed under the MIT License – see the LICENSE file for details.