Unified Search Agent

<< Zurück zu Vorlagen

Unified Search Agent

Repo besuchen Im Sandbox öffnen

Stack

LangGraph

Gemini 2.0 Flash

Bright Data MCP

Pydantic

LangGraph Studio

Features

The core logic defined in src/agent/graph.py orchestrates a sophisticated search workflow that:

Intent Classification: Uses Gemini 2.0 Flash to classify queries into four categories:
- general_search: News, facts, definitions, explanations
- product_search: Shopping, prices, reviews, recommendations
- web_scraping: Data extraction from specific websites
- comparison: Comparing multiple items or services
Multi-Modal Search:
- Google Search: Via Bright Data’s MCP search engine for general queries
- Web Scraping: Using Bright Data’s Web Unlocker for targeted data extraction
- Smart Routing: Automatically chooses the best search strategy based on intent
Result Processing:
- Sanitizes and deduplicates results
- Scores results on relevance and quality
- Returns configurable top N results with confidence scores
- Provides query summaries
Error Handling: Graceful fallbacks and comprehensive error management

Architecture

The agent follows a sophisticated graph-based workflow:

START Intent Classifier [Google Search | Web Unlocker] Final Processing END

Routing Logic:

URLs in query Direct to Web Unlocker
general_search Google Search only
product_search Google Search then Web Scraping
web_scraping Web Unlocker only
comparison Both search methods in parallel

Tech Stack

LangGraph
Gemini 2.0 Flash
Bright Data MCP
Pydantic
LangGraph Studio

Getting Started

Install dependencies along with the LangGraph CLI:

cd unified-search-agent
pip install -e . "langgraph-cli[inmem]"

Set up environment variables. Create a .env file with your API keys:

cp .env.example .env

Add your API keys to the .env file:

# Required
GOOGLE_API_KEY=your_gemini_api_key_here
BRIGHT_DATA_API_TOKEN=your_bright_data_token_here

# Optional zones (defaults provided)
WEB_UNLOCKER_ZONE=unblocker
BROWSER_ZONE=scraping_browser

# Optional - for LangSmith tracing
LANGSMITH_API_KEY=lsv2...

Start the LangGraph Server:

langgraph dev

Open LangGraph Studio at the URL provided (typically https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024)

For more information on getting started with LangGraph Server, see here.

Usage Examples

Basic Search

{
 "query": "Who is Or Lenchner",
 "max_results": 3
}

Product Search

{
 "query": "best laptops under $1000",
 "max_results": 5
}

Web Scraping

{
 "query": "extract contact info from https://example.com",
 "max_results": 10
}

Comparison Query

{
 "query": "iPhone 15 vs Samsung Galaxy S24 comparison",
 "max_results": 5
}

Configuration

The agent supports several configurable parameters:

max_results: Number of final results to return (default: 5)
Query-specific routing: URLs in queries automatically trigger web scraping
Search strategies: Automatically determined by intent classification

How to Customize

Modify Intent Classification: Update the categories and examples in intent_classifier_node() in src/agent/nodes.py
Adjust Search Strategies: Modify the routing logic in src/agent/graph.py to change how different intents are handled
Customize Result Scoring: Update the scoring criteria in final_processing_node() to change how results are ranked
Add New Search Sources: Extend the graph with additional search nodes for other data sources
Configure Parameters: Modify the Configuration class in graph.py to expose additional runtime parameters

Development

While iterating on your graph in LangGraph Studio, you can:

Edit past state and rerun from previous states to debug specific nodes
Hot reload – local changes are automatically applied
Create new threads using the + button to clear previous history
Visual debugging – see the exact flow and state at each step

The graph structure allows for easy debugging of:

Intent classification accuracy
Search result quality
Routing decisions
Final result scoring

Result Format

The agent returns structured results with comprehensive scoring:

{
 "final_results": [
 {
 "title": "Result Title",
 "url": "https://example.com",
 "snippet": "Relevant description...",
 "source": "google_search",
 "relevance_score": 0.95,
 "quality_score": 0.88,
 "final_score": 0.92,
 "metadata": {
 "search_engine": "google",
 "via": "bright_data_mcp",
 "query": "original query"
 }
 }
 ],
 "query_summary": "Found information about...",
 "total_processed": 8,
 "intent": "general_search",
 "intent_confidence": 0.95
}

Advanced Features

Parallel Processing: Comparison queries execute both search methods simultaneously
Intelligent Fallbacks: Graceful error handling with default responses
Duplicate Detection: Automatic deduplication of results across sources
URL Validation: Filters out invalid or empty URLs
Content Sanitization: Cleans and validates all text content

For more advanced features and examples, refer to the LangGraph documentation.

LangGraph Studio integrates with LangSmith for in-depth tracing and team collaboration, allowing you to analyze and optimize your search agent’s performance.

Dependencies

langgraph>=0.2.6: Core orchestration framework
langchain-google-genai: Gemini integration for LLM operations
pydantic>=2.0.0: Data validation and parsing
mcp-use: MCP client for Bright Data integration
langchain-core: Core LangChain utilities
python-dotenv>=1.0.1: Environment variable management

Contributing

Fork the repository
Create a feature branch
Make your changes
Test with LangGraph Studio
Submit a pull request

License

This project is licensed under the MIT License – see the LICENSE file for details.