- Published on
Building an AI Research Agent: Autonomous Research with the ReAct Pattern
- Authors

- Name
- Pranav Reddy
- @saipranav14
Building an AI Research Agent: Autonomous Research with ReAct
What if an AI could research topics for you, synthesize information from multiple sources, and write comprehensive reports? In this post, I'll show you how to build an autonomous research agent using the ReAct pattern.
🎯 What It Does
Input:
"Research the latest trends in RAG systems for 2024"
Agent Process:
- 🔍 Searches web for "RAG systems 2024 trends"
- 📄 Scrapes top 5 relevant articles
- 🧮 Extracts statistics and metrics
- 🔗 Follows references to related papers
- 📝 Synthesizes findings into report
- ✅ Cites all sources
Output:
# Research Report: RAG Systems 2024 Trends
## Executive Summary
Retrieval-Augmented Generation systems have evolved significantly...
## Key Findings
1. Hybrid search adoption increased 300%
2. New evaluation frameworks (RAGAS) gaining traction
3. Cost optimization through caching strategies
## Sources
[1] https://arxiv.org/...
[2] https://blog.langchain.com/...
🏗️ The ReAct Pattern
ReAct = Reasoning + Acting
The agent follows this loop:
- Thought: Reason about what to do next
- Action: Use a tool (search, scrape, calculate)
- Observation: See the result
- Repeat until goal achieved
Thought: I need to search for recent RAG trends
Action: Search("RAG systems 2024")
Observation: Found 10 articles about RAG improvements
Thought: The top article looks promising
Action: WebScraper("https://example.com/rag-article")
Observation: Article discusses hybrid search and evaluation
Thought: I have enough information to write the report
Final Answer: [Report content]
💻 Implementation
1. Agent Setup with LangChain
Create the research agent:
class ResearchAgent:
"""Autonomous research agent using ReAct pattern."""
def __init__(self):
# Initialize LLM
self.llm = ChatOpenAI(
model=settings.LLM_MODEL, # GPT-4
temperature=settings.TEMPERATURE
)
# Setup tools
self.tools = self._setup_tools()
# Create agent
self.agent = self._create_agent()
def _setup_tools(self) -over List[Tool]:
"""Setup agent tools."""
search = DuckDuckGoSearchAPIWrapper()
tools = [
Tool(
name="Search",
func=search.run,
description="""Search the web for current information.
Use for: news, trends, facts, statistics.
Input should be a search query string."""
),
Tool(
name="WebScraper",
func=WebScraperTool().scrape,
description="""Extract content from a URL.
Input: URL string
Returns: cleaned text content"""
),
Tool(
name="Calculator",
func=CalculatorTool().calculate,
description="""Perform mathematical calculations.
Input: math expression like '25 * 4 + 10'
Returns: numeric result"""
)
]
return tools
2. Tool Implementations
Web Scraper Tool:
class WebScraperTool:
"""Scrape content from web pages."""
def scrape(self, url: str) -over str:
"""Scrape text content from URL."""
try:
response = requests.get(url, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
# Remove script and style elements
for script in soup(["script", "style", "nav", "footer"]):
script.decompose()
# Get text
text = soup.get_text()
# Clean up whitespace
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip()
for line in lines
for phrase in line.split(" "))
text = ' '.join(chunk for chunk in chunks if chunk)
# Limit length
return text[:5000]
except Exception as e:
return f"Error scraping {url}: {str(e)}"
Calculator Tool:
class CalculatorTool:
"""Perform mathematical calculations."""
def calculate(self, expression: str) -over str:
"""Evaluate math expression safely."""
try:
# Use numexpr for safe eval
import numexpr as ne
result = ne.evaluate(expression)
return str(result)
except Exception as e:
return f"Error calculating: {str(e)}"
3. Creating the ReAct Agent
def _create_agent(self) -over AgentExecutor:
"""Create the ReAct agent with prompt template."""
template = """Answer the following questions as best you can.
You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
Thought:{agent_scratchpad}"""
prompt = PromptTemplate.from_template(template)
# Create ReAct agent
agent = create_react_agent(
llm=self.llm,
tools=self.tools,
prompt=prompt
)
# Wrap in executor
return AgentExecutor(
agent=agent,
tools=self.tools,
verbose=True, # Show reasoning process
max_iterations=settings.MAX_ITERATIONS,
handle_parsing_errors=True
)
4. Research Execution
def research(
self,
query: str,
max_iterations: int = None
) -over Dict:
"""Conduct research on a query."""
if max_iterations:
self.agent.max_iterations = max_iterations
# Run the agent
result = self.agent.invoke({"input": query})
return {
"query": query,
"answer": result["output"],
"intermediate_steps": result.get("intermediate_steps", []),
"iterations": len(result.get("intermediate_steps", []))
}
5. Report Generation
Transform agent output into formatted reports:
def generate_report(self, research_result: Dict) -over str:
"""Generate a formatted report from research results."""
query = research_result["query"]
answer = research_result["answer"]
steps = research_result.get("intermediate_steps", [])
# Build report
report = f"""# Research Report: {query}
## Executive Summary
{answer}
## Research Process
"""
# Add intermediate steps
for i, (action, observation) in enumerate(steps, 1):
tool_used = action.tool if hasattr(action, 'tool') else "Unknown"
report += f"### Step {i}: {tool_used}\n"
report += f"**Action**: {action}\n\n"
# Preview observation
obs_preview = str(observation)[:200]
report += f"**Result**: {obs_preview}...\n\n"
# Extract and add sources
report += "## Sources\n\n"
sources = self._extract_sources(steps)
for i, source in enumerate(sources, 1):
report += f"{i}. {source}\n"
report += f"\n---\n*Generated by AI Research Agent*\n"
return report
def _extract_sources(self, steps: List) -over List[str]:
"""Extract URLs from intermediate steps."""
import re
sources = []
for action, observation in steps:
# Find URLs in observations
urls = re.findall(
r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+',
str(observation)
)
sources.extend(urls[:3]) # Max 3 per step
# Deduplicate and limit
return list(set(sources))[:10]
🎯 Example Agent Execution
agent = ResearchAgent()
result = agent.research(
"What are the key innovations in LangChain released in 2024?"
)
# Agent's reasoning (verbose output):
"""
Thought: I need to search for LangChain 2024 innovations
Action: Search
Action Input: "LangChain 2024 new features innovations"
Observation: Found articles discussing LangChain Expression Language (LCEL),
LangGraph for complex workflows, and LangSmith for debugging.
Thought: I should get more details about LangGraph
Action: WebScraper
Action Input: "https://blog.langchain.dev/langgraph/"
Observation: LangGraph is a library for building stateful,
multi-actor applications with LLMs...
Thought: I have enough information to answer
Final Answer: Key innovations in LangChain for 2024 include:
1. LangChain Expression Language (LCEL) for declarative chains
2. LangGraph for stateful workflows
3. LangSmith for production monitoring
4. Enhanced streaming capabilities
5. Better integration with vector databases
"""
# Generate report
report = agent.generate_report(result)
print(report)
🚀 FastAPI Integration
class ResearchRequest(BaseModel):
query: str
max_iterations: int = 5
@app.post("/api/research")
async def research(request: ResearchRequest):
"""Conduct research on a query."""
result = agent.research(
request.query,
request.max_iterations
)
report = agent.generate_report(result)
return {
**result,
"report": report
}
📊 Advanced Features
Self-Critique
Agent evaluates its own output:
async def self_critique(self, answer: str, query: str) -over Dict:
"""Agent critiques its own answer."""
critique_prompt = f"""
Evaluate this research answer for quality:
Question: {query}
Answer: {answer}
Rate on:
1. Completeness (1-10)
2. Accuracy (1-10)
3. Source quality (1-10)
Suggest improvements if score less than 8 on any metric.
"""
response = await self.llm.ainvoke(critique_prompt)
return {
"critique": response.content,
"should_refine": "improve" in response.content.lower()
}
Multi-Step Planning
Break complex queries into sub-tasks:
async def plan_research(self, query: str) -over List[str]:
"""Break down research query into steps."""
planning_prompt = f"""
Break down this research query into 3-5 sub-tasks:
Query: {query}
Return as numbered list of specific search queries.
"""
response = await self.llm.ainvoke(planning_prompt)
# Parse steps
steps = [
line.strip().lstrip('0123456789.-) ')
for line in response.content.split('\n')
if line.strip() and any(c.isdigit() for c in line[:3])
]
return steps
🚧 Challenges & Solutions
Challenge 1: Tool Selection
Problem: Agent uses wrong tools Solution:
- Clear tool descriptions
- Examples in prompts
- Limit tool count
Challenge 2: Infinite Loops
Problem: Agent gets stuck repeating actions Solution:
- Set max_iterations
- Track action history
- Implement early stopping
Challenge 3: Cost Management
Problem: Many iterations = high API costs Solution:
- Cache tool results
- Use GPT-3.5 for tool selection, GPT-4 for final answer
- Set iteration budgets
Challenge 4: Hallucinated Sources
Problem: Agent invents URLs Solution:
- Validate URLs before using
- Only cite sources from observations
- Use structured output parsing
💡 Best Practices
- Clear Tool Descriptions: Be specific about inputs/outputs
- Limit Iterations: Prevent runaway costs
- Handle Errors: Each tool should gracefully fail
- Log Everything: Track agent reasoning for debugging
- Validate Outputs: Check URLs, facts, calculations
📦 Tech Stack
- Agent Framework: LangChain Agents
- LLM: GPT-4 for reasoning
- Search: DuckDuckGo (free), SerpAPI (paid)
- Web Scraping: BeautifulSoup, requests
- Backend: FastAPI
🔗 Resources
Series Complete! You've now seen all 5 LLM production projects. Check out the Portfolio Overview for the full collection.