Multi-agent AI math tutor built with LangGraph โ CRAG retrieval, episodic & semantic long-term memory, Tavily MCP web search, Google OAuth, and Neo4j-style memory graph. Powered by LLaMA 3.3 70B on Groq.
๐น Demo Video: https://www.loom.com/share/721d0bdb7c66442ca68a41b5b351dd10
๐ Deep Dive Blog Post (Medium): https://medium.com/@dikshant182004/mathtutor-deep-dive-cf5327141a90
- What Is This? โ Multi-Agent JEE Math Tutor
- Features
- Architecture โ 14-Node LangGraph Multi-Agent Pipeline
- Agent Pipeline
- Memory System โ Episodic, Semantic & Procedural LTM with Redis
- Hybrid CRAG Retrieval โ BM25 + Cohere Dense + Reciprocal Rank Fusion
- Agent Tools โ Tavily MCP Web Search, SymPy Calculator, FAISS RAG
- Interactive Memory Graph Visualiser โ Neo4j-style vis.js
- What We Store in Redis
- Project Structure
- Setup & Installation
- Running the App
- Testing
- Deployment
- What We Tried (But Didn't Work Out)
- Known Limitations
- Scope for Further Advancement
- Tech Stack
JEE Math Tutor is a full-stack AI tutoring system that goes well beyond a simple chat interface. It accepts text, image (OCR), or audio (ASR) input, routes the student's intent intelligently, solves problems step-by-step using a ReAct tool loop, verifies its own answers with a dedicated critic agent, and generates rich personalised explanations.
The system remembers students across sessions โ tracking which topics they struggle with, which solving strategies work for them, and what mistakes they commonly make โ and uses that memory to personalise every response.
Authentication uses Streamlit's native Google OIDC flow (st.login("google") + .streamlit/secrets.toml auth block). On login, each user is mapped to a stable Redis student_id namespace via get_or_create_user, so memory, threads, and checkpoints stay isolated per student.
For non-solve intents (explain, research, generate), the graph routes to direct_response_agent, which writes final_response and optional direct_response_tool_calls into state. The frontend stream handler reads these updates and renders the assistant answer directly in the chat UI, while also showing activity-panel tool cards for web-search-backed responses.
Compared to the original single-agent version described in the README, the system has been substantially upgraded:
- Full LTM (Long-Term Memory) across sessions: episodic, semantic, and procedural memory stored in Redis with vector similarity search
- Intent routing: six distinct intents (solve, explain, hint, formula_lookup, research, generate) with separate pipelines
- Direct Response Agent for non-solve intents โ skips the verifier/explainer pipeline entirely
- Hybrid CRAG โ BM25 sparse + Cohere dense + Reciprocal Rank Fusion, with corrective relevance filtering
- Two-API-key architecture โ separates the solver from all other agents to avoid Groq rate limits
- STM trimming โ rolling LLM summarisation keeps the context window under 8k tokens without losing history
- Interactive memory graph โ Neo4j-style vis.js visualisation of the student's entire memory graph
- Google OAuth โ full authentication with per-student Redis namespacing
- Safety agent โ output-level policy check before any response reaches the student
- Activity panel โ live sidebar showing every agent node, tool call, and payload as it streams
%%{init: {
'flowchart': { 'nodeSpacing': 70, 'rankSpacing': 90, 'curve': 'basis' },
'theme': 'base',
'themeVariables': {
'fontSize': '18px',
'primaryColor': '#111827',
'primaryTextColor': '#E5E7EB',
'primaryBorderColor': '#60A5FA',
'lineColor': '#93C5FD',
'tertiaryColor': '#0B1220'
}
}}%%
flowchart TD
USER(["๐ค Student\n(Text / Image / Audio)"])
APP["๐ฅ๏ธ Streamlit Frontend\napp.py"]
AUTH["๐ Google OAuth\nst.login('google')\nMap user -> student_id"]
USER -->|question| APP
APP -->|login gate| AUTH
AUTH -->|authenticated| LANGGRAPH
subgraph LANGGRAPH["๐ LangGraph StateGraph ยท AgentState ยท RedisSaver"]
direction TB
DETECT["๐ detect_input\nClassify: text / image / audio\nReset per-problem state"]
OCR["๐ท ocr_node\nGoogle Vision โ text + confidence"]
ASR["๐๏ธ asr_node\nGroq Whisper โ transcript + confidence"]
GUARD["๐ก๏ธ guardrail_agent\nRule-based + LLM topic check"]
LTM_R["๐ง retrieve_ltm\nEpisodic vector search\nSemantic + Procedural lookup"]
PARSER["๐งฉ parser_agent\nClean OCR/ASR noise\nExtract variables & constraints"]
ROUTER["๐บ๏ธ intent_router\nClassify intent (solve/explain/hint/\nformula_lookup/research/generate)\nPick strategy + difficulty"]
subgraph SOLVE["โ๏ธ Solver Pipeline"]
direction LR
SOLVER["๐งฎ solver_agent\nTwo-call RAG pattern\nLTM-personalised system prompt"]
TOOLS["๐ง tool_node\nRAG ยท Web Search ยท Calculator"]
SOLVER -- tool_calls --> TOOLS
end
DR["๐ฌ direct_response_agent\nexplain / hint / formula_lookup\nresearch / generate"]
VERIFIER["โ
verifier_agent\nStep-by-step correctness check\nRoutes retry / hitl / pass"]
SAFETY["๐ safety_agent\nOutput policy check"]
EXPLAINER["๐ explainer_agent\nStructured ExplainerOutput\nLTM-personalised explanation"]
HITL["๐ hitl_node\nbad_input ยท clarification\nverification ยท satisfaction\ninterrupt() checkpoint"]
LTM_S["๐พ store_ltm\nEpisodic ยท Semantic ยท Procedural\nFlow-gated: solver only for\nsemantic + procedural"]
DETECT -->|image| OCR
DETECT -->|audio| ASR
DETECT -->|text| GUARD
OCR --> GUARD
ASR --> GUARD
GUARD -->|passed| PARSER
GUARD -->|blocked| END1(["๐ซ END"])
PARSER -->|clear| LTM_R
PARSER -->|ambiguous| HITL
LTM_R --> ROUTER
ROUTER -->|solve/hint/formula_lookup| SOLVE
ROUTER -->|explain/research/generate| DR
DR --> SAFETY
SOLVE --> VERIFIER
VERIFIER -->|correct| SAFETY
VERIFIER -->|incorrect, iter < 3| SOLVER
VERIFIER -->|needs_human| HITL
TOOLS -- ToolMessage --> SOLVER
SAFETY -->|solve path| EXPLAINER
SAFETY -->|direct path| HITL
EXPLAINER --> HITL
HITL -->|satisfied| LTM_S
HITL -->|not satisfied| EXPLAINER
LTM_S --> END2(["๐ END"])
end
subgraph REDIS["๐๏ธ Redis Stack"]
STM["STM Checkpoints\nRedisSaver ยท LangGraph state\nper thread_id"]
LTM_DB["LTM Memory\nEpisodic JSON + HNSW vectors\nSemantic profile\nProcedural strategy table\nThread metadata ยท User registry"]
end
subgraph TOOLS_DETAIL["๐ง Solver Tools"]
RAG["๐ Hybrid CRAG\nCohere embed-english-v3.0\nFAISS IndexFlatIP\nBM25 sparse\nRRF fusion ยท cosine โฅ 0.30"]
WEB["๐ Web Search\nTavily MCP (remote)\nadvanced depth ยท 5 results"]
CALC["๐ Symbolic Calculator\nSymPy backend\nfactorials ยท precision ยท matrices"]
end
LANGGRAPH <-->|checkpoint every node| STM
LANGGRAPH <-->|retrieve / store| LTM_DB
TOOLS -.-> RAG
TOOLS -.-> WEB
TOOLS -.-> CALC
AUTH -->|stream_mode=updates| LANGGRAPH
APP -->|st.write_stream| USER
Source file reference:
| Node | File |
|---|---|
| Frontend entry | src/frontend/app.py |
| Graph | src/backend/agents/graph.py |
| Input / OCR / ASR | src/backend/agents/nodes/input.py |
| Guardrail | src/backend/agents/nodes/guardrail.py |
| Parser | src/backend/agents/nodes/parser.py |
| Intent Router | src/backend/agents/nodes/router.py |
| Solver | src/backend/agents/nodes/solver.py |
| Verifier | src/backend/agents/nodes/verifier.py |
| Safety | src/backend/agents/nodes/safety.py |
| Explainer | src/backend/agents/nodes/explainer.py |
| Direct Response | src/backend/agents/nodes/direct_response.py |
| HITL | src/backend/agents/nodes/hitl.py |
| Memory Manager | src/backend/agents/nodes/memory/memory_manager.py |
The graph has 14 nodes. Here is what each one does and what it writes to state.
Classifies the incoming input as text, image, or audio. Resets all per-problem state fields (solve_iterations, hitl flags, messages, etc.) so a new question always starts clean. Routes to OCR/ASR nodes or directly to the guardrail.
OCR uses Google Cloud Vision API. ASR uses Groq Whisper (whisper-large-v3). Both produce a confidence score. If confidence falls below 0.5 or the extracted text is empty, a bad_input HITL is triggered.
Two-stage input safety check:
- Stage 1 (rule-based): Pattern matches for prompt injection, extraction attempts, and PII (email, phone, Aadhaar). Zero LLM cost.
- Stage 2 (LLM): LLaMA 3.3 70B checks topic relevance against
topic_policy.yaml. Passes anything where mathematics is even loosely the subject. When in doubt, passes โ false positives are far more costly than false negatives.
Cleans OCR/ASR noise, normalises math notation (fractions, exponents, Greek letters, integrals), extracts variables and constraints. Sets needs_clarification=True only when the problem is genuinely unsolvable without more information โ not for hard or unusual problems. Routes to HITL if clarification is needed, otherwise to retrieve_ltm.
Runs before the intent router so the solver always has student context available. Three independent lookups:
- Episodic: Cohere vector search over past solved problems for this student, filtered by
student_idtag in the HNSW index - Semantic: Reads
weak_topics,strong_topics,mistake_patternsfrom the student's semantic profile - Procedural: Finds the highest-success-rate strategy for the current topic
Writes ltm_context to state. Populates the activity panel with a full breakdown.
Classifies the student's intent into one of six categories and picks a solving strategy. Routes solve/hint/formula_lookup to the solver pipeline and explain/research/generate to the direct response agent. Also sets topic, difficulty, and solver_strategy in solution_plan.
The most complex node. Two separate LLM calls when a PDF is uploaded:
- Call 1: Forces
rag_toolas the first action viatool_choiceโ the LLM must retrieve relevant passages from the student's notes before writing anything - Call 2: Receives RAG context as a plain
HumanMessage(not a tool message) to avoid Groq's tool validation, then writes the full solution using[calc, web]only
On retry iterations, RAG is skipped and verifier feedback is injected. The system prompt is personalised with LTM context (best strategy, weak areas, known mistakes, similar past problems). Uses a second Groq API key to avoid rate limiting the other agents.
LangGraph's built-in ToolNode executing calculator and web search tool calls from the solver's ReAct loop. RAG is handled inline in solver_agent itself (not through this node) to avoid tool validation issues with Groq.
Checks the solution on three criteria: step-by-step algebraic correctness (citing specific step numbers for errors), units and domain validity, and edge cases (division by zero, undefined log/sqrt, empty set). Routes to safety (correct), retry (incorrect, up to 3 attempts), or HITL (needs human expert).
Handles all non-solve intents with a single LLM call. For research and generate intents, calls Tavily web search synchronously before building the prompt. Returns structured markdown. Writes stub solver/verifier outputs so downstream nodes (store_ltm) don't crash.
Output-level policy check against output_policy.yaml. Two-stage: keyword fast path (no LLM cost) then LLM check. Only fires after the verifier confirms correctness โ prevents harmful content from reaching the student even if the solver was somehow manipulated.
Produces a structured ExplainerOutput (approach summary, step-by-step working with headings, key formulae, key concepts, common mistakes, difficulty rating). Personalises the explanation using LTM โ if the student has struggled with this topic or made specific mistakes before, those are called out explicitly. Renders to rich markdown with LaTeX.
Single suspension point for all human-in-the-loop scenarios. Uses LangGraph's interrupt() to checkpoint state and pause โ the student's browser can close and reopen and the graph resumes exactly where it left off. Four HITL types: bad_input, clarification, verification, satisfaction.
Called only when student_satisfied=True. Writes to three memory stores with a critical flow gate: episodic memory is written for all flows, but semantic and procedural memory are written only for solver-flow intents (solve/hint/formula_lookup). This prevents research-style strategy strings from polluting procedural memory.
The memory system has three layers, each serving a different purpose.
What: The live conversation state for one problem-solving session.
Where: Redis via LangGraph's RedisSaver checkpointer. Every node writes its output to a checkpoint automatically.
Key pattern: checkpoint:<thread_id>:*
TTL: 2 hours (matches STM_SUMMARY_TTL)
Trimming: When the message list exceeds 8,000 tokens (tiktoken gpt-4o encoding), older messages are summarised by a separate LLM call and replaced with a single AIMessage containing the rolling summary. The last 6 messages are always kept verbatim. The summary is also persisted to Redis at stm:summary:<thread_id> so it survives restarts within the TTL window.
LTM spans sessions and is written at the end of each solved problem when the student confirms satisfaction.
What: One record per solved problem โ a "memory of what happened."
Where: Redis JSON + RedisVL HNSW vector index
Key pattern: episodic:<student_id>:<episode_id>
TTL: 90 days
What's stored per episode:
| Field | Description |
|---|---|
student_id |
Hashed student identifier |
episode_id |
Millisecond timestamp (unique) |
topic |
e.g. geometry, calculus |
difficulty |
easy |
problem_summary |
First 200 chars of problem text |
final_answer |
e.g. ฯ/4, x = 3 |
outcome |
correct |
solve_attempts |
How many solver retries were needed |
timestamp |
Unix time of storage |
access_count |
Incremented each time this episode is retrieved |
decay_score |
Spaced-repetition forgetting curve score |
embedding |
1024-dim Cohere float32 vector of "{topic} {difficulty} {summary}" |
Retrieval: At the start of each new problem, the system embeds "{topic} {problem_text[:200]}" and runs a vector similarity search (HNSW, cosine) filtered by student_id. The top 3 most similar past problems are injected into the solver's system prompt.
Decay: decay_score = e^(-days_old / 30) ร log(1 + access_count + 1). Episodes retrieved often decay much more slowly (spaced repetition effect). Episodes below threshold (0.05) and older than 30 days are pruned manually via the admin panel.
What: The student's topic-level strength/weakness profile.
Where: Redis JSON
Key pattern: semantic:<student_id>
TTL: None (permanent)
What's stored:
| Field | Description |
|---|---|
weak_topics |
{topic: fail_count} โ incremented on incorrect attempts |
strong_topics |
{topic: success_count} โ incremented on correct outcomes |
mistake_patterns |
[{pattern, topic, count}] โ deduped by (pattern, topic) |
Struggle signal: Since store_ltm is only reached after a correct outcome, a multi-attempt session is handled by writing solve_attempts - 1 "incorrect" passes first, then the final "correct" pass. This is the only way to populate weak_topics without catching mid-session failures.
What: Which solving strategies work for this student on which topics.
Where: Redis JSON
Key pattern: procedural:<student_id>
TTL: None (permanent)
What's stored:
{
"strategy_success": {
"geometry": {
"Use the distance formula and verify collinearity": {
"success_count": 3,
"total_count": 4,
"attempts_sum": 5,
"success_rate": 0.75,
"attempts_avg": 1.25
}
}
}
}Best strategy selection: max(strategies, key=lambda kv: (success_rate, -attempts_avg)) โ prefers high success rate, breaks ties by fewest average attempts.
Flow gate: Only solver-flow intents write to procedural memory. Research/generate strategy strings (e.g. "Use web search to find examplesโฆ") are explicitly blocked by checking intent_type in ("solve", "hint", "formula_lookup") before every write. A secondary guard inside update_procedural_memory rejects strings over 120 characters or containing research keywords as a defence-in-depth measure.
The RAG system is a Corrective Retrieval-Augmented Generation (CRAG) pipeline that searches the student's uploaded PDF notes.
Ingestion: PyPDFLoader โ RecursiveCharacterTextSplitter (800 chars, 150 overlap) โ Cohere embed-english-v3.0 โ FAISS IndexFlatIP (in-memory, per thread). Calling ingest a second time appends to the existing index โ all uploaded PDFs are searched together.
Retrieval pipeline:
- Dense retrieval: Embed query with Cohere (query mode), search FAISS index for top 10 by cosine similarity
- Sparse retrieval: BM25Okapi on tokenised chunks, top 10 by BM25 score
- Reciprocal Rank Fusion: Merge dense and sparse rankings with
score = 1/(K + rank)where K=60, take top 5 - Corrective filter: Drop any chunk with cosine similarity < 0.30 โ these are almost certainly off-topic. This is the "C" in CRAG.
Two-call pattern in solver: To avoid Groq's tool validation error (which requires a ToolMessage for every AIMessage with tool_calls), RAG is handled inline rather than through the graph's ToolNode:
- Call 1 forces
rag_toolviatool_choice - The solver intercepts the tool call, executes RAG directly, then builds
messages_for_call2with the RAG result injected as a plainHumanMessagewith a sentinel prefix[RAG context retrieved from student's notes] - Call 2 binds only
[calculator, web_search]โ no rag_tool โ and writes the full solution
Query strategy: The LLM is instructed to query by concept/theorem/formula name (e.g. "Bayes theorem", "integration by parts formula") NOT by problem text. This ensures retrieval works even when the student's notes contain the formula without a matching example problem.
| Tool | When to use | Backend |
|---|---|---|
rag_tool |
First call on every problem when a PDF is uploaded | Cohere + FAISS + BM25 |
web_search_tool |
Recent discoveries, new JEE questions, theory lookups, when RAG returns empty | Tavily MCP (remote, mcp.tavily.com) |
calculator_tool |
Large factorials, high-precision decimals, large matrix operations ONLY | SymPy |
Web search is handled through the Tavily MCP server (mcp.tavily.com) โ a remote MCP endpoint that requires no local setup. The solver calls it via tavily_mcp_search, which wraps the MCP client call with search_depth="advanced" and returns a Tavily AI direct answer along with the top 5 ranked results (title, URL, snippet).
Multi-query strategy โ up to 3 calls per turn:
The solver is instructed to decompose a research task into up to three focused, distinct queries rather than firing one broad query:
- Query 1 โ core formula, theorem, or concept
- Query 2 โ worked example or step-by-step solution
- Query 3 โ edge case, common mistake, or real application (only if needed)
This mirrors how a student would actually research a topic โ first understanding the principle, then seeing it applied, then stress-testing the understanding. Each query is narrow and specific so Tavily's advanced search mode can surface high-quality results rather than generic overviews.
When the solver calls web_search_tool:
- Student asks about recent JEE Mains / Advanced questions on a topic
- Student asks about math Olympiad problems (IMO, Putnam, USAMO, RMO)
- Student asks for study resources, textbooks, or video explanations
- CRAG returned empty or insufficient context
- Any question requiring current or up-to-date information
When it should NOT be called:
- For computing math โ use own reasoning or
calculator_toolinstead - For topics already covered by the student's uploaded notes โ
rag_tooltakes priority
The calculator_tool wraps SymPy and is intentionally scoped to a narrow set of cases where symbolic or high-precision computation adds real value over the LLM's own arithmetic. The solver handles all routine JEE-level computation itself; the calculator is only invoked when the LLM's floating-point reasoning would be unreliable or slow.
The three valid use cases:
- Very large factorials / combinatorics โ e.g.
binomial(50, 25),factorial(100). These produce exact integers that are impractical to compute by hand or by LLM token prediction. - High-precision decimal results โ e.g.
N(integrate(1/sqrt(1-x**2), x), 50)for a 50-digit result when the problem explicitly demands precision beyond standard floating point. - Large matrix operations โ determinants, inverses, and eigenvalues for matrices too large to expand symbolically in-context: e.g.
Matrix([[1,2,3],[4,5,6],[7,8,9]]).det().
The tool calls sp.sympify(expression) followed by sp.N(expr) and returns the result as a plain string. Errors are caught and returned with a hint to check SymPy syntax, so a malformed expression never crashes the agent turn.
Expression syntax (valid SymPy strings):
binomial(50, 25)
factorial(100)
N(integrate(1/sqrt(1-x**2), x), 50)
Matrix([[1,2,3],[4,5,6],[7,8,9]]).det()
Note: The calculator does not have NumPy available. Use SymPy-native equivalents:
binomial(n,k),factorial(n),Matrix([[...]]).det(),sqrt(x)etc. Passing NumPy expressions will raise a calculator error.
The memory visualiser at /pages/memory_viz.py renders the student's complete memory graph as an interactive Neo4j-style network using vis.js.
What it shows:
- Student root node (star shape) with profile data
- Session thread nodes (hexagon) โ one per conversation
- Agent nodes per thread โ showing every node that ran and its payload
- Tool call nodes โ every RAG/web search/calculator call
- Episodic memory nodes โ one per solved problem, showing topic, difficulty, outcome, answer, decay score
- Semantic profile node โ with weak/strong topic children and mistake pattern children
- Procedural profile node โ with per-topic strategy children showing success rates
Interactivity:
- Click any node to open a detail panel on the right with all stored fields
- Double-click a node to select and fit to its direct neighbours
- Hover for glow effect and tooltip
- Physics engine (forceAtlas2 / barnesHut / repulsion) with live stabilisation โ freeze when stable
- Four layout presets: Radial, Hierarchical, Organic spread, Tight cluster
- Toggle node labels (L key), fit graph (F key), close panel (Esc)
- Export graph as PNG
- Filter by node type โ hide agent/tool nodes for a cleaner LTM-only view
- Adjust max threads shown (1โ30)
- Decay score colour-coded: green (โฅ 0.6), yellow (โฅ 0.3), red (< 0.3)
Redis Stack
โ
โโโ STM (LangGraph checkpointer)
โ โโโ checkpoint:<thread_id>:* LangGraph state snapshots
โ โโโ stm:summary:<thread_id> Rolling LLM summary, TTL 2h
โ
โโโ User Registry
โ โโโ user:<student_id> Hash: name, email, problems_solved, timestamps
โ โโโ threads:<student_id> Sorted set: thread_ids scored by timestamp
โ
โโโ Thread Metadata
โ โโโ thread:<thread_id>:meta Hash: problem_summary, topic, outcome, timestamps
โ
โโโ Episodic LTM
โ โโโ episodic:<student_id>:<episode_id> JSON doc + HNSW vector (1024-dim float32)
โ โโโ idx:episodic RedisVL HNSW vector index (COSINE, FLOAT32)
โ
โโโ Semantic LTM
โ โโโ semantic:<student_id> JSON: weak_topics, strong_topics, mistake_patterns
โ
โโโ Procedural LTM
โโโ procedural:<student_id> JSON: strategy_success per topic
RedisInsight UI is available at http://localhost:8001 when running via Docker Compose โ useful for inspecting all keys, running queries, and monitoring memory usage.
MathTutor/
โโโ src/
โ โโโ backend/
โ โ โโโ __init__.py
โ โ โโโ exceptions/
โ โ โ โโโ __init__.py Agent_Exception with file + line info
โ โ โโโ logger/
โ โ โ โโโ __init__.py Timestamped file logger (logs/ dir)
โ โ โโโ agents/
โ โ โโโ __init__.py Shared imports: messages, typing, logger
โ โ โโโ base.py BaseAgent โ two ChatGroq clients + MediaProcessor
โ โ โโโ graph.py LangGraph StateGraph โ all nodes + routing functions
โ โ โโโ state.py AgentState TypedDict + make_initial_state()
โ โ โโโ nodes/
โ โ โ โโโ __init__.py Node-level shared imports + artifact schemas
โ โ โ โโโ input.py detect_input_type, ocr_node, asr_node
โ โ โ โโโ guardrail.py GuardrailAgent โ rule-based + LLM topic check
โ โ โ โโโ parser.py ParserAgent โ clean + structure problem text
โ โ โ โโโ router.py IntentRouterAgent โ six-intent classification
โ โ โ โโโ solver.py SolverAgent โ two-call RAG + ReAct loop
โ โ โ โโโ verifier.py VerifierAgent โ three-criteria correctness check
โ โ โ โโโ safety.py SafetyAgent โ output policy enforcement
โ โ โ โโโ explainer.py ExplainerAgent โ structured personalised explanation
โ โ โ โโโ direct_response.py DirectResponseAgent โ explain/hint/research/generate
โ โ โ โโโ hitl.py HITLAgent โ interrupt() + four HITL types
โ โ โ โโโ memory/
โ โ โ โ โโโ __init__.py Redis URLs, TTLs, token limits, index schema
โ โ โ โ โโโ memory_manager.py STM trimming, episodic/semantic/procedural R/W
โ โ โ โโโ tools/
โ โ โ โ โโโ __init__.py Cohere model constants, TOP_K, MIN_SCORE
โ โ โ โ โโโ tools.py rag_tool, web_search_tool, calculator_tool, ingest_pdf
โ โ โ โ โโโ mcp/
โ โ โ โ โโโ __init__.py Tavily MCP constants
โ โ โ โ โโโ tavily_mcp_client.py Async Tavily MCP โ sync wrapper
โ โ โ โโโ security_checks/
โ โ โ โโโ topic_policy.yaml Allowed/blocked topics for guardrail
โ โ โ โโโ injection_patterns.yaml Prompt injection + extraction patterns
โ โ โ โโโ output_policy.yaml Output safety patterns
โ โ โโโ utils/
โ โ โโโ ___init__.py
โ โ โโโ artifacts.py Pydantic schemas: Parser/Router/Verifier/Explainer/Safety
โ โ โโโ db_utils.py Redis singletons, key helpers, user/thread registry, STM
โ โ โโโ helper.py MediaProcessor (OCR/ASR), _log_payload, _render_markdown
โ โ โโโ memory_graph_reader.py Builds vis.js {nodes, edges} from Redis data
โ โโโ frontend/
โ โ โโโ __init__.py AGENT_META, TOOL_META, ANSWER_NODES, HITL prefixes
โ โ โโโ app.py Main Streamlit app โ streaming, HITL, activity panel
โ โ โโโ pages/
โ โ โ โโโ __init__.py vis.js visual constants (colours, sizes, shapes)
โ โ โ โโโ memory_viz.py Memory graph Streamlit page
โ โ โ โโโ graph.html vis.js HTML template with %%TOKEN%% injection
โ โ โ โโโ graph.css Neo4j-style dark theme CSS
โ โ โ โโโ graph.js vis.js network init + interaction logic
โ โ โโโ templates/
โ โ โโโ __init__.py
โ โ โโโ activity_panel.py Step card builder + panel renderer
โ โ โโโ login.py Google OAuth login page
โ โ โโโ profile.py Profile card HTML builder
โ โ โโโ styles.css Global dark theme โ cards, banners, tables
โ โ โโโ login.css Login page specific styles
โ โโโ tests/
โ โโโ conftest.py Shared pytest fixtures and monkeypatch helpers
โ โโโ unit/
โ โ โโโ test_db_utils_threads_and_stm.py Thread registry + STM summary persistence helpers
โ โ โโโ test_db_utils_user_registry.py User registry (get_or_create_user etc.) roundtrip
โ โ โโโ test_direct_response_agent.py DirectResponseAgent contract and tool-call logging
โ โ โโโ test_env_example_keys.py Ensures .env.example includes required keys
โ โ โโโ test_hitl_processors.py HITL processing helpers and state updates
โ โ โโโ test_input_and_solver_no_api.py Input reset + solver contracts (no external APIs)
โ โ โโโ test_memory_manager_flow_gates.py Memory manager store/retrieve routing + flow gates
โ โ โโโ test_router_node_contract.py Intent router contract (state in/out, intent labels)
โ โ โโโ test_state_and_policies.py State defaults + guardrail/safety/verifier behaviors
โ โ โโโ test_tavily_mcp_helpers.py Tavily MCP helper utilities (no network)
โ โโโ integration/
โ โ โโโ test_clarification_to_router_loop.py Clarification HITL -> router loop integration flow
โ โ โโโ test_direct_response_followup_flow.py Direct-response follow-up question flow
โ โ โโโ test_memory_store_after_correct_solve.py End-to-end: solver -> verifier -> memory store (all mocked)
โ โ โโโ test_router_to_direct_response_pipeline.py Router -> direct response end-to-end flow
โ โ โโโ test_user_registry_roundtrip.py User registry integration roundtrip
โ โโโ __init__.py
โโโ logs/ Timestamped log files (git-ignored)
โโโ uploads/ Temporary upload staging (git-ignored)
โโโ secrets/ Google service account JSON (git-ignored)
โโโ docker-compose.yml Redis Stack service
โโโ entrypoint.sh Start Streamlit (and waits for Redis)
โโโ run.ps1 Windows PowerShell runner
โโโ .env.example API key template
โโโ .gitignore
โโโ pytest.ini
โโโ README.md
- Python 3.11+
- Docker Desktop (for Redis Stack)
- A Google Cloud project with Vision API enabled (for OCR)
- API keys: Groq (ร2 recommended), Cohere, Tavily, Google OAuth credentials
git clone https://github.com/dikshant182004/MathTutor.git
cd MathTutorWindows (PowerShell):
python -m venv myenv
myenv\Scripts\Activate.ps1macOS / Linux:
python -m venv myenv
source myenv/bin/activatepip install -r requirements.txtdocker compose up -d redisRedis will be available at localhost:6379 and RedisInsight at http://localhost:8001.
cp .env.example .envEdit .env:
# LLM inference
GROQ_API_KEY=gsk_... # Primary key โ guardrail, parser, router, verifier, safety, explainer
GROQ_API_KEY_2=gsk_... # Secondary key โ solver + direct_response (avoids rate limits)
# Embeddings
COHERE_API_KEY=CIy...
# Web search
TAVILY_API_KEY=tvly-...
# Redis
REDIS_URL=redis://:jee_secret@localhost:6379
# Google OAuth (Streamlit native auth)
GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...
OAUTH_REDIRECT_URI=http://localhost:8501/oauth2callback
# Google Vision OCR โ choose one:
GOOGLE_CREDENTIALS_JSON='{"type":"service_account",...}' # JSON string (Streamlit Cloud)
# OR
GOOGLE_APPLICATION_CREDENTIALS=./secrets/your-key.json # File path (local dev)Create .streamlit/secrets.toml:
# API keys (mirrors .env for Streamlit Cloud deployment)
GROQ_API_KEY = "gsk_..."
GROQ_API_KEY_2 = "gsk_..."
COHERE_API_KEY = "CIy..."
TAVILY_API_KEY = "tvly-..."
REDIS_URL = "redis://:jee_secret@localhost:6379"
# Google OAuth โ required for st.login("google")
[auth]
redirect_uri = "http://localhost:8501/oauth2callback"
cookie_secret = "your-random-secret-string-here"
[auth.google]
client_id = "your-google-client-id.apps.googleusercontent.com"
client_secret = "GOCSPX-..."
# Google Vision โ use JSON string for cloud, file path for local
GOOGLE_CREDENTIALS_JSON = '''{"type":"service_account","project_id":"..."}'''Tip: Generate
cookie_secretwithpython -c "import secrets; print(secrets.token_hex(32))".
.\run.ps1This sets PYTHONPATH and launches Streamlit directly.
chmod +x entrypoint.sh
./entrypoint.shThis script:
- Loads
.envinto shell environment - Waits up to 30s for Redis to respond to PING
- Starts Streamlit in the foreground
# Custom port:
STREAMLIT_PORT=8502 ./entrypoint.shTerminal 1 โ Redis:
docker compose up -d redisTerminal 2 โ Streamlit (Windows PowerShell):
$env:PYTHONPATH = "$PSScriptRoot\src"
streamlit run src/frontend/app.pyTerminal 2 โ Streamlit (macOS/Linux):
PYTHONPATH=src streamlit run src/frontend/app.pyPrune stale episodic memories (decay < 0.05 AND age > 30 days):
from backend.agents.nodes.memory.memory_manager import prune_stale_episodic
prune_stale_episodic() # all students
prune_stale_episodic("d009fb7ace325090") # one studentAvailable via the sidebar Admin panel in the app too.
Pytest is configured via pytest.ini and discovers tests under src/tests.
pytestCI runs this same command on every push and pull request via
.github/workflows/tests.yml.
pytest -m unitpytest -m integrationThe app is designed to deploy on Streamlit Community Cloud with Redis hosted separately (e.g. Redis Cloud free tier).
- Push to a public or private GitHub repository
- Go to share.streamlit.io โ New app
- Set Main file path:
src/frontend/app.py - Under Advanced settings โ Secrets, paste the full contents of your
secrets.toml(see above) - Update
REDIS_URLto point to your hosted Redis instance - Update
OAUTH_REDIRECT_URItohttps://your-app.streamlit.app/oauth2callback - Add the redirect URI to your Google OAuth 2.0 credentials in Google Cloud Console
- Create an account at redis.com/try-free
- Create a database โ choose Redis Stack (required for RedisJSON + RediSearch)
- Copy the public endpoint and password
- Set
REDIS_URL = "redis://:password@host:port"in secrets
GROQ_API_KEY = "gsk_..."
GROQ_API_KEY_2 = "gsk_..."
COHERE_API_KEY = "CIy..."
TAVILY_API_KEY = "tvly-..."
REDIS_URL = "redis://:password@redis-cloud-host:port"
GOOGLE_CREDENTIALS_JSON = '''{ full service account JSON }'''
[auth]
redirect_uri = "https://your-app.streamlit.app/oauth2callback"
cookie_secret = "your-64-char-random-secret"
[auth.google]
client_id = "....apps.googleusercontent.com"
client_secret = "GOCSPX-..."What we tried: A local FastMCP server (manim_mcp_server.py) that rendered Manim Community Edition animations as .mp4 files. The explainer agent would generate Manim Python code in a second LLM call (separate from the structured ExplainerOutput call, to avoid Groq's 400 error on large code strings in function-calling schema), which was then sent to the MCP server via asyncio + nest_asyncio.
The problem: Manim's rendering environment is very sensitive โ it requires specific system dependencies (LaTeX, Cairo, FFmpeg), the rendering times were unpredictable (10โ90 seconds), and the generated code frequently had syntax errors that were hard to recover from gracefully. The async bridging inside Streamlit's event loop added another layer of complexity.
- Manim docs: docs.manim.community
- FastMCP: github.com/jlowin/fastmcp
What we tried: Generating step-by-step diagram images using Wavespeed.ai's image generation API to visually illustrate geometric constructions, graphs, and number line diagrams alongside the explainer output.
The problem: The generated images were not reliably accurate for mathematical diagrams โ abstract art generators are not optimised for precise geometric figures with exact coordinates, labelled axes, or algebraic curves. The latency was also too high for a real-time tutoring flow.
Status: Removed from the pipeline. If you want to experiment:
- Wavespeed.ai API: wavespeed.ai
- A better approach for math diagrams would be server-side matplotlib/plotly rendering triggered by structured output from the explainer
- In-memory RAG index lost on restart. The FAISS index is stored in process memory (
_STORESdict). Re-upload your PDF after restarting the Streamlit server. A persistent option would require storing chunk embeddings in Redis or a vector database. - Groq rate limits.
llama-3.3-70b-versatilehas token-per-minute limits, especially on the free tier. The two-API-key architecture helps, but heavy multi-tool turns (RAG + web search + long solution) can still hit limits. The solver catches rate limit errors and routes to HITL. weak_topicsrequires retry sessions to populate. Sincestore_ltmis only reached after a correct final outcome, the system uses a "struggle signal" heuristic (writingsolve_attempts - 1incorrect passes) โ but this requires the solver to actually retry. First-attempt-correct sessions never contribute toweak_topics.mistake_patternsrequires verifier feedback. The verifier'ssuggested_fixis only populated when the solver got something wrong. Students who get everything right on the first try will always have emptymistake_patterns.- No multi-student isolation for FAISS. The in-memory store is keyed by
thread_id, notstudent_id, so a student's PDF index is lost when they start a new thread. This is intentional (each problem session gets a fresh context) but means students re-upload PDFs frequently. - Streamlit reruns on every interaction. Streamlit's execution model reruns the entire script on any widget interaction. The activity panel and HITL state management are carefully designed around this, but complex HITL resumption flows can occasionally require a manual
st.rerun(). - No concurrent multi-user scaling. The current setup runs one Streamlit process. For multi-user production use, you would need multiple workers behind a load balancer, with Redis as the shared state layer (which it already is for LTM and STM).
Graph database for relational memory. The current procedural and semantic memory is flat JSON in Redis. A graph database (Neo4j, or ArangoDB) would let the system express richer relationships โ "this student struggles with integration whenever it involves trigonometric substitution but not u-substitution" โ and traverse the knowledge graph to find related weaknesses.
Dedicated vector database. Replacing the FAISS in-memory index with a persistent vector store (Pinecone, Weaviate, Qdrant, or Redis Vector Library with persistence) would make the RAG index survive server restarts. We deliberately avoided adding another database dependency to keep the stack simple โ Redis Stack already handles both JSON storage and vector search for the episodic LTM.
Cross-student knowledge graph. Aggregate anonymised mistake patterns across students to surface the most common errors for each topic โ a teacher-facing dashboard showing "75% of students make sign errors when integrating by parts."
Multi-modal output. Generate matplotlib/plotly figures server-side from structured solver output (coordinates, function definitions, geometric constructions) and embed them in the explanation. More reliable than image generation APIs for mathematical diagrams.
Adaptive difficulty. Use the semantic memory (strong/weak topics) to automatically adjust the difficulty of generated practice problems โ students who are strong in calculus get hard problems, students who struggle get medium ones with more scaffolding.
Curriculum sequencing. Track which topics have been covered across sessions and suggest what to study next based on known weaknesses and JEE syllabus dependencies.
Step-level feedback. Instead of just verifying the final answer, the verifier could identify exactly which step the student would likely get stuck on and generate a targeted micro-hint for that step.
Async Streamlit. Migrate to a proper async web framework (FastAPI + HTMX, or Streamlit's upcoming async support) to handle concurrent users without blocking.
Streaming explainer. The explainer currently returns a full structured output in one shot. Streaming token-by-token would improve perceived responsiveness for long explanations.
LangGraph persistence across deployments. Currently the RedisSaver TTL is 2 hours. For a production system, indefinite checkpointing with a separate archival policy would let students resume any past session from the sidebar.
| Layer | Technology |
|---|---|
| LLM inference | LLaMA 3.3 70B Versatile via Groq |
| Agent orchestration | LangGraph (StateGraph, interrupt, RedisSaver) |
| Frontend | Streamlit |
| Authentication | Google OAuth 2.0 via st.login() |
| Embeddings | Cohere embed-english-v3.0 (1024-dim) |
| Dense vector search (RAG) | FAISS IndexFlatIP (in-memory, cosine similarity) |
| Sparse retrieval | BM25Okapi (rank-bm25) |
| LTM vector index | RedisVL HNSW (cosine, FLOAT32, 1024-dim) |
| Database | Redis Stack (Redis + RedisJSON + RediSearch) |
| STM checkpointing | LangGraph RedisSaver |
| Token counting | tiktoken (gpt-4o encoding, local proxy for LLaMA) |
| PDF ingestion | LangChain PyPDFLoader + RecursiveCharacterTextSplitter |
| Web search | Tavily MCP (mcp.tavily.com, remote โ no local server needed) |
| Symbolic calculator | SymPy |
| OCR | Google Cloud Vision API |
| ASR | Groq Whisper (whisper-large-v3) |
| Memory visualiser | vis.js 4.21 (via CDN) embedded in Streamlit components.html |
| Output schemas | Pydantic v2 with with_structured_output() |
| Service | Purpose | Get it at |
|---|---|---|
| Groq (ร2 recommended) | LLaMA 3.3 70B for all agents | console.groq.com |
| Cohere | PDF chunk embeddings + episodic LTM embeddings | cohere.com |
| Tavily | Real-time web search via MCP | app.tavily.com |
| Google Cloud | Vision API (OCR) + OAuth 2.0 (auth) | console.cloud.google.com |
MIT License. See LICENSE for details.
Built with LangGraph, Streamlit, Groq, Cohere, and Redis Stack.