security: fix prompt injection, thread safety, and supply-chain risks#759
security: fix prompt injection, thread safety, and supply-chain risks#759kennydream wants to merge 5 commits intoTauricResearch:mainfrom
Conversation
HIGH severity: - agent_utils: validate company_of_interest via safe_ticker_component before embedding in system prompt (blocks path-traversal and injection payloads) - news/social analysts: add EXTERNAL_CONTENT_GUARD to system prompts so the model treats tool-returned content as untrusted data, not commands - yfinance_news: wrap each article in EXTERNAL_DATA_START/END delimiters - memory: sanitize and truncate free-text fields (_sanitize_text, 4000-char cap, strip HTML comment delimiters) before writing to log; wrap get_past_context() output in MEMORY_LOG_START/END so injection payloads stored in past reflections cannot escape the data boundary MEDIUM severity: - checkpointer: add per-DB threading.Lock so concurrent runs of the same ticker cannot interleave SQLite writes (check_same_thread was already False) - factory: call validate_model() at client creation and emit UserWarning for unknown model names (catches stale default config values early) - y_finance: replace bare str(e) in exception returns with generic "data unavailable" to prevent internal details from leaking into LLM prompts LOW severity: - pyproject.toml: add upper-bound version pins on all dependencies to reduce supply-chain blast radius from a malicious patch release - cli/main: emit UserWarning when .env.enterprise is absent so silent misconfiguration is surfaced at startup Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces several security and stability enhancements, including dependency pinning with upper bounds, prompt injection guards for external data, and per-database threading locks for SQLite. It also adds early model validation in the LLM factory and ticker validation. Feedback highlights critical security vulnerabilities where external news content is not sanitized before being wrapped in HTML delimiters, potentially allowing prompt injection. Additionally, multiple instances were found where exceptions are caught and returned as generic messages without being logged, which hinders observability.
| news_str += "<!-- EXTERNAL_DATA_START -->\n" | ||
| news_str += f"### {data['title']} (source: {data['publisher']})\n" | ||
| if data["summary"]: | ||
| news_str += f"{data['summary']}\n" | ||
| if data["link"]: | ||
| news_str += f"Link: {data['link']}\n" | ||
| news_str += "\n" | ||
| news_str += "<!-- EXTERNAL_DATA_END -->\n\n" |
There was a problem hiding this comment.
The external content (title, publisher, summary) is interpolated into the prompt within <!-- EXTERNAL_DATA_START --> delimiters without sanitization. If the fetched content contains the closing delimiter <!-- EXTERNAL_DATA_END -->, it can "escape" the untrusted data block and perform a prompt injection attack. This bypasses the security boundary intended by the delimiters.
| news_str += "<!-- EXTERNAL_DATA_START -->\n" | |
| news_str += f"### {data['title']} (source: {data['publisher']})\n" | |
| if data["summary"]: | |
| news_str += f"{data['summary']}\n" | |
| if data["link"]: | |
| news_str += f"Link: {data['link']}\n" | |
| news_str += "\n" | |
| news_str += "<!-- EXTERNAL_DATA_END -->\n\n" | |
| news_str += "<!-- EXTERNAL_DATA_START -->\n" | |
| news_str += f"### {data['title'].replace('<!--', '< !--').replace('-->', '-- >')} (source: {data['publisher'].replace('<!--', '< !--').replace('-->', '-- >')})\n" | |
| if data["summary"]: | |
| news_str += f"{data['summary'].replace('<!--', '< !--').replace('-->', '-- >')}\n" | |
| if data["link"]: | |
| news_str += f"Link: {data['link']}\n" | |
| news_str += "<!-- EXTERNAL_DATA_END -->\n\n" |
| news_str += "<!-- EXTERNAL_DATA_START -->\n" | ||
| news_str += f"### {title} (source: {publisher})\n" | ||
| if summary: | ||
| news_str += f"{summary}\n" | ||
| if link: | ||
| news_str += f"Link: {link}\n" | ||
| news_str += "\n" | ||
| news_str += "<!-- EXTERNAL_DATA_END -->\n\n" |
There was a problem hiding this comment.
Similar to the issue in get_news_yfinance, the global news content is not sanitized before being wrapped in delimiters. This allows malicious external content to escape the security boundary by including the <!-- EXTERNAL_DATA_END --> string.
| news_str += "<!-- EXTERNAL_DATA_START -->\n" | |
| news_str += f"### {title} (source: {publisher})\n" | |
| if summary: | |
| news_str += f"{summary}\n" | |
| if link: | |
| news_str += f"Link: {link}\n" | |
| news_str += "\n" | |
| news_str += "<!-- EXTERNAL_DATA_END -->\n\n" | |
| news_str += "<!-- EXTERNAL_DATA_START -->\n" | |
| news_str += f"### {title.replace('<!--', '< !--').replace('-->', '-- >')} (source: {publisher.replace('<!--', '< !--').replace('-->', '-- >')})\n" | |
| if summary: | |
| news_str += f"{summary.replace('<!--', '< !--').replace('-->', '-- >')}\n" | |
| if link: | |
| news_str += f"Link: {link}\n" | |
| news_str += "<!-- EXTERNAL_DATA_END -->\n\n" |
| except Exception: | ||
| return f"Error fetching news for {ticker}: data unavailable" |
There was a problem hiding this comment.
| except Exception: | ||
| return f"Error retrieving fundamentals for {ticker}: data unavailable" |
There was a problem hiding this comment.
…dening Performance (Batch 1): - stockstats_utils: cache wrapped DataFrame per (symbol, curr_date) so 8 indicator computations in one analyst pass share one read+wrap cycle instead of 8 (estimated 5-8x speedup on the technical-indicators path). - stockstats_utils: stable cache filename (no embedded today date) with mtime-based 24h refresh; eliminates daily cache churn that re-downloaded ~5 years of OHLCV every calendar day. Docstring/code drift fixed (5y). - interface: add per-process result cache to route_to_vendor so news+social analysts requesting the same get_news payload deduplicate. Cleared at start of every propagate() so different runs never share state. - memory: add mtime-based parsed-entries cache so multiple load_entries() calls per propagate() (get_pending_entries + get_past_context) skip the read + split + regex-parse cycle. - trading_graph: _resolve_pending_entries now also sweeps cross-ticker pending entries older than 30 days so abandoned tickers don't accumulate pending entries forever. - stockstats_utils: yf_retry now raises explicitly if its loop exits without returning, removing an implicit-None footgun. Architecture (Batch 2): - New tradingagents/runtime.py exposing get_runtime_config / set_runtime_config / reset_context_config. Backed by a ContextVar so concurrent analyses can have isolated configs, and a threading.Lock for the shared default. dataflows/config.py kept as a thin compatibility wrapper delegating to the new module. - Replace 'from tradingagents.agents import *' in graph/setup.py with an explicit import list. Remove the same star-import from trading_graph.py where it was completely unused. - Replace 'from cli.utils import *' in cli/main.py with explicit imports. - agent_utils.get_language_instruction now imports from runtime (no longer reaches into a 'dataflows' module from the agents layer). Security (Batch 3): - alpha_vantage_common: scrub the api key from any HTTPError / RequestException before re-raising, so an upstream 4xx/5xx never leaks the key into stack traces. Add 5/30s connect/read timeouts to outbound HTTP. - cli/utils.get_ticker now validates input via safe_ticker_component at the prompt, so unsafe tickers (path traversal, control chars) are rejected immediately rather than after a deep stack trace. - default_config: add llm_timeout=120 default so a stuck provider can't hang the entire propagate() forever. Code quality (Batch 4): - Replace print() with logger.warning/exception in alpha_vantage_common, alpha_vantage_indicator, y_finance, dataflows/utils. - Remove dead code (decorate_all_methods, get_next_weekday, unused json import) from dataflows/utils. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- tests/test_memory_log: replace test_resolve_skips_other_tickers (which
asserted the prior bug behavior — pending entries for other tickers
never resolving) with two tests that match the new behavior:
* test_resolve_skips_recent_other_tickers — recent cross-ticker entries
stay pending (no premature resolution)
* test_resolve_sweeps_stale_other_tickers — entries older than
_STALE_PENDING_DAYS get swept regardless of current ticker
Both tests propagate the _STALE_PENDING_DAYS class attribute onto the
MagicMock so the unbound-method call sees a real int.
- pyproject.toml: widen langchain ecosystem upper bounds to <2.0 (was
<0.5) to match what the langchain-google-genai dependency chain actually
pulls in (langchain-core>=1.2.5). Bound is still useful as a major-version
cap; the previous pinning was overzealous and broke fresh installs.
Full test suite (109 tests) passes. Smoke tests for runtime config,
ContextVar isolation, ticker validation, vendor cache, memory log cache +
sanitization, API key redaction, and model validator warnings all pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The picker fetched OpenRouter's full model list and showed the top 5 by recency. The first slot was usually a fresh free-tier model (baidu/cobuddy:free was top during local testing) — these share an upstream rate-limit pool, so users who hit Enter on the default crashed mid-analysis with a 429. We hit this in real use after spending tokens on four analyst reports. Now the picker: * Filters out :free variants entirely (still reachable via Custom ID). * Surfaces deepseek / anthropic / openai / google / x-ai / mistralai / meta-llama / qwen first, in declared priority order. * Caps each priority provider at 2 entries so a chatty provider (deepseek has 11+ variants on OpenRouter) doesn't crowd out the others. * Within each provider, preserves the API's newest-first order rather than re-sorting alphabetically — alphabetical would surface "gpt-3.5-turbo" before "gpt-5.4", actively misleading users. * Priority-provider overflow stays hidden (not appended to the long tail) so the picker can't grow stale variants. The 8 visible slots now look like: 1-2 deepseek/... (two latest) 3-4 anthropic/... (two latest) 5-6 openai/... (two latest) 7-8 google/... (two latest) followed by Custom ID for power users. Tests: tests/test_openrouter_curation.py covers free-filter, priority ordering, per-provider cap, API-order preservation, and the long-tail behavior. 10 new tests; full suite 119 passed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reports were emerging in Simplified Chinese even when "Chinese" was selected. Two root causes: 1. The picker offered a single "Chinese (中文)" entry that mapped to the bare string "Chinese" — Mandarin-trained LLMs (DeepSeek, Qwen, GLM) silently default to Simplified for that input. 2. The Trader agent's system prompt didn't include the language instruction, so its section of the report came back in whatever language the model happened to pick (often English or mixed). Changes: * cli/utils.ask_output_language: split Chinese into "Traditional Chinese (繁體中文)" and "Simplified Chinese (简体中文)". Bare "Chinese" no longer appears in the picker. * agent_utils.get_language_instruction: explicit alias resolution. Traditional aliases (Traditional Chinese, 繁體中文, zh-TW, Hong Kong Chinese, ...) emit a firm "use ONLY Traditional, do NOT use Simplified" instruction. Simplified aliases get the inverse. Bare "Chinese" / "中文" resolves to Traditional (project default for ambiguous input — users wanting Simplified pick it explicitly). * trader.py: include get_language_instruction in the system prompt. Trader output is part of the user-visible report (saved to trader.md, displayed in console) and was the only user-facing agent missing this. Tests: tests/test_language_instruction.py covers English (empty), Traditional via every alias form including locale codes (zh-TW, zh-HK) and native script, Simplified inverse, the bare-Chinese fallback, and the generic template for non-Chinese languages. 11 new tests; full suite 130 passed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
This PR addresses 3 HIGH and 6 MEDIUM/LOW severity security issues found during a security audit of the codebase. All changes are purely defensive — no analysis logic, agent behavior, memory structure, or checkpoint format has been altered.
HIGH severity
company_of_interest: The ticker/company name entered by the user was interpolated directly into agent system prompts without sanitization. Now validated throughsafe_ticker_component()before reaching any LLM context.EXTERNAL_CONTENT_GUARDto analyst system prompts and wrapped each article in<!-- EXTERNAL_DATA_START/END -->delimiters so the model treats fetched content as untrusted data.get_past_context()output is wrapped in<!-- MEMORY_LOG_START/END -->markers.MEDIUM severity
check_same_thread=Falsewas set without any external locking. Added a per-DBthreading.Lockincheckpointer.pyto prevent concurrent write interleaving.create_llm_client()now callsvalidate_model()at construction time and emits aUserWarningfor unrecognized names.str(e)was returned verbatim from dataflow functions and flowed directly into LLM tool-call results. Replaced with generic"data unavailable"messages; details remain in logs only.LOW severity
langchain-core>=0.3.81,<0.5) to reduce supply-chain blast radius from a malicious patch release..env.enterpriseabsence: The file was loaded with no warning when missing. Startup now emits aUserWarningif.env.enterpriseis not found.Test plan
pytest tests/company_of_interestwith injection payload raisesValueErrorbefore reaching promptEXTERNAL_DATA_START/ENDmarkersMEMORY_LOG_START/ENDwrapperUserWarning🤖 Generated with Claude Code