[DANNA-001] add pykrx vendor and Parquet disk cache by bbuduck · Pull Request #744 · TauricResearch/TradingAgents

bbuduck · 2026-05-06T00:31:43Z

Summary

@simple_parquet_cache 데코레이터 추가 — sha256 키 기반 Parquet 캐시, TTL,
fail-open, atomic write
cache_admin.py CLI 추가 — stats / clear 서브커맨드
pykrx_vendor.py 추가 — KRX 직접 조회: OHLCV, 시점 universe, 투자자별
매매동향, PER/PBR 가치지표
interface.py 등록 — kr_market_data 카테고리, route_to_vendor 통합
conftest.py + .env.example — KRX_ID/KRX_PW 로컬 인증 가이드 포함

New tools via `route_to_vendor`

tool	vendor	설명
`get_stock_data`	pykrx	KRX OHLCV (인증 불필요)
`get_kr_universe`	pykrx	시점 KOSPI/KOSDAQ 종목 리스트
`get_kr_investor_trading`	pykrx	외국인/기관/개인 매매동향
`get_kr_value_factors`	pykrx	PER, PBR, EPS, BPS, DIV, DPS

Test plan

pytest tests/test_dataflows_cache.py — 캐시 unit 테스트 6개 (네트워크
불필요)
RUN_NETWORK_TESTS=1 pytest tests/test_pykrx_vendor.py — 9개 통과 (unit
3 + smoke 4 + integration 2)

Setup

.env.example 참고하여 .env 생성:

KRX_ID / KRX_PW — data.krx.co.kr 무료 가입
(OHLCV는 미필요)
OPENDART_API_KEY — opendart.fss.or.kr 무료
발급

Add DART (Data Analysis, Retrieval and Transfer) financial statements and disclosure data as a new vendor for the Fundamentals Analyst. Korean-listed companies (6-digit ticker codes) can now be analyzed with official DART filings including revenue, operating profit, net income, and recent regulatory disclosures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Introduce configurable investment personas that shape decision-making style of Trader, Research Manager, and Risk Manager agents. Analysts remain objective for unbiased data gathering. - Add personas.py with 3 personas × 3 roles = 9 prompt definitions - Add config["persona"] field (None/warren_buffett/ray_dalio/peter_lynch) - Wire persona through TradingAgentsGraph → GraphSetup → agent creators - Add opendartreader to pyproject.toml dependencies Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add Korea Investment & Securities (한국투자증권) REST API integration enabling automated trade execution after agent analysis. Supports both paper trading (모의투자) and real trading (실투자) with multi-layer safety guards including position limits, order amount caps, daily loss limits, and market hours enforcement. - Add tradingagents/execution/ package with abstract BaseBroker interface - Implement KISBroker with token management and rate limiting - Add ExecutionEngine with SafetyGuard orchestration - Add conditional Executor node to LangGraph (Risk Judge → Executor → END) - Inject portfolio context into Trader agent prompt - Add broker configuration to default_config and CLI (Step 9) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Create CLAUDE.md with project architecture, config system, development patterns, and coding conventions for AI assistant context - Update README with Investment Persona section (Buffett, Dalio, Lynch) - Update README with Broker Execution (KIS) section including setup guide, safety guards table, and architecture overview - Add All Configuration Options reference table - Update Required APIs section with KIS credentials - Add persona and broker examples to Python usage section Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Translate full README.md to Korean including all sections: framework overview, installation, CLI usage, Python examples, OpenDART, investment personas, KIS broker execution, safety guards, and configuration reference. Link from English README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

80 tests covering models, SafetyGuard, ExecutionEngine, KISBroker, and persona system. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…and trading graph 35 additional tests covering KISClient HTTP/token management, GraphSetup executor node wiring, SignalProcessor LLM interaction, and TradingAgentsGraph broker initialization with portfolio context injection. Total: 115 tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… Anthropic - Add 6-phase pipeline diagram, agent role table, and data source table - Change default llm_provider from openai to anthropic (claude-sonnet-4-6, claude-haiku-4-5) - Update config table and Python examples to reflect new defaults Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e LLM support Add Groq and Together AI as new LLM providers using OpenAI-compatible APIs, and add Llama/DeepSeek models to Ollama options for local inference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- OpenAI: add GPT-5.4 series, o3-pro, o1-pro; remove deprecated GPT-5.2/5.1, GPT-4o - Anthropic: add Claude Opus 4.7, Opus 4.6, Sonnet 4.6; remove legacy 3.x models - Google: add Gemini 3.1 Pro, 3.1 Flash Lite; remove shutdown 3 Pro, deprecated 2.0 - xAI: add Grok 4.20 series (reasoning/non-reasoning/multi-agent); remove old Grok 4 - Groq: add GPT-OSS 120B/20B, Qwen3 32B; remove deprecated Mixtral, Gemma, SpecDec - Together: add DeepSeek V3.1, Qwen3.5 397B/9B; remove DeepSeek V3, Qwen2.5 - Fix reasoning model detection to include o4-series in openai_client.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…Research#555, TauricResearch#553) YFRateLimitError was crashing analyses because yfinance calls had no retry logic, no caching (except OHLCV CSV), and rate-limit errors were swallowed by broad except clauses before reaching the vendor fallback. Changes: - Add yfinance_utils.py with exponential backoff retry decorator and thread-safe TTL in-memory cache (config-driven, no new dependencies) - Apply @yfinance_retry + @yfinance_cached to all yfinance data functions (fundamentals, balance sheet, cashflow, income statement, insider txns, news) - Stop swallowing YFRateLimitError in all except clauses — let it propagate - Extend route_to_vendor() fallback to catch YFRateLimitError alongside AlphaVantageRateLimitError for automatic vendor failover - Add yfinance_retry and cache_ttl config sections to DEFAULT_CONFIG - Add 12 unit tests covering retry, backoff, cache hit/miss/expiry Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…h#542) Pre-compute all 12 technical indicators in pure Python code before passing to Market Analyst LLM, eliminating 8-16 tool call round trips per analysis. LLM now focuses on interpretation only. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add news entries for indicator calculation separation (TauricResearch#542), yfinance rate limit fix (TauricResearch#555), and LLM model updates. Update default config values and Technical Analyst description. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…testResult) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…anagement Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… queries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ns, trade history Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Wire the backtest engine, trade tracker, and dashboard builder together with end-to-end integration tests and a convenience CLI entry point. Fix monthly_returns key mismatch (return -> return_pct) between PerformanceCalculator and DashboardBuilder. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- I1: TradeRecord.from_dict() filters unknown keys for forward compat - I2: Sharpe ratio docstring clarifies trade-based approximation - I3: Extract shared state helpers to backtest/state_utils.py (DRY) - I4: Add encoding="utf-8" to tracker file operations - I5: Fix total_trades/win_rate to count only closed trades Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Outer try/except in simple_parquet_cache previously caught wrapped function exceptions and silently retried, doubling API calls and hiding the original error. Restructure so func() runs outside all try blocks; only cache infra failures (config read, mkdir, parquet I/O) fail-open. Add regression test.

…func A malicious or mistaken --kind '../../etc' would let shutil.rmtree delete arbitrary directories outside the cache root. Resolve target and verify is_relative_to(root) before any rmtree. Also expand the abbreviated copyright line to the full Apache 2.0 header per project convention.

_yyyymmdd previously stripped dashes only, silently producing wrong output for non-zero-padded inputs like '2024-1-2'. Now uses strptime to validate format. Top-level 'from pykrx import stock' is moved inside _fetch_pykrx_ohlcv so the dataflows package can be imported on machines without pykrx (Task 7 will register this vendor in interface.py). Add unit tests for the date helper.

Add python-dotenv dependency and a root conftest.py that calls load_dotenv() before any test collection. Document KRX_ID/KRX_PW in .env.example with a note that they're required for pykrx universe/investor-trading/value-factors endpoints since 2025-12-27. .env itself stays gitignored (already in .gitignore).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request significantly expands the TradingAgents framework by introducing a comprehensive backtesting engine, a performance tracking system, and an HTML dashboard for results visualization. Key additions include support for the Korean market via OpenDART and pykrx, real-time trade execution through the KIS broker with multi-layer safety guards, and an investment persona system. Architectural enhancements feature a disk-persisted Parquet cache, improved yfinance rate-limit handling, and a two-layer technical analysis approach. Feedback from the review highlights several critical technical issues: a concurrency bottleneck in the rate limiter caused by holding a lock during sleep, a memory leak in the cache path-locking mechanism, inefficient object instantiation in the OpenDART provider, redundant data processing in the technical calculator, and potential race conditions in file caching.

gemini-code-assist · 2026-05-06T00:34:13Z

+    def acquire(self):
+        with self._lock:
+            now = time.monotonic()
+            # Remove timestamps outside the window
+            self._timestamps = [
+                t for t in self._timestamps if now - t < self.period
+            ]
+            if len(self._timestamps) >= self.max_calls:
+                sleep_time = self.period - (now - self._timestamps[0])
+                if sleep_time > 0:
+                    time.sleep(sleep_time)
+                self._timestamps = [
+                    t for t in self._timestamps if time.monotonic() - t < self.period
+                ]
+            self._timestamps.append(time.monotonic())


The RateLimiter.acquire method holds the instance lock while sleeping. In a multi-threaded environment, this serializes all requests and prevents other threads from even checking the rate limit status, effectively negating the benefit of a sliding window for concurrent callers. The sleep should occur outside the lock.

gemini-code-assist · 2026-05-06T00:34:13Z

+_path_locks: dict[str, Lock] = {}
+_path_locks_master = Lock()
+
+
+def _get_path_lock(path: str) -> Lock:
+    """Per-path in-process lock so concurrent calls don't collide on disk."""
+    with _path_locks_master:
+        lock = _path_locks.get(path)
+        if lock is None:
+            lock = Lock()
+            _path_locks[path] = lock
+        return lock


The _path_locks dictionary grows indefinitely as new unique file paths are accessed, leading to a memory leak. Since each entry is a threading.Lock object associated with a specific cache file, this can become significant over long-running sessions with many tickers or dates. Consider using a weakref.WeakValueDictionary to allow locks to be garbage collected when no longer in use.

gemini-code-assist · 2026-05-06T00:34:13Z

+def _get_dart():
+    """Get OpenDartReader instance (lazy import)."""
+    try:
+        import OpenDartReader
+    except ImportError:
+        raise ImportError("opendartreader package required: pip install opendartreader")
+
+    api_key = os.environ.get("OPENDART_API_KEY", "")
+    if not api_key:
+        raise ValueError("OPENDART_API_KEY environment variable is not set.")
+
+    return OpenDartReader(api_key)


The _get_dart function creates a new OpenDartReader instance on every call. This is inefficient as it involves redundant initialization and potentially multiple authentication checks. It should be implemented as a singleton or cached at the module level.

gemini-code-assist · 2026-05-06T00:34:13Z

+    for ind_key in DEFAULT_INDICATORS:
+        try:
+            indicator_data[ind_key] = _get_stock_stats_bulk(symbol, ind_key, curr_date)


This loop calls _get_stock_stats_bulk for each indicator individually. If the underlying implementation fetches or processes the entire stock history for each call, this results in significant redundant work (e.g., multiple CSV reads or API calls). It is recommended to fetch the full technical dataset once and then extract the required indicators from the resulting DataFrame.

gemini-code-assist · 2026-05-06T00:34:13Z

+                and isinstance(result, pd.DataFrame)
+                and not result.empty
+            ):
+                tmp = path.with_suffix(".parquet.tmp")


Using a fixed temporary filename (.parquet.tmp) can lead to race conditions if multiple processes or threads attempt to write to the same cache entry simultaneously. While the in-process lock mitigates this for threads, it does not protect against multiple processes. Including the PID or a unique identifier in the temporary filename would be safer.

Herald and others added 30 commits March 8, 2026 21:31

test: add unit tests for execution, safety, personas, and broker modules

fc2d2b6

80 tests covering models, SafetyGuard, ExecutionEngine, KISBroker, and persona system. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add backtest data models (TradeRecord, PerformanceMetrics, Back…

9fc2b06

…testResult) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add PerformanceCalculator with Sharpe, MDD, win rate, equity curve

ea392b5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add BacktestEngine with rebalancing, signal caching, position m…

ac59133

…anagement Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add TradeTracker for live/paper trade recording and performance…

a0aa56f

… queries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add DashboardBuilder with Plotly.js equity curve, monthly retur…

01791c4

…ns, trade history Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: add backtest, tracker, dashboard modules to CLAUDE.md

0ec2e1e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: add backtest dashboard design and implementation plan

d5bebb7

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: add pykrx, pyarrow, filelock for KR data vendor and parquet cache

8326dec

feat(config): add cache_ttl entries and kr_market_data vendor category

22d7c12

feat(dataflows): add simple_parquet_cache disk cache decorator

47f8684

feat(dataflows): add cache_admin CLI for stats and clear

930a9ee

feat(dataflows): add pykrx_vendor with cached OHLCV

df4774d

bbuduck and others added 5 commits May 5, 2026 16:21

feat(dataflows): add pykrx universe, investor trading, value factors

7da91f5

feat(dataflows): register pykrx vendor and kr_market_data tool category

1b78b2a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test(dataflows): add route_to_vendor → pykrx integration tests

a954b72

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: document pykrx vendor and disk parquet cache

95ad119

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gemini-code-assist Bot reviewed May 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DANNA-001] add pykrx vendor and Parquet disk cache#744

[DANNA-001] add pykrx vendor and Parquet disk cache#744
bbuduck wants to merge 35 commits intoTauricResearch:mainfrom
hongsookim:DANNA-001

bbuduck commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bbuduck commented May 6, 2026

Summary

New tools via route_to_vendor

Test plan

Setup

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New tools via `route_to_vendor`