[DANNA-001] add pykrx vendor and Parquet disk cache#744
[DANNA-001] add pykrx vendor and Parquet disk cache#744bbuduck wants to merge 35 commits intoTauricResearch:mainfrom
Conversation
Add DART (Data Analysis, Retrieval and Transfer) financial statements and disclosure data as a new vendor for the Fundamentals Analyst. Korean-listed companies (6-digit ticker codes) can now be analyzed with official DART filings including revenue, operating profit, net income, and recent regulatory disclosures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce configurable investment personas that shape decision-making style of Trader, Research Manager, and Risk Manager agents. Analysts remain objective for unbiased data gathering. - Add personas.py with 3 personas × 3 roles = 9 prompt definitions - Add config["persona"] field (None/warren_buffett/ray_dalio/peter_lynch) - Wire persona through TradingAgentsGraph → GraphSetup → agent creators - Add opendartreader to pyproject.toml dependencies Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Korea Investment & Securities (한국투자증권) REST API integration enabling automated trade execution after agent analysis. Supports both paper trading (모의투자) and real trading (실투자) with multi-layer safety guards including position limits, order amount caps, daily loss limits, and market hours enforcement. - Add tradingagents/execution/ package with abstract BaseBroker interface - Implement KISBroker with token management and rate limiting - Add ExecutionEngine with SafetyGuard orchestration - Add conditional Executor node to LangGraph (Risk Judge → Executor → END) - Inject portfolio context into Trader agent prompt - Add broker configuration to default_config and CLI (Step 9) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create CLAUDE.md with project architecture, config system, development patterns, and coding conventions for AI assistant context - Update README with Investment Persona section (Buffett, Dalio, Lynch) - Update README with Broker Execution (KIS) section including setup guide, safety guards table, and architecture overview - Add All Configuration Options reference table - Update Required APIs section with KIS credentials - Add persona and broker examples to Python usage section Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Translate full README.md to Korean including all sections: framework overview, installation, CLI usage, Python examples, OpenDART, investment personas, KIS broker execution, safety guards, and configuration reference. Link from English README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
80 tests covering models, SafetyGuard, ExecutionEngine, KISBroker, and persona system. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…and trading graph 35 additional tests covering KISClient HTTP/token management, GraphSetup executor node wiring, SignalProcessor LLM interaction, and TradingAgentsGraph broker initialization with portfolio context injection. Total: 115 tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… Anthropic - Add 6-phase pipeline diagram, agent role table, and data source table - Change default llm_provider from openai to anthropic (claude-sonnet-4-6, claude-haiku-4-5) - Update config table and Python examples to reflect new defaults Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e LLM support Add Groq and Together AI as new LLM providers using OpenAI-compatible APIs, and add Llama/DeepSeek models to Ollama options for local inference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- OpenAI: add GPT-5.4 series, o3-pro, o1-pro; remove deprecated GPT-5.2/5.1, GPT-4o - Anthropic: add Claude Opus 4.7, Opus 4.6, Sonnet 4.6; remove legacy 3.x models - Google: add Gemini 3.1 Pro, 3.1 Flash Lite; remove shutdown 3 Pro, deprecated 2.0 - xAI: add Grok 4.20 series (reasoning/non-reasoning/multi-agent); remove old Grok 4 - Groq: add GPT-OSS 120B/20B, Qwen3 32B; remove deprecated Mixtral, Gemma, SpecDec - Together: add DeepSeek V3.1, Qwen3.5 397B/9B; remove DeepSeek V3, Qwen2.5 - Fix reasoning model detection to include o4-series in openai_client.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Research#555, TauricResearch#553) YFRateLimitError was crashing analyses because yfinance calls had no retry logic, no caching (except OHLCV CSV), and rate-limit errors were swallowed by broad except clauses before reaching the vendor fallback. Changes: - Add yfinance_utils.py with exponential backoff retry decorator and thread-safe TTL in-memory cache (config-driven, no new dependencies) - Apply @yfinance_retry + @yfinance_cached to all yfinance data functions (fundamentals, balance sheet, cashflow, income statement, insider txns, news) - Stop swallowing YFRateLimitError in all except clauses — let it propagate - Extend route_to_vendor() fallback to catch YFRateLimitError alongside AlphaVantageRateLimitError for automatic vendor failover - Add yfinance_retry and cache_ttl config sections to DEFAULT_CONFIG - Add 12 unit tests covering retry, backoff, cache hit/miss/expiry Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…h#542) Pre-compute all 12 technical indicators in pure Python code before passing to Market Analyst LLM, eliminating 8-16 tool call round trips per analysis. LLM now focuses on interpretation only. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add news entries for indicator calculation separation (TauricResearch#542), yfinance rate limit fix (TauricResearch#555), and LLM model updates. Update default config values and Technical Analyst description. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…testResult) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…anagement Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… queries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ns, trade history Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire the backtest engine, trade tracker, and dashboard builder together with end-to-end integration tests and a convenience CLI entry point. Fix monthly_returns key mismatch (return -> return_pct) between PerformanceCalculator and DashboardBuilder. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- I1: TradeRecord.from_dict() filters unknown keys for forward compat - I2: Sharpe ratio docstring clarifies trade-based approximation - I3: Extract shared state helpers to backtest/state_utils.py (DRY) - I4: Add encoding="utf-8" to tracker file operations - I5: Fix total_trades/win_rate to count only closed trades Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Outer try/except in simple_parquet_cache previously caught wrapped function exceptions and silently retried, doubling API calls and hiding the original error. Restructure so func() runs outside all try blocks; only cache infra failures (config read, mkdir, parquet I/O) fail-open. Add regression test.
…func A malicious or mistaken --kind '../../etc' would let shutil.rmtree delete arbitrary directories outside the cache root. Resolve target and verify is_relative_to(root) before any rmtree. Also expand the abbreviated copyright line to the full Apache 2.0 header per project convention.
_yyyymmdd previously stripped dashes only, silently producing wrong output for non-zero-padded inputs like '2024-1-2'. Now uses strptime to validate format. Top-level 'from pykrx import stock' is moved inside _fetch_pykrx_ohlcv so the dataflows package can be imported on machines without pykrx (Task 7 will register this vendor in interface.py). Add unit tests for the date helper.
Add python-dotenv dependency and a root conftest.py that calls load_dotenv() before any test collection. Document KRX_ID/KRX_PW in .env.example with a note that they're required for pykrx universe/investor-trading/value-factors endpoints since 2025-12-27. .env itself stays gitignored (already in .gitignore).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request significantly expands the TradingAgents framework by introducing a comprehensive backtesting engine, a performance tracking system, and an HTML dashboard for results visualization. Key additions include support for the Korean market via OpenDART and pykrx, real-time trade execution through the KIS broker with multi-layer safety guards, and an investment persona system. Architectural enhancements feature a disk-persisted Parquet cache, improved yfinance rate-limit handling, and a two-layer technical analysis approach. Feedback from the review highlights several critical technical issues: a concurrency bottleneck in the rate limiter caused by holding a lock during sleep, a memory leak in the cache path-locking mechanism, inefficient object instantiation in the OpenDART provider, redundant data processing in the technical calculator, and potential race conditions in file caching.
| def acquire(self): | ||
| with self._lock: | ||
| now = time.monotonic() | ||
| # Remove timestamps outside the window | ||
| self._timestamps = [ | ||
| t for t in self._timestamps if now - t < self.period | ||
| ] | ||
| if len(self._timestamps) >= self.max_calls: | ||
| sleep_time = self.period - (now - self._timestamps[0]) | ||
| if sleep_time > 0: | ||
| time.sleep(sleep_time) | ||
| self._timestamps = [ | ||
| t for t in self._timestamps if time.monotonic() - t < self.period | ||
| ] | ||
| self._timestamps.append(time.monotonic()) |
There was a problem hiding this comment.
The RateLimiter.acquire method holds the instance lock while sleeping. In a multi-threaded environment, this serializes all requests and prevents other threads from even checking the rate limit status, effectively negating the benefit of a sliding window for concurrent callers. The sleep should occur outside the lock.
| _path_locks: dict[str, Lock] = {} | ||
| _path_locks_master = Lock() | ||
|
|
||
|
|
||
| def _get_path_lock(path: str) -> Lock: | ||
| """Per-path in-process lock so concurrent calls don't collide on disk.""" | ||
| with _path_locks_master: | ||
| lock = _path_locks.get(path) | ||
| if lock is None: | ||
| lock = Lock() | ||
| _path_locks[path] = lock | ||
| return lock |
There was a problem hiding this comment.
The _path_locks dictionary grows indefinitely as new unique file paths are accessed, leading to a memory leak. Since each entry is a threading.Lock object associated with a specific cache file, this can become significant over long-running sessions with many tickers or dates. Consider using a weakref.WeakValueDictionary to allow locks to be garbage collected when no longer in use.
| def _get_dart(): | ||
| """Get OpenDartReader instance (lazy import).""" | ||
| try: | ||
| import OpenDartReader | ||
| except ImportError: | ||
| raise ImportError("opendartreader package required: pip install opendartreader") | ||
|
|
||
| api_key = os.environ.get("OPENDART_API_KEY", "") | ||
| if not api_key: | ||
| raise ValueError("OPENDART_API_KEY environment variable is not set.") | ||
|
|
||
| return OpenDartReader(api_key) |
| for ind_key in DEFAULT_INDICATORS: | ||
| try: | ||
| indicator_data[ind_key] = _get_stock_stats_bulk(symbol, ind_key, curr_date) |
There was a problem hiding this comment.
This loop calls _get_stock_stats_bulk for each indicator individually. If the underlying implementation fetches or processes the entire stock history for each call, this results in significant redundant work (e.g., multiple CSV reads or API calls). It is recommended to fetch the full technical dataset once and then extract the required indicators from the resulting DataFrame.
| and isinstance(result, pd.DataFrame) | ||
| and not result.empty | ||
| ): | ||
| tmp = path.with_suffix(".parquet.tmp") |
There was a problem hiding this comment.
Using a fixed temporary filename (.parquet.tmp) can lead to race conditions if multiple processes or threads attempt to write to the same cache entry simultaneously. While the in-process lock mitigates this for threads, it does not protect against multiple processes. Including the PID or a unique identifier in the temporary filename would be safer.
Summary
@simple_parquet_cache데코레이터 추가 — sha256 키 기반 Parquet 캐시, TTL,fail-open, atomic write
cache_admin.pyCLI 추가 —stats/clear서브커맨드pykrx_vendor.py추가 — KRX 직접 조회: OHLCV, 시점 universe, 투자자별매매동향, PER/PBR 가치지표
interface.py등록 —kr_market_data카테고리,route_to_vendor통합conftest.py+.env.example— KRX_ID/KRX_PW 로컬 인증 가이드 포함New tools via
route_to_vendorget_stock_dataget_kr_universeget_kr_investor_tradingget_kr_value_factorsTest plan
pytest tests/test_dataflows_cache.py— 캐시 unit 테스트 6개 (네트워크불필요)
RUN_NETWORK_TESTS=1 pytest tests/test_pykrx_vendor.py— 9개 통과 (unit3 + smoke 4 + integration 2)
Setup
.env.example참고하여.env생성:KRX_ID/KRX_PW— data.krx.co.kr 무료 가입(OHLCV는 미필요)
OPENDART_API_KEY— opendart.fss.or.kr 무료발급