Skip to content

[DANNA-001] add pykrx vendor and Parquet disk cache#744

Open
bbuduck wants to merge 35 commits intoTauricResearch:mainfrom
hongsookim:DANNA-001
Open

[DANNA-001] add pykrx vendor and Parquet disk cache#744
bbuduck wants to merge 35 commits intoTauricResearch:mainfrom
hongsookim:DANNA-001

Conversation

@bbuduck
Copy link
Copy Markdown

@bbuduck bbuduck commented May 6, 2026

Summary

  • @simple_parquet_cache 데코레이터 추가 — sha256 키 기반 Parquet 캐시, TTL,
    fail-open, atomic write
  • cache_admin.py CLI 추가 — stats / clear 서브커맨드
  • pykrx_vendor.py 추가 — KRX 직접 조회: OHLCV, 시점 universe, 투자자별
    매매동향, PER/PBR 가치지표
  • interface.py 등록 — kr_market_data 카테고리, route_to_vendor 통합
  • conftest.py + .env.example — KRX_ID/KRX_PW 로컬 인증 가이드 포함

New tools via route_to_vendor

tool vendor 설명
get_stock_data pykrx KRX OHLCV (인증 불필요)
get_kr_universe pykrx 시점 KOSPI/KOSDAQ 종목 리스트
get_kr_investor_trading pykrx 외국인/기관/개인 매매동향
get_kr_value_factors pykrx PER, PBR, EPS, BPS, DIV, DPS

Test plan

  • pytest tests/test_dataflows_cache.py — 캐시 unit 테스트 6개 (네트워크
    불필요)
  • RUN_NETWORK_TESTS=1 pytest tests/test_pykrx_vendor.py — 9개 통과 (unit
    3 + smoke 4 + integration 2)

Setup

.env.example 참고하여 .env 생성:

Herald and others added 30 commits March 8, 2026 21:31
Add DART (Data Analysis, Retrieval and Transfer) financial statements
and disclosure data as a new vendor for the Fundamentals Analyst.
Korean-listed companies (6-digit ticker codes) can now be analyzed
with official DART filings including revenue, operating profit,
net income, and recent regulatory disclosures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce configurable investment personas that shape decision-making
style of Trader, Research Manager, and Risk Manager agents.
Analysts remain objective for unbiased data gathering.

- Add personas.py with 3 personas × 3 roles = 9 prompt definitions
- Add config["persona"] field (None/warren_buffett/ray_dalio/peter_lynch)
- Wire persona through TradingAgentsGraph → GraphSetup → agent creators
- Add opendartreader to pyproject.toml dependencies

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Korea Investment & Securities (한국투자증권) REST API integration
enabling automated trade execution after agent analysis. Supports both
paper trading (모의투자) and real trading (실투자) with multi-layer
safety guards including position limits, order amount caps, daily loss
limits, and market hours enforcement.

- Add tradingagents/execution/ package with abstract BaseBroker interface
- Implement KISBroker with token management and rate limiting
- Add ExecutionEngine with SafetyGuard orchestration
- Add conditional Executor node to LangGraph (Risk Judge → Executor → END)
- Inject portfolio context into Trader agent prompt
- Add broker configuration to default_config and CLI (Step 9)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create CLAUDE.md with project architecture, config system, development
  patterns, and coding conventions for AI assistant context
- Update README with Investment Persona section (Buffett, Dalio, Lynch)
- Update README with Broker Execution (KIS) section including setup guide,
  safety guards table, and architecture overview
- Add All Configuration Options reference table
- Update Required APIs section with KIS credentials
- Add persona and broker examples to Python usage section

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Translate full README.md to Korean including all sections:
framework overview, installation, CLI usage, Python examples,
OpenDART, investment personas, KIS broker execution, safety
guards, and configuration reference. Link from English README.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
80 tests covering models, SafetyGuard, ExecutionEngine, KISBroker, and persona system.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…and trading graph

35 additional tests covering KISClient HTTP/token management, GraphSetup executor
node wiring, SignalProcessor LLM interaction, and TradingAgentsGraph broker
initialization with portfolio context injection. Total: 115 tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… Anthropic

- Add 6-phase pipeline diagram, agent role table, and data source table
- Change default llm_provider from openai to anthropic (claude-sonnet-4-6, claude-haiku-4-5)
- Update config table and Python examples to reflect new defaults

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e LLM support

Add Groq and Together AI as new LLM providers using OpenAI-compatible APIs,
and add Llama/DeepSeek models to Ollama options for local inference.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- OpenAI: add GPT-5.4 series, o3-pro, o1-pro; remove deprecated GPT-5.2/5.1, GPT-4o
- Anthropic: add Claude Opus 4.7, Opus 4.6, Sonnet 4.6; remove legacy 3.x models
- Google: add Gemini 3.1 Pro, 3.1 Flash Lite; remove shutdown 3 Pro, deprecated 2.0
- xAI: add Grok 4.20 series (reasoning/non-reasoning/multi-agent); remove old Grok 4
- Groq: add GPT-OSS 120B/20B, Qwen3 32B; remove deprecated Mixtral, Gemma, SpecDec
- Together: add DeepSeek V3.1, Qwen3.5 397B/9B; remove DeepSeek V3, Qwen2.5
- Fix reasoning model detection to include o4-series in openai_client.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Research#555, TauricResearch#553)

YFRateLimitError was crashing analyses because yfinance calls had no
retry logic, no caching (except OHLCV CSV), and rate-limit errors were
swallowed by broad except clauses before reaching the vendor fallback.

Changes:
- Add yfinance_utils.py with exponential backoff retry decorator and
  thread-safe TTL in-memory cache (config-driven, no new dependencies)
- Apply @yfinance_retry + @yfinance_cached to all yfinance data functions
  (fundamentals, balance sheet, cashflow, income statement, insider txns, news)
- Stop swallowing YFRateLimitError in all except clauses — let it propagate
- Extend route_to_vendor() fallback to catch YFRateLimitError alongside
  AlphaVantageRateLimitError for automatic vendor failover
- Add yfinance_retry and cache_ttl config sections to DEFAULT_CONFIG
- Add 12 unit tests covering retry, backoff, cache hit/miss/expiry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…h#542)

Pre-compute all 12 technical indicators in pure Python code before
passing to Market Analyst LLM, eliminating 8-16 tool call round trips
per analysis. LLM now focuses on interpretation only.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add news entries for indicator calculation separation (TauricResearch#542),
yfinance rate limit fix (TauricResearch#555), and LLM model updates. Update
default config values and Technical Analyst description.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…testResult)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…anagement

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… queries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ns, trade history

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire the backtest engine, trade tracker, and dashboard builder together
with end-to-end integration tests and a convenience CLI entry point.
Fix monthly_returns key mismatch (return -> return_pct) between
PerformanceCalculator and DashboardBuilder.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- I1: TradeRecord.from_dict() filters unknown keys for forward compat
- I2: Sharpe ratio docstring clarifies trade-based approximation
- I3: Extract shared state helpers to backtest/state_utils.py (DRY)
- I4: Add encoding="utf-8" to tracker file operations
- I5: Fix total_trades/win_rate to count only closed trades

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Outer try/except in simple_parquet_cache previously caught wrapped
function exceptions and silently retried, doubling API calls and
hiding the original error. Restructure so func() runs outside all
try blocks; only cache infra failures (config read, mkdir, parquet
I/O) fail-open. Add regression test.
…func

A malicious or mistaken --kind '../../etc' would let shutil.rmtree
delete arbitrary directories outside the cache root. Resolve target
and verify is_relative_to(root) before any rmtree. Also expand the
abbreviated copyright line to the full Apache 2.0 header per project
convention.
_yyyymmdd previously stripped dashes only, silently producing wrong
output for non-zero-padded inputs like '2024-1-2'. Now uses strptime
to validate format. Top-level 'from pykrx import stock' is moved
inside _fetch_pykrx_ohlcv so the dataflows package can be imported
on machines without pykrx (Task 7 will register this vendor in
interface.py). Add unit tests for the date helper.
bbuduck and others added 5 commits May 5, 2026 16:21
Add python-dotenv dependency and a root conftest.py that calls
load_dotenv() before any test collection. Document KRX_ID/KRX_PW
in .env.example with a note that they're required for pykrx
universe/investor-trading/value-factors endpoints since 2025-12-27.
.env itself stays gitignored (already in .gitignore).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly expands the TradingAgents framework by introducing a comprehensive backtesting engine, a performance tracking system, and an HTML dashboard for results visualization. Key additions include support for the Korean market via OpenDART and pykrx, real-time trade execution through the KIS broker with multi-layer safety guards, and an investment persona system. Architectural enhancements feature a disk-persisted Parquet cache, improved yfinance rate-limit handling, and a two-layer technical analysis approach. Feedback from the review highlights several critical technical issues: a concurrency bottleneck in the rate limiter caused by holding a lock during sleep, a memory leak in the cache path-locking mechanism, inefficient object instantiation in the OpenDART provider, redundant data processing in the technical calculator, and potential race conditions in file caching.

Comment on lines +36 to +50
def acquire(self):
with self._lock:
now = time.monotonic()
# Remove timestamps outside the window
self._timestamps = [
t for t in self._timestamps if now - t < self.period
]
if len(self._timestamps) >= self.max_calls:
sleep_time = self.period - (now - self._timestamps[0])
if sleep_time > 0:
time.sleep(sleep_time)
self._timestamps = [
t for t in self._timestamps if time.monotonic() - t < self.period
]
self._timestamps.append(time.monotonic())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The RateLimiter.acquire method holds the instance lock while sleeping. In a multi-threaded environment, this serializes all requests and prevents other threads from even checking the rate limit status, effectively negating the benefit of a sliding window for concurrent callers. The sleep should occur outside the lock.

Comment on lines +38 to +49
_path_locks: dict[str, Lock] = {}
_path_locks_master = Lock()


def _get_path_lock(path: str) -> Lock:
"""Per-path in-process lock so concurrent calls don't collide on disk."""
with _path_locks_master:
lock = _path_locks.get(path)
if lock is None:
lock = Lock()
_path_locks[path] = lock
return lock
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _path_locks dictionary grows indefinitely as new unique file paths are accessed, leading to a memory leak. Since each entry is a threading.Lock object associated with a specific cache file, this can become significant over long-running sessions with many tickers or dates. Consider using a weakref.WeakValueDictionary to allow locks to be garbage collected when no longer in use.

Comment on lines +21 to +32
def _get_dart():
"""Get OpenDartReader instance (lazy import)."""
try:
import OpenDartReader
except ImportError:
raise ImportError("opendartreader package required: pip install opendartreader")

api_key = os.environ.get("OPENDART_API_KEY", "")
if not api_key:
raise ValueError("OPENDART_API_KEY environment variable is not set.")

return OpenDartReader(api_key)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _get_dart function creates a new OpenDartReader instance on every call. This is inefficient as it involves redundant initialization and potentially multiple authentication checks. It should be implemented as a singleton or cached at the module level.

Comment on lines +79 to +81
for ind_key in DEFAULT_INDICATORS:
try:
indicator_data[ind_key] = _get_stock_stats_bulk(symbol, ind_key, curr_date)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop calls _get_stock_stats_bulk for each indicator individually. If the underlying implementation fetches or processes the entire stock history for each call, this results in significant redundant work (e.g., multiple CSV reads or API calls). It is recommended to fetch the full technical dataset once and then extract the required indicators from the resulting DataFrame.

and isinstance(result, pd.DataFrame)
and not result.empty
):
tmp = path.with_suffix(".parquet.tmp")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a fixed temporary filename (.parquet.tmp) can lead to race conditions if multiple processes or threads attempt to write to the same cache entry simultaneously. While the in-process lock mitigates this for threads, it does not protect against multiple processes. Including the PID or a unique identifier in the temporary filename would be safer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant