All notable changes to AbstractCore will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
--installreadiness check: comprehensive check of all subsystems (default model, provider connectivity, embeddings model, vision fallback, STT/TTS models, ffmpeg, abstractvision, API keys). Reports ✅/⚠️ /❌ for each area and offers to download/install missing models interactively. Use--yes(-y) to auto-accept all downloads for non-interactive environments (e.g.abstractcore --install --yes).- Embeddings: 7 providers supported (was 3).
EmbeddingManagernow acceptsopenai,openrouter,portkey, andopenai-compatiblein addition to the existinghuggingface,ollama, andlmstudio. AddedOpenAIProvider.embed()method; gateway providers (OpenRouterProvider,PortkeyProvider) already inheritembed()fromOpenAICompatibleProvider. All server/cloud providers return embeddings in OpenAI-compatible format. - Interactive config wizard (
--config) — expanded to 7 steps:- Step 1: now asks for base URL when the selected provider is a local server (ollama, lmstudio, vllm, openai-compatible). Shows the env var name, current value if set, default URL, and prints the
exportcommand for shell persistence. - Step 4 (NEW): Audio strategy — defaults to
autoon Enter. Asks aboutnative_only/auto/speech_to_textfor audio attachment handling. Mentionsabstractvoicedependency when needed. - Step 5 (NEW): Video strategy — defaults to
autoon Enter. Asks aboutnative_only/auto/frames_captionfor video attachment handling. Mentionsffmpegdependency when needed. - Step 6 (NEW): Embeddings provider/model — asks for embeddings configuration with examples across all 7 supported providers. Validates provider before saving.
- Step 7: Console logging verbosity (renumbered from step 4).
- Step 1: now asks for base URL when the selected provider is a local server (ollama, lmstudio, vllm, openai-compatible). Shows the env var name, current value if set, default URL, and prints the
- Interactive config wizard now covers all major configuration areas (model, base URL, vision, API keys, audio, video, embeddings, logging). Previously only covered model, vision, API keys, and logging.
--installembeddings check: now provider-aware — server-based providers (ollama, lmstudio, openai, openrouter, portkey, openai-compatible) check reachability or API key instead of trying to download viasentence-transformers. Whensentence-transformersis missing,--installoffers topip install "abstractcore[embeddings]"and then download the model.
- Audio strategy default changed from
native_onlytoauto: theAudioConfig.strategydefault wasnative_only, which caused audio attachments to fail on text-only models unless the user explicitly configured it. Changed toauto(matchingVideoConfig.strategywhich was alreadyauto). Withauto, audio works seamlessly whenabstractvoiceis installed (STT fallback) and raises a clear error with install hints when it is not. - Config-persisted API keys now injected into environment: API keys saved via
abstractcore --set-api-key(or--config) were stored in~/.abstractcore/config/abstractcore.jsonbut providers only read fromos.environ(e.g.OPENAI_API_KEY). Added_apply_api_keys_to_env()to bridge config-persisted keys into the environment at config load time. Environment variables always take precedence (config keys are injected only when the env var is absent). --installTTS/STT severity: failed model downloads are now reported as⚠️(warning) instead of❌(critical) since TTS/STT are optional subsystems.--installTTS/STT verification: download results are now verified by re-checking the filesystem instead of trusting the subprocess exit code (some prefetch commands exit 0 even on failure).
- Documentation and internal improvements.
- Portkey provider: OpenAI-compatible gateway with config-based routing (env:
PORTKEY_API_KEY,PORTKEY_CONFIG; optionalPORTKEY_BASE_URL). - Tests: Portkey provider payload adaptation, reasoning model restrictions, explicit-None handling, and base URL validation.
- Portkey payload hygiene: forward optional generation parameters only when explicitly set.
- Token parameter mapping: use
max_completion_tokensfor OpenAI reasoning families (gpt-5/o1); keep legacymax_tokensfor other backends. - Reasoning model compatibility: drop unsupported parameters (temperature/top_p/penalties) with structured logging.
- Error diagnostics: base URL validation and improved DNS/connectivity hints.
- Server logging: route Python warnings through structured logging; avoid raw stderr warnings at default ERROR verbosity.
- Server UX: print internal/external access URLs outside logging on startup.
- OpenAPI schema: normalize request examples to prevent
/openapi.jsonvalidation failures.
- Config CLI: interactive vision fallback now accepts any provider/model and uses provider-agnostic guidance.
- Config CLI: interactive console logging default now uses ERROR to match package defaults.
- Portkey usage guidance added across core docs.
- Media docs: clarified vision fallback examples as provider-agnostic.
- Server docs: moved interactive API docs links to the top of the page.
- Config CLI: video defaults (
--set-video-*) and--configalias for interactive setup.
- Faster CLI startup by lazily importing optional web parsing deps in
abstractcore.tools.common_tools. - Docs: clarified requirements and configuration for image/video/audio fallbacks (including
abstractcore --config).
- STT fallback when abstractvoice is installed
- faster utils.cli with lazy loading of the providers
- Updated the timeout settings (abstractcore config 3600s)
- Skim tool benchmarks: added
examples/skim_tools_benchmark.pyto measure output footprint and latency forskim_websearch/web_searchandskim_url/fetch_url. - Import-safety test: added a test to ensure
import abstractcoredoes not eagerly import optional deps (requests,bs4,sentence_transformers,pymupdf*, ...).
- Skim outputs stay compact:
skim_websearchnow truncates long titles/snippets to keep tool outputs prompt-friendly by default. - Tool guidance for prompted models: tool prompts now render short
when_to_usehints for small tool sets and a few high-impact tools (edit/write/execute + web triage tools). - Tool examples: globally-capped examples now include
skim_websearch/skim_urlearlier so models learn the token-efficient web triage workflow. - Native tool payload compatibility: native tool schemas no longer include non-standard metadata keys (
tags,when_to_use,examples) to avoid strict provider schema validation failures. - Docs accuracy: clarified
fetch_urlbehavior for PDFs/binaries and documented the recommendedskim_*→fetch_*workflow in the docs entry points.
- Security policy: added
SECURITY.mdwith responsible disclosure guidance. - API overview doc: added
docs/api.mdas a user-facing map of the public Python API. - FAQ: added
docs/faq.mdand linked it from the docs entry points. - Events + logging docs: added
docs/events.mdanddocs/structured-logging.md. - Skim tools: added
skim_url(fast URL triage) andskim_websearch(compact/filtered search) to keep agent prompts smaller when you only need “what is this about?”.
- Install composition (default stays small): docs and packaging emphasize a lightweight core install, with heavy features enabled via explicit extras (
tools,media,embeddings,server, provider SDKs). - Dependency compatibility: relaxed
abstractcore[huggingface]transformersupper bound to<6so it can co-install withabstractcore[mlx](asmlx-lmcurrently pinstransformers==5.0.0rc*). - Documentation polish: refreshed wording and navigation for external users; ensured internal links/anchors resolve across docs.
- Skim output footprint: tuned
skim_urldefaults (smaller preview/headings) and madeskim_websearchJSON compact so tool outputs are more token-efficient by default. - Web search URLs:
web_searchnow unwraps DuckDuckGo redirect URLs (more readable links; smaller tool outputs).
- Docs accuracy: aligned event fields and examples with the current codebase (events, telemetry, and usage data).
- Optional imports: made Telegram Bot API tools import-safe when
requestsis not installed (returns a clearabstractcore[tools]install hint when used). - HTML extraction edge cases: improved main-content selection/pruning so
fetch_url/skim_urlpreviews don’t get wiped by over-aggressive boilerplate removal on some pages.
- MLX throughput benchmarking:
examples/mlx_concurrency_benchmark.pyto sweep concurrency with continuous batching (mlx-lm) and generate summary CSVs + PNG plots.
- MLX install extras: refreshed/clarified
mlx+mlx-benchoptional dependencies for Apple Silicon throughput benchmarking.
- Embedding model detection: treat
model_type: "embedding"as the canonical signal; addnomic-embed-text-v1.5(incl. LMStudio aliastext-embedding-nomic-embed-text-v1.5@q6_k) toassets/model_capabilities.json. - MLX model discovery:
MLXProvider.list_available_models()now also scans LM Studio's local cache (~/.lmstudio/models) (includinglmstudio-community/*andmlx-community/*) and loads from those local directories when present. - GPT-OSS (Harmony) on MLX: improved prompt formatting (prefers tokenizer chat templates), extracts Harmony transcripts into clean
content(stores reasoning inmetadata.reasoning), and propagates correctfinish_reason(stop/length) for truncation handling.
- Concurrency guide: added MLX concurrency benchmarking notes and tracked benchmark plots/CSVs under
docs/assets/so docs don't depend on the ignoredtest_results/folder.
- Config CLI parity: implemented missing
ConfigurationManagermethods used byabstractcoreconfig commands (streaming defaults, embeddings config, cache dirs, logging controls, vision fallback chain). - OpenAI-compatible auth:
openai-compatibleprovider now readsOPENAI_COMPATIBLE_API_KEYwhen set. - CLI provider selection:
abstractcore.utils.clinow exposesopenrouter,openai-compatible, andvllmin--providerchoices (and updates usage examples). - CLI token controls:
abstractcore.utils.clinow supports--max-output-tokensand interactive/max-tokens+/max-output-tokens.
- Updated provider/config/CLI/server docs to reflect OpenAI-compatible consolidation, OpenRouter usage, current Claude model naming, and
base_urlusage for OpenAI-compatible endpoints.
- OpenRouter provider:
create_llm("openrouter", ...)via the OpenAI-compatible API (https://openrouter.ai/api/v1), with config support forOPENROUTER_API_KEY.
- OpenAI-compatible consolidation: refactored
OpenAICompatibleProviderinto the shared implementation and madeLMStudioProvider/VLLMProviderthin subclasses. - Config: added
api_keys.openroutersupport and wiring forabstractcore --set-api-key openrouter .... - Defaults: updated Anthropic default model to
claude-haiku-4-5.
- Test stability: live-network and local-server provider tests are consistently opt-in via env flags; tracing tests no longer require a running Ollama server.
- Media validation:
AnthropicMediaHandler.validate_media_for_model()now relies on centralized vision capability detection for newer Claude naming (e.g.claude-haiku-4-5).
- Packaging / installability:
pip install abstractcorenow includesbeautifulsoup4soimport abstractcoredoes not fail due toModuleNotFoundError: bs4.
-
MCP (Model Context Protocol) Integration: First-class support for MCP servers
- New
abstractcore.mcppackage with HTTP and stdio client implementations McpClientfor HTTP-based MCP servers with session managementMcpStdioClientfor local stdio-based MCP server processesMcpToolSourcefor automatic tool discovery and schema normalization- Tool namespacing (
mcp:server_name:tool_name) to prevent collisions - Comprehensive test coverage for MCP integration
- New
-
Model Support: Added 5 new models to capabilities database
claude-haiku-4-5: Claude Haiku 4.5 with 64K max output, 200K contextclaude-opus-4-5: Claude Opus 4.5 with 64K max output, 200K contextglm-4.7: GLM-4.7 358B MoE with enhanced coding and reasoning (32K output, 128K context)minimax-m2.1: MiniMax M2.1 229B MoE optimized for coding (128K output, 200K context)nemotron-3-nano-30b-a3b: NVIDIA Nemotron 30B hybrid MoE (23 Mamba-2 + 6 Attention layers, 256K context)
-
Architecture Support: Added
nemotron_hybrid_moearchitecture inarchitecture_formats.jsonfor hybrid Mamba-2/Attention models -
Model Name Resolution: Enhanced architecture detection to strip provider prefixes (
nvidia,azure,bedrock,fireworks,gemini,google,groq,together, etc.) from model names for capability lookups (e.g.,lmstudio/qwen/qwen3-next-80b→qwen3-next-80b) -
Tools Infrastructure:
- Filesystem ignore policy (
abstractcore.tools.abstractignore) with.abstractignoresupport and default patterns for*.d/runtime directories - Argument canonicalization (
arg_canonicalizer.py) for flexible parameter naming (e.g.,file_path/filepath/path) - JSON-ish parser (
abstractcore.utils.jsonish) for robust LLM-generated JSON parsing - Tool schema now includes
required_argsfield inToolDefinition.to_dict()
- Filesystem ignore policy (
-
Documentation:
- GLM-4.6V tool format troubleshooting guide (
docs/misc/glm-4.6v-tool-format-inconsistency.md) - Enhanced
docs/tool-calling.mdwith best practices - Backlog organization with
docs/backlog/README.mdand completed items moved to subdirectory
- GLM-4.6V tool format troubleshooting guide (
-
Tool Output Format (Breaking): Core tools now return structured JSON
execute_command: Returns{success, return_code, stdout, stderr, rendered}dictfetch_url: Returns{rendered, raw_text, normalized_text, ...}dict- Maintains
renderedfield for human-readable output - Tool Registry supports structured failure reporting
-
Provider Enhancements:
max_tokensparameter (if provided withoutmax_output_tokens) is automatically mapped tomax_output_tokensfor backward compatibility with callers using legacy terminology. Within AbstractCore,max_output_tokensremains the first-class citizen alongsidemax_input_tokensandmax_tokens(context window)- Centralized timeout configuration from
abstractcore/config - Server endpoint
/v1/chat/completionsacceptstimeout_srequest field - Refactored tool prompt handling for better model-specific format support
- Enhanced performance tracking with detailed timing metrics
-
File Operations:
read_filemax lines increased from 600 to 1000list_filesnow includes directories and uses relative pathsedit_fileenhanced with idempotent insertion behavior, better error messages, diff observability
-
Provider Fixes:
- Anthropic: Unknown
claude*models default to native tool calling;claude-haiku-4-5andclaude-opus-4-5properly recognized;role="tool"messages converted totool_resultcontent blocks - OpenAI-Compatible: Fixed tool call normalization for wrapped tool names (e.g.,
"{function-name: write_file}") - Ollama: Added
metadata._provider_requestfor provider-wire observability - VLLM: Enhanced tool call handling
- LMStudio: Improved timeout handling
- All: Normalized timeout errors, enhanced metadata handling, better architecture detection
- Anthropic: Unknown
-
Tool Fixes:
- Web Search: Prefer
ddgswith fallback toduckduckgo_search; bounded retries with query cleaning; region fallback; relevance scoring - File Operations:
write_filenow requirescontentparameter;edit_fileimproved diagnostics; enhancedsearch_filesandread_filecontext handling - Code Analysis: Enhanced
analyze_codedocumentation
- Web Search: Prefer
-
Tool Calling Infrastructure:
- Parser handles doubled tags, broken closing tags, unescaped control characters
- Bracket prefix support for alternative formats
- Better Nemotron XMLish format handling
- Wrapped tool name mapping in
BaseProvider - Enhanced tag rewriting and normalization
-
Model Capabilities:
- Caching for default capabilities warnings (reduces log noise)
- Updated multiple models to "native" tool support (including
qwen3-next-80b-a3b) - Proper max output token clamping with better error messages
-
Testing: Added 30+ new test files for MCP, tool calling, providers, filesystem policy, streaming, and packaging
- Tool Outputs: Update code parsing
execute_commandorfetch_urloutputs to handle dicts withrenderedfield - File Operations: Explicitly provide
contentparameter towrite_file(usecontent=""for empty files) - Claude Models: Review tool support settings for Claude 4.5 models (now default to native)
- 43 commits improving tools, providers, MCP integration, and infrastructure
- 120 files changed: 8,738 insertions, 12,472 deletions
- 5 new models added to capabilities database (135 total models)
- 30+ new test files for comprehensive coverage
- 21,385 total lines changed across the codebase
Add workflow event types: Introduce new event types for workflow progress tracking
- Added EVENT_TYPE constants for workflow steps: WORKFLOW_STEP_STARTED, WORKFLOW_STEP_COMPLETED, WORKFLOW_STEP_WAITING, and WORKFLOW_STEP_FAILED.
- Enhances event tracking capabilities for durable execution processes.
- Model Support: Added 15+ new models including GLM-4.6V, Qwen3-VL series, Devstral, GPT-OSS, MiniMax-M2, and Granite-4.0-H
- Vision models with enhanced OCR (32 languages) and visual agent capabilities
- MoE models with detailed expert configurations and quantization specs
- Coding models optimized for agentic workflows
- Architecture Support: Added 8 new architectures (glm4v_moe, mistral3, ministral3, granitemoehybrid, gpt_oss, qwen3_vl, qwen3_vl_moe, minimax_m2, harmony)
- Compression Modes: Added
CompressionModeenum for chat history summarization (LIGHT/STANDARD/HEAVY) - Trace Metadata: Added HTTP header extraction for distributed tracing support
- Token Budget Control:
BasicSummarizernow supports AUTO mode for token managementmax_tokens=-1(AUTO): Uses model's full context window capabilitymax_tokens=N: Hard limit for deployment constraints (GPU/RAM)- Same logic applies to
max_output_tokens - CLI supports
--max-tokens autoor specific values
- Tool Call Parsing: Improved robustness with sanitization for malformed LLM output
- Handles doubled tags, broken closing tags, and unescaped control characters
- String-aware JSON escaping preserves structural whitespace
- Summarization: Smart token budget management prevents OOM while optimizing performance
- AUTO mode uses model's full capability
- Hard limits respect deployment constraints (GPU memory)
- Reduces API calls on large-context models (up to 12x improvement)
- Fallback parsing when structured output fails
- File Editing: Added flexible whitespace matching and unified diff support to
edit_file- Matches patterns ignoring indentation differences
- Preserves file's original indentation style
- Error Handling: Added fallback strategies throughout for improved reliability
- Async Trace Capture: Improved reliability of trace capture in
agenerate()for async LLM calls
- All changes maintain backward compatibility
- Default changed to
max_tokens=-1(AUTO) for optimal performance - Token limits prevent OOM in memory-constrained environments
- Added deprecation warnings for
execute_toolsparameter
- Made PIL/Pillow a required core dependency
- Providers need media handling, so PIL cannot be optional
- Fixes import errors when using abstractcore without explicit media installation
- Modified files:
pyproject.toml,abstractcore/media/utils/image_scaler.py,abstractcore/utils/vlm_token_calculator.py
-
Fixed
NameError: name 'Image' is not definedwhen importing tools module without PIL/Pillow installedimage_scaler.pyused PIL types in annotations but imported conditionally, causing NameError instead of ImportError- Changed to direct imports with clear error messages
- Core functionality (
tools,create_llm) now works without PIL installed - Modified files:
abstractcore/media/utils/image_scaler.py,abstractcore/utils/vlm_token_calculator.py
-
Fixed
compressioninstallation group to depend onmedia(includes Pillow) -
Added missing installation groups:
all-non-mlx,all-providers-non-mlx,local-providers-non-mlx
- Dynamic Base URL Support for Server Endpoint: POST parameter for runtime base_url configuration
- New Parameter:
base_urlfield in/v1/chat/completionsrequest body - Use Case: Connect to custom OpenAI-compatible endpoints without environment variables
- Example:
{"model": "openai-compatible/model-name", "base_url": "http://localhost:1234/v1", ...} - Integration: Works with openai-compatible provider and any provider supporting base_url
- Logging: Custom base URLs logged with 🔗 emoji for easy debugging
- Priority: POST parameter > environment variable > provider default
- Zero Breaking Changes: Optional parameter, existing code unchanged
- New Parameter:
- OpenAI-Compatible Provider Model Listing: Fixed
/v1/models?provider=openai-compatibleendpoint- Root Cause: Provider validation rejected "default" placeholder model used by registry for model discovery
- Solution: Skip model validation when model == "default" (registry placeholder)
- Impact:
/v1/modelsendpoint now correctly lists all 27 models from LMStudio/llama.cpp servers - Verified: Works with environment variable (
OPENAI_COMPATIBLE_BASE_URL) configuration - Model Prefix: All models returned with correct
openai-compatible/prefix
- Provider Registry: Added openai-compatible to instance-based model listing
- Previous: Attempted static method call, failed with openai-compatible
- Fixed: Added "openai-compatible" to instance-based providers list alongside ollama, lmstudio, anthropic
- Benefit: Proper model discovery with base_url injection from environment variables
- Files Modified:
abstractcore/server/app.py(added base_url field to ChatCompletionRequest, ~18 lines)abstractcore/providers/openai_compatible_provider.py(skip validation for "default" model, ~3 lines)abstractcore/providers/registry.py(added openai-compatible to instance providers, 1 line)abstractcore/utils/version.py(version bump to 2.6.5)
- Architecture: Clean parameter injection pattern, minimal code changes
- Testing: Validated with LMStudio server on localhost:1234 (qwen/qwen3-next-80b model)
# POST with dynamic base_url parameter (NEW in v2.6.5)
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai-compatible/qwen/qwen3-next-80b",
"messages": [{"role": "user", "content": "Hello"}],
"base_url": "http://localhost:1234/v1"
}'
# List models with environment variable (FIXED in v2.6.5)
export OPENAI_COMPATIBLE_BASE_URL="http://localhost:1234/v1"
curl http://localhost:8080/v1/models?provider=openai-compatible
# Returns all 27 models with openai-compatible/ prefix-
vLLM Provider: Dedicated provider for high-throughput GPU inference on NVIDIA CUDA hardware
- Native vLLM Features: Exposes guided decoding, Multi-LoRA, and beam search capabilities
- Guided Decoding:
guided_regex,guided_json,guided_grammarparameters for 100% syntax-safe code generation - Multi-LoRA Support:
load_adapter(),unload_adapter(),list_adapters()for dynamic adapter management - Beam Search:
best_of,use_beam_searchparameters for higher accuracy on complex tasks - Full Async Support: Native async implementation with lazy-loaded httpx.AsyncClient
- OpenAI-Compatible: Uses
/v1/chat/completionsendpoint while exposing vLLM extensions viaextra_body - Shared Cache: Automatically shares HuggingFace cache with HF/MLX providers via
HF_HOME - Environment Variables:
VLLM_BASE_URL(default:http://localhost:8000/v1),VLLM_API_KEY(optional) - Default Model:
Qwen/Qwen3-Coder-30B-A3B-Instruct(or use Qwen2.5-Coder-7B-Instruct for testing) - Registry Integration: Listed in
get_all_providers_status()alongside other 6 providers - Implementation: 823 lines of provider code, 371 lines of tests, comprehensive GPU testing guide
- Use Cases: Production GPU deployments, multi-GPU tensor parallelism, specialized AI agents with LoRA adapters
-
OpenAI-Compatible Generic Provider: Universal provider for any OpenAI-compatible API endpoint
- Maximum Compatibility: Works with llama.cpp, text-generation-webui, LocalAI, FastChat, Aphrodite, SGLang, proxies
- Optional Authentication: API key support (optional, many local servers don't require it)
- Feature Parity: Chat completions, streaming, async, embeddings, structured output, prompted tools
- Environment Variables:
OPENAI_COMPATIBLE_BASE_URL(default:http://localhost:8080/v1),OPENAI_COMPATIBLE_API_KEY(optional) - Default Model:
"default"(server-dependent) - 8 Providers Total: Completes provider ecosystem alongside OpenAI, Anthropic, Ollama, LMStudio, MLX, HuggingFace, vLLM
- Implementation: 764 lines of provider code, 328 lines of tests
- Architecture: Inherits from BaseProvider, uses httpx for HTTP communication
- Use Cases: llama.cpp local servers, text-generation-webui deployments, OpenAI-compatible proxies, custom endpoints
- Future Enhancement: Planned refactoring to create base class for vLLM/LMStudio to reduce code duplication (see
docs/backlog/)
- Hardware Requirements: Updated README.md and docs/prerequisites.md with hardware compatibility warnings
- Added "Hardware" column to provider table (MLX: Apple Silicon only, vLLM: NVIDIA CUDA only)
- Clear installation guidance per hardware platform
- Multi-GPU Setup: Complete guide for tensor parallelism on 4x NVIDIA L4 GPUs
- Startup commands for single GPU, multi-GPU, production with LoRA
- Key parameters documentation (
--tensor-parallel-size,--gpu-memory-utilization,--max-num-seqs) - OOM troubleshooting based on real deployment experience
- Testing Infrastructure: GPU test scripts for quick verification and comprehensive integration testing
test-repl-gpu.py: Interactive REPL for direct vLLM provider testingtest-gpu.py: Full stack test with AbstractCore server + curl examples- FastDoc UI available at
http://localhost:8080/docswhen server running
- Validated on 4x NVIDIA L4 GPUs (23GB VRAM each, Scaleway Paris)
- Successfully resolved multi-GPU tensor parallelism requirements
- Fixed sampler warm-up OOM by reducing
--max-num-seqsfrom 256 to 128 - Documented Triton kernel compilation issues with MoE models (recommend 7B models for reliability)
- Files Created:
abstractcore/providers/vllm_provider.py(823 lines)abstractcore/providers/openai_compatible_provider.py(764 lines)tests/providers/test_vllm_provider.py(371 lines)tests/providers/test_openai_compatible_provider.py(328 lines)
- Files Modified:
abstractcore/providers/registry.py(added 2 provider registrations)abstractcore/providers/__init__.py(exported 2 new providers)README.md(hardware requirements)docs/prerequisites.md(multi-GPU setup guide)
- Architecture: Both providers inherit from BaseProvider (not OpenAIProvider) for clean httpx implementation
- Pattern: vLLM uses
extra_bodyfor vLLM-specific params; OpenAI-compatible is pure OpenAI-compatible - Branch:
vllm-provider(pending merge to main)
- More Stringent Assessment Scoring: BasicJudge now applies rigorous, context-aware scoring to prevent grade inflation (2025-12-10)
- Anti-Grade-Inflation: Explicit guidance to avoid defaulting to high scores (3-4) for adequate work
- Context-Aware Criteria: Scores criteria based on task type (e.g., innovation=1-2 for routine calculations, not 3)
- Task-Appropriate Expectations: Different rubrics for routine tasks vs creative work vs complex problem-solving
- New Evaluation Step: "Assess if each criterion meaningfully applies to this task (if not, score 1-2)"
- Impact: More accurate and fair assessments that distinguish between routine competence and genuine excellence
- Example: Basic arithmetic now correctly scores innovation=1-2 (routine formula), not 3 (adequate innovation)
- Zero Breaking Changes: Assessment API unchanged, only internal scoring logic improved
- Complete Score Visibility:
session.generate_assessment()now returns all predefined criterion scores in structured format- New Field:
scoresdict containing clarity, simplicity, actionability, soundness, innovation, effectiveness, relevance, completeness, coherence - Before: Only overall_score, custom_scores, and text feedback visible
- After: Full transparency with individual scores for both predefined and custom criteria
- Impact: Users can now see exactly how each criterion was scored, not just overall and custom scores
- Backward Compatible: New
scoresfield added to assessment result without breaking existing code
- New Field:
- Files Modified:
abstractcore/processing/basic_judge.py(scoring principles),abstractcore/core/session.py(score extraction) - Prompt Enhancement: Added "SCORING PRINCIPLES - CRITICAL" section with 6 explicit guidelines
- Implementation: ~15 lines added to scoring rubric, ~10 lines to session assessment storage
- Programmatic Provider Configuration: Runtime configuration API for provider settings without environment variables (2025-12-01)
- Simple API:
configure_provider(),get_provider_config(),clear_provider_config()functions - Runtime Configuration: Set provider base URLs and other settings programmatically
- Automatic Application: All future
create_llm()calls automatically use configured settings - Provider Discovery:
get_all_providers_with_models()automatically uses runtime configuration - Use Cases:
- Web UI settings pages: Configure providers through user interfaces
- Docker startup scripts: Read from custom env vars and configure programmatically
- Integration testing: Set mock server URLs without environment variables
- Multi-tenant deployments: Configure different base URLs per tenant
- Priority System: Constructor parameter > Runtime configuration > Environment variable > Default value
- Implementation: ~65 lines across 3 files (config/manager.py, config/init.py, providers/registry.py)
- Testing: 9/9 tests passing with real implementations (no mocking)
- Zero Breaking Changes: Optional runtime configuration, all existing code works unchanged
- Feature Request: Extension of Digital Article team's base URL configuration request
- Simple API:
- README.md: Added Programmatic Configuration section with use cases and priority system
- llms.txt: Added feature line for v2.6.2
- llms-full.txt: Added comprehensive section with Web UI, Docker, testing, and multi-tenant examples
- FEATURE_REQUEST_RESPONSE_ENV_VARS.md: Updated with programmatic API examples
- Architecture: Runtime-only (in-memory), not persisted to config JSON file
- Injection Point:
ProviderRegistry.create_provider_instance()merges runtime config into kwargs - Pattern:
merged_kwargs = {**runtime_config, **kwargs}ensures user kwargs take precedence - Backward Compatibility: All 6 providers work automatically via registry injection
- Test Coverage: Unit tests for config methods, provider creation, precedence, and registry integration
- Environment Variable Support for Provider Base URLs: Ollama and LMStudio providers now respect environment variables for custom base URLs (2025-12-01)
- Ollama Provider: Supports
OLLAMA_BASE_URLandOLLAMA_HOSTenvironment variables - LMStudio Provider: Supports
LMSTUDIO_BASE_URLenvironment variable - Provider Discovery:
get_all_providers_with_models()automatically respects environment variables when checking provider availability - Use Cases:
- Remote Ollama servers (e.g., GPU server on
http://192.168.1.100:11434) - Docker/Kubernetes deployments with custom networking
- Non-standard ports for multi-instance deployments (e.g.,
:11435,:1235) - Accurate provider availability detection in distributed environments
- Remote Ollama servers (e.g., GPU server on
- Priority System: Programmatic
base_urlparameter > Environment variable > Default value - Implementation: ~30 lines across 2 providers, follows existing OpenAI/Anthropic pattern
- Testing: 12/12 tests passing with real implementations (no mocking)
- Zero Breaking Changes: Optional environment variables, defaults unchanged, fully backward compatible
- Feature Request: Submitted by Digital Article team for computational notebook deployment
- Ollama Provider: Supports
- README.md: Added Environment Variables section with examples for all providers
- llms.txt: Added feature line for v2.6.1
- llms-full.txt: Added comprehensive Environment Variables section with use cases and code examples
- Architecture: Consistent with OpenAI/Anthropic providers (implemented in v2.6.0)
- Pattern:
base_url or os.getenv("PROVIDER_BASE_URL") or default_value - Providers Updated:
ollama_provider.py,lmstudio_provider.py - Test Coverage: Unit tests for env var reading, precedence, defaults, and integration with provider registry
-
Model Download API: Provider-agnostic async model download with progress reporting (2025-12-01)
- Top-Level Function:
from abstractcore import download_model- simple, discoverable API - Async Progress Reporting: Real-time status updates via async generator pattern
- Provider Support:
- ✅ Ollama: Full progress with percent and bytes via
/api/pullstreaming NDJSON - ✅ HuggingFace: Start/complete messages via
huggingface_hub.snapshot_download - ✅ MLX: Same as HuggingFace (uses HF Hub internally)
- ✅ Ollama: Full progress with percent and bytes via
- Progress Information:
DownloadProgressdataclass with status, message, percent, downloaded_bytes, total_bytes - Error Handling: Clear error messages for connection failures, missing models, and gated repositories
- Use Cases: Docker deployments, automated setup, web UIs with SSE streaming, batch downloads
- Implementation: ~240 lines in
abstractcore/download.py, 11/11 tests passing with real implementations - Zero Breaking Changes: New functionality only, fully backward compatible
- Top-Level Function:
-
Custom Base URL Support: Configure custom API endpoints for OpenAI and Anthropic providers (2025-12-01)
- OpenAI Provider:
base_urlparameter +OPENAI_BASE_URLenvironment variable - Anthropic Provider:
base_urlparameter +ANTHROPIC_BASE_URLenvironment variable - Use Cases:
- OpenAI-compatible proxies (Portkey, etc.) for observability, caching, cost management
- Local OpenAI-compatible servers
- Enterprise gateways for security and compliance
- Custom endpoints for testing and development
- Configuration Methods: Programmatic parameter (recommended) or environment variables
- Implementation: ~30 lines across 2 providers, follows Ollama/LMStudio pattern
- Testing: 8/10 tests passing, 2 appropriately skipped (OpenAI model validation with test keys)
- Zero Breaking Changes: Optional parameter with None default, fully backward compatible
- Note: Azure OpenAI NOT supported (requires AzureOpenAI SDK class)
- OpenAI Provider:
-
Production-Ready Native Async Support: Complete async/await implementation with validated 6-7.5x performance improvement (2025-11-30)
- Native Async Providers: Ollama, LMStudio, OpenAI, Anthropic now use native async clients (httpx.AsyncClient, AsyncOpenAI, AsyncAnthropic)
- Performance Validated:
- Ollama: 7.5x faster for concurrent requests
- LMStudio: 6.5x faster for concurrent requests
- OpenAI: 6.0x faster for concurrent requests
- Anthropic: 7.4x faster for concurrent requests
- Fallback Providers: MLX and HuggingFace use
asyncio.to_thread()(industry standard for non-async libraries) - Implementation Time: 15-16 hours (vs 80-120 hours originally planned) - simplified approach
- Code Changes: ~529 lines across 4 provider files (Ollama, LMStudio native implementations)
- Zero Breaking Changes: All sync APIs unchanged, async purely additive
- Testing: Comprehensive validation with real models (no mocking), 100% success rate
-
Structured Logging Standardization: Completed migration of 14 core modules to structured logging (2025-12-01)
- 100% Migration Rate: 14/14 target files successfully migrated to
get_logger()fromabstractcore.utils.structured_logging - Modules Migrated: tools/ (6 files), architectures/, core/, embeddings/, media/, providers/, utils/
- Simplified Approach: 2 hours implementation (vs 6-12 hours originally planned) - 5-6x more efficient
- SOTA Compliance: Follows PEP 282, Django, FastAPI, and cloud-native patterns
- Zero Breaking Changes: Fully backward compatible, all tests passing
- Benefits: Consistent structured logs, JSON output support, cloud-native ready, improved observability
- 100% Migration Rate: 14/14 target files successfully migrated to
-
Async Documentation:
- Updated README.md with performance data and provider-specific details
- Educational async CLI demo with 8 core async/await patterns
- Created comprehensive async guide in docs/async-guide.md
- Backlog documents:
async-mlx-hf.md(investigation),batching.md(future enhancement)
-
Observability: Consistent structured logging across all critical infrastructure
- Module-level loggers using
get_logger(__name__)pattern - Structured fields support for machine-readable logs (ELK/Datadog/Splunk)
- Cloud-native JSON output ready
- No file dependencies (stdout/stderr only)
- Module-level loggers using
- Architecture:
BaseProvider._agenerate_internal()as extension point for native async- Lazy-loaded async clients (zero overhead for sync-only users)
- Proper async cleanup in
unload()methods - Pattern follows SOTA from LangChain, LiteLLM, Pydantic-AI
- Why MLX/HF use fallback: Libraries don't expose async APIs, direct function calls (no HTTP layer)
- SOTA Validation: Research confirmed approach matches industry best practices
- Average Speedup: ~7x faster for concurrent requests across all providers
- Real Concurrency: True async I/O overlap for network providers (HTTP client/server architecture)
- Fallback Efficiency: MLX/HF keep event loop responsive for mixing with async I/O operations
- Async/Await Support - Updated usage examples
- Async Guide - Comprehensive examples and patterns
- Async CLI Demo - Educational reference for learning
-
Async/Await Support: Native async API for concurrent LLM requests with 3-10x performance improvement
agenerate()Method: Async version ofgenerate()works with all 6 providers (OpenAI, Anthropic, Ollama, LMStudio, MLX, HuggingFace)- Concurrent Execution: Use
asyncio.gather()for parallel requests with proven 3.52x speedup on real workloads - Async Streaming: Full streaming support with
AsyncIteratorfor real-time token generation - Session Async:
BasicSession.agenerate()maintains conversation history in async workflows - Zero Breaking Changes: All sync APIs continue to work unchanged - async is purely additive
- FastAPI Compatible: Works seamlessly with async web frameworks and non-blocking applications
- Real Concurrency Verified: Benchmark tests confirm true async concurrency, not fake async wrappers
- Implementation: ~90 lines in 2 files using
asyncio.to_thread()for thread-pool async execution - Files Modified:
abstractcore/providers/base.py,abstractcore/core/session.py - Tests: Comprehensive test suite with real provider implementations (no mocking) in
tests/async/
-
Cross-Platform Installation Options: New installation extras for Linux/Windows users
abstractcore[all-non-mlx]- Complete installation without MLX (for Linux/Windows)abstractcore[all-providers-non-mlx]- All providers except MLXabstractcore[local-providers-non-mlx]- Ollama and LMStudio without MLX- Fixes installation failures when trying to install MLX on non-macOS systems
- Comprehensive installation guide:
docs/installation-guide.md - Updated README with platform-specific installation instructions
- Async Documentation: Comprehensive documentation updates across all guides
- README.md: Added async to Key Features and dedicated Async/Await section with examples
- docs/getting-started.md: New Section 6 covering async patterns and use cases
- docs/api-reference.md: Complete API documentation for
agenerate()methods - docs/README.md: Added async to Essential Guides navigation
- llms.txt: Added async code examples and capabilities for AI consumption
- llms-full.txt: Comprehensive async section with 4 subsections (basic, streaming, session, multi-provider)
- Platform Compatibility:
pip install abstractcore[all]no longer fails on Linux/Windows- Previously,
abstractcore[all]would fail on non-macOS systems due to MLX dependencies - Users should now use
abstractcore[all-non-mlx]on Linux/Windows for complete installation
- Previously,
- Async Implementation Details:
- Uses
asyncio.to_thread()to run sync methods in thread pool without blocking event loop - Proper
AsyncIteratorprotocol for streaming responses - Works with all existing provider implementations automatically via
BaseProvider - Full parameter passthrough for all generation options
- Tested with real LLM calls across all providers
- Uses
- Verified Speedup: Benchmark testing shows 3.52x improvement for concurrent requests
- Sequential: 0.93s for 3 requests
- Concurrent: 0.26s for 3 requests with
asyncio.gather() - Real async concurrency confirmed (not fake async wrappers)
- Batch document processing
- Multi-provider consensus/comparison
- Non-blocking web applications (FastAPI, async frameworks)
- Parallel data extraction tasks
- High-throughput API endpoints
-
Added programmatic interaction tracing to capture complete LLM interaction history, enabling debugging, compliance, and performance analysis.
-
Introduced provider-level and session-level tracing with customizable metadata and automatic trace collection.
-
Implemented trace retrieval and export utilities for JSONL, JSON, and Markdown formats.
-
Enhanced documentation and examples for interaction tracing usage and benefits.
-
Comprehensive test coverage added for tracing functionality, ensuring reliability and correctness.
-
MiniMax M2 Model Support: Added comprehensive detection for MiniMax M2 Mixture-of-Experts model
- Model Specs: 230B total parameters with 10B active (MoE architecture)
- Capabilities: Native tool calling, structured outputs, interleaved thinking with
<think>tags - Context Window: 204K tokens (industry-leading), optimized for coding and agentic workflows
- Variant Detection: Supports all distribution formats:
minimax-m2(canonical name)MiniMaxAI/MiniMax-M2(HuggingFace official)mlx-community/minimax-m2(MLX quantized)unsloth/MiniMax-M2-GGUF(GGUF format)
- Case-Insensitive: All variants detected regardless of case (e.g.,
MiniMax-M2,MINIMAX-m2) - Source: Official MiniMax documentation (minimax-m2.org, HuggingFace, GitHub)
- License: Apache-2.0 with no commercial restrictions
- Note: Added single entry in
model_capabilities.jsonwith comprehensive aliases for automatic detection across all distribution formats
-
[EXPERIMENTAL] Glyph Visual-Text Compression: Renders long text as optimized images for VLM processing
⚠️ Vision Model Requirement: ONLY works with vision-capable models (gpt-4o, claude-3-5-sonnet, llama3.2-vision, etc.)⚠️ Error Handling:glyph_compression="always"raisesUnsupportedFeatureErrorif model lacks vision support⚠️ Auto Mode:glyph_compression="auto"(default) logs warning and falls back to text processing for non-vision models- PIL-based text rendering with custom font support and proper DPI scaling
- Markdown-like formatting with hierarchical headers, bold/italic text, and smart newline handling
- Multi-column layout support with configurable spacing and margins
- Special OCRB font family support with separate regular/italic variants and stroke-based bold effect
- Font customization via
--font(by name) and--font-path(by file) parameters - Research-based VLM token calculator with provider-specific formulas
- Thread-safe caching system in
~/.abstractcore/glyph_cache/ - Optional dependencies:
pip install abstractcore[compression](removed ReportLab dependency) - Vision capability validation in
AutoMediaHandler._should_apply_compression()
-
Model Capability Filtering: Clean, type-safe system for filtering models by input/output capabilities
- Input Capabilities: Filter by what models can analyze (TEXT, IMAGE, AUDIO, VIDEO)
- Output Capabilities: Filter by what models generate (TEXT, EMBEDDINGS)
- Python API:
list_available_models(input_capabilities=[...], output_capabilities=[...]) - HTTP API:
/v1/models?input_type=image&output_type=text - All Providers: Works consistently across OpenAI, Anthropic, Ollama, LMStudio, MLX, HuggingFace
-
Text File Support: Media module now supports 90+ text-based file extensions with intelligent content detection
- Expanded Mappings: Added support for programming languages (.py, .js, .r, .R, .rs, .go, .jl, etc.), notebooks (.ipynb, .rmd), config files (.yaml, .toml, .ini), web files (.css, .vue, .svelte), build scripts (.sh, .dockerfile), and more
- Smart Detection: Unknown extensions are analyzed via content sampling (UTF-8, Latin-1, etc.) to automatically detect text files
- Programmatic Access: New
get_all_supported_extensions()andget_supported_extensions_by_type()functions for querying supported formats - CLI Enhancement:
@filepathsyntax now works with ANY text-based file (R scripts, Jupyter notebooks, SQL files, etc.) - Fallback Processing: TextProcessor handles all text files via plain text fallback, ensuring universal support
-
Model Capabilities: Added 50+ VLM models (Mistral Small 3.1/3.2, LLaMA 4, Qwen3-VL, Granite Vision)
-
Detection System: All model queries go through
detection.pywith structured logging -
Token Calculation: Accurate image tokenization using model-specific parameters
-
Offline-First Architecture: AbstractCore now enforces offline-first operation by default
- Added centralized offline configuration in
config/manager.py - HuggingFace provider loads models directly from local cache when offline
- Environment variables (
TRANSFORMERS_OFFLINE,HF_HUB_OFFLINE) set automatically - Uses centralized cache directory configuration
- Designed primarily for open source LLMs with full offline capability
- Added centralized offline configuration in
-
HuggingFace Provider: Added vision model support for GLM4V architecture (Glyph, GLM-4.1V)
- Upgraded transformers requirement to >=4.57.1 for GLM4V architecture support
- Added
_is_vision_model()detection for AutoModelForImageTextToText models - Added
_load_vision_model()and_generate_vision_model()methods - Proper multimodal message handling with AutoProcessor
- Suppressed progress bars and processor warnings during model loading
-
Vision Compression: Enhanced test script with exact token counting from API responses
- Added
--detailparameter for Qwen3-VL token optimization (low,high,auto,custom) - Added
--target-tokensparameter for precise token control per image - Improved compression ratio calculation using actual vs estimated tokens
- Added model-specific context window validation and warnings
- Added
-
Media Handler Architecture: Clarified OpenAI vs Local handler usage patterns
- LMStudio uses OpenAIMediaHandler for vision models (API compatibility)
- Ollama uses LocalMediaHandler with custom image array format
- Added comprehensive architecture documentation and diagrams
- Cache Creation: Automatic directory creation with proper error handling
- Dependency Validation: Structured logging for missing libraries
- Compression Pipeline: Fixed parameter passing and quality threshold bypass
- GLM4V Architecture: Fixed
KeyError: 'glm4v'when loading Glyph and GLM-4.1V models - Text Formatting Performance: Fixed infinite loop in inline formatting parser for large files
- Text Pagination: Implemented proper multi-image splitting for long texts
- Literal Newline Handling: Fixed
\\nsequences not being converted to actual newlines - Token Estimation: Added model-specific visual token calculations and context overflow protection
- Media Path Logging: Fixed media output paths not showing in INFO logs
- Qwen3-VL Context Management: Auto-adjusts detail level to prevent memory allocation errors
- LMStudio GLM-4.1V Compatibility: Documented LMStudio's internal vision config limitations
- HuggingFace GLM4V Support: Added proper error handling for transformers version requirements
- Requires vision-capable models (llama3.2-vision, qwen2.5vl, gpt-4o, claude-3-5-sonnet, zai-org/Glyph)
- System dependency on poppler-utils may require manual installation on some systems
- Quality assessment heuristics may be overly conservative for some document types
- Native Structured Output Support for HuggingFace GGUF Models: HuggingFace provider now supports server-side schema enforcement for GGUF models via llama-cpp-python's
response_formatparameter- GGUF models loaded through HuggingFace provider automatically get native structured output support
- Uses the same OpenAI-compatible
response_formatparameter as LMStudio - Server-side schema enforcement validates output against the provided schema
- Transformers models continue to use prompted approach as fallback
- Provider registry updated to advertise structured output capability
- Native Structured Output via Outlines for HuggingFace Transformers: HuggingFace Transformers models now support native structured output via optional Outlines integration
- Constrained decoding ensures 100% schema compliance without validation retries
- Optional dependency - only installed with
pip install abstractcore[huggingface] - Automatic detection and activation when Outlines is available
- Graceful fallback to prompted approach if Outlines not installed
- Works with any transformers-compatible model
- Server-side logit filtering guarantees valid token selection
- Native Structured Output via Outlines for MLX: MLX models now support native structured output via optional Outlines integration
- Constrained decoding on Apple Silicon with 100% schema compliance
- Optional dependency - only installed with
pip install abstractcore[mlx] - Automatic detection and activation when Outlines is available
- Graceful fallback to prompted approach if Outlines not installed
- Optimized for Apple M-series processors
- Zero validation retries required
- StructuredOutputHandler: Enhanced provider detection to identify HuggingFace GGUF models, Transformers with Outlines, and MLX with Outlines as having native support
- Checks for
model_type == "gguf"to determine GGUF native support - Checks for
model_type == "transformers"with Outlines availability for Transformers native support - Checks for Outlines availability for MLX native support
- GGUF models benefit from llama-cpp-python's constrained sampling
- Transformers and MLX models benefit from Outlines constrained decoding when available
- Automatic fallback to prompted strategy if Outlines not installed
- Checks for
- Structured Output Control: Added
structured_output_methodparameter to HuggingFace and MLX providers for explicit control"auto"(default): Use Outlines if available, fallback to prompted"native_outlines": Force Outlines usage (error if unavailable)"prompted": Always use prompted fallback (recommended - fastest, 100% success)- Allows users to optimize for performance vs theoretical guarantees
- Model Capabilities: Verified and documented native structured output support for Ollama and LMStudio providers
- Ollama: Confirmed correct implementation using
formatparameter with full JSON schema - LMStudio: Documented existing OpenAI-compatible
response_formatimplementation - Both providers leverage server-side schema enforcement for schema compliance
- Ollama: Confirmed correct implementation using
- Dependencies: Added Outlines as optional dependency for HuggingFace and MLX providers
pip install abstractcore[huggingface]now includes Outlines for native structured outputpip install abstractcore[mlx]now includes Outlines for native structured output- Base installation remains lightweight - Outlines only installed when needed
- HuggingFace Provider: Added missing
response_modelparameter propagation through internal generation methods- Fixed
_generate_internal()to passresponse_modelto both GGUF and transformers backends - Both
_generate_gguf()and_generate_transformers()now accept and handleresponse_modelparameter
- Fixed
- Provider Registry: Added
"structured_output"to supported features for Ollama, LMStudio, HuggingFace, and MLX providers- Ensures accurate capability reporting for structured output functionality
Surprising Findings from Comprehensive Testing (October 26, 2025):
Extensive testing on Apple Silicon M4 Max revealed unexpected performance characteristics:
MLX Provider (mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit):
- Prompted fallback: 745-4,193ms, 100% success rate
- Outlines native: 2,031-9,840ms, 100% success rate
- Overhead: 173-409% slower with Outlines constrained generation
- Conclusion: Both approaches achieve 100% schema compliance, but prompted is 2-5x faster
Key Insight: The prompted approach (client-side validation) achieves identical 100% success rate at significantly better performance than Outlines' server-side constrained generation. This is contrary to typical expectations where server-side constraints should be more reliable.
Recommendation:
- Default to
structured_output_method="prompted"for best performance with proven reliability - Use
structured_output_method="native_outlines"only when theoretical guarantees are required despite performance cost - The
"auto"setting uses Outlines if installed, which may impact performance without improving reliability
This finding suggests that for these specific models and use cases, the overhead of constrained decoding outweighs its benefits when client-side validation already achieves 100% success.
- New
intentCLI application for analyzing conversation intents and detecting deception patterns /intentcommand in interactive CLI to analyze participant motivations in real-time conversations- Support for multi-participant conversation analysis with focus on specific participants
- Native Structured Output Support: LMStudio provider now supports server-side schema enforcement via OpenAI-compatible
response_formatparameter- Structured outputs are now guaranteed to match the provided schema without retry logic
- Works seamlessly with Pydantic models through the existing
response_modelparameter - Provider registry updated to advertise structured output capability
- Renamed "Internal CLI" to "AbstractCore CLI" throughout documentation
- File renamed:
docs/internal-cli.md→docs/acore-cli.md - Model Capabilities: Updated 50+ Ollama-compatible models to report native structured output support (Llama, Qwen, Gemma, Mistral, Phi families)
- This reflects the actual server-side schema enforcement capabilities these models have when used with Ollama
- Provider Registry: Added
"structured_output"to supported features for both Ollama and LMStudio providers
- Updated all documentation cross-references to use new CLI naming
- Ollama Provider: Improved documentation of native structured output implementation (was already correct, now better documented)
- StructuredOutputHandler: Enhanced provider detection logic to correctly identify Ollama and LMStudio as having native support regardless of configuration
- Configuration System: Fixed missing configuration module that caused
'NoneType' object is not callableerror- Renamed
abstractcore/clitoabstractcore/configto match expected import path - Added complete configuration manager implementation with vision, embeddings, and app defaults
- Fixed
abstractcore --set-vision-providerand all other configuration commands
- Renamed
- Tools Dependencies: Added missing
requestsdependency to core requirements and createdtoolsoptional extra for enhanced functionality
- Unified Token Naming: Standardized token terminology across AbstractCore to match input parameter naming
GeneratedResponsenow providesinput_tokens,output_tokens,total_tokensproperties- Maintains backward compatibility with legacy
prompt_tokensandcompletion_tokenskeys - All providers now use consistent terminology in usage dictionaries
- Token counts sourced from: Provider APIs (OpenAI, Anthropic, LMStudio) or AbstractCore's
token_utils.py(MLX, HuggingFace)
- Provider-Specific Token Handling: Clear documentation of token count sources
- From Provider APIs: OpenAI, Anthropic, LMStudio (native API token counts)
- From AbstractCore: MLX, HuggingFace providers (calculated using
token_utils.py) - Mixed Sources: Ollama (combination of provider and calculated tokens)
- Consistent Interface: All providers normalized through unified
GeneratedResponse.usagestructure
- Universal Timing: Added
gen_timeproperty toGeneratedResponseacross all providers (in milliseconds)- Precise Measurement: Tracks actual API call duration for network-based providers (OpenAI, Anthropic, LMStudio, Ollama)
- Local Processing Time: Measures inference time for local providers (MLX, HuggingFace)
- Simulated Timing: Local providers include realistic timing simulation
- Precision: Rounded to 1 decimal place for clean, readable output
- Performance Insights: Enables performance monitoring, optimization, and comparative analysis across providers
- Summary Integration: Generation time automatically included in
response.get_summary()output
- Optimized HTML Parsing: Added lxml parser support for 2-3x faster HTML processing (with html.parser fallback)
- Session-Based Connection Reuse: Improved network performance through connection pooling
- Enhanced Encoding Detection: Multiple encoding fallback strategies for better text decoding reliability
- Improved Content Extraction: Better main content detection, removes navigation/footer/sidebar elements
- Smart Download Chunking: Optimized chunk sizes based on content type (32KB for binary, 16KB for text)
- Better JSON Formatting: Smart truncation at logical boundaries for improved readability
- Unified Parameter Support: Added comprehensive
seedandtemperatureparameter support across all 6 providers- Provider-Level: All providers now accept
seedandtemperatureparameters in constructor and generate() calls - Session-Level: BasicSession now supports persistent
temperatureandseedparameters across conversation - Parameter Inheritance: Session parameters are used as defaults, can be overridden per generate() call
- Consistent Interface: Same API works across OpenAI, Anthropic, HuggingFace, Ollama, LMStudio, and MLX providers
- Provider-Level: All providers now accept
- OpenAI: Native
seedparameter support for deterministic outputs (except reasoning models like o1) - Anthropic: Graceful fallback with debug logging (Claude API doesn't support seed natively)
- HuggingFace: Full seed support for both transformers (
torch.manual_seed()) and GGUF models (llama-cpp-python) - Ollama: Native
seedparameter support via options - LMStudio: OpenAI-compatible
seedparameter support - MLX: Graceful fallback with debug logging (MLX-LM has limited seed support)
- Consistent Handling: Improved temperature parameter consistency across all providers
- Session Persistence: Temperature can be set at session level and persists across generate() calls
- Provider Defaults: Each provider maintains its own default temperature (0.7) when not specified
- Interface-Level Parameter Declaration: Moved
temperatureandseedtoAbstractCoreInterfacefor consistent contract - Eliminated Code Duplication: Removed redundant parameter initialization from all 6 providers (DRY principle)
- Centralized Parameter Logic: Added
_extract_generation_params()helper method for consistent parameter extraction - Cleaner Provider Code: Providers now focus only on their specific configuration, inheriting common parameters
- Robust Fallback Hierarchy: kwargs → instance variables → interface defaults with elegant one-liner implementation
- Parameter Persistence: Session-level temperature and seed are maintained across conversation
- Flexible Override: Per-call parameters override session defaults without changing session state
- Enhanced Documentation: Updated session docstrings with parameter descriptions
- Non-Breaking: All changes are backward compatible - existing code continues to work
- Provider-Agnostic: Same seed/temperature API works regardless of underlying provider capabilities
- Graceful Degradation: Providers that don't support seed log debug messages instead of failing
- Clean Architecture: Leveraged existing parameter inheritance system in BaseProvider
- Eliminated Duplication: Removed 12 lines of identical parameter initialization across 6 providers
- Interface Contract: Parameters now declared at interface level, ensuring consistent API contract
- Centralized Logic: Single
_extract_generation_params()method replaces scattered parameter handling - Simplified Providers: Each provider reduced by 2-4 lines, focusing only on provider-specific concerns
- Maintainability: Future parameter additions only require interface-level changes, not per-provider updates
# Provider-level parameters
llm = create_llm("openai", model="gpt-4", temperature=0.3, seed=42)
response = llm.generate("Hello", temperature=0.8) # Override temperature for this call
# Session-level parameters
session = BasicSession(provider=llm, temperature=0.5, seed=123)
response1 = session.generate("First message") # Uses session temperature=0.5, seed=123
response2 = session.generate("Second message", temperature=0.9) # Override temperature, keep seedAfter independent analysis, the implementation was refactored for maximum elegance and maintainability:
- Code duplication across 6 providers (12 identical lines)
- Inconsistent parameter handling patterns
- Missing interface-level parameter contract
- Scattered parameter extraction logic
- Interface-Level Declaration: Parameters moved to
AbstractCoreInterfacefor consistent contract - DRY Principle: Eliminated all parameter duplication across providers
- Centralized Logic: Single
_extract_generation_params()method for consistent behavior - Cleaner Providers: Each provider reduced by 2-4 lines, focusing only on provider-specific concerns
- Future-Proof: New parameters require only interface-level changes, not per-provider updates
- Lines Reduced: 12 lines of duplication eliminated
- Maintainability: 83% reduction in parameter-related code across providers
- Consistency: 100% uniform parameter handling across all 6 providers
- Extensibility: New parameters can be added with 2 lines instead of 12
See Generation Parameters Architecture for detailed technical analysis.
- Basic Parameter Tests:
tests/test_seed_temperature_basic.py- CI/CD compatible parameter handling tests - Determinism Tests:
tests/test_seed_determinism.py- Real-world determinism verification across providers - Manual Verification:
tests/manual_seed_verification.py- Interactive script for testing actual determinism - Test Documentation:
tests/README_SEED_TESTING.md- Complete testing guide and troubleshooting
- OpenAI: ✅ Native seed support (verified deterministic)
- Anthropic: ❌ No seed support (issues UserWarning when seed provided)
- HuggingFace: ✅ Full support for transformers and GGUF models
- Ollama: ✅ Native seed support via options
- LMStudio: ✅ OpenAI-compatible seed support
- MLX: ✅ Native seed support via mx.random.seed() (corrected implementation)
Empirically Verified: All providers except Anthropic achieve true determinism with seed + temperature=0:
# Verified deterministic behavior (100% success rate):
✅ OpenAI (gpt-3.5-turbo): Same seed → Identical outputs
✅ Ollama (gemma3:1b): Same seed → Identical outputs
✅ MLX (Qwen3-4B): Same seed → Identical outputs
⚠️ Anthropic (claude-3-haiku): temperature=0 → Consistent outputs (no seed support)Test Commands:
# Test all available providers
python tests/manual_seed_verification.py
# Test specific provider determinism
python tests/manual_seed_verification.py --provider openai --prompt "Count to 5"- Missing Media Subpackages: Fixed critical package installation bug where media subpackages were not included in distribution
- Issue:
pyproject.tomlonly listedabstractcore.mediaparent package but not its subpackages - Impact: Import
from abstractcore import create_llmfailed withModuleNotFoundError: No module named 'abstractcore.media.processors' - Missing Packages:
abstractcore.media.processors(ImageProcessor, PDFProcessor, OfficeProcessor, TextProcessor)abstractcore.media.handlers(OpenAIMediaHandler, AnthropicMediaHandler, LocalMediaHandler)abstractcore.media.utils(image_scaler utilities)
- Solution: Explicitly added all media subpackages to packages list in
pyproject.toml - Root Cause: When explicitly listing packages in pyproject.toml, setuptools does NOT auto-discover subpackages
- Workaround for 2.4.4: Use
from abstractcore.core.factory import create_llminstead offrom abstractcore import create_llm - Credit: Bug discovered and reported during production deployment testing
- Issue:
- Missing abstractcore.cli Module: Fixed missing
abstractcore.clipackage from distribution- Issue: CLI entry point
abstractcorecommand referencedabstractcore.cli.main:mainbut module was not included in package - Impact: Configuration CLI commands would fail after installation from PyPI
- Solution: Added
abstractcore.clito packages list inpyproject.toml
- Issue: CLI entry point
- New Entry Points: Added convenient aliases to clarify CLI purpose and improve user experience
abstractcore-config: Alias forabstractcorecommand (configuration CLI for settings, API keys, models)abstractcore-chat: New entry point for interactive REPL (abstractcore.utils.cli→ LLM interaction)- Purpose: Distinguish between configuration CLI (manage settings) and interactive chat CLI (talk to LLMs)
- Backwards Compatible: All existing commands continue to work (
abstractcore,python -m abstractcore.utils.cli)
- Updated packages list in
pyproject.tomlto include all required modules:packages = [ # ... existing packages ... "abstractcore.media", "abstractcore.media.processors", # ✅ Added "abstractcore.media.handlers", # ✅ Added "abstractcore.media.utils", # ✅ Added "abstractcore.cli" # ✅ Added ]
- Verification: All 19 packages now properly included in distribution
- Testing: Recommended to always test
pip installfrom built wheel before PyPI release
- Installation Works: Users can now successfully
pip install abstractcore[all]orpip install abstractcore[media] - Complete Media System: All media processing capabilities (images, PDFs, Office docs) now accessible after installation
- Clear CLI Commands: Users have obvious entry points for different CLI purposes
- Production Ready: Package installation thoroughly tested and verified
No migration needed - this is a pure bug fix release. If you experienced installation issues with 2.4.4:
- Upgrade:
pip install --upgrade abstractcore - Verify:
python -c "from abstractcore import create_llm; print('✅ Works!')" - Use new CLI aliases (optional):
abstractcore-config --statusinstead ofabstractcore --statusabstractcore-chatinstead ofpython -m abstractcore.utils.cli
- NEW
.health()Method: Unified health check interface for all providers- Structured Response: Consistent health status format across all providers
- Connectivity Testing: Uses
list_available_models()as implicit connectivity test - Smart Timeout Management: Configurable timeout (default: 5.0s) with automatic restoration
- Never Throws: Errors captured in response structure, never raises exceptions
- Rich Information: Returns status, provider name, model list, model count, error message, and latency
- Universal Compatibility: Works with all provider types (API, local, server-based)
- Override-able: Providers can customize health check logic if needed
{
"status": bool, # True if provider is healthy/online
"provider": str, # Provider class name (e.g., "OllamaProvider")
"models": List[str] | None, # Available models if online, None if offline
"model_count": int, # Number of models available (0 if offline)
"error": str | None, # Error message if offline, None if healthy
"latency_ms": float # Health check duration in milliseconds
}- Centralized Token Counter: Fixed HuggingFace provider to use centralized
TokenUtilsfor consistency- Problem: HuggingFace was the only provider using provider-specific
tokenizer.encode()for token counting - Solution: Added
_calculate_usage()method matching MLX provider pattern usingTokenUtils.estimate_tokens() - Impact: All local providers now consistently use centralized token counting infrastructure
- Benefits:
- ✅ Consistency across all providers (MLX, HuggingFace)
- ✅ Robustness when tokenizer unavailable (GGUF models)
- ✅ Content-type detection for better accuracy (code vs text vs JSON)
- ✅ Model-family adjustments (qwen, llama, mistral tokenization patterns)
- Problem: HuggingFace was the only provider using provider-specific
- Comprehensive Token Capture: All providers consistently capture THREE token metrics
- prompt_tokens: Input/context tokens (system prompt + history + current prompt)
- completion_tokens: Generated/output tokens (model's response)
- total_tokens: Sum of prompt + completion (used for billing/quotas)
- API Providers: OpenAI, Anthropic, Ollama, LMStudio use exact API-provided counts
- Local Providers: MLX, HuggingFace use centralized
TokenUtilsestimation
- Centralized Infrastructure: Located at
abstractcore/utils/token_utils.pyTokenUtils.estimate_tokens(text, model): Fast estimation with content-type detectionTokenUtils.count_tokens(text, model, method): Flexible counting (auto/precise/fast)TokenUtils.count_tokens_precise(text, model): Accurate counting with tiktoken when available- Multi-tiered strategy: tiktoken (precise) → provider tokenizer → model-aware heuristics → fast fallback
abstractcore/providers/base.py: Addedhealth()method (lines 870-965)abstractcore/providers/huggingface_provider.py:- Added
_calculate_usage()method using centralized TokenUtils (lines 890-902) - Updated
_single_generate_transformers()to use centralized token counting (lines 867-868)
- Added
- Health Monitoring: Simple interface to check provider connectivity and availability
- Consistency: Unified token counting across all providers with same methodology
- Production Ready: Built-in timeout management prevents hanging health checks
- Developer Experience: Rich health information enables better error handling and monitoring
- Maintainability: Single centralized token counter to update/improve
New .health() method available on all providers:
from abstractcore.core.factory import create_llm
# Check single provider
provider = create_llm("ollama", model="llama2")
health = provider.health(timeout=3.0)
if health["status"]:
print(f"✅ {health['provider']} is healthy!")
print(f" 📦 {health['model_count']} models available")
print(f" ⏱️ {health['latency_ms']}ms response time")
else:
print(f"❌ {health['provider']} is offline")
print(f" Error: {health['error']}")No changes required - all existing code continues to work. HuggingFace provider now uses the same centralized token counting infrastructure as other local providers, improving consistency and accuracy.
- NEW
/v1/responsesEndpoint: 100% compatible with OpenAI's Responses API format- input_file Support: Native support for
{"type": "input_file", "file_url": "..."}in content arrays - Backward Compatible: Existing
messagesformat continues to work alongside newinputformat - Automatic Format Detection: Server automatically detects and converts between OpenAI and legacy formats
- Streaming Support: Optional streaming with
"stream": truefor real-time responses (defaults tofalse) - Universal File Processing: Works with all file types (PDF, DOCX, XLSX, CSV, images) across all providers
- input_file Support: Native support for
- type="file" Support: New content type alongside
"text"and"image_url"for explicit file attachments- Unified Format:
{"type": "file", "file_url": {"url": "..."}}works consistently across all endpoints - Multiple Sources: Supports HTTP(S) URLs, local file paths, and base64 data URLs
- Content-Type Detection: Intelligent file type detection from headers and URL extensions
- Generic Downloader: Replaces image-only downloader with universal file download supporting 15+ file types
- Unified Format:
- Complete Text Extraction: Full PDF content extraction using PyMuPDF4LLM with formatting preservation
- 40,000+ Character Support: Successfully tested with large documents (Berkshire Hathaway annual letter)
- LLM-Optimized Output: Markdown formatting with preserved tables, headers, and structure
- Automatic Installation: Added PyMuPDF4LLM, PyMuPDF, and Pillow to dependencies
- Graceful Fallbacks: Multi-level fallback ensures content extraction even if advanced processing fails
- Global Configuration Management: Unified configuration at
~/.abstractcore/config/abstractcore.json- App-Specific Defaults: Set different models for CLI, summarizer, extractor, and judge apps
- Global Fallbacks: Configure fallback models when app-specific settings aren't available
- API Key Management: Centralized API key storage for all providers
- Cache Configuration: Configurable cache directories for HuggingFace, local models, and general cache
- Logging Control: Console and file logging levels with enable/disable commands
- Streaming Defaults: Configure default streaming behavior for CLI applications
- Universal Media API: Same
media=[]parameter works across all providers with automatic format conversion- Image Processing: Automatic resolution optimization for each model's maximum capability (GPT-4o: 4096px, Claude 3.5: 1568px, qwen2.5vl: 3584px)
- Document Processing: Full support for PDF, DOCX, XLSX, PPTX with complete content extraction
- Data Files: CSV, TSV, JSON, XML with intelligent parsing and analysis
- Provider-Specific Formatting: Automatic conversion to OpenAI JSON, Anthropic Messages API, or local text embedding
- Error Handling: Multi-level fallback strategy ensures users always get meaningful results
- Vision Fallback for Text-Only Models: Transparent two-stage pipeline enables image processing for any model
- Automatic Detection: Identifies when text-only models receive images and activates fallback
- One-Command Setup:
abstractcore --download-vision-modeldownloads and configures BLIP vision model - Flexible Configuration: Supports local models (BLIP, ViT-GPT2, GIT), Ollama, LMStudio, and cloud APIs
- Transparent Operation: Users don't need to change code - system handles vision fallback automatically
-
Command-Line Arguments: Added
--debug,--host, and--portflags for flexible server startup- Debug Mode:
--debugenables comprehensive request/response logging with timing metrics - Custom Binding:
--hostand--portallow custom server addresses (default: 127.0.0.1:8000) - Environment Integration: Follows centralized config patterns with
ABSTRACTCORE_DEBUGvariable
- Debug Mode:
-
Comprehensive Error Reporting: Enhanced 422 validation error handling with actionable diagnostics
- Field-Level Details: Shows exact field path, validation message, and problematic input
- Request Body Capture: In debug mode, logs full request body for troubleshooting
- Structured Logging: JSON-formatted logs with client IP, timing, and error context
- Before vs After: "422 Unprocessable Entity" now shows detailed field validation errors
- OpenAI Vision API Format: Full support for
image_urlobjects with base64 data URLs and HTTP(S) URLs - File Processing Pipeline: Automatic media extraction, validation, and cleanup with request-specific prefixes
- Size Limits: 10MB per file, 32MB total per request with comprehensive validation
- Cleanup Logic: Automatic temporary file cleanup for
abstractcore_img_*,abstractcore_file_*, andabstractcore_b64_*prefixes - Prompt Adaptation: Intelligent prompt adaptation based on file types to avoid confusion
-
Time Module Scoping: Removed redundant local
import timestatements causing "cannot access local variable" errors- Fixed in lines 1995-1996 and 2123-2124 of
abstractcore/server/app.py - Now uses global time import consistently throughout server
- Fixed in lines 1995-1996 and 2123-2124 of
-
Boolean Syntax: Corrected JavaScript boolean syntax (
false/true) to Python syntax (False/True)- Fixed in lines 625, 813, 824, 1170, 1181, 1214 across request examples and defaults
-
Streaming Default: Changed
/v1/responsesendpoint default fromstream=Truetostream=False- Aligns with OpenAI API standard behavior (streaming opt-in, not opt-out)
- Line 361 in
OpenAIResponsesRequestmodel
- Payload Input Issue: Fixed
/v1/responsesendpoint not showing request body in Swagger "Try it out"- Replaced raw
Requestparameter with proper FastAPIBody(...)annotation - Added comprehensive examples for OpenAI format, legacy format, file analysis, and streaming
- Lines 1148-1220 now properly expose request schema to OpenAPI documentation
- Replaced raw
- PDF Download Failures: Created generic file downloader replacing image-only version
- Added proper
Accept: */*headers instead of image-specific headers - Comprehensive content-type mapping for PDF, DOCX, XLSX, CSV, and 10+ other types
- URL extension fallback when content-type header missing
- Lines 1502-1627 in
abstractcore/server/app.py
- Added proper
-
Centralized Configuration Integration: All CLI apps (summarizer, extractor, judge) now use centralized config
- Apps respect
abstractcore --set-app-defaultconfiguration - Fallback to global defaults when app-specific config not set
- Enhanced
--debugmode for all applications
- Apps respect
-
Vision Configuration CLI: New
abstractcore/cli/vision_config.pyfor vision fallback setup- Interactive configuration wizard
- Model download commands
- Status checking and validation
-
Centralized Configuration: Created
docs/centralized-config.mdwith complete configuration system documentation- All available commands with examples
- Configuration file format and priority system
- Troubleshooting guide and common tasks
-
Media Handling System: Comprehensive
docs/media-handling-system.mdwith production-tested examples- "How It Works Behind the Scenes" section explaining multi-layer architecture
- Provider-specific formatting documentation (OpenAI JSON, Anthropic Messages API)
- Real-world CLI usage examples with verified working commands
- Model compatibility matrix and resolution limits
-
Server Documentation: Updated
docs/server.mdwith/v1/responsesendpoint details- OpenAI Responses API format examples
- File attachment workflows
- Streaming configuration
- Media processing capabilities
-
Provider Registry Enhancement: Leverages centralized provider registry for model discovery
/providersendpoint returns complete provider metadata- No hardcoded provider lists - all dynamic discovery
- Registry version 2.0 indicators in API responses
-
Message Preprocessing: New
MessagePreprocessorfor@filenamesyntax in CLI- Extracts file attachments from text
- Validates file existence
- Cleans text for LLM processing
-
Media Type Detection: Intelligent file type detection and processor selection
- AutoMediaHandler coordinates specialized processors
- ImageProcessor, PDFProcessor, OfficeProcessor, TextProcessor
- Graceful fallback ensures processing never fails completely
-
Media Examples: Added comprehensive test assets in
tests/media_examples/- PDF reports, Office documents, spreadsheets, presentations
- CSV/TSV data files with various encodings
- Image examples with metadata
-
Server Testing: Enhanced test suite for media processing and OpenAI compatibility
- Real file processing tests (not mocked)
- Cross-provider media handling verification
- Streaming with media attachments
None. All changes maintain full backward compatibility with version 2.4.x.
The /v1/responses endpoint now accepts both OpenAI's input format and our legacy messages format:
OpenAI Responses API Format (Recommended):
{
"model": "gpt-4o",
"input": [
{
"role": "user",
"content": [
{"type": "input_text", "text": "Analyze this document"},
{"type": "input_file", "file_url": "https://example.com/doc.pdf"}
]
}
],
"stream": false
}Legacy Format (Still Supported):
{
"model": "openai/gpt-4",
"messages": [
{"role": "user", "content": "Tell me a story"}
],
"stream": false
}Note: Streaming is now opt-in (set "stream": true) instead of automatic, matching OpenAI's behavior.
New centralized configuration system available:
# Set global default model
abstractcore --set-global-default ollama/llama3:8b
# Set app-specific defaults
abstractcore --set-app-default summarizer openai gpt-4o-mini
abstractcore --set-app-default extractor ollama qwen3:4b-instruct
# Configure logging
abstractcore --set-console-log-level WARNING
abstractcore --enable-file-logging
# Check current configuration
abstractcore --statusConfiguration is stored in ~/.abstractcore/config/abstractcore.json and respects priority:
- Explicit parameters (highest priority)
- App-specific configuration
- Global configuration
- Hardcoded defaults (lowest priority)
Media processing now supports explicit file types:
CLI (Using @filename syntax):
python -m abstractcore.utils.cli --prompt "Analyze @report.pdf and @chart.png"Python API:
response = llm.generate(
"Analyze these documents",
media=["report.pdf", "chart.png", "data.xlsx"]
)Server API (New type="file"):
{
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze this file"},
{"type": "file", "file_url": {"url": "https://example.com/doc.pdf"}}
]
}
]
}All formats work identically across all providers with automatic format conversion.
pymupdf4llm(0.0.27): LLM-optimized PDF text extractionpymupdf(1.26.5): Core PDF processing librarypydantic(2.12.3): Request validation and serializationfastapi: Enhanced with latest featurespillow(12.0.0): Image processing support
- Users: Seamless file attachment across all providers with
@filenameCLI syntax andmedia=[]API - Developers: OpenAI-compatible server endpoints with comprehensive media processing
- Production: Robust error handling, detailed logging, and graceful degradation
- Configuration: Single source of truth for all package-wide preferences and defaults
- Media System Critical Fixes: Resolved implementation issues preventing full media processing functionality
- PDF Processing: Fixed
output_formatparameter conflict inPDFProcessor._create_media_content()call (line 128) causing "got multiple values for keyword argument" error - Office Document Processing: Fixed element iteration errors in
OfficeProcessorby replacingconvert_to_dict()approach with direct element processing for DOCX, XLSX, and PPTX files - Unstructured Library Integration: Updated office processor to work correctly with current unstructured library API, eliminating "'NarrativeText' object is not iterable" and "'Table' object is not iterable" errors
- PDF Processing: Fixed
-
Production-Ready Media System: All file types now working perfectly with comprehensive content extraction
- PDF Files: Full text extraction with formatting preservation using PyMuPDF4LLM
- Word Documents: Complete document analysis with structure preservation (DOCX)
- Excel Spreadsheets: Sheet-by-sheet content extraction with intelligent data analysis (XLSX)
- PowerPoint Presentations: Slide content extraction with comprehensive presentation analysis (PPTX)
- CSV/TSV Files: Intelligent data parsing with quality assessment and recommendations
- Images: Seamless vision model integration with existing test infrastructure
-
Server Debug Support: Comprehensive debug mode for troubleshooting API issues
- Command Line Interface: Added
--debug,--host, and--portarguments to server startup with comprehensive help - Enhanced Error Logging: Detailed 422 validation error reporting with field-level diagnostics and request body capture
- Request/Response Tracking: Full HTTP request logging with client information, timing metrics, and structured JSON output
- Centralized Configuration Integration: Follows centralized config system patterns with environment variable support
- Before vs After: Uninformative "422 Unprocessable Entity" messages now provide actionable field validation details
- Command Line Interface: Added
- CLI Integration: Confirmed
@filenamesyntax works flawlessly across all file types- Tested with real files: PDF reports, Office documents, spreadsheets, presentations, data files, and images
- Cross-provider compatibility verified with OpenAI, Anthropic, and LMStudio providers
- All examples documented in
docs/media-handling-system.mdare production-tested and working
- Comprehensive Media System Documentation: Completely rewrote
docs/media-handling-system.mdto reflect actual implementation- Added detailed "How It Works Behind the Scenes" section explaining the multi-layer architecture
- Documented provider-specific formatting (OpenAI JSON, Anthropic Messages API, local text embedding)
- Added real-world CLI usage examples with verified working commands
- Included cross-provider workflow diagrams and error handling strategies
- Architecture Documentation: Updated
docs/architecture.mdwith comprehensive media system architecture section- Added media processing workflow diagrams and component descriptions
- Documented graceful fallback strategy and provider-specific formatting
- Included unified media API documentation and CLI integration details
- Robust Error Handling: Multi-level fallback strategy ensures users always get meaningful results
- Advanced processing with specialized libraries (PyMuPDF4LLM, Unstructured)
- Basic processing fallbacks for text extraction
- Metadata-only fallbacks when all else fails
- System never crashes or fails completely
- Test Infrastructure: Leveraged existing
tests/vision_examples/with production-quality test assets- 5 high-quality images with comprehensive JSON metadata for validation
- Real-world testing with actual provider APIs and file processing
- Users: Can immediately attach any file type using
@filenamesyntax with excellent analysis results - Developers: Universal
media=[]parameter works identically across all providers - Production: Reliable media processing with comprehensive error handling and graceful degradation
- CLI: Simple file attachment workflow that works with all supported file formats
- Centralized Provider Registry System: Unified provider discovery and metadata management
- Single Source of Truth: Created
abstractcore/providers/registry.pywithProviderRegistryclass for centralized provider management - Package-wide Discovery Function:
get_all_providers_with_models()provides unified access to ALL providers with complete metadata - Complete Model Lists: Fixed truncation issue - now returns all models without "... and X more" truncation
- Rich Metadata: Installation instructions, features, authentication requirements, supported capabilities automatically available
- HTTP API Integration: Server
/providersendpoint now uses centralized registry (registry_version: "2.0") - Dynamic Discovery: Automatically discovers providers without hardcoding, eliminating manual synchronization
- Single Source of Truth: Created
- Factory System: Simplified
create_llm()from 70+ line if/elif chain to single registry call while maintaining full backward compatibility - Server Endpoints: Enhanced
/providersendpoint with comprehensive metadata including model counts, features, and installation instructions - Documentation: Added "Provider Discovery" section to both
llms.txtandllms-full.txtwith Python API and HTTP API examples - Error Messages: Improved error messages with dynamic provider lists from registry
- Manual Provider Synchronization: Eliminated need to manually update provider lists across factory.py, server/app.py, and documentation
- Model List Truncation: Fixed "... and X more" truncation - now returns complete model lists for all providers
- Provider Metadata Inconsistency: Centralized all provider information including features, authentication requirements, and installation extras
- Comprehensive Test Suite: Added 50 tests in
tests/provider_registry/covering core functionality, server integration, and factory integration - Lazy Loading: Provider classes loaded on-demand for better performance and memory usage
- Backward Compatibility: All existing code continues to work unchanged - no breaking changes
- Extensible Architecture: Easy to add new providers by registering them in the centralized registry
- Developers: Single function to discover all providers programmatically
- Server Users: Enhanced
/providersendpoint with rich metadata - Maintainers: No more manual provider list synchronization across multiple files
- Documentation: Always up-to-date provider information in docs
- Critical Package Distribution Fix: Fixed
ModuleNotFoundError: No module named 'abstractcore.exceptions'that occurred when installing from PyPI- Added missing
abstractcore.exceptionsandabstractcore.mediapackages to the setuptools configuration inpyproject.toml - This issue was introduced during the refactoring process when these modules were not included in the package distribution list
- Users can now successfully import
from abstractcore import create_llmafter installing from PyPI - Verified fix by building and testing the wheel package with the corrected configuration
- Added missing
- Complete Rebranding: Comprehensive rename from "AbstractLLM" to "AbstractCore" throughout the entire project
- Package Name: Internal package
abstractllm/→abstractcore/to align with published package name - Product Name: "AbstractLLM Core" → "AbstractCore" in all documentation and branding
- Import statements: All
from abstractcore import ...must becomefrom abstractcore import ... - Console scripts: Entry points changed from
abstractllm.apps.*toabstractcore.apps.* - Interface names:
AbstractLLMInterface→AbstractCoreInterface,AbstractLLMError→AbstractCoreError - Environment variables:
ABSTRACTLLM_*→ABSTRACTCORE_*(e.g.,ABSTRACTCORE_ONNX_VERBOSE) - Cache directories:
~/.abstractllm/→~/.abstractcore/ - Log files:
abstractllm_*.log→abstractcore_*.log - Module paths: All absolute imports updated throughout codebase
- Impact: This affects all users - complete migration required from AbstractLLM to AbstractCore branding
- Package Name: Internal package
To migrate from 2.3.x to 2.4.0, update all references to AbstractLLM:
1. Import Statements:
# Before (2.3.x)
from abstractcore import create_llm
from abstractllm.processing import BasicSummarizer
from abstractllm.embeddings import EmbeddingManager
# After (2.4.0+)
from abstractcore import create_llm
from abstractcore.processing import BasicSummarizer
from abstractcore.embeddings import EmbeddingManager2. Interface Names:
# Before (2.3.x)
from abstractllm.core.interface import AbstractLLMInterface
# After (2.4.0+)
from abstractcore.core.interface import AbstractCoreInterface3. Environment Variables:
# Before (2.3.x)
export ABSTRACTLLM_ONNX_VERBOSE=1
# After (2.4.0+)
export ABSTRACTCORE_ONNX_VERBOSE=14. Console Scripts:
Console scripts remain the same (both summarizer and abstractcore-summarizer work), but internal module paths have changed to abstractcore.apps.*.
- Directory Structure: Renamed main package directory from
abstractllm/toabstractcore/ - Configuration Updates: Updated
pyproject.tomlwith new package names, console scripts, and version paths - Build System: Cleaned and regenerated all build artifacts with correct package structure
- Documentation: Updated all code examples, CLI usage, and module references across documentation
- Examples: Updated all example files with new import statements
- Tests: Updated all test imports and references throughout test suite
- Timeout Handling: Comprehensive timeout parameter handling across all providers
- All providers now properly handle
timeout=None(infinity) as the default - HuggingFace Provider: Issues warning when non-None timeout is provided (local models don't support timeouts)
- MLX Provider: Issues warning when non-None timeout is provided (local models don't support timeouts)
- Local Providers: Accept timeout parameters appropriately
- API Providers (OpenAI, Anthropic, Ollama, LMStudio): Properly pass timeout to HTTP clients
- Added
_update_http_client_timeout()method for providers that need to update client timeouts
- All providers now properly handle
- Setting timeout default to None (infinity)
- Issue with the version
- Syntax Warning: Fixed invalid escape sequence
\(incommon_tools.pydocstring example - CLI Enhancement: Added optional focus parameter to
/compactcommand for targeted conversation summarization- Usage:
/compact [focus]where focus can be "technical details", "key decisions", etc. - Leverages existing
BasicSummarizerfocus functionality for more precise compaction - Maintains backward compatibility (no focus = default behavior)
- Usage:
- Vector Embeddings: SOTA open-source models with EmbeddingGemma as default, ONNX optimization, multi-provider support (HuggingFace, Ollama, LMStudio)
- Processing Applications: BasicSummarizer, BasicExtractor, BasicJudge with CLI tools and structured output
- GitHub Pages Website: Professional documentation site with responsive design and provider showcase
- Unified Streaming Architecture: Real-time tool call detection and execution across all providers
- Memory Management: Provider unload() methods for resource management in constrained environments
- Session Management: Complete serialization with analytics (summary, assessment, facts)
- CLI Enhancements: Interactive REPL with tool integration, session persistence, and comprehensive help system
- Critical Tool Compatibility: Tools + structured output now work together with sequential execution pattern
- Ollama Endpoint Selection: Fixed verbose responses by using correct
/api/chatendpoint - Streaming Tool Execution: Consistent formatting between streaming and non-streaming modes
- Architecture Detection: Corrected Qwen3-Next models and universal tool call parsing
- Session Serialization: Fixed parameter consistency and tool result integration
- Timeout Configuration: Unified timeout management across all components (default: 5 minutes)
- Package Dependencies: Made processing module core dependency, fixed installation extras
- Multi-Provider Embedding: Unified API across HuggingFace, Ollama, LMStudio with caching and optimization
- Tool Call Syntax Rewriting: Server-side format conversion for agentic CLI compatibility
- Documentation: Consolidated and professional tone, comprehensive tool calling guide
- Token Management: Helper methods and validation with provider-specific recommendations
- Test Coverage: 346+ tests with real models, comprehensive provider testing
- Event System: Real-time monitoring and observability with OpenTelemetry compatibility
- Circuit Breakers: Netflix Hystrix pattern with exponential backoff retry strategy
- FastAPI Server: OpenAI-compatible endpoints with comprehensive parameter support
- Model Discovery: Heuristic-based filtering and provider-specific routing
- Problem: AbstractCore's
toolsandresponse_modelparameters were mutually exclusive, preventing users from combining function calling with structured output validation - Root Cause:
StructuredOutputHandlerbypassed normal tool execution flow and tried to validate tool call JSON against Pydantic model - Solution: Implemented sequential execution pattern - tools execute first, then structured output uses results as context
- Impact: Enables sophisticated LLM applications requiring both function calling and structured output validation
- Usage:
llm.generate(tools=[func], response_model=Model, execute_tools=True)now works seamlessly - Limitation: Streaming not supported in hybrid mode (clear error message provided)
- Added:
generate()method to BaseProvider implementing AbstractCoreInterface - Fixed: Proper delegation from
generate()togenerate_with_telemetry()with full parameter passthrough - Impact: Ensures consistent API behavior across all provider implementations
- Added
_handle_tools_with_structured_output()method with sequential execution strategy - Modified
generate_with_telemetry()to detect and route hybrid requests appropriately - Enhanced prompt engineering to inject tool execution results into structured output context
- Maintained full backward compatibility for single-mode usage (tools-only or structured-only)
abstractcore/providers/base.py: Added hybrid handling logic and generate() method implementation- Sequential execution: Tool execution → Context enhancement → Structured output generation
- Clean error handling with descriptive messages for unsupported combinations
✅ Tools-only mode: Works correctly
✅ Structured output-only mode: Works correctly
✅ NEW: Hybrid mode (tools + structured output): Now works correctly
✅ Backward compatibility: All existing functionality preserved
✅ Error handling: Clear messages for unsupported streaming + hybrid combination
- Professional Website: Created comprehensive GitHub Pages website at
https://lpalbou.github.io/AbstractCore/ - Modern UI/UX: Responsive design with dark/light theme toggle, smooth animations, and mobile-first approach
- Interactive Features: Code block copy functionality, smooth scrolling navigation, and dynamic theme switching
- Provider Showcase: Visual display of all supported LLM providers (OpenAI, Anthropic, Ollama, MLX, LMStudio, HuggingFace)
- SEO Optimization: Complete sitemap.xml, robots.txt, and meta tags for search engine visibility
- LLM Integration: Added
llms.txtandllms-full.txtfiles for enhanced LLM compatibility and content discovery
- New Documentation: Created
docs/tool-calling.mdwith complete coverage of the tool calling system - Rich Decorator Examples: Documented the full capabilities of the
@tooldecorator including metadata injection - Architecture-Aware Formatting: Explained how tool definitions adapt to different model architectures (Qwen, LLaMA, Gemma)
- Tool Syntax Rewriting: Integrated comprehensive documentation of Tag Rewriter and Syntax Rewriter systems
- Real-World Examples: Showcased actual tools from
common_tools.pywith full metadata and system prompt integration
- Professional Tone: Removed pretentious language, excessive emojis, and marketing hype from all documentation
- Consolidated Content: Merged
tool-syntax-rewriting.mdinto comprehensivetool-calling.mddocumentation - Fixed Cross-References: Updated all internal links in README.md, docs/README.md, and getting-started.md
- Consistent Styling: Standardized documentation format and removed redundant content
- HTML Documentation: Created HTML versions of all documentation for the GitHub Pages website
- Static Site Generation: Pure HTML/CSS/JavaScript implementation for maximum performance and compatibility
- Asset Organization: Structured asset directory with optimized SVG logos and provider icons
- GitHub Pages Optimization: Added
.nojekyllfile and proper CNAME configuration for custom domains - Documentation Integration: Seamless integration between website and documentation with consistent navigation
index.html: Main landing page with hero section, features showcase, and provider displayassets/css/main.css: Comprehensive styling with CSS variables for theming and responsive designassets/js/main.js: Interactive functionality including theme switching and mobile navigationllms.txt: Concise LLM-friendly project overview with key documentation linksllms-full.txt: Complete documentation content aggregated for LLM consumptiondocs/tool-calling.html: HTML version of comprehensive tool calling documentationrobots.txtandsitemap.xml: SEO optimization files for search engine discovery
- Enhanced
docs/tool-calling.mdwith complete@tooldecorator capabilities and real-world examples - Updated README.md, docs/README.md, and docs/getting-started.md with professional tone and correct links
- Removed redundant
docs/tool-syntax-rewriting.mdafter content integration - Fixed all cross-references and internal navigation links
- Created clean
gh-pagesbranch with optimized website content - Implemented proper GitHub Pages configuration with SEO optimization
- Added comprehensive LLM compatibility files for enhanced discoverability
- Structured deployment ready for custom domain configuration
- Enhanced Developer Experience: Professional website provides clear project overview and easy navigation
- Improved Documentation Quality: Consolidated, professional documentation without redundancy or pretentious language
- Better LLM Integration: Structured
llms.txtfiles enable better LLM understanding and interaction with the project - Increased Discoverability: SEO-optimized website improves project visibility and accessibility
- Comprehensive Tool Documentation: Complete coverage of tool calling system with practical examples and architecture details
- Problem: ONNX Runtime displayed verbose CoreML execution provider warnings on macOS during embedding model initialization
- Root Cause: ONNX Runtime logs informational messages about CoreML partitioning and node assignment directly to stderr, bypassing Python's warning system
- Solution: Added ONNX Runtime log level configuration in
_suppress_onnx_warnings()to suppress harmless informational messages - Impact: Cleaner console output during embedding operations while preserving debugging capability via
ABSTRACTLLM_ONNX_VERBOSE=1environment variable - Technical: Set
onnxruntime.set_default_logger_severity(3)to suppress warnings that don't affect performance or quality
- Problem: Ollama provider was generating excessively verbose responses (1000+ characters for simple questions like "What is 2+2?")
- Root Cause: Provider incorrectly used
/api/generateendpoint for all requests, including tool-enabled conversations - Solution: Updated endpoint selection logic to use
/api/chatby default, following Ollama's API design recommendations - Impact: Reduced response length from 977+ characters to 15 characters for simple queries, eliminated "infinite text" generation issue
- Technical: Modified
_generate_internal()method to useuse_chat_format = tools is not None or messages is not None or Truefor proper endpoint routing
- Problem: Inconsistent parameter naming between
session.add_message()usingnameandsession.generate()usingusername - Root Cause: Parameter standardization was incomplete during metadata redesign
- Solution: Standardized both methods to use
nameparameter, aligning withsession_schema.jsonspecification - Impact: Consistent API across session methods, improved developer experience
- Problem: Tool execution results were missing from chat history during live CLI sessions but appeared after session reload
- Root Cause: Tool results were not being added to session message history during execution
- Solution: Modified
_execute_tool_calls()in CLI to explicitly addrole="tool"messages with execution metadata - Impact: Tool results now immediately available to assistant during conversation, consistent behavior between live and serialized sessions
- Problem:
list_filesandsearch_filestools failed with type errors whenhead_limitparameter was passed as string - Root Cause: LLM-generated tool calls sometimes provided numeric parameters as strings
- Solution: Added defensive type conversion with fallback to default values on
ValueError - Impact: Improved tool reliability and error handling
- Session Serialization: Complete session state preservation including provider, model, parameters, system prompt, tool registry, and conversation history
- Optional Analytics: Added
generate_summary(),generate_assessment(), andextract_facts()methods for session-level insights - Versioned Schema: Implemented
session-archive/v1format with JSON schema validation inabstractcore/assets/session_schema.json - CLI Integration: Added
/save <file> [--summary] [--assessment] [--facts]and/load <file>commands with optional analytics generation - Backward Compatibility: Graceful handling of legacy session formats during load operations
- Improved Help System: Comprehensive, aesthetically pleasing help text with detailed command documentation and usage examples
- Tool Integration: Added
search_filestool to CLI with full documentation and status reporting - Better Banner: Informative startup banner with quick commands and available tools overview
- Parameter Documentation: Clear documentation of
/savecommand options and usage patterns
- Extensible Metadata: Moved
namefield intometadatadictionary for better extensibility - Location Support: Added
locationproperty backed bymetadata['location']for geographical context - Property-Based Access: Clean API with
message.nameandmessage.locationproperties while maintaining metadata flexibility - Backward Compatibility: Automatic migration of legacy
namefield tometadata['name']during deserialization
abstractcore/providers/ollama_provider.py: Fixed endpoint selection logic to use/api/chatby defaultabstractcore/core/session.py: Enhanced serialization, standardized parameter naming, added analytics methodsabstractcore/core/types.py: Redesigned metadata system with property-based accessabstractcore/utils/cli.py: Improved help system, added tool integration, enhanced save/load commandsabstractcore/tools/common_tools.py: Added defensive programming for parameter type handlingabstractcore/assets/session_schema.json: Created comprehensive JSON schema for session validationdocs/session.md: New documentation explaining session management and serialization benefits
✅ Ollama responses now concise (15 chars vs 977+ chars previously)
✅ Session serialization preserves complete state including analytics
✅ Tool execution results properly integrated into live chat history
✅ Parameter consistency across all session methods
✅ Defensive tool parameter handling prevents type errors
✅ Backward compatibility maintained for existing session files
- Simplified server implementation in
abstractcore/server/app.py(reduced from ~4000 to ~1500 lines) - Removed complex model discovery in favor of direct provider queries
- Added comprehensive endpoint documentation with OpenAI-style descriptions
- Enhanced request/response models with detailed parameter descriptions and examples
EmbeddingManagernow supports three providers: HuggingFace, Ollama, and LMStudio- Unified embedding API across all providers with automatic format conversion
- Provider-specific caching for isolation and performance
- Backward compatible with existing HuggingFace-only code (default provider)
- Added
syntax_rewriter.pyfor server-side tool call format conversion - Supports multiple formats: OpenAI, Codex, Qwen3, LLaMA3, Gemma, XML
- Automatic format detection based on headers, user-agent, and model name
- Enables seamless integration with agentic CLIs (Codex, Crush, Gemini CLI)
- Added
/v1/models?type=text-embeddingendpoint for filtering embedding models - Heuristic-based model type detection (embedding vs text-generation)
- Embedding patterns: "embed", "all-minilm", "bert-", "-bert", "bge-", "gte-", etc.
- Provider-specific model filtering via query parameters
- Enhanced
/v1/embeddingsendpoint with multi-provider support - Added
typeparameter to/v1/modelsfor model type filtering (text-generation/text-embedding) - Improved
/v1/chat/completionswith comprehensive parameter documentation - Added
/{provider}/v1/chat/completionsfor provider-specific requests - Enhanced
/v1/responsesendpoint for agentic CLI compatibility - Updated
/providersendpoint with detailed provider information
- Added detailed field descriptions and examples to all Pydantic models
EmbeddingRequest: Comprehensive parameter explanations using OpenAI reference styleChatCompletionRequest: Enhanced with field-level documentation and examplesChatMessage: Detailed role and content descriptions with use cases- Default examples updated to use working models
- Automatic tool call format conversion for different agentic CLIs
- Support for custom tool call tags via
agent_formatparameter - Configurable tool execution (server-side vs client-side)
- Environment variable configuration for default formats
- Provider parameter added to
EmbeddingManager.__init__()(default: "huggingface") embed()andembed_batch()methods now delegate to provider-specific implementations- Ollama provider: Added
embed()method using/api/embeddingsendpoint - LMStudio provider: Added
embed()method using/v1/embeddingsendpoint - Cache naming includes provider for proper isolation
- Enhanced provider base classes with improved error handling
- Better streaming support across all providers
- Consistent timeout handling and retry logic
- Improved tool call detection and parsing
- Added
UnsupportedProviderErrorfor better error messages - Enhanced exception types for embedding-specific errors
- Improved error context and debugging information
- Merged
common-mistakes.mdintotroubleshooting.mdwith cross-references - Merged
server-api-reference.mdinto simplifiedserver.md(1006 → 479 lines) - Created comprehensive
docs/README.mdas navigation hub - Removed redundant documentation files (8 files consolidated)
- Created
tool-syntax-rewriting.mdcovering both tag and syntax rewriters - Enhanced
embeddings.mdwith multi-provider support and examples - Updated
architecture.mdwith server architecture and present-tense language - Improved
getting-started.mdwith comprehensive tool documentation
- Moved
basic-*.mdfiles todocs/apps/subdirectory - Created
docs/archive/for superseded documentation - Added
docs/archive/README.mdexplaining archived content - Updated all cross-references across documentation
- Removed historical/refactoring language ("replaced", "improved", "before/after")
- Converted all documentation to present tense
- Focused on current capabilities and actionable content
- Simplified language for clarity and accessibility
- Added clearer distinction between core library and optional server
- Enhanced documentation section with better organization
- Added "Architecture & Advanced" section
- Improved Quick Links with comprehensive navigation
- Removed unused
simple_model_discovery.pymodule - Cleaned up temporary debug files and scripts
- Removed integration.py tool module (functionality moved to providers)
- Better separation of concerns between core and server
- Added comprehensive tests for embedding providers
- Enhanced server endpoint testing
- Improved tool call syntax rewriting tests
- Better test coverage for multi-provider scenarios
None. All changes are backward compatible with version 2.2.x.
If you were using embeddings, no changes needed. The default behavior remains HuggingFace.
To use other providers:
from abstractcore.embeddings import EmbeddingManager
# HuggingFace (default, unchanged)
embedder = EmbeddingManager(model="sentence-transformers/all-MiniLM-L6-v2")
# Ollama (new)
embedder = EmbeddingManager(model="granite-embedding:278m", provider="ollama")
# LMStudio (new)
embedder = EmbeddingManager(model="text-embedding-all-minilm-l6-v2-embedding", provider="lmstudio")Server API endpoints remain compatible. New features:
- Use
?type=text-embeddingto filter embedding models - Use
agent_formatparameter for custom tool call formats - Environment variables for default configuration
- Use
docs/server.mdinstead ofserver-api-reference.md - Use
docs/troubleshooting.mdfor all troubleshooting (includes common mistakes) - Use
docs/README.mdas navigation hub - Reference
prerequisites.mdinstead of deletedproviders.md
- ONNX Optimization and Warning Management: Improved embedding performance and user experience
- Smart ONNX Model Selection: EmbeddingManager now automatically selects optimized
model_O3.onnxfor better performance - Warning Suppression: Eliminated harmless warnings from PyTorch 2.8+ and sentence-transformers during model loading
- Graceful Fallbacks: Multiple fallback layers ensure reliability (optimized ONNX → basic ONNX → PyTorch)
- Performance Improvement: ONNX optimization provides significant speedup for batch embedding operations
- Clean Implementation: Conservative approach with minimal code changes (40 lines) for maintainability
- Smart ONNX Model Selection: EmbeddingManager now automatically selects optimized
- Added
_suppress_onnx_warnings()context manager to handle known harmless warnings - Added
_get_optimal_onnx_model()function for intelligent ONNX variant selection - Enhanced
_load_model()with multi-layer fallback strategy and clear logging - Zero breaking changes - all improvements are additive with sensible defaults
- Installation Package [all] Extra: Fixed
pip install abstractcore[all]to truly install ALL modules- Issue: The
[all]extra was missing development dependencies (dev, test, docs) - Solution: Updated
[all]extra to include complete dependency set (12 total extras) - Coverage: Now includes all providers, features, and development tools
- All Providers (6): openai, anthropic, ollama, lmstudio, huggingface, mlx
- All Features (3): embeddings, processing, server
- All Development (3): dev, test, docs
- Impact: Users can now confidently use
abstractcore[all]for complete installation without missing dependencies
- Issue: The
- Comprehensive Installation:
pip install abstractcore[all]now installs 12 dependency groups - Development Ready: Includes all testing frameworks (pytest-cov, responses), code tools (black, mypy, ruff), and documentation tools (mkdocs)
- Verified Configuration: All referenced extras exist and are properly defined with no circular dependencies
- LLM-as-a-Judge: Production-ready objective evaluation with structured assessments
- BasicJudge class for critical assessment with constructive skepticism
- Multiple file support with sequential processing to avoid context overflow
- Global assessment synthesis for multi-file evaluations (appears first, followed by individual file results)
- Enhanced assessment structure with judge summary, source reference, and optional criteria details
- 9 evaluation criteria: clarity, simplicity, actionability, soundness, innovation, effectiveness, relevance, completeness, coherence
- CLI with simple command:
judge file1.py file2.py --context="code review"(console script entry point) - Flexible output formats: JSON, plain text, YAML with structured scoring (1-5 scale)
- Optional global assessment control:
--exclude-globalflag for original list behavior
- Built-in Applications: BasicJudge added to production-ready application suite
- Structured output integration with Pydantic validation and FeedbackRetry for validation error recovery
- Chain-of-thought reasoning for transparent evaluation with low temperature (0.1) for consistency
- Custom criteria support and reference-based evaluation for specialized assessment needs
- Comprehensive error handling with graceful fallbacks and detailed diagnostics
- Complete BasicJudge documentation: Enhanced
docs/basic-judge.mdwith API reference, examples, and best practices- Real-world examples: Code review, documentation assessment, academic writing evaluation, multiple file scenarios
- CLI parameter documentation with practical usage patterns and advanced options
- Global assessment examples showing synthesis of multiple file evaluations
- Updated README.md: Added BasicJudge to built-in applications with 30-second examples
- Internal CLI integration: Added
/judgecommand for conversation quality evaluation with detailed feedback
- Context overflow prevention: Optimized global assessment prompts to work within model context limits
- Production-grade architecture: Proper Pydantic integration, sequential file processing, backward compatibility
- Console script integration: Simple
judgecommand available after package installation (matchesextractor,summarizer) - Full backward compatibility: All existing functionality preserved, optional features clearly marked
-
Timeout Configuration: Unified timeout management across all components
- Updated default HTTP timeout from 180s to 300s (5 minutes) for better reliability with large models
- All providers now consistently inherit timeout from base configuration
- Server endpoints updated to use unified 5-minute default
- Improved handling of large language models (36B+ parameters) that require longer processing time
-
Extractor CLI Improvements: Enhanced command-line interface for knowledge graph extraction
- Added
--timeoutparameter with proper validation (30s minimum, 2 hours maximum) - Users can now configure timeout for large documents and models:
--timeout 3600for 60 minutes - Improved error messages for timeout validation
- Better support for processing large documents with resource-intensive models
- Added
-
BasicExtractor JSON-LD Consistency: Resolved structural inconsistencies in knowledge graph output
- Fixed JSON-LD reference normalization where some providers generated string references instead of proper object format
- Corrected refinement prompt to match initial extraction format exactly (
@type: "s:Relationship"vs@type: "r:provides") - Added missing
s:nameandstrengthfields in relationship refinement - All providers now generate consistent, properly structured JSON-LD output
-
Cross-Provider Compatibility: Improved extraction reliability across different LLM providers
- LMStudio models now generate proper JSON-LD object references through automatic normalization
- Reduced warning noise by converting normalization messages to debug level
- Enhanced iterative refinement to follow exact same structure rules as initial extraction
- Centralized Timeout Management: All timeout configuration now emanates from
base.py- Providers inherit timeout via
self._timeoutfrom BaseProvider class - Factory system properly propagates timeout parameters through
**kwargs - No hardcoded timeout values remain in provider implementations
- Consistent 300-second default across HTTP clients, tool execution, and embeddings
- Providers inherit timeout via
-
Updated Model References: Modernized documentation to use current recommended models
- Updated
docs/getting-started.mdto useqwen3:4b-instruct-2507-q4_K_M(default) andqwen3-coder:30b(premium) - Replaced outdated
qwen2.5-coder:7breferences throughout getting started guide - Added proper cross-references to reorganized documentation (
server.md,acore-cli.md) - Enhanced "What's Next?" section with links to universal API server and CLI documentation
- Updated
-
Cross-Reference Validation: Verified all documentation links and anchors
- Confirmed
docs/prerequisites.mdsection anchors match README.md references - Validated provider setup links point to correct sections (#openai-setup, #anthropic-setup, etc.)
- Ensured consistent documentation structure across all guides
- Confirmed
Previous version history is available in the git commit log.