Skip to content

afarinizadi/adverse-media-screening

Repository files navigation

Adverse Media Screening System

Python 3.13 License: MIT Code Style: Black

An advanced AI-powered CLI tool for screening individuals against adverse media with minimal false positives and zero false negatives.

🎯 Problem Statement

Financial institutions must screen applicants against negative news ("adverse media") to comply with regulations and assess risk. Existing tools generate too many false positives, requiring expensive manual review. Analysts need an intelligent system that can:

  1. Accurately identify if an article is about the specific person (given name + date of birth)
  2. Determine sentiment whether the article portrays them negatively, positively, or neutrally
  3. Minimize false negatives (missed adverse media is unacceptable for compliance)
  4. Reduce false positives (unnecessary manual reviews are costly)
  5. Handle multiple languages including non-Latin scripts
  6. Provide explainable results with evidence and confidence scores

Cost of Errors

  • False Negatives: 🚨 UNACCEPTABLE - Regulatory violations, reputation damage, financial losses
  • False Positives: πŸ’° COSTLY - Manual review overhead, delayed decisions, operational inefficiency

πŸ—οΈ Solution Overview

Our system employs a sophisticated ensemble approach combining rule-based algorithms, advanced NLP, and selective AI enhancement:

Core Innovation

  • Hybrid Architecture: Rule-based precision + AI-powered disambiguation
  • Smart API Usage: ≀3 GPT-5 calls per article through intelligent preprocessing
  • Multilingual Support: Handle 100+ languages with context-aware translation
  • Evidence-Based Decisions: Full explainability with quoted evidence
  • Zero False Negative Design: Conservative thresholds prioritize recall over precision

Key Capabilities

βœ… Advanced Person Matching: Multilingual NER + fuzzy matching + phonetics + nicknames
βœ… Context-Aware Analysis: DOB/age verification + occupation/location cues
βœ… Intelligent Polarity Detection: Lexicon-based + GPT-5 disambiguation
βœ… Confidence Calibration: Multi-factor confidence scoring with uncertainty quantification
βœ… Comprehensive Evaluation: Enhanced testing framework with systematic improvement

πŸ› οΈ High-Level Architecture

graph TB
    subgraph "Input Layer"
        CLI[CLI Interface<br/>adverse-media-screen]
        API_Input[Person + Article URL]
    end
    
    subgraph "Orchestration Layer"
        MainService[AdverseMediaScreeningService<br/>Workflow Orchestration]
    end
    
    subgraph "Data Processing Layer"
        ArticleFetcher[Article Fetcher<br/>HTTP + Content Extraction]
        Processor[Article Processor<br/>Language Detection + Cleaning]
        Translator[AI Translator<br/>Non-English β†’ English]
    end
    
    subgraph "Core AI Engine"
        EnsembleEngine[Ensemble Decision Engine<br/>Rule-based + AI Disambiguation]
        
        subgraph "Feature Extraction"
            FeatureExtractor[Advanced Feature Extractor<br/>NER + Fuzzy + Phonetic]
            NameMatcher[Name Matching<br/>Multilingual + Nicknames]
            AgeMatcher[Age/DOB Extraction<br/>Context-Aware]
            ContextMatcher[Occupation/Location<br/>Cue Detection]
        end
        
        subgraph "Decision Making"
            RuleEngine[Rule-Based Ensemble<br/>Weighted Feature Scoring]
            AIDisambiguator[AI Disambiguator<br/>GPT-5 for Edge Cases]
        end
        
        subgraph "Polarity Analysis"
            PolarityAnalyzer[Advanced Polarity Analyzer<br/>Lexicon + AI]
            LexiconCheck[Adverse Terms Lexicon<br/>Domain-Specific Keywords]
            SentimentAI[AI Sentiment Analysis<br/>GPT-5 for Complex Cases]
        end
    end
    
    subgraph "Support Services"
        AIClient[AI Client<br/>GPT-5 Integration]
        ConfigManager[Configuration Manager<br/>Environment-Based Settings]
        ErrorHandler[Error Handler<br/>Comprehensive Exception Hierarchy]
    end
    
    subgraph "Output Layer"
        DecisionModel[Decision Model<br/>Match + Polarity + Evidence]
        JSONOutput[JSON Output<br/>Structured Results]
        Evidence[Evidence Extraction<br/>Quoted Text + Confidence]
    end
    
    subgraph "Evaluation System"
        EnhancedEval[Enhanced Evaluation<br/>Systematic Testing]
        AutoPipeline[Automated Pipeline<br/>Continuous Monitoring]
        ErrorAnalysis[Error Analysis<br/>Pattern Detection]
    end
    
    %% Flow connections
    CLI --> API_Input
    API_Input --> MainService
    MainService --> ArticleFetcher
    ArticleFetcher --> Processor
    Processor --> Translator
    Translator --> EnsembleEngine
    
    EnsembleEngine --> FeatureExtractor
    FeatureExtractor --> NameMatcher
    FeatureExtractor --> AgeMatcher
    FeatureExtractor --> ContextMatcher
    
    EnsembleEngine --> RuleEngine
    RuleEngine --> AIDisambiguator
    EnsembleEngine --> PolarityAnalyzer
    PolarityAnalyzer --> LexiconCheck
    PolarityAnalyzer --> SentimentAI
    
    AIDisambiguator --> AIClient
    SentimentAI --> AIClient
    Translator --> AIClient
    
    EnsembleEngine --> DecisionModel
    DecisionModel --> JSONOutput
    DecisionModel --> Evidence
    
    %% Support service connections
    MainService -.-> ConfigManager
    MainService -.-> ErrorHandler
    EnsembleEngine -.-> ConfigManager
    
    %% Evaluation connections
    MainService -.-> EnhancedEval
    EnhancedEval --> AutoPipeline
    EnhancedEval --> ErrorAnalysis
    
    %% Styling
    classDef input fill:#e1f5fe
    classDef processing fill:#f3e5f5
    classDef ai fill:#fff3e0
    classDef output fill:#e8f5e8
    classDef evaluation fill:#fce4ec
    
    class CLI,API_Input input
    class ArticleFetcher,Processor,Translator processing
    class EnsembleEngine,FeatureExtractor,PolarityAnalyzer,AIClient ai
    class DecisionModel,JSONOutput,Evidence output
    class EnhancedEval,AutoPipeline,ErrorAnalysis evaluation
Loading

🧠 Core Algorithms

1. Advanced Feature Extraction Algorithm

The AdvancedFeatureExtractor employs a multi-stage ensemble approach:

class AdvancedFeatureExtractor:
    """
    Ensemble feature extraction combining:
    - Multilingual NER (spaCy)
    - Fuzzy string matching (RapidFuzz)
    - Phonetic matching (Soundex, Metaphone)
    - Nickname detection
    - Context analysis (Β±100 characters)
    """
    
    def extract_all_features(self, text: str, person: Person) -> AdvancedExtractedFeatures:
        # 1. NER-based entity extraction
        name_entities = self._extract_person_entities(text)
        
        # 2. Multi-algorithm name matching
        name_matches = self._find_name_matches(text, person.name, name_entities)
        
        # 3. Age/DOB reference detection
        age_references = self._extract_age_references(text, person.dob)
        
        # 4. Occupation/location cue detection
        occupation_refs = self._extract_occupation_references(text, person)
        location_refs = self._extract_location_references(text, person)
        
        # 5. Ensemble scoring with confidence calibration
        return self._calculate_ensemble_features(...)

Matching Strategies:

  • Exact Match: Direct string comparison (confidence: 1.0)
  • Fuzzy Match: Levenshtein distance (confidence: 0.6-0.95)
  • Phonetic Match: Soundex/Metaphone (confidence: 0.4-0.8)
  • Nickname Match: Built-in nickname database (confidence: 0.7-0.9)
  • NER Match: spaCy entity recognition (confidence: 0.5-0.85)

2. Ensemble Decision Algorithm

The EnsembleDecisionEngine implements sophisticated decision logic:

class EnsembleDecisionEngine:
    """
    Multi-stage decision process:
    1. Rule-based ensemble scoring
    2. AI disambiguation for edge cases
    3. Polarity analysis with lexicon + AI
    4. Confidence calibration and evidence aggregation
    """
    
    def make_decision(self, person: Person, article: Article) -> Decision:
        # Stage 1: Extract features
        features = self.feature_extractor.extract_all_features(article.text, person)
        
        # Stage 2: Rule-based matching with thresholds
        match_score = self._calculate_match_score(features)
        
        if match_score >= 0.7:          # Strong match
            match_result = MatchResult.YES
        elif match_score <= 0.2:        # Weak match
            match_result = MatchResult.NO
        else:                           # Ambiguous - use AI
            match_result = self._ai_disambiguate(person, article, features)
        
        # Stage 3: Polarity analysis
        polarity_result = self.polarity_analyzer.analyze_polarity(
            article.text, features, match_result
        )
        
        # Stage 4: Evidence aggregation and confidence scoring
        return self._build_final_decision(...)

Decision Thresholds:

  • Strong Match (β‰₯0.7): High confidence "YES"
  • Weak Match (≀0.2): High confidence "NO"
  • Ambiguous (0.2-0.7): AI disambiguation required
  • Edge Cases: Conservative bias toward "UNSURE" vs false negatives

3. Advanced Polarity Analysis Algorithm

The AdvancedPolarityAnalyzer combines lexicon-based and AI-powered analysis:

class AdvancedPolarityAnalyzer:
    """
    Hybrid polarity detection:
    1. Domain-specific adverse terms lexicon
    2. Context-aware sentiment analysis
    3. GPT-5 disambiguation for complex cases
    """
    
    def analyze_polarity(self, text: str, features: AdvancedExtractedFeatures) -> PolarityAnalysisResult:
        # Stage 1: Lexicon-based adverse term detection
        adverse_terms = self._find_adverse_terms(text)
        positive_terms = self._find_positive_terms(text)
        
        # Stage 2: Context analysis around person mentions
        context_sentiment = self._analyze_context_sentiment(text, features.name_matches)
        
        # Stage 3: Rule-based polarity decision
        lexicon_polarity = self._calculate_lexicon_polarity(adverse_terms, positive_terms)
        
        # Stage 4: AI disambiguation if unclear
        if self._is_polarity_ambiguous(lexicon_polarity, context_sentiment):
            ai_polarity = self._ai_polarity_analysis(text, features)
            return self._reconcile_polarity_analyses(lexicon_polarity, ai_polarity)
        
        return lexicon_polarity

Adverse Terms Lexicon (Examples):

  • Legal: lawsuit, indictment, convicted, sentenced, fraud, embezzlement
  • Financial: bankruptcy, default, sanctions, money laundering, tax evasion
  • Regulatory: violation, penalty, fine, suspended, revoked, investigation
  • Reputational: scandal, controversy, misconduct, corruption, bribery

πŸ›οΈ Core Classes and Models

Domain Models (models/)

# Core business entities
class Person(BaseModel):
    """Person entity with validation and aliases support"""
    name: str
    dob: date
    aliases: List[str] = []
    occupation: Optional[str] = None
    location: Optional[str] = None

class Article(BaseModel):
    """Article entity with metadata and processing state"""
    url: str
    title: Optional[str] = None
    text: Optional[str] = None
    language: Language = Language.UNKNOWN
    publication_date: Optional[datetime] = None
    word_count: int = 0

class Decision(BaseModel):
    """Final screening decision with evidence and confidence"""
    match: MatchResult          # YES, NO, UNSURE
    polarity: Polarity         # POSITIVE, NEGATIVE, NEUTRAL, UNCLEAR
    confidence: float          # 0.0-1.0
    evidence: List[Evidence]   # Supporting evidence with quotes
    reasoning: str             # Human-readable explanation
    api_calls_used: int        # Cost tracking
    processing_time_ms: int    # Performance tracking

class Evidence(BaseModel):
    """Supporting evidence with source tracking"""
    text: str                  # Quoted evidence text
    confidence: float          # Evidence reliability
    evidence_type: str         # name_match, age_reference, adverse_term
    source_span: Tuple[int, int]  # Character positions in original text

Service Classes (services/)

class AdverseMediaScreeningService:
    """Main orchestration service - coordinates the entire workflow"""
    
class EnsembleDecisionEngine:
    """Core decision engine - implements ensemble algorithms"""
    
class AdvancedFeatureExtractor:
    """Feature extraction - NER + fuzzy matching + phonetics"""
    
class AdvancedPolarityAnalyzer:
    """Polarity analysis - lexicon + AI sentiment analysis"""
    
class ArticleFetcher:
    """Article retrieval - HTTP fetching + content extraction"""
    
class AIClient:
    """GPT-5 integration - translation + disambiguation + sentiment"""

Configuration System (config/)

class ProcessingConfig(BaseModel):
    """Core processing parameters"""
    max_api_calls_per_article: int = 3
    default_confidence_threshold: float = 0.7
    enable_fuzzy_matching: bool = True
    fuzzy_matching_threshold: float = 0.8

class OpenAIConfig(BaseModel):
    """AI service configuration"""
    api_key: SecretStr
    model: str = "gpt-4"
    max_tokens: int = 1000
    temperature: float = 0.1

πŸš€ Getting Started

Prerequisites

Before setting up the Adverse Media Screening System, ensure you have:

  • Python 3.13+ installed (Download Python)
  • Git for version control
  • OpenAI API Key (required for AI-powered analysis)
  • Internet connection for article fetching and API calls

Local Installation

1. Clone and Navigate to Repository

# Clone the repository
git clone https://github.com/your-org/adverse-media-screening.git
cd adverse-media-screening

2. Set Up Python Virtual Environment

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate

# Verify activation (should show venv path)
which python3

3. Install Dependencies

# Install the package in development mode
pip install -e .

# Install additional dependencies for evaluation
pip install schedule  # For automated evaluation pipeline

# Verify installation
adverse-media-screen --help

4. Environment Configuration

Create your environment configuration file:

# Copy the example environment file
cp .env.example .env

# Edit the .env file with your settings
nano .env  # or vim .env, or your preferred editor

Required environment variables in .env:

# OpenAI Configuration (REQUIRED)
OPENAI_API_KEY=your-openai-api-key-here

# Processing Configuration (Optional - defaults shown)
MAX_API_CALLS_PER_ARTICLE=3
DEFAULT_CONFIDENCE_THRESHOLD=0.7
ENABLE_FUZZY_MATCHING=true
FUZZY_MATCHING_THRESHOLD=0.8

# Logging Configuration (Optional)
LOG_LEVEL=INFO
LOG_FORMAT=json

# Security Configuration (Optional)
RATE_LIMIT_REQUESTS_PER_MINUTE=60
REQUEST_TIMEOUT_SECONDS=30

5. Verify Installation

# Test basic functionality
adverse-media-screen --version

# Test with a sample screening (requires valid OpenAI API key)
adverse-media-screen \
  --name "Test Person" \
  --dob "1990-01-01" \
  --url "https://example.com" \
  --verbose

CLI Tool Usage Guide

The adverse-media-screen command provides a powerful interface for screening individuals against adverse media.

Basic Command Structure

adverse-media-screen [OPTIONS] COMMAND

Available Commands

1. screen - Main Screening Command

Screen a person against an article for adverse media content.

Required Parameters:

  • --name - Person's full name (quoted if contains spaces)
  • --dob - Date of birth in YYYY-MM-DD format
  • --url - Article URL to analyze

Optional Parameters:

  • --output, -o - Save results to JSON file
  • --verbose, -v - Enable detailed processing output
2. version - Version Information

Display the current version of the tool.

adverse-media-screen version

Usage Examples

Basic Screening
# Simple screening with minimal output
adverse-media-screen screen \
  --name "John Smith" \
  --dob "1985-03-15" \
  --url "https://news.example.com/article/12345"
Verbose Analysis
# Detailed analysis with processing information
adverse-media-screen screen \
  --name "Jane Doe" \
  --dob "1978-11-22" \
  --url "https://reuters.com/business/finance/article.html" \
  --verbose
Save Results to File
# Screen and save results to JSON file
adverse-media-screen screen \
  --name "Robert Johnson" \
  --dob "1965-07-08" \
  --url "https://bbc.com/news/business-12345678" \
  --output screening-results.json
Complex Names and Special Characters
# Handle names with special characters or multiple parts
adverse-media-screen screen \
  --name "MarΓ­a JosΓ© GarcΓ­a-LΓ³pez" \
  --dob "1992-12-03" \
  --url "https://elpais.com/economia/articulo" \
  --verbose
Batch Processing (using shell scripting)
# Process multiple people against the same article
while IFS=, read -r name dob; do
  echo "Processing: $name (DOB: $dob)"
  adverse-media-screen screen \
    --name "$name" \
    --dob "$dob" \
    --url "https://example.com/article" \
    --output "results_$(echo $name | tr ' ' '_').json"
done < people_list.csv

Understanding the Output

The tool outputs structured JSON with the following key sections:

{
  "decision": {
    "match": "yes|no|unsure",           // Person identification result
    "polarity": "negative|positive|neutral|unclear", // Sentiment analysis
    "confidence": 0.87,                 // Overall confidence score (0-1)
    "evidence": [...],                  // Supporting evidence array
    "reasoning": "Human-readable explanation",
    "api_calls_used": 1,               // Cost tracking
    "processing_time_ms": 1247         // Performance tracking
  },
  "person": {
    "name": "Input name",
    "dob": "Input date of birth"
  },
  "article": {
    "url": "Article URL",
    "title": "Extracted title",
    "language": "Detected language",
    "word_count": 542,
    "publication_date": "2024-01-15T10:30:00Z"
  }
}
Evidence Types

The evidence array contains objects with:

  • text: Quoted text from the article
  • confidence: Reliability of this evidence (0-1)
  • evidence_type: Type of evidence found
    • name_match: Direct name mentions
    • age_reference: Age or DOB references
    • adverse_term: Negative sentiment indicators
    • occupation_match: Professional context
    • location_match: Geographic context
  • source_span: Character positions in original text

Exit Codes

  • 0 - Success
  • 1 - Error (configuration, API, processing, or validation)

Development Setup

If you're contributing to the project or need to modify the code:

1. Install Development Dependencies

# Install development tools
pip install -e .[dev]

# Or manually install dev dependencies
pip install pytest pytest-cov pytest-mock pytest-asyncio black isort mypy pre-commit

2. Set Up Pre-commit Hooks

# Install pre-commit hooks for code quality
pre-commit install

# Run hooks manually
pre-commit run --all-files

3. Run Tests

# Run all tests
pytest

# Run with coverage report
pytest --cov=src/adverse_media_agent --cov-report=html

# Run specific test files
pytest tests/test_cli.py -v

4. Code Formatting and Linting

# Format code with Black
black src/ tests/

# Sort imports with isort
isort src/ tests/

# Type checking with mypy
mypy src/adverse_media_agent

Using Docker (Alternative Setup)

For containerized deployment:

# Build the Docker image
docker build -t adverse-media-screening .

# Run with environment variables
docker run -e OPENAI_API_KEY=your-key \
  adverse-media-screening \
  adverse-media-screen screen \
  --name "John Doe" \
  --dob "1980-01-01" \
  --url "https://example.com/article"

Troubleshooting

Common Issues and Solutions

1. ModuleNotFoundError: No module named 'schedule'
# Install missing dependency
pip install schedule
2. OpenAI API Key Not Found
# Verify .env file exists and contains valid API key
cat .env | grep OPENAI_API_KEY

# Test API key validity
python3 -c "
import openai
import os
from dotenv import load_dotenv
load_dotenv()
client = openai.OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
print('API key is valid')
"
3. Permission Denied or Command Not Found
# Ensure virtual environment is activated
source venv/bin/activate

# Reinstall in development mode
pip install -e .

# Check if command is available
which adverse-media-screen
4. SSL Certificate Errors
# Update certificates (macOS)
/Applications/Python\ 3.13/Install\ Certificates.command

# Or set environment variable to bypass (not recommended for production)
export SSL_VERIFY=false
5. Memory Issues with Large Articles
# Increase system limits or process articles in smaller chunks
# Check article size before processing
curl -I https://example.com/large-article

Getting Help

  1. Check logs: Use --verbose flag for detailed output
  2. Validate input: Ensure date format is YYYY-MM-DD
  3. Test connectivity: Verify internet access and article URL
  4. API limits: Check OpenAI API usage and rate limits
  5. Issue tracker: Report bugs on GitHub issues page

Performance Tips

  • API Efficiency: The system is designed to use ≀3 API calls per article
  • Caching: Results are not cached by default; implement caching for repeated analyses
  • Batch Processing: For multiple articles, process sequentially to respect API rate limits
  • Article Size: Very large articles (>50KB) may require additional processing time

πŸ“Š Evaluation and Testing

Comprehensive Evaluation Framework

The system includes multiple evaluation tools for thorough testing and improvement:

1. Comprehensive Evaluation Script (Recommended)

Test any CSV dataset with detailed statistical analysis and improvement recommendations:

# Run comprehensive evaluation with your data
python evaluation/scripts/comprehensive_evaluation.py evaluation/datasets/your_ground_truth.csv --verbose

# Use sample data for testing
python evaluation/scripts/comprehensive_evaluation.py evaluation/datasets/sample_ground_truth.csv

# Run interactive demo
python evaluation/scripts/run_evaluation_demo.py

Features:

  • βœ… Flexible CSV input format
  • βœ… Complete statistical analysis (confusion matrix, precision, recall, F1, specificity)
  • βœ… Data-driven improvement recommendations
  • βœ… Multiple output formats (JSON, text report, CSV comparison)
  • βœ… Error categorization and pattern analysis

2. Enhanced Evaluation System

Advanced evaluation with systematic error analysis:

# The enhanced evaluation functionality is now integrated into the comprehensive evaluation script
# Use the comprehensive evaluation for all testing needs
python evaluation/scripts/comprehensive_evaluation.py evaluation/datasets/enhanced_test_dataset.csv --verbose

3. Automated Monitoring Pipeline

Continuous evaluation with regression detection:

# Set up automated monitoring
python evaluation/scripts/automated_evaluation_pipeline.py evaluation/datasets/enhanced_test_dataset.csv --daemon

# Generate trend analysis
python evaluation/scripts/automated_evaluation_pipeline.py evaluation/datasets/enhanced_test_dataset.csv --trend-report

CSV Format for Ground Truth Data

To use the comprehensive evaluation script, prepare a CSV file with these columns:

Required Columns

name,dob,url,expected_match,expected_polarity,expected_confidence_min,language,description,notes
"John Doe","1980-01-01","https://example.com/article","yes","negative",0.8,"en","Fraud conviction","Clear adverse case"
"Jane Smith","1990-05-15","https://example.com/article2","no","neutral",0.0,"en","Different person","Clear non-match"

Optional Columns (for enhanced analysis)

  • category: Test case type (true_positive, true_negative, edge_case, name_variation)
  • difficulty: Case complexity (easy, medium, hard)
  • source: Data origin (manual, reuters, ap_news)

Key Metrics

Metric Target Description
False Negative Rate <5% 🚨 Critical - missed adverse media
Accuracy >85% Overall correctness
Precision >85% Avoid false positives
Recall >90% Catch all true matches
Specificity >85% Correctly reject non-matches
API Efficiency >0.5 Decisions per API call

πŸ“ˆ Performance Characteristics

Metric Typical Value Notes
Processing Time 1-3 seconds Per article analysis
API Calls 0-3 per article Smart optimization
Memory Usage <100MB Efficient text processing
Throughput 20-60 articles/minute Depends on complexity
Languages 100+ supported Via AI translation

πŸ”§ Development

Project Structure

src/adverse_media_agent/            # Main application code
β”œβ”€β”€ cli.py                           # CLI interface
β”œβ”€β”€ models/                          # Domain models
β”‚   β”œβ”€β”€ core.py                     # Person, Article, Evidence
β”‚   β”œβ”€β”€ decision.py                 # Decision, ScreeningResult
β”‚   β”œβ”€β”€ enums.py                    # MatchResult, Polarity, Language
β”‚   └── stats.py                    # Performance tracking
β”œβ”€β”€ services/                        # Business logic
β”‚   β”œβ”€β”€ main_service.py             # Workflow orchestration
β”‚   β”œβ”€β”€ ensemble_decision_engine.py # Core decision logic
β”‚   β”œβ”€β”€ advanced_feature_extractor.py # NER + fuzzy matching
β”‚   β”œβ”€β”€ advanced_polarity_analyzer.py # Sentiment analysis
β”‚   β”œβ”€β”€ article_fetcher.py          # HTTP + content extraction
β”‚   └── ai_client.py                # GPT-5 integration
β”œβ”€β”€ config/                          # Configuration system
β”‚   β”œβ”€β”€ base.py                     # Base configuration
β”‚   β”œβ”€β”€ main_config.py              # Main config aggregation
β”‚   β”œβ”€β”€ openai_config.py            # AI service settings
β”‚   └── processing_config.py        # Processing parameters
β”œβ”€β”€ processor.py                     # Text processing utilities
└── exceptions.py                    # Error handling

evaluation/                          # Evaluation system
β”œβ”€β”€ scripts/                         # Evaluation tools and scripts
β”‚   β”œβ”€β”€ comprehensive_evaluation.py # Main evaluation tool
β”‚   β”œβ”€β”€ automated_evaluation_pipeline.py # Continuous monitoring
β”‚   └── run_evaluation_demo.py      # Interactive demo
β”œβ”€β”€ datasets/                        # Test datasets
β”‚   β”œβ”€β”€ sample_ground_truth.csv     # Balanced sample data
β”‚   └── enhanced_test_dataset.csv   # Extended test cases
β”œβ”€β”€ results/                         # Evaluation results
└── documentation/                   # Evaluation documentation

tests/                              # Unit tests
β”œβ”€β”€ test_cli.py                     # CLI testing
β”œβ”€β”€ test_models.py                  # Model testing
β”œβ”€β”€ test_fetcher.py                 # Fetcher testing
└── test_processor.py               # Processor testing

coverage/                           # Test coverage reports
└── html/                          # HTML coverage reports

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src/adverse_media_agent --cov-report=html

# Run specific test category
pytest tests/test_models.py -v

πŸ“š Documentation

Complete Documentation Structure

This project includes comprehensive documentation organized by purpose:

docs/                              # πŸ“š All project documentation
β”œβ”€β”€ README.md                      # Documentation overview and guide
β”œβ”€β”€ development/                   # πŸ› οΈ Development documentation
β”‚   β”œβ”€β”€ plan.md                   # Development roadmap and current status
β”‚   β”œβ”€β”€ REORGANIZATION_PLAN.md    # Architecture reorganization details
β”‚   β”œβ”€β”€ ITERATION_SUMMARY.md      # Development history and sprints
β”‚   └── INFRASTRUCTURE_IMPROVEMENTS.md # Infrastructure enhancements
└── operations/                    # πŸ”§ Operational documentation
    └── EVALUATION_ORGANIZATION_SUMMARY.md # System organization guide

evaluation/documentation/          # πŸ“Š Evaluation system documentation
β”œβ”€β”€ EVALUATION_USAGE_GUIDE.md     # Detailed evaluation instructions
β”œβ”€β”€ EVALUATION_IMPROVEMENT_PLAN.md # Enhancement strategies
β”œβ”€β”€ EVALUATION_SYSTEM_README.md   # Technical evaluation documentation
└── EVALUATION_SUMMARY.md         # Executive summary

Documentation Quick Links

Purpose Document Description
Getting Started README.md This file - project overview and quick start
Development docs/development/plan.md Current status and development roadmap
Testing & Evaluation evaluation/README.md Complete evaluation system guide
Documentation Guide docs/README.md Navigation guide for all documentation
System Organization docs/operations/EVALUATION_ORGANIZATION_SUMMARY.md File structure and organization changes

For Different Audiences

  • πŸ‘₯ Users & Stakeholders: Start with this README.md
  • πŸ‘¨β€πŸ’» Developers: See docs/development/ for development guides
  • πŸ”¬ QA & Testing: Check evaluation/ for testing framework
  • πŸ”§ Operations: Review docs/operations/ for operational guides

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


Built with ❀️ for financial compliance and risk management

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages