An intelligent journalism quality analysis API that combines NLP with LLM-powered metrics to evaluate article credibility and objectivity.
- Overview
- Features
- Quick Start
- API Documentation
- NLP Process Service
- Deployment
- Configuration
- Development
- Metrics
- Troubleshooting
- License
- Contributing
- Support
- Acknowledgments
Trust Engine v2 is a REST API that analyzes journalistic articles using Natural Language Processing (NLP) and Large Language Models (LLMs) to provide objective quality metrics. The system evaluates articles across multiple dimensions to help identify bias, sensationalism, and poor writing quality.
- Fact-checkers: Identify potentially biased language in articles
- Journalists: Self-audit writing for objectivity
- Media Literacy: Teach critical reading skills with objective metrics
- Research: Analyze large corpora for language patterns
- Submit article content via REST API
- Stanford Stanza performs linguistic analysis (POS tagging, dependency parsing)
- OpenRouter + DSPy filters subjective language patterns
- Return comprehensive quality metrics
- LLM-Powered Analysis: Uses OpenRouter + DSPy to distinguish qualitative (opinionated) from descriptive (objective) adjectives
- Multi-Metric Evaluation: 4 complementary metrics for comprehensive assessment
- Spanish Language Support: Built on Stanford Stanza for robust Spanish NLP
- REST API: FastAPI-based with automatic OpenAPI/Swagger documentation
- Docker Support: Containerized for easy deployment
- Cloud Run Ready: Automated deployment to Google Cloud Platform
- Auto-scaling: Scales to zero when not in use
- Comprehensive Logging: Track API calls and metric calculations
- Python 3.12+
- pip or conda
- (Optional) Docker for containerized deployment
- (Optional) OpenRouter API key for LLM-powered adjective filtering
-
Clone the repository
cd trust-engine-v2 -
Set up Python 3.12 with pyenv
pyenv install 3.12.7 # skip if already installed pyenv local 3.12.7 # uses .python-version
-
Install Poetry and project dependencies
pip install poetry # or pip install --user poetry poetry env use $(pyenv which python) poetry install
-
Configure environment variables
# Copy the example file cp .env.example .env # Edit .env with your credentials nano .env
Minimum configuration:
# Optional but recommended for full functionality OPENROUTER_API_KEY=your_api_key_here -
Start the API server
poetry run pre-commit install # optional: install ruff hooks locally poetry run uvicorn trust_api.main:app --reload
Tip:
poetry install --with devinstalls both your main dependencies and the dev-only ones under[tool.poetry.group.dev.dependencies](pytest, pre-commit, ruff). It doesnβt affect prod deps; it just brings in the extra tooling for tests/lint/hooks.
-
Access the API
open http://localhost:8000 # or curl the endpoints below- API: http://localhost:8000
- Interactive Docs: http://localhost:8000/docs
- Alternative Docs: http://localhost:8000/redoc
- Local:
http://localhost:8000 - Production:
https://your-service-name.run.app
Root endpoint with API information.
Response:
{
"message": "Welcome to MediaParty Trust API",
"version": "0.1.0",
"docs": "/docs"
}Health check endpoint.
Response:
{
"status": "healthy"
}Analyzes a journalistic article and returns trust metrics.
Request Body:
{
"body": "Article content goes here...",
"title": "Article Title",
"author": "Author Name",
"link": "https://example.com/article",
"date": "2024-03-15",
"media_type": "news"
}Response:
[
{
"id": 0,
"criteria_name": "Qualitative Adjectives",
"explanation": "The qualitative adjective ratio (3.2%) is excellent, indicating objective writing.",
"flag": 1,
"score": 0.9
},
{
"id": 1,
"criteria_name": "Word Count",
"explanation": "The article has 450 words, indicating adequate coverage.",
"flag": 0,
"score": 0.6
},
{
"id": 2,
"criteria_name": "Sentence Complexity",
"explanation": "Average sentence length is 18 words, indicating good readability.",
"flag": 1,
"score": 0.8
},
{
"id": 3,
"criteria_name": "Verb Tense Analysis",
"explanation": "Past tense usage (55%) is appropriate for news reporting.",
"flag": 1,
"score": 0.75
}
]Flag Values:
1: Positive indicator (good quality)0: Neutral (acceptable)-1: Negative indicator (poor quality)
Score Range: 0.0 to 1.0 (higher is better)
- Navigate to http://localhost:8000/docs
- Click on the
/api/v1/analyzeendpoint - Click "Try it out"
- Use the pre-filled example or modify the JSON
- Click "Execute"
- View the response below
curl -X POST "http://localhost:8000/api/v1/analyze" \
-H "Content-Type: application/json" \
-d '{
"body": "El gobierno anunciΓ³ hoy nuevas medidas econΓ³micas. Las decisiones fueron tomadas despuΓ©s de semanas de anΓ‘lisis. Los expertos consideran que estas polΓticas tendrΓ‘n un impacto significativo en la economΓa nacional.",
"title": "Nuevas medidas econΓ³micas anunciadas",
"author": "MarΓa GarcΓa",
"link": "https://example.com/article",
"date": "2024-03-15",
"media_type": "news"
}'import requests
import json
url = "http://localhost:8000/api/v1/analyze"
article = {
"body": "El gobierno anunciΓ³ hoy nuevas medidas econΓ³micas...",
"title": "Nuevas medidas econΓ³micas anunciadas",
"author": "MarΓa GarcΓa",
"link": "https://example.com/article",
"date": "2024-03-15",
"media_type": "news"
}
response = requests.post(url, json=article)
metrics = response.json()
for metric in metrics:
print(f"{metric['criteria_name']}: {metric['score']:.2f}")
print(f" {metric['explanation']}")
print()A lightweight FastAPI service to receive CSV references (GCS URIs) and trigger NLP processing (stubbed; replace with real logic).
- Run locally:
poetry run uvicorn trust_api.nlp.main:app --reload
- Env vars:
SERVICE_NAME(defaultnlp-process)ENVIRONMENT(defaultlocal)
- Deploy via CI: set
GCP_NLP_SERVICE_NAMEin GitHub secrets/vars. The workflow builds a single Docker image (named afterGCP_SERVICE_NAME, e.g.,trust-engine-v2) and deploys it to both Cloud Run services:- Main service: uses the image with default
APP_MODULE=trust_api.main:app - NLP service: uses the same image with
APP_MODULE=trust_api.nlp.main:appset via environment variable
- Main service: uses the image with default
Endpoints:
GET /metadataGET /healthhealth checkPOST /processwith body{"gcs_uri": "...", "metadata": {...}}(returns a stub response)
-
Build the Docker image
docker build -t trust-engine-v2 . -
Run the container
docker run -p 8080:8080 \ -e OPENROUTER_API_KEY=your_key_here \ trust-engine-v2
-
Test the deployment
curl http://localhost:8080/health
This project includes automated deployment to Google Cloud Run via GitHub Actions.
-
Create GCP Project
- Go to console.cloud.google.com
- Create a new project or select existing one
-
Enable Required APIs
gcloud services enable run.googleapis.com gcloud services enable artifactregistry.googleapis.com gcloud services enable cloudbuild.googleapis.com
-
Create Artifact Registry
gcloud artifacts repositories create cloud-run-source-deploy \ --repository-format=docker \ --location=us-central1 \ --description="Docker repository for Cloud Run" -
Create Service Account
- Go to IAM & Admin β Service Accounts
- Create service account with these roles:
- Cloud Run Admin
- Storage Admin
- Artifact Registry Administrator
-
AutenticaciΓ³n
- Uso interactivo (CLI):
gcloud auth login - Credenciales por defecto (ADC) para SDK/contenedores:
gcloud auth application-default login - Para CI/CD (GitHub Actions), usa Workload Identity Federation (ver secretos abajo)
- Uso interactivo (CLI):
-
Setup GitHub Secrets
Go to your GitHub repository β Settings β Secrets and variables β Actions
Add these secrets:
GCP_PROJECT_ID: Your GCP project IDGCP_REGION: Deployment region (e.g.,us-central1)GCP_SERVICE_NAME: Service name (e.g.,trust-engine-v2)GCP_WORKLOAD_IDENTITY_PROVIDER: Workload Identity Provider resource name (e.g.,projects/β¦/locations/global/workloadIdentityPools/β¦/providers/β¦)GCP_SERVICE_ACCOUNT_EMAIL: Service account email to impersonate via WIFOPENROUTER_API_KEY: Your OpenRouter API key The GitHub Actions workflow authenticates via Workload Identity Federation (no JSON key required).
Automatic Deployment:
Push to the main branch triggers automatic deployment.
Manual Deployment:
- Go to GitHub β Actions
- Select "Deploy to Cloud Run"
- Click "Run workflow"
The deployment process:
- Builds Docker container
- Pushes to Google Artifact Registry
- Deploys to Cloud Run
- Runs health check
- Outputs service URL
Note on Docker Images:
A single Docker image is built and pushed to Artifact Registry with the name based on GCP_SERVICE_NAME (e.g., trust-engine-v2). Both Cloud Run services (the main API service and the optional NLP processing service) use this same image. The difference is in the runtime configuration:
- Main service: Uses the default
APP_MODULE=trust_api.main:app - NLP service: Uses
APP_MODULE=trust_api.nlp.main:app(set via environment variable)
This approach is efficient as it:
- Reduces storage in Artifact Registry (one image instead of two)
- Ensures both services use the same codebase version
- Simplifies maintenance and deployment
Manual Cloud Build + Deploy (no local Docker)
export GCP_PROJECT_ID=your-project
export GCP_REGION=us-central1
export GCP_SERVICE_NAME=trust-engine-v2
# Optional: overrides
export TAG=$(git rev-parse --short HEAD) # image tag; defaults to git SHA or timestamp
export AR_REPO=cloud-run-source-deploy # Artifact Registry repo name
export OPENROUTER_API_KEY=your_api_key # forwarded to Cloud Run if set
export CLOUD_RUN_ENV_VARS="EXAMPLE=1,FOO=bar" # extra env vars for Cloud Run
./scripts/deploy_cloud_run.shThis uses gcloud builds submit to build in Cloud Build and deploys the built image to Cloud Run. The script will also create the Artifact Registry repo if it does not exist (AR_REPO in GCP_REGION).
Notes:
- The image pre-downloads the Stanza model during build (uses
STANZA_RESOURCES_DIR=/app/stanza_resourcesandSTANZA_LANG, defaultes). Runtime also passes these env vars so the model path is reused (avoids downloading on startup). - Local dev defaults
STANZA_RESOURCES_DIRto./stanza_resources(override via env if needed).
After deployment, your API will be available at:
https://[SERVICE-NAME]-[HASH]-[REGION].a.run.app
Check GitHub Actions logs for the exact URL.
Use gcloud run services proxy to bind your Cloud Run service to a local port (requires gcloud auth login and the correct project/region):
gcloud run services proxy $GCP_SERVICE_NAME \
--project $GCP_PROJECT_ID \
--region $GCP_REGION \
--port 8080Then call it via http://localhost:8080 (e.g., http://localhost:8080/health or the API endpoints). Stop with Ctrl+C when finished.
You can also use the helper script (loads env vars from your shell):
source .env # ensure GCP_PROJECT_ID, GCP_REGION, GCP_SERVICE_NAME are set
./scripts/proxy_cloud_run.shDefault settings (configurable in .github/workflows/deploy-cloud-run.yml):
- Memory: 2GB
- CPU: 2 vCPU
- Timeout: 300 seconds
- Max instances: 10
- Min instances: 0 (scales to zero)
- Port: 8080
- Authentication: Public (allow unauthenticated)
Create a .env file in the project root:
# OpenRouter API Configuration (optional but recommended)
OPENROUTER_API_KEY=your_api_key_here
# Google Cloud Platform Configuration (for deployment)
GCP_PROJECT_ID=your-gcp-project-id
GCP_REGION=us-central1
GCP_SERVICE_NAME=trust-engine-v2
GCP_WORKLOAD_IDENTITY_PROVIDER=projects/.../locations/global/workloadIdentityPools/.../providers/...
GCP_SERVICE_ACCOUNT_EMAIL=sa-name@your-gcp-project-id.iam.gserviceaccount.com- Sign up at openrouter.ai
- Go to API Keys
- Create a new API key
- Copy to your
.envfile
Note: Without OPENROUTER_API_KEY, the adjective metric will work using all adjectives instead of filtering for qualitative ones only.
trust-engine-v2/
βββ src/trust_api/
β βββ main.py # FastAPI application entry point
β βββ models.py # Pydantic models
β βββ __init__.py
β βββ nlp/ # NLP processing service (trust-api-nlp)
β β βββ __init__.py
β β βββ main.py # NLP service entry point
β β βββ core/
β β βββ config.py
β βββ api/
β β βββ __init__.py
β β βββ v1/
β β βββ __init__.py
β β βββ endpoints.py # API endpoints
β βββ services/
β β βββ __init__.py
β β βββ metrics.py # Metric calculation logic
β β βββ stanza_service.py # NLP processing service
β βββ core/
β βββ __init__.py
β βββ config.py # Configuration management
βββ .github/
β βββ workflows/
β βββ deploy-cloud-run.yml # CI/CD pipeline
βββ test/ # Test examples
βββ Dockerfile # Container definition
βββ .dockerignore # Docker build exclusions
βββ pyproject.toml # Python dependencies
βββ .pre-commit-config.yaml # Lint/format hooks (ruff)
βββ .env.example # Environment template
βββ .gitignore # Git exclusions
βββ README.md # This file
- Open
src/trust_api/services/metrics.py - Create a new function following this pattern:
def get_new_metric(doc: Document, metric_id: int) -> Metric:
"""
Calculate your new metric.
Args:
doc: Stanza Document object
metric_id: Unique metric identifier
Returns:
Metric object with results
"""
# Your analysis logic here
score = 0.0 # Calculate score (0.0 to 1.0)
flag = 0 # -1, 0, or 1
return Metric(
id=metric_id,
criteria_name="Your Metric Name",
explanation="Description of the result",
flag=flag,
score=score,
)- Add to the analysis pipeline in
src/trust_api/api/v1/endpoints.py:
metrics = [
get_adjective_count(doc, metric_id=0),
get_word_count(doc, metric_id=1),
get_sentence_complexity(doc, metric_id=2),
get_verb_tense_analysis(doc, metric_id=3),
get_new_metric(doc, metric_id=4), # Add your metric
]# In-process health check (no server needed; skips stanza init)
poetry run pytest test/test_client.py
# Model/metric unit tests (pytest)
poetry run pytest test/test_metrics.py
# Against a running server (full flow)
poetry run uvicorn trust_api.main:app --reload & # in another terminal
poetry run python test/test_client.py --url http://localhost:8000 --input test/example_article.jsonWhat it measures: Proportion of opinion-based adjectives vs. descriptive adjectives
Why it matters: Excessive qualitative adjectives signal bias or sensationalism
How it works:
- Extracts all adjectives using Stanza
- Uses OpenRouter + DSPy to classify as qualitative vs. descriptive
- Calculates ratio of qualitative adjectives
Scoring:
- β€5%: Excellent (flag: 1, score: 0.8-1.0)
- 5-10%: Moderate (flag: 0, score: 0.5-0.8)
-
10%: High (flag: -1, score: 0.0-0.5)
What it measures: Total article length
Why it matters: Longer articles tend to provide more comprehensive coverage
How it works:
- Counts all words in title + body
- Evaluates against journalistic standards
Scoring:
-
400 words: Good depth (flag: 1)
- 200-400 words: Moderate (flag: 0)
- <200 words: Insufficient (flag: -1)
What it measures: Average sentence length
Why it matters: Proper complexity ensures readability without oversimplification
How it works:
- Calculates average words per sentence
- Optimal range: 15-25 words
Scoring:
- 15-25 words: Optimal (flag: 1, score: 0.8-1.0)
- 10-30 words: Acceptable (flag: 0, score: 0.5-0.8)
- Other: Poor (flag: -1, score: 0.0-0.5)
What it measures: Distribution of verb tenses
Why it matters: News articles should primarily use past tense for reported events
How it works:
- Analyzes verb tense distribution using Stanza
- Expected for news: 40-70% past tense
Scoring:
- 40-70% past tense: Appropriate (flag: 1)
- 30-40% or 70-80%: Moderate (flag: 0)
- Other: Inappropriate (flag: -1)
Cause: Stanza is downloading language models on first run
Solution: Wait for initialization to complete. Check server logs for progress.
# Check logs
tail -f /var/log/app.logCause: Environment variable not configured
Solution: The API will work with reduced functionality (all adjectives instead of filtered qualitative ones). To enable full functionality:
# Add to .env file
OPENROUTER_API_KEY=your_key_hereCause: Usually dependency or network issues
Solution:
# Clear Docker cache and rebuild
docker build --no-cache -t trust-engine-v2 .Cause: Missing permissions or incorrect configuration
Solution: Verify:
- GitHub secrets are set correctly
- Service account has required roles
- APIs are enabled in GCP
- Artifact Registry exists
# Check service account permissions
gcloud projects get-iam-policy YOUR_PROJECT_ID \
--flatten="bindings[].members" \
--filter="bindings.members:serviceAccount:YOUR_SA_EMAIL"Cause: Stanza model loading + container cold start
Solution:
- First request may take 30-60 seconds
- Subsequent requests are fast (<2 seconds)
- In Cloud Run, set
min-instances: 1to avoid cold starts
# Update Cloud Run to keep 1 instance warm
gcloud run services update trust-engine-v2 \
--min-instances 1 \
--region us-central1GNU AFFERO GENERAL PUBLIC LICENSE
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
For issues and questions:
- GitHub Issues: Create an issue
- Documentation: http://localhost:8000/docs
Sponsor: Desconfio.org
Built with:
- FastAPI - Modern Python web framework
- Stanford Stanza - NLP toolkit
- DSPy - Programming with foundation models
- OpenRouter - LLM API gateway