Skip to content

zappa/zappa-merle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

merle README

This project uses zappa/ollama to deploy ollama compatable models to AWS lambda.

CLI Usage

The CLI provides commands for preparing, deploying, and managing Ollama model deployments:

# Prepare deployment files for an Ollama model
python -m merle.cli prepare --model {OLLAMA_MODEL} [--auth-token TOKEN] [--tags KEY=VALUE,...]

# Deploy a prepared model to AWS Lambda
python -m merle.cli deploy --model {MODEL_NAME} --auth-token {AUTH_TOKEN}

# List all configured models
python -m merle.cli list

# Start an interactive chat session with a deployed model
python -m merle.cli chat --model {MODEL_NAME}

# Tear down a deployed Lambda function
python -m merle.cli destroy --model {MODEL_NAME}

Note: You can find a list of available Ollama models at https://ollama.com/library

AWS Configuration

Before deploying, ensure your AWS credentials are configured. Merle uses the standard AWS credential chain:

# Option 1: Set AWS profile (recommended for multiple accounts)
export AWS_PROFILE=your-profile-name

# Option 2: Set credentials directly
export AWS_ACCESS_KEY_ID=your-access-key
export AWS_SECRET_ACCESS_KEY=your-secret-key

# Optional: Set default region (overrides the CLI default)
export AWS_DEFAULT_REGION=us-east-1

Region Configuration:

  • Default region: ap-northeast-1
  • Override with --region option: merle prepare --model llama2 --region us-west-2
  • Or set via environment: export AWS_DEFAULT_REGION=us-west-2

Note: The region must be specified during prepare step as it's embedded in the deployment configuration.

Deployment topology (--topology)

merle supports two deployment topologies. Pick one with --topology at prepare / deploy time.

Topology Max request duration Auth layer When to pick it
apigw (default) 29 seconds (API Gateway REST integration cap; cannot be raised) API Gateway custom authorizer Lambda validates X-API-Key before the request reaches Lambda Small models whose warm end-to-end latency is well under 29s (e.g. tinyllama, tiny quantised 1B models).
function-url Up to Lambda's configured timeout_seconds (15 min max) Lambda Function URL with AuthType=NONE; the Flask app validates X-API-Key via a before_request hook Anything that can't finish in 29s on CPU — basically every real-world model, including schroneko/gemma-2-2b-jpn-it, llama3.2, mistral, and larger.

WARNING — 29s ceiling on apigw: API Gateway REST has a hard 29-second integration timeout that AWS does not let you raise. If cold-start + model-load + first-token exceeds 29s the client gets HTTP 504 "Endpoint request timed out" even though the Lambda itself finishes the request (visible in CloudWatch). Streaming does not help — API Gateway buffers before flushing. For CPU inference on anything larger than toy models, choose --topology function-url.

Function URL example

# Prepare + deploy with a Function URL (no API Gateway)
uvx merle prepare --model schroneko/gemma-2-2b-jpn-it --topology function-url
uvx merle deploy --model schroneko/gemma-2-2b-jpn-it

# Subsequent chat uses the Function URL automatically
uvx merle chat --model schroneko/gemma-2-2b-jpn-it

Under function-url, merle sets MERLE_REQUIRE_API_KEY=true on the Lambda. The Flask app in the container enforces X-API-Key on every route (including /health and /), matching the behaviour of the API Gateway authorizer in apigw mode. The authorizer Lambda and its IAM role are not provisioned in function-url mode.

To switch an existing deployment between topologies, destroy it first:

uvx merle destroy --model {MODEL}
uvx merle prepare --model {MODEL} --topology function-url
uvx merle deploy --model {MODEL}

OpenAI-compatible clients

merle proxies both Ollama's native API (/api/*) and Ollama's OpenAI-compatible surface (/v1/*). OpenAI SDK users can point at the merle deployment URL directly:

from openai import OpenAI

client = OpenAI(
    base_url="https://<function-url-or-apigw-url>/v1",
    api_key="<the X-API-Key you set at prepare time>",
    default_headers={"X-API-Key": "<same token>"},
)

reply = client.chat.completions.create(
    model="schroneko/gemma-2-2b-jpn-it",
    messages=[{"role": "user", "content": "こんにちは"}],
)

Note: the OpenAI SDK sends the token as Authorization: Bearer ..., but merle's authorizer and in-app gate read X-API-Key. Pass the token in default_headers as shown, or set X-API-Key on each request.

Using uvx (Recommended)

You can run merle without installing it using uvx, which executes the CLI in an isolated environment:

# Prepare deployment files (with optional region)
uvx merle prepare --model llama2 --auth-token YOUR_TOKEN --region us-east-1

# Deploy to AWS Lambda
uvx merle deploy --model llama2 --auth-token YOUR_TOKEN

# List configured models
uvx merle list

# Start interactive chat
uvx merle chat --model llama2

# Destroy deployment
uvx merle destroy --model llama2

# Check version
uvx merle --version

Benefits of using uvx:

  • No installation required
  • Always uses an isolated environment
  • Fast subsequent runs due to caching
  • Perfect for CI/CD pipelines and one-off commands

Note: First run may take a moment to set up the environment, but subsequent runs are nearly instant due to uv's caching.

Structure

zappa-merle/
├── .github/
│   └── workflows/
│       ├── register-circleci-project.yml
│       └── test.yml
├── merle/
│   ├── __init__.py
│   ├── app.py
│   ├── chat.py
│   ├── cli.py
│   ├── functions.py
│   ├── settings.py
│   └── templates/
│       ├── Dockerfile.template
│       ├── authorizer.py
│       └── zappa_settings.json.template
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── test_chat.py
│   ├── test_cli.py
│   ├── test_deployment_completeness.py
│   ├── test_docker.py
│   └── test_functions.py
├── .gitignore
├── .pre-commit-config.yaml
├── LICENSE
├── README.md
├── pyproject.toml
└── uv.lock

Local Development

Python: 3.13

Requires uv for dependency management

Installing Development Environment

  1. Install pre-commit hooks (ruff):

    Assumes pre-commit is already installed.

    pre-commit install
  2. The following command installs project and development dependencies:

    uv sync

Run Code Checks

uv run poe check

Run type checking:

uv run poe typecheck

Run Test Cases

This project uses pytest for running testcases.

Test cases should be added in the tests directory.

To run test cases, execute the following command:

pytest -v
# Or, from the parent directory
uv run poe test

About

A zappa-based ollama model deployment tool

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages