merle README

This project uses zappa/ollama to deploy ollama compatable models to AWS lambda.

CLI Usage

The CLI provides commands for preparing, deploying, and managing Ollama model deployments:

# Prepare deployment files for an Ollama model
python -m merle.cli prepare --model {OLLAMA_MODEL} [--auth-token TOKEN] [--tags KEY=VALUE,...]

# Deploy a prepared model to AWS Lambda
python -m merle.cli deploy --model {MODEL_NAME} --auth-token {AUTH_TOKEN}

# List all configured models
python -m merle.cli list

# Start an interactive chat session with a deployed model
python -m merle.cli chat --model {MODEL_NAME}

# Tear down a deployed Lambda function
python -m merle.cli destroy --model {MODEL_NAME}

Note: You can find a list of available Ollama models at https://ollama.com/library

AWS Configuration

Before deploying, ensure your AWS credentials are configured. Merle uses the standard AWS credential chain:

# Option 1: Set AWS profile (recommended for multiple accounts)
export AWS_PROFILE=your-profile-name

# Option 2: Set credentials directly
export AWS_ACCESS_KEY_ID=your-access-key
export AWS_SECRET_ACCESS_KEY=your-secret-key

# Optional: Set default region (overrides the CLI default)
export AWS_DEFAULT_REGION=us-east-1

Region Configuration:

Default region: ap-northeast-1
Override with --region option: merle prepare --model llama2 --region us-west-2
Or set via environment: export AWS_DEFAULT_REGION=us-west-2

Note: The region must be specified during prepare step as it's embedded in the deployment configuration.

Deployment topology (`--topology`)

merle supports two deployment topologies. Pick one with --topology at prepare / deploy time.

Topology	Max request duration	Auth layer	When to pick it
`apigw` (default)	29 seconds (API Gateway REST integration cap; cannot be raised)	API Gateway custom authorizer Lambda validates `X-API-Key` before the request reaches Lambda	Small models whose warm end-to-end latency is well under 29s (e.g. `tinyllama`, tiny quantised 1B models).
`function-url`	Up to Lambda's configured `timeout_seconds` (15 min max)	Lambda Function URL with `AuthType=NONE`; the Flask app validates `X-API-Key` via a `before_request` hook	Anything that can't finish in 29s on CPU — basically every real-world model, including `schroneko/gemma-2-2b-jpn-it`, `llama3.2`, `mistral`, and larger.

WARNING — 29s ceiling on apigw: API Gateway REST has a hard 29-second integration timeout that AWS does not let you raise. If cold-start + model-load + first-token exceeds 29s the client gets HTTP 504 "Endpoint request timed out" even though the Lambda itself finishes the request (visible in CloudWatch). Streaming does not help — API Gateway buffers before flushing. For CPU inference on anything larger than toy models, choose --topology function-url.

Function URL example

# Prepare + deploy with a Function URL (no API Gateway)
uvx merle prepare --model schroneko/gemma-2-2b-jpn-it --topology function-url
uvx merle deploy --model schroneko/gemma-2-2b-jpn-it

# Subsequent chat uses the Function URL automatically
uvx merle chat --model schroneko/gemma-2-2b-jpn-it

Under function-url, merle sets MERLE_REQUIRE_API_KEY=true on the Lambda. The Flask app in the container enforces X-API-Key on every route (including /health and /), matching the behaviour of the API Gateway authorizer in apigw mode. The authorizer Lambda and its IAM role are not provisioned in function-url mode.

To switch an existing deployment between topologies, destroy it first:

uvx merle destroy --model {MODEL}
uvx merle prepare --model {MODEL} --topology function-url
uvx merle deploy --model {MODEL}

OpenAI-compatible clients

merle proxies both Ollama's native API (/api/*) and Ollama's OpenAI-compatible surface (/v1/*). OpenAI SDK users can point at the merle deployment URL directly:

from openai import OpenAI

client = OpenAI(
    base_url="https://<function-url-or-apigw-url>/v1",
    api_key="<the X-API-Key you set at prepare time>",
    default_headers={"X-API-Key": "<same token>"},
)

reply = client.chat.completions.create(
    model="schroneko/gemma-2-2b-jpn-it",
    messages=[{"role": "user", "content": "こんにちは"}],
)

Note: the OpenAI SDK sends the token as Authorization: Bearer ..., but merle's authorizer and in-app gate read X-API-Key. Pass the token in default_headers as shown, or set X-API-Key on each request.

Using uvx (Recommended)

You can run merle without installing it using uvx, which executes the CLI in an isolated environment:

# Prepare deployment files (with optional region)
uvx merle prepare --model llama2 --auth-token YOUR_TOKEN --region us-east-1

# Deploy to AWS Lambda
uvx merle deploy --model llama2 --auth-token YOUR_TOKEN

# List configured models
uvx merle list

# Start interactive chat
uvx merle chat --model llama2

# Destroy deployment
uvx merle destroy --model llama2

# Check version
uvx merle --version

Benefits of using uvx:

No installation required
Always uses an isolated environment
Fast subsequent runs due to caching
Perfect for CI/CD pipelines and one-off commands

Note: First run may take a moment to set up the environment, but subsequent runs are nearly instant due to uv's caching.

Structure

zappa-merle/
├── .github/
│   └── workflows/
│       ├── register-circleci-project.yml
│       └── test.yml
├── merle/
│   ├── __init__.py
│   ├── app.py
│   ├── chat.py
│   ├── cli.py
│   ├── functions.py
│   ├── settings.py
│   └── templates/
│       ├── Dockerfile.template
│       ├── authorizer.py
│       └── zappa_settings.json.template
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── test_chat.py
│   ├── test_cli.py
│   ├── test_deployment_completeness.py
│   ├── test_docker.py
│   └── test_functions.py
├── .gitignore
├── .pre-commit-config.yaml
├── LICENSE
├── README.md
├── pyproject.toml
└── uv.lock

Local Development

Python: 3.13

Requires uv for dependency management

Installing Development Environment

Install pre-commit hooks (ruff):

Assumes pre-commit is already installed.
```
pre-commit install
```
The following command installs project and development dependencies:
```
uv sync
```

Run Code Checks

uv run poe check

Run type checking:

uv run poe typecheck

Run Test Cases

This project uses pytest for running testcases.

Test cases should be added in the tests directory.

To run test cases, execute the following command:

pytest -v
# Or, from the parent directory
uv run poe test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

merle README

CLI Usage

AWS Configuration

Deployment topology (`--topology`)

Function URL example

OpenAI-compatible clients

Using uvx (Recommended)

Structure

Local Development

Installing Development Environment

Run Code Checks

Run Test Cases

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
.idea		.idea
merle		merle
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

merle README

CLI Usage

AWS Configuration

Deployment topology (--topology)

Function URL example

OpenAI-compatible clients

Using uvx (Recommended)

Structure

Local Development

Installing Development Environment

Run Code Checks

Run Test Cases

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Deployment topology (`--topology`)

Packages