Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
5eb3af3
feat(migration): SDK subpackage for org-to-org data migration (v1, ad…
chandrasekharan-zipstack May 23, 2026
17aa923
chore(migration): drop cross-repo refs from code comments
chandrasekharan-zipstack May 23, 2026
f28c3c3
feat(migration): connector + tag phases
chandrasekharan-zipstack May 23, 2026
41022d5
refactor(migration): drive POST payloads from DRF OPTIONS schema
chandrasekharan-zipstack May 23, 2026
fcd7048
refactor(migration): centralise POST payload construction
chandrasekharan-zipstack May 23, 2026
8863f29
feat(migration): CustomTool composite phase
chandrasekharan-zipstack May 23, 2026
7fe9c33
feat(migration): WorkflowPhase
chandrasekharan-zipstack May 23, 2026
2a0604c
feat(migration): ToolInstance + WorkflowEndpoint phases
chandrasekharan-zipstack May 23, 2026
f879a8f
refactor(migration): use project-transfer for CustomToolPhase
chandrasekharan-zipstack May 23, 2026
60a8f29
fix(migration): resolve profile adapters by NAME, not UUID
chandrasekharan-zipstack May 23, 2026
0d237a2
feat(migration): Pipeline + APIDeployment phases
chandrasekharan-zipstack May 23, 2026
05055ce
feat(migration): FilesPhase for Prompt Studio document corpus [UN-3479]
chandrasekharan-zipstack May 24, 2026
fe66b05
fix(migration): files phase end-to-end + report quieting + tool_insta…
chandrasekharan-zipstack May 24, 2026
bc4ded6
fix(migration): address greptile P1s on base + workflow_endpoint
chandrasekharan-zipstack May 24, 2026
a756d13
perf(migration): hoist target list_custom_tools out of per-tool loop
chandrasekharan-zipstack May 24, 2026
0811d31
Delete docs/internal/files-migration-plan.md
chandrasekharan-zipstack May 24, 2026
b73556f
docs(migration): rewrite README for end users
chandrasekharan-zipstack May 24, 2026
d2cc32f
docs(migration): add sample report + cross-deployment note
chandrasekharan-zipstack May 24, 2026
d8e6490
docs(migration): slim README, defer to public docs
chandrasekharan-zipstack May 24, 2026
ca1b6db
refactor(clone): rename migration module/CLI to clone
chandrasekharan-zipstack May 24, 2026
da07247
docs(clone): top-of-README note on users + install instructions
chandrasekharan-zipstack May 26, 2026
836962c
docs(clone): update public docs URL to /cloning-orgs/
chandrasekharan-zipstack May 26, 2026
9391ce3
fix(clone): address PR #15 review feedback (P1s + quick wins)
chandrasekharan-zipstack May 26, 2026
5e25736
perf(clone): within-phase parallelism + per-phase timing
chandrasekharan-zipstack May 27, 2026
149bcec
fix(clone): bump result.skipped when tool_instance metadata PATCH is …
chandrasekharan-zipstack May 27, 2026
bf0b8d0
fix(clone): surface 'file already on target' skip reason at INFO
chandrasekharan-zipstack May 27, 2026
0722713
docs(clone): note OAuth connectors need re-auth on target
chandrasekharan-zipstack May 27, 2026
e87822d
docs(clone): note Unstract Cloud trial adapters are skipped
chandrasekharan-zipstack May 27, 2026
876861f
feat(clone): skip unmigratable resources upfront + failures summary
chandrasekharan-zipstack May 27, 2026
29f169f
fix(clone): address PR #15 P1s — workflow detail GET + remap snapshot
chandrasekharan-zipstack May 27, 2026
7fc2f5f
ci: add PR-gate workflow for lint and tests
chandrasekharan-zipstack May 27, 2026
2e313e1
ci: install --all-extras so clone CLI tests can import click
chandrasekharan-zipstack May 27, 2026
4779fb2
Merge pull request #16 from Zipstack/ci/pr-test-gate
chandrasekharan-zipstack May 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: Lint and Test

on:
pull_request:
branches: [main, "feat/**", "fix/**"]
push:
branches: [main]

jobs:
lint-and-test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.11", "3.12"]
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install uv
uses: astral-sh/setup-uv@v6
with:
version: "0.6.14"
enable-cache: true

- name: Install dependencies
run: uv sync --dev --all-extras

- name: Create test env
run: cp tests/sample.env tests/.env

- name: Tests (pytest)
run: uv run pytest tests/ -v
9 changes: 9 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,15 @@ classifiers = [
"Topic :: Software Development :: Libraries :: Python Modules",
]

[project.optional-dependencies]
clone = [
"click>=8.1",
"rich>=13.7",
]

[project.scripts]
unstract-clone = "unstract.clone.cli:main"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
Expand Down
61 changes: 61 additions & 0 deletions src/unstract/clone/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Cloning Organizations

> [!NOTE]
> **Users are not cloned.** Two reasons:
> - The same user may not need access in every environment.
> - The same user may hold different roles across environments.
>
> **Groups _will_ be cloned** (upcoming — not yet implemented). Once available, an admin can add the right users to each group per environment.
Clone an Unstract organization's configured resources into another organization (same deployment or different). Useful for environment promotion (DEV → QA → PROD) and for spinning up a fresh org from a known-good baseline.

Cloned resources: adapters, connectors, custom tools, prompts, profiles, workflows, tool instances, workflow endpoints, tags, API deployments, pipelines, and Prompt Studio document files. The source org is left untouched.

> **Full documentation, behavior notes, CLI reference, and sample report:**
> https://docs.unstract.com/unstract/unstract_platform/api_documentation/versions/cloning-orgs/
## Install

From a clone of this repository:

```bash
uv sync --all-extras
```

This pulls in the `clone` extra (`click`, `rich`) needed by the CLI.

## Quickstart

```bash
UNSTRACT_SRC_PLATFORM_KEY=src_pk_... \
UNSTRACT_TGT_PLATFORM_KEY=tgt_pk_... \
uv run python -m unstract.clone clone \
--source-url https://source.example.com \
--source-org my-source-org \
--target-url https://target.example.com \
--target-org my-target-org
```

Both keys must be **org admin Platform API keys**.

> [!WARNING]
> Both keys grant broad access. Run from a trusted machine and rotate both keys after the clone completes.
> [!NOTE]
> **Unstract Cloud free-trial adapters are not cloned.** Trial adapters are platform-owned and filtered out of the source listing. Prompt Studio projects whose default profile references them are skipped, and that cascades to dependent workflows, API deployments, and pipelines. Provision your own adapters on the target org and re-run the clone to bring the rest across.
> [!NOTE]
> **OAuth-backed connectors need re-authorisation on target.** Connectors that use OAuth (e.g. Google Drive) are cloned without their refresh tokens — the Platform API never exposes them. Re-connect each one on the target after the clone.
## Re-runs are safe

If a phase fails partway, fix the cause and re-run the same command. Resources already on the target are detected by name and reused. There is no `--resume-from` flag — the target is the state.

## Files

The Prompt Studio document corpus is the only resource type with bytes on disk. Default cap per file is 25 MB; oversize files are reported for manual re-upload. Use `--skip-files` to skip bytes entirely (document records are still created).

> [!WARNING]
> Run clones during low-activity windows. Concurrent uploads to the source org during a clone can create duplicate file records on the target.
See the [public docs](https://docs.unstract.com/unstract/unstract_platform/api_documentation/versions/cloning-orgs/) for the full flag list, behavioral notes, and the format of the end-of-run report.
25 changes: 25 additions & 0 deletions src/unstract/clone/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
"""Cloning organizations over the Platform API.

Migrates configured resources (adapters, connectors, custom tools, workflows,
etc.) from one Unstract org to another using two admin-issued Platform API
keys. The target deployment is the persistent state — re-runs reconcile
against existing target rows by natural key.
"""

from unstract.clone.context import (
CloneContext,
CloneOptions,
OrgEndpoint,
RemapTable,
)
from unstract.clone.orchestrator import clone
from unstract.clone.report import CloneReport

__all__ = [
"CloneContext",
"CloneOptions",
"CloneReport",
"OrgEndpoint",
"RemapTable",
"clone",
]
6 changes: 6 additions & 0 deletions src/unstract/clone/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Entry point: ``python -m unstract.clone``."""

from unstract.clone.cli import main

if __name__ == "__main__":
main()
212 changes: 212 additions & 0 deletions src/unstract/clone/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
"""Click-based CLI for ``unstract.clone``.

Single ``clone`` command. Platform keys can be passed via flags
(``--source-key`` / ``--target-key``) or env vars
(``UNSTRACT_SRC_PLATFORM_KEY`` / ``UNSTRACT_TGT_PLATFORM_KEY``) — env vars
are preferred so the key never lands in shell history.
"""

from __future__ import annotations

import logging
import re
import sys
from typing import Any

import click

from unstract.clone.context import (
DEFAULT_CONCURRENCY,
DEFAULT_MAX_FILE_SIZE,
CloneOptions,
OrgEndpoint,
)
from unstract.clone.exceptions import CloneError
from unstract.clone.orchestrator import clone as run_clone

_SIZE_UNITS: dict[str, int] = {
"B": 1,
"K": 1024,
"KB": 1024,
"M": 1024 * 1024,
"MB": 1024 * 1024,
"G": 1024 * 1024 * 1024,
"GB": 1024 * 1024 * 1024,
}
_SIZE_RE = re.compile(r"^\s*(\d+(?:\.\d+)?)\s*([A-Za-z]*)\s*$")


def _parse_size(value: str) -> int:
"""Accept ``25``, ``25MB``, ``1.5GB`` etc. Returns bytes."""
m = _SIZE_RE.match(value)
if not m:
raise click.BadParameter(f"can't parse size '{value}'")
num, unit = m.group(1), m.group(2).upper() or "B"
if unit not in _SIZE_UNITS:
raise click.BadParameter(
f"unknown size unit '{unit}'; use one of {sorted(_SIZE_UNITS)}"
)
return int(float(num) * _SIZE_UNITS[unit])


def _configure_logging(verbose: bool) -> None:
level = logging.DEBUG if verbose else logging.INFO
logging.basicConfig(
level=level,
format="%(asctime)s %(levelname)-7s %(name)s: %(message)s",
datefmt="%H:%M:%S",
)


def _split_csv(value: str | None) -> tuple[str, ...] | None:
if not value:
return None
return tuple(p.strip() for p in value.split(",") if p.strip())


@click.group()
def cli() -> None:
"""Cloning organizations over the Platform API."""


@cli.command("clone")
@click.option("--source-url", required=True, help="Base URL of the source deployment")
@click.option(
"--source-org", required=True, help="Source organization_id (slug in the URL path)"
)
@click.option(
"--source-key",
envvar="UNSTRACT_SRC_PLATFORM_KEY",
required=True,
help="Source admin's Platform API key (or env UNSTRACT_SRC_PLATFORM_KEY)",
)
@click.option("--target-url", required=True, help="Base URL of the target deployment")
@click.option(
"--target-org", required=True, help="Target organization_id (slug in the URL path)"
)
@click.option(
"--target-key",
envvar="UNSTRACT_TGT_PLATFORM_KEY",
required=True,
help="Target admin's Platform API key (or env UNSTRACT_TGT_PLATFORM_KEY)",
)
@click.option(
"--dry-run", is_flag=True, help="Plan only — do not POST anything to target"
)
@click.option(
"--include",
default=None,
help="Comma-separated phase names to include (default: all)",
)
@click.option(
"--exclude",
default=None,
help="Comma-separated phase names to exclude",
)
@click.option(
"--on-name-conflict",
type=click.Choice(["adopt", "abort"]),
default="adopt",
show_default=True,
help="What to do when a like-named entity exists in target",
)
@click.option(
"--api-prefix",
default="api/v1",
show_default=True,
help="Backend URL prefix (matches deployment's PATH_PREFIX env)",
)
@click.option(
"--file-strategy",
type=click.Choice(["platform_api", "skip"]),
default="platform_api",
show_default=True,
help="How to move Prompt Studio document files. 'skip' = metadata only.",
)
@click.option(
"--max-file-size",
default="25MB",
show_default=True,
help="Per-file cap for the files phase. Oversize → reported, not aborted.",
)
@click.option(
"--skip-files",
is_flag=True,
help="Alias for --file-strategy=skip.",
)
@click.option(
"--concurrency",
type=click.IntRange(min=1, max=32),
default=DEFAULT_CONCURRENCY,
show_default=True,
help="Per-phase worker count. 1 = strictly sequential.",
)
@click.option("-v", "--verbose", is_flag=True, help="Debug logging")
def clone_cmd(
source_url: str,
source_org: str,
source_key: str,
target_url: str,
target_org: str,
target_key: str,
dry_run: bool,
include: str | None,
exclude: str | None,
on_name_conflict: str,
api_prefix: str,
file_strategy: str,
max_file_size: str,
skip_files: bool,
concurrency: int,
verbose: bool,
) -> None:
"""Clone configured resources from one org to another."""
_configure_logging(verbose)

effective_strategy = "skip" if skip_files else file_strategy
try:
cap_bytes = _parse_size(max_file_size)
except click.BadParameter as e:
raise click.UsageError(str(e)) from e

options = CloneOptions(
dry_run=dry_run,
include=_split_csv(include),
exclude=_split_csv(exclude) or (),
on_name_conflict=on_name_conflict,
verbose=verbose,
file_strategy=effective_strategy,
max_file_size=cap_bytes if cap_bytes is not None else DEFAULT_MAX_FILE_SIZE,
concurrency=concurrency,
)

source = OrgEndpoint(
base_url=source_url,
organization_id=source_org,
platform_key=source_key,
api_path_prefix=api_prefix,
)
target = OrgEndpoint(
base_url=target_url,
organization_id=target_org,
platform_key=target_key,
api_path_prefix=api_prefix,
)

try:
report = run_clone(source, target, options)
except CloneError as e:
click.echo(f"Clone failed: {e}", err=True)
sys.exit(2)

click.echo(report.render())
if report.aborted or any(p.failed for p in report.phases):
sys.exit(1)


def main(argv: list[str] | None = None) -> Any:
return cli(args=argv, standalone_mode=True)


if __name__ == "__main__":
main()
Loading
Loading