atlanhq
diff --git a/‎.claude/skills/generate-v9-models/SKILL.md‎
Lines changed: 132 additions & 0 deletions b/‎.claude/skills/generate-v9-models/SKILL.md‎
Lines changed: 132 additions & 0 deletions
diff --git a/‎.github/workflows/pyatlan-pr.yaml‎
Lines changed: 176 additions & 5 deletions b/‎.github/workflows/pyatlan-pr.yaml‎
Lines changed: 176 additions & 5 deletions
diff --git a/‎README.md‎
Lines changed: 38 additions & 0 deletions b/‎README.md‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎pyatlan/test_utils/base_vcr.py‎
Lines changed: 7 additions & 3 deletions b/‎pyatlan/test_utils/base_vcr.py‎
Lines changed: 7 additions & 3 deletions
@@ -0,0 +1,132 @@
+---
+description: Generate pyatlan_v9 msgspec model files by cloning the models repo and running the Pkl code generator
+---
+
+# Generate v9 Models
+
+Generates pyatlan_v9 msgspec model files from Pkl type definitions in the atlanhq/models repo.
+
+## Usage
+
+- `/generate-v9-models` — Clone models@master, generate and sync v9 models
+- `/generate-v9-models <branch>` — Clone models@<branch> instead of master
+- `/generate-v9-models <branch> test` — Also run tests after sync
+- `/generate-v9-models test` — Clone models@master and run tests after sync
+
+## Instructions
+
+### 1. Clone or update the models repo
+
+Parse args to determine the branch (default: `master`) and whether to run tests (args contain "test").
+
+The models repo should be cloned as a sibling directory of this repo (atlan-python):
+
+```bash
+# Determine paths
+SDK_DIR="$(pwd)"  # atlan-python root
+MODELS_DIR="$(cd .. && pwd)/models"
+BRANCH="master"  # override with first non-"test" arg
+
+if [ -d "$MODELS_DIR" ]; then
+  cd "$MODELS_DIR" && git fetch origin && git checkout "$BRANCH" && git pull origin "$BRANCH"
+else
+  git clone --branch "$BRANCH" --single-branch git@github.com:atlanhq/models.git "$MODELS_DIR"
+fi
+```
+
+### 2. Run Pkl evaluation
+
+From the models repo root, run the Pkl code generator in **SDK-only mode** (`-p sdkOnly=true`). This generates only Python SDK files — no JSON typedefs, no frontend code, no samples.
+
+Use a temp directory for staging so no generated files land in the models repo:
+
+```bash
+cd "$MODELS_DIR"
+STAGING_DIR="$(mktemp -d)"
+OVERLAYS_PATH="${SDK_DIR}/pyatlan_v9/model/assets/_overlays/"
+
+pkl eval typedefs/*.pkl -m "$STAGING_DIR" -p sdkOnly=true -p sdk=true \
+  -p targetOutputDir=pyatlan_v9/model/assets/ \
+  -p internalPackage=pyatlan_v9.model \
+  -p sdkOverlaysBasePath="$OVERLAYS_PATH"
+```
+
+- `-p sdkOnly=true` skips JSON typedef generation — only Python SDK files are produced
+- `sdkOverlaysBasePath` must be an absolute path — Pkl resolves `read?()` relative to the module file, not CWD
+- Output goes to a temp staging directory (nothing written to models repo)
+
+### 3. Selective sync
+
+Copy generated files from the staging dir to the SDK, **excluding** these files that have manual patches or are hand-written:
+
+| File | Reason |
+|------|--------|
+| `__init__.py` | Hand-written init with `__all__` |
+| `entity.py` | Patched: `_metadata_proxies`, `type_name: Any`, `SaveSemantic` |
+| `referenceable.py` | Patched: `InternalKeywordField`, field descriptors, helper exports |
+| `atlas_glossary.py` | Patched: GTC anchor-in-attributes handling |
+| `atlas_glossary_term.py` | Patched: GTC anchor-in-attributes handling |
+| `atlas_glossary_category.py` | Patched: GTC anchor-in-attributes handling |
+| `quick_sight_dataset.py` | Patched: `useLocalTypeAsPrefix` field naming |
+| `quick_sight_dataset_field.py` | Patched: `useLocalTypeAsPrefix` field naming |
+| `quick_sight_folder.py` | Patched: `useLocalTypeAsPrefix` field naming |
+| `data_quality_rule.py` | Hand-written, not yet generated correctly |
+
+```bash
+rsync -av \
+  --exclude='__init__.py' \
+  --exclude='entity.py' \
+  --exclude='referenceable.py' \
+  --exclude='atlas_glossary.py' \
+  --exclude='atlas_glossary_term.py' \
+  --exclude='atlas_glossary_category.py' \
+  --exclude='quick_sight_dataset.py' \
+  --exclude='quick_sight_dataset_field.py' \
+  --exclude='quick_sight_folder.py' \
+  --exclude='data_quality_rule.py' \
+  "${STAGING_DIR}/pyatlan_v9/model/assets/" \
+  "${SDK_DIR}/pyatlan_v9/model/assets/"
+
+# Clean up staging dir
+rm -rf "$STAGING_DIR"
+```
+
+16 additional types in pyatlan_v9 (persona.py, purpose.py, badge.py, access_control.py, etc.) are hand-written and NOT generated by Pkl — rsync won't touch them since they don't exist in staging.
+
+### 4. Post-sync patches
+
+**related_entity.py** — ensure `relationship_attributes` field exists after `unique_attributes`. This file is not generated in sdkOnly mode, so it persists across regens. If starting from scratch, add:
+```python
+    # Relationship-specific attributes
+    relationship_attributes: Union[dict[str, Any], None, UnsetType] = UNSET
+    """Attributes of the relationship itself (e.g., description, status, etc.)."""
+```
+
+### 5. Run ruff auto-fix and format
+
+After syncing and patching, run ruff to fix unused imports and format the generated files:
+
+```bash
+cd "${SDK_DIR}"
+uv run ruff check --fix --select F401,F811 pyatlan_v9/
+uv run ruff format pyatlan_v9/
+```
+
+### 6. Run tests (if args contain "test")
+
+```bash
+cd "${SDK_DIR}" && python -m pytest tests_v9/unit/ -x -q
+```
+
+### 7. Report summary
+
+Report: how many files were generated, how many synced, how many excluded, and test results if applicable.
+
+## Notes
+
+- The models repo is cloned from `git@github.com:atlanhq/models.git`
+- If `../models` already exists, it fetches and checks out the requested branch instead of re-cloning
+- `-p sdkOnly=true` ensures only Python SDK files are generated (no JSON typedefs written to models repo)
+- Generated files go to a temp staging dir, then are selectively synced to `atlan-python/pyatlan_v9/model/assets/`
+- Fields with `useSetType=true` in the Pkl typedefs generate `set[str]` instead of `list[str]` (used for user/group/role fields)
+- Overlay files (custom methods like `creator()`, `updater()`, policy helpers) live at `pyatlan_v9/model/assets/_overlays/` in this repo
@@ -1,11 +1,10 @@
 name: Pyatlan Pull Request Build
 
 # This workflow runs both sync and async integration tests intelligently:
-# - Sync integration tests: Always run on every PR
-# - Async integration tests: Only run when:
-#   1. Changes detected in pyatlan/*/aio/ or tests/*/aio/ paths
-#   2. PR has the "run-async-tests" label (manual trigger)
-# This prevents adding 12+ minutes to every PR while ensuring async tests run when needed.
+# - Legacy sync integration tests: Always run on every PR with code changes
+# - Legacy async integration tests: Only run when AIO changes detected or "run-async-tests" label
+# - V9 unit tests: Always run on every PR with code changes
+# - V9 integration tests (sync + async): Only run when PR has the "run_pyatlan_v9_integration_tests" label
 
 on:
   pull_request:
@@ -117,6 +116,28 @@ jobs:
             echo "⏭️ No AIO changes detected and no manual trigger label found"
           fi
 
+  check-v9-integration-label:
+    runs-on: ubuntu-latest
+    outputs:
+      run-v9-integration: ${{ steps.check-label.outputs.run-v9-integration }}
+    steps:
+      - name: Check for v9 integration test label
+        id: check-label
+        run: |
+          if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
+            echo "run-v9-integration=true" >> $GITHUB_OUTPUT
+            echo "Manual trigger: running v9 integration tests"
+            exit 0
+          fi
+
+          if echo '${{ toJson(github.event.pull_request.labels.*.name) }}' | grep -q "run_pyatlan_v9_integration_tests"; then
+            echo "run-v9-integration=true" >> $GITHUB_OUTPUT
+            echo "Found 'run_pyatlan_v9_integration_tests' label"
+          else
+            echo "run-v9-integration=false" >> $GITHUB_OUTPUT
+            echo "No 'run_pyatlan_v9_integration_tests' label found, skipping v9 integration tests"
+          fi
+
   qa-checks-and-unit-tests:
     needs: [check-code-changes, vulnerability-scan]
     if: needs.check-code-changes.outputs.has-code-changes == 'true'
@@ -260,3 +281,153 @@ jobs:
           # Run the async integration test file using `pytest-timer` plugin
           # to display only the durations of the 10 slowest tests with `pytest-sugar`
           command: uv run pytest ${{ matrix.test_file }} -p name_of_plugin --timer-top-n 10 --force-sugar -vv
+
+  # =========================================================================
+  # V9 (msgspec) Jobs
+  # =========================================================================
+
+  v9-qa-checks-and-unit-tests:
+    needs: [check-code-changes, vulnerability-scan]
+    if: needs.check-code-changes.outputs.has-code-changes == 'true'
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v6
+
+      - name: Install dependencies
+        run: uv sync --group dev
+
+      - name: QA checks (ruff-format, ruff-lint, mypy)
+        run: uv run ./qa-checks
+
+      - name: Run v9 unit tests
+        env:
+          ATLAN_API_KEY: ${{ secrets.ATLAN_API_KEY }}
+          ATLAN_BASE_URL: ${{ secrets.ATLAN_BASE_URL }}
+        run: uv run pytest tests_v9/unit --force-sugar -vv
+
+  v9-prepare-integration-tests:
+    needs: [check-code-changes, vulnerability-scan, check-v9-integration-label]
+    if: >-
+      needs.check-code-changes.outputs.has-code-changes == 'true' &&
+      needs.check-v9-integration-label.outputs.run-v9-integration == 'true'
+    runs-on: ubuntu-latest
+    outputs:
+      v9-files: ${{ steps.distribute-v9-files.outputs.v9-files }}
+      v9-aio-files: ${{ steps.distribute-v9-aio-files.outputs.v9-aio-files }}
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Prepare v9 sync integration tests distribution
+        id: distribute-v9-files
+        run: |
+          files=$(find tests_v9/integration -maxdepth 1 \( -name "test_*.py" -o -name "*_test.py" \) | sort | tr '\n' ' ')
+          if [ -n "$files" ]; then
+            json_files=$(echo "${files[@]}" | jq -R -c 'split(" ")[:-1]')
+          else
+            json_files="[]"
+          fi
+          echo "v9-files=$json_files" >> $GITHUB_OUTPUT
+          echo "V9 sync integration test files: $json_files"
+
+      - name: Prepare v9 async integration tests distribution
+        id: distribute-v9-aio-files
+        run: |
+          if [ -d "tests_v9/integration/aio" ]; then
+            aio_files=$(find tests_v9/integration/aio -name "test_*.py" | sort | tr '\n' ' ')
+            if [ -n "$aio_files" ]; then
+              json_aio_files=$(echo "${aio_files[@]}" | jq -R -c 'split(" ")[:-1]')
+            else
+              json_aio_files="[]"
+            fi
+          else
+            json_aio_files="[]"
+          fi
+          echo "v9-aio-files=$json_aio_files" >> $GITHUB_OUTPUT
+          echo "V9 async integration test files: $json_aio_files"
+
+  v9-integration-tests:
+    needs: [v9-prepare-integration-tests]
+    if: needs.v9-prepare-integration-tests.outputs.v9-files != '[]'
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        test_file: ${{fromJson(needs.v9-prepare-integration-tests.outputs.v9-files)}}
+    concurrency:
+      group: v9-${{ matrix.test_file }}
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Python 3.11
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v6
+
+      - name: Install dependencies
+        run: uv sync --group dev
+
+      - name: Run v9 integration test
+        env:
+          ATLAN_API_KEY: ${{ secrets.ATLAN_API_KEY }}
+          ATLAN_BASE_URL: ${{ secrets.ATLAN_BASE_URL }}
+        uses: nick-fields/retry@v3
+        with:
+          max_attempts: 3
+          timeout_minutes: 10
+          command: uv run pytest ${{ matrix.test_file }} -p name_of_plugin --timer-top-n 10 --force-sugar -vv
+
+  v9-async-integration-tests:
+    needs: [v9-prepare-integration-tests]
+    if: needs.v9-prepare-integration-tests.outputs.v9-aio-files != '[]'
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        test_file: ${{fromJson(needs.v9-prepare-integration-tests.outputs.v9-aio-files)}}
+    concurrency:
+      group: v9-async-${{ matrix.test_file }}
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Python 3.11
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v6
+
+      - name: Install dependencies
+        run: uv sync --group dev
+
+      - name: Run v9 async integration test
+        env:
+          ATLAN_API_KEY: ${{ secrets.ATLAN_API_KEY }}
+          ATLAN_BASE_URL: ${{ secrets.ATLAN_BASE_URL }}
+        uses: nick-fields/retry@v3
+        with:
+          max_attempts: 3
+          timeout_minutes: 15
+          command: uv run pytest ${{ matrix.test_file }} -p name_of_plugin --timer-top-n 10 --force-sugar -vv
@@ -218,6 +218,44 @@ This will:
 - 🎨 Format code automatically
 - ⚡ Support incremental updates
 
+## 🏗️ pyatlan_v9 Model Generation (msgspec)
+
+The `pyatlan_v9` package uses [msgspec](https://jcristharris.com/msgspec/) `Struct`-based models generated from Pkl type definitions in the [atlanhq/models](https://github.com/atlanhq/models) repo.
+
+### Using Claude Code
+
+The recommended way to regenerate models is via the Claude Code skill:
+
+```bash
+# From the atlan-python repo root:
+/generate-v9-models              # Generate from models@master
+/generate-v9-models <branch>     # Generate from a specific models branch
+/generate-v9-models test         # Generate and run tests
+/generate-v9-models <branch> test
+```
+
+The skill will:
+1. Clone/update `atlanhq/models` at `../models`
+2. Run the Pkl code generator with SDK mode (`pkl eval typedefs/*.pkl -m . -p sdk=true`)
+3. Selectively sync generated files to `pyatlan_v9/model/assets/` (excluding hand-written types)
+4. Apply post-sync patches (e.g., `set[str]` fields in `asset.py`)
+5. Optionally run `tests_v9/unit/` tests
+
+### Overlay Files
+
+Custom methods (`creator()`, `updater()`, policy helpers, etc.) live in `pyatlan_v9/model/assets/_overlays/`. These are Python files read by the Pkl renderer and injected into generated classes. Each overlay file uses import directives:
+
+- `# IMPORT:` — external imports (not remapped)
+- `# INTERNAL_IMPORT:` — internal imports (remapped to `pyatlan_v9.*`)
+- `# STDLIB_IMPORT:` — standard library imports
+
+### Hand-written Types
+
+Some types are not yet fully generated and are maintained by hand:
+- Infrastructure: `__init__.py`, `entity.py`, `referenceable.py`
+- GTC types: `atlas_glossary.py`, `atlas_glossary_term.py`, `atlas_glossary_category.py`
+- Others: `persona.py`, `purpose.py`, `badge.py`, `access_control.py`, `auth_policy.py`, etc.
+
 ## 📁 Project Structure
 
 Understanding the codebase layout will help you navigate and contribute effectively:
 
@@ -265,7 +265,11 @@ def vcr_cassette_dir(self, request):
 
         :returns: directory path for storing cassettes
         """
-        # Set self._CASSETTES_DIR or use the default directory path based on the test module name
-        return self._CASSETTES_DIR or os.path.join(
-            "tests/vcr_cassettes", request.module.__name__
+        # Set self._CASSETTES_DIR or use the default directory path based on the test module name.
+        # V9 tests (module name starting with tests_v9) use tests_v9/vcr_cassettes; legacy use tests/vcr_cassettes.
+        root = (
+            "tests_v9/vcr_cassettes"
+            if request.module.__name__.startswith("tests_v9")
+            else "tests/vcr_cassettes"
         )
+        return self._CASSETTES_DIR or os.path.join(root, request.module.__name__)