Skip to content

Commit 60ca25d

Browse files
authored
Merge pull request #811 from atlanhq/BLDX-434
BLDX-434 | Migrate to `msgspec.Struct` models
2 parents c444099 + 96cc1fd commit 60ca25d

1,032 files changed

Lines changed: 436115 additions & 145 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
---
2+
description: Generate pyatlan_v9 msgspec model files by cloning the models repo and running the Pkl code generator
3+
---
4+
5+
# Generate v9 Models
6+
7+
Generates pyatlan_v9 msgspec model files from Pkl type definitions in the atlanhq/models repo.
8+
9+
## Usage
10+
11+
- `/generate-v9-models` β€” Clone models@master, generate and sync v9 models
12+
- `/generate-v9-models <branch>` β€” Clone models@<branch> instead of master
13+
- `/generate-v9-models <branch> test` β€” Also run tests after sync
14+
- `/generate-v9-models test` β€” Clone models@master and run tests after sync
15+
16+
## Instructions
17+
18+
### 1. Clone or update the models repo
19+
20+
Parse args to determine the branch (default: `master`) and whether to run tests (args contain "test").
21+
22+
The models repo should be cloned as a sibling directory of this repo (atlan-python):
23+
24+
```bash
25+
# Determine paths
26+
SDK_DIR="$(pwd)" # atlan-python root
27+
MODELS_DIR="$(cd .. && pwd)/models"
28+
BRANCH="master" # override with first non-"test" arg
29+
30+
if [ -d "$MODELS_DIR" ]; then
31+
cd "$MODELS_DIR" && git fetch origin && git checkout "$BRANCH" && git pull origin "$BRANCH"
32+
else
33+
git clone --branch "$BRANCH" --single-branch git@github.com:atlanhq/models.git "$MODELS_DIR"
34+
fi
35+
```
36+
37+
### 2. Run Pkl evaluation
38+
39+
From the models repo root, run the Pkl code generator in **SDK-only mode** (`-p sdkOnly=true`). This generates only Python SDK files β€” no JSON typedefs, no frontend code, no samples.
40+
41+
Use a temp directory for staging so no generated files land in the models repo:
42+
43+
```bash
44+
cd "$MODELS_DIR"
45+
STAGING_DIR="$(mktemp -d)"
46+
OVERLAYS_PATH="${SDK_DIR}/pyatlan_v9/model/assets/_overlays/"
47+
48+
pkl eval typedefs/*.pkl -m "$STAGING_DIR" -p sdkOnly=true -p sdk=true \
49+
-p targetOutputDir=pyatlan_v9/model/assets/ \
50+
-p internalPackage=pyatlan_v9.model \
51+
-p sdkOverlaysBasePath="$OVERLAYS_PATH"
52+
```
53+
54+
- `-p sdkOnly=true` skips JSON typedef generation β€” only Python SDK files are produced
55+
- `sdkOverlaysBasePath` must be an absolute path β€” Pkl resolves `read?()` relative to the module file, not CWD
56+
- Output goes to a temp staging directory (nothing written to models repo)
57+
58+
### 3. Selective sync
59+
60+
Copy generated files from the staging dir to the SDK, **excluding** these files that have manual patches or are hand-written:
61+
62+
| File | Reason |
63+
|------|--------|
64+
| `__init__.py` | Hand-written init with `__all__` |
65+
| `entity.py` | Patched: `_metadata_proxies`, `type_name: Any`, `SaveSemantic` |
66+
| `referenceable.py` | Patched: `InternalKeywordField`, field descriptors, helper exports |
67+
| `atlas_glossary.py` | Patched: GTC anchor-in-attributes handling |
68+
| `atlas_glossary_term.py` | Patched: GTC anchor-in-attributes handling |
69+
| `atlas_glossary_category.py` | Patched: GTC anchor-in-attributes handling |
70+
| `quick_sight_dataset.py` | Patched: `useLocalTypeAsPrefix` field naming |
71+
| `quick_sight_dataset_field.py` | Patched: `useLocalTypeAsPrefix` field naming |
72+
| `quick_sight_folder.py` | Patched: `useLocalTypeAsPrefix` field naming |
73+
| `data_quality_rule.py` | Hand-written, not yet generated correctly |
74+
75+
```bash
76+
rsync -av \
77+
--exclude='__init__.py' \
78+
--exclude='entity.py' \
79+
--exclude='referenceable.py' \
80+
--exclude='atlas_glossary.py' \
81+
--exclude='atlas_glossary_term.py' \
82+
--exclude='atlas_glossary_category.py' \
83+
--exclude='quick_sight_dataset.py' \
84+
--exclude='quick_sight_dataset_field.py' \
85+
--exclude='quick_sight_folder.py' \
86+
--exclude='data_quality_rule.py' \
87+
"${STAGING_DIR}/pyatlan_v9/model/assets/" \
88+
"${SDK_DIR}/pyatlan_v9/model/assets/"
89+
90+
# Clean up staging dir
91+
rm -rf "$STAGING_DIR"
92+
```
93+
94+
16 additional types in pyatlan_v9 (persona.py, purpose.py, badge.py, access_control.py, etc.) are hand-written and NOT generated by Pkl β€” rsync won't touch them since they don't exist in staging.
95+
96+
### 4. Post-sync patches
97+
98+
**related_entity.py** β€” ensure `relationship_attributes` field exists after `unique_attributes`. This file is not generated in sdkOnly mode, so it persists across regens. If starting from scratch, add:
99+
```python
100+
# Relationship-specific attributes
101+
relationship_attributes: Union[dict[str, Any], None, UnsetType] = UNSET
102+
"""Attributes of the relationship itself (e.g., description, status, etc.)."""
103+
```
104+
105+
### 5. Run ruff auto-fix and format
106+
107+
After syncing and patching, run ruff to fix unused imports and format the generated files:
108+
109+
```bash
110+
cd "${SDK_DIR}"
111+
uv run ruff check --fix --select F401,F811 pyatlan_v9/
112+
uv run ruff format pyatlan_v9/
113+
```
114+
115+
### 6. Run tests (if args contain "test")
116+
117+
```bash
118+
cd "${SDK_DIR}" && python -m pytest tests_v9/unit/ -x -q
119+
```
120+
121+
### 7. Report summary
122+
123+
Report: how many files were generated, how many synced, how many excluded, and test results if applicable.
124+
125+
## Notes
126+
127+
- The models repo is cloned from `git@github.com:atlanhq/models.git`
128+
- If `../models` already exists, it fetches and checks out the requested branch instead of re-cloning
129+
- `-p sdkOnly=true` ensures only Python SDK files are generated (no JSON typedefs written to models repo)
130+
- Generated files go to a temp staging dir, then are selectively synced to `atlan-python/pyatlan_v9/model/assets/`
131+
- Fields with `useSetType=true` in the Pkl typedefs generate `set[str]` instead of `list[str]` (used for user/group/role fields)
132+
- Overlay files (custom methods like `creator()`, `updater()`, policy helpers) live at `pyatlan_v9/model/assets/_overlays/` in this repo

β€Ž.github/workflows/pyatlan-pr.yamlβ€Ž

Lines changed: 176 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
11
name: Pyatlan Pull Request Build
22

33
# This workflow runs both sync and async integration tests intelligently:
4-
# - Sync integration tests: Always run on every PR
5-
# - Async integration tests: Only run when:
6-
# 1. Changes detected in pyatlan/*/aio/ or tests/*/aio/ paths
7-
# 2. PR has the "run-async-tests" label (manual trigger)
8-
# This prevents adding 12+ minutes to every PR while ensuring async tests run when needed.
4+
# - Legacy sync integration tests: Always run on every PR with code changes
5+
# - Legacy async integration tests: Only run when AIO changes detected or "run-async-tests" label
6+
# - V9 unit tests: Always run on every PR with code changes
7+
# - V9 integration tests (sync + async): Only run when PR has the "run_pyatlan_v9_integration_tests" label
98

109
on:
1110
pull_request:
@@ -117,6 +116,28 @@ jobs:
117116
echo "⏭️ No AIO changes detected and no manual trigger label found"
118117
fi
119118
119+
check-v9-integration-label:
120+
runs-on: ubuntu-latest
121+
outputs:
122+
run-v9-integration: ${{ steps.check-label.outputs.run-v9-integration }}
123+
steps:
124+
- name: Check for v9 integration test label
125+
id: check-label
126+
run: |
127+
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
128+
echo "run-v9-integration=true" >> $GITHUB_OUTPUT
129+
echo "Manual trigger: running v9 integration tests"
130+
exit 0
131+
fi
132+
133+
if echo '${{ toJson(github.event.pull_request.labels.*.name) }}' | grep -q "run_pyatlan_v9_integration_tests"; then
134+
echo "run-v9-integration=true" >> $GITHUB_OUTPUT
135+
echo "Found 'run_pyatlan_v9_integration_tests' label"
136+
else
137+
echo "run-v9-integration=false" >> $GITHUB_OUTPUT
138+
echo "No 'run_pyatlan_v9_integration_tests' label found, skipping v9 integration tests"
139+
fi
140+
120141
qa-checks-and-unit-tests:
121142
needs: [check-code-changes, vulnerability-scan]
122143
if: needs.check-code-changes.outputs.has-code-changes == 'true'
@@ -260,3 +281,153 @@ jobs:
260281
# Run the async integration test file using `pytest-timer` plugin
261282
# to display only the durations of the 10 slowest tests with `pytest-sugar`
262283
command: uv run pytest ${{ matrix.test_file }} -p name_of_plugin --timer-top-n 10 --force-sugar -vv
284+
285+
# =========================================================================
286+
# V9 (msgspec) Jobs
287+
# =========================================================================
288+
289+
v9-qa-checks-and-unit-tests:
290+
needs: [check-code-changes, vulnerability-scan]
291+
if: needs.check-code-changes.outputs.has-code-changes == 'true'
292+
runs-on: ubuntu-latest
293+
strategy:
294+
matrix:
295+
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
296+
297+
steps:
298+
- name: Checkout code
299+
uses: actions/checkout@v4
300+
301+
- name: Set up Python
302+
uses: actions/setup-python@v5
303+
with:
304+
python-version: ${{ matrix.python-version }}
305+
306+
- name: Install uv
307+
uses: astral-sh/setup-uv@v6
308+
309+
- name: Install dependencies
310+
run: uv sync --group dev
311+
312+
- name: QA checks (ruff-format, ruff-lint, mypy)
313+
run: uv run ./qa-checks
314+
315+
- name: Run v9 unit tests
316+
env:
317+
ATLAN_API_KEY: ${{ secrets.ATLAN_API_KEY }}
318+
ATLAN_BASE_URL: ${{ secrets.ATLAN_BASE_URL }}
319+
run: uv run pytest tests_v9/unit --force-sugar -vv
320+
321+
v9-prepare-integration-tests:
322+
needs: [check-code-changes, vulnerability-scan, check-v9-integration-label]
323+
if: >-
324+
needs.check-code-changes.outputs.has-code-changes == 'true' &&
325+
needs.check-v9-integration-label.outputs.run-v9-integration == 'true'
326+
runs-on: ubuntu-latest
327+
outputs:
328+
v9-files: ${{ steps.distribute-v9-files.outputs.v9-files }}
329+
v9-aio-files: ${{ steps.distribute-v9-aio-files.outputs.v9-aio-files }}
330+
331+
steps:
332+
- name: Checkout code
333+
uses: actions/checkout@v4
334+
335+
- name: Prepare v9 sync integration tests distribution
336+
id: distribute-v9-files
337+
run: |
338+
files=$(find tests_v9/integration -maxdepth 1 \( -name "test_*.py" -o -name "*_test.py" \) | sort | tr '\n' ' ')
339+
if [ -n "$files" ]; then
340+
json_files=$(echo "${files[@]}" | jq -R -c 'split(" ")[:-1]')
341+
else
342+
json_files="[]"
343+
fi
344+
echo "v9-files=$json_files" >> $GITHUB_OUTPUT
345+
echo "V9 sync integration test files: $json_files"
346+
347+
- name: Prepare v9 async integration tests distribution
348+
id: distribute-v9-aio-files
349+
run: |
350+
if [ -d "tests_v9/integration/aio" ]; then
351+
aio_files=$(find tests_v9/integration/aio -name "test_*.py" | sort | tr '\n' ' ')
352+
if [ -n "$aio_files" ]; then
353+
json_aio_files=$(echo "${aio_files[@]}" | jq -R -c 'split(" ")[:-1]')
354+
else
355+
json_aio_files="[]"
356+
fi
357+
else
358+
json_aio_files="[]"
359+
fi
360+
echo "v9-aio-files=$json_aio_files" >> $GITHUB_OUTPUT
361+
echo "V9 async integration test files: $json_aio_files"
362+
363+
v9-integration-tests:
364+
needs: [v9-prepare-integration-tests]
365+
if: needs.v9-prepare-integration-tests.outputs.v9-files != '[]'
366+
runs-on: ubuntu-latest
367+
strategy:
368+
fail-fast: false
369+
matrix:
370+
test_file: ${{fromJson(needs.v9-prepare-integration-tests.outputs.v9-files)}}
371+
concurrency:
372+
group: v9-${{ matrix.test_file }}
373+
374+
steps:
375+
- name: Checkout code
376+
uses: actions/checkout@v4
377+
378+
- name: Set up Python 3.11
379+
uses: actions/setup-python@v5
380+
with:
381+
python-version: "3.11"
382+
383+
- name: Install uv
384+
uses: astral-sh/setup-uv@v6
385+
386+
- name: Install dependencies
387+
run: uv sync --group dev
388+
389+
- name: Run v9 integration test
390+
env:
391+
ATLAN_API_KEY: ${{ secrets.ATLAN_API_KEY }}
392+
ATLAN_BASE_URL: ${{ secrets.ATLAN_BASE_URL }}
393+
uses: nick-fields/retry@v3
394+
with:
395+
max_attempts: 3
396+
timeout_minutes: 10
397+
command: uv run pytest ${{ matrix.test_file }} -p name_of_plugin --timer-top-n 10 --force-sugar -vv
398+
399+
v9-async-integration-tests:
400+
needs: [v9-prepare-integration-tests]
401+
if: needs.v9-prepare-integration-tests.outputs.v9-aio-files != '[]'
402+
runs-on: ubuntu-latest
403+
strategy:
404+
fail-fast: false
405+
matrix:
406+
test_file: ${{fromJson(needs.v9-prepare-integration-tests.outputs.v9-aio-files)}}
407+
concurrency:
408+
group: v9-async-${{ matrix.test_file }}
409+
410+
steps:
411+
- name: Checkout code
412+
uses: actions/checkout@v4
413+
414+
- name: Set up Python 3.11
415+
uses: actions/setup-python@v5
416+
with:
417+
python-version: "3.11"
418+
419+
- name: Install uv
420+
uses: astral-sh/setup-uv@v6
421+
422+
- name: Install dependencies
423+
run: uv sync --group dev
424+
425+
- name: Run v9 async integration test
426+
env:
427+
ATLAN_API_KEY: ${{ secrets.ATLAN_API_KEY }}
428+
ATLAN_BASE_URL: ${{ secrets.ATLAN_BASE_URL }}
429+
uses: nick-fields/retry@v3
430+
with:
431+
max_attempts: 3
432+
timeout_minutes: 15
433+
command: uv run pytest ${{ matrix.test_file }} -p name_of_plugin --timer-top-n 10 --force-sugar -vv

β€ŽREADME.mdβ€Ž

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,44 @@ This will:
218218
- 🎨 Format code automatically
219219
- ⚑ Support incremental updates
220220

221+
## πŸ—οΈ pyatlan_v9 Model Generation (msgspec)
222+
223+
The `pyatlan_v9` package uses [msgspec](https://jcristharris.com/msgspec/) `Struct`-based models generated from Pkl type definitions in the [atlanhq/models](https://github.com/atlanhq/models) repo.
224+
225+
### Using Claude Code
226+
227+
The recommended way to regenerate models is via the Claude Code skill:
228+
229+
```bash
230+
# From the atlan-python repo root:
231+
/generate-v9-models # Generate from models@master
232+
/generate-v9-models <branch> # Generate from a specific models branch
233+
/generate-v9-models test # Generate and run tests
234+
/generate-v9-models <branch> test
235+
```
236+
237+
The skill will:
238+
1. Clone/update `atlanhq/models` at `../models`
239+
2. Run the Pkl code generator with SDK mode (`pkl eval typedefs/*.pkl -m . -p sdk=true`)
240+
3. Selectively sync generated files to `pyatlan_v9/model/assets/` (excluding hand-written types)
241+
4. Apply post-sync patches (e.g., `set[str]` fields in `asset.py`)
242+
5. Optionally run `tests_v9/unit/` tests
243+
244+
### Overlay Files
245+
246+
Custom methods (`creator()`, `updater()`, policy helpers, etc.) live in `pyatlan_v9/model/assets/_overlays/`. These are Python files read by the Pkl renderer and injected into generated classes. Each overlay file uses import directives:
247+
248+
- `# IMPORT:` β€” external imports (not remapped)
249+
- `# INTERNAL_IMPORT:` β€” internal imports (remapped to `pyatlan_v9.*`)
250+
- `# STDLIB_IMPORT:` β€” standard library imports
251+
252+
### Hand-written Types
253+
254+
Some types are not yet fully generated and are maintained by hand:
255+
- Infrastructure: `__init__.py`, `entity.py`, `referenceable.py`
256+
- GTC types: `atlas_glossary.py`, `atlas_glossary_term.py`, `atlas_glossary_category.py`
257+
- Others: `persona.py`, `purpose.py`, `badge.py`, `access_control.py`, `auth_policy.py`, etc.
258+
221259
## πŸ“ Project Structure
222260

223261
Understanding the codebase layout will help you navigate and contribute effectively:

β€Žpyatlan/test_utils/base_vcr.pyβ€Ž

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -265,7 +265,11 @@ def vcr_cassette_dir(self, request):
265265
266266
:returns: directory path for storing cassettes
267267
"""
268-
# Set self._CASSETTES_DIR or use the default directory path based on the test module name
269-
return self._CASSETTES_DIR or os.path.join(
270-
"tests/vcr_cassettes", request.module.__name__
268+
# Set self._CASSETTES_DIR or use the default directory path based on the test module name.
269+
# V9 tests (module name starting with tests_v9) use tests_v9/vcr_cassettes; legacy use tests/vcr_cassettes.
270+
root = (
271+
"tests_v9/vcr_cassettes"
272+
if request.module.__name__.startswith("tests_v9")
273+
else "tests/vcr_cassettes"
271274
)
275+
return self._CASSETTES_DIR or os.path.join(root, request.module.__name__)

0 commit comments

Comments
Β (0)