|
| 1 | +--- |
| 2 | +name: test-sdk |
| 3 | +description: End-to-end test the scrapegraph-py v2 SDK against the live API using a user-provided API key. Exercises every public method on both ScrapeGraphAI (sync) and AsyncScrapeGraphAI (async), including the crawl/monitor/history namespaces. Use when the user asks to "test the SDK", "run full SDK tests", or validate a release candidate. NEVER push directly to main — all changes go via a feature branch + PR. |
| 4 | +--- |
| 5 | + |
| 6 | +# Test the scrapegraph-py v2 SDK end-to-end |
| 7 | + |
| 8 | +## Hard rules |
| 9 | + |
| 10 | +1. **DO NOT push directly to `main`.** `main` is the protected release branch. If changes are needed, create a feature branch and open a PR. Never `git push origin main`, never force-push to main, never self-merge. |
| 11 | +2. **Never hardcode or commit the API key.** Accept it from the user at runtime. Pass it via the `SGAI_API_KEY` env var or the `ScrapeGraphAI(api_key=...)` constructor. Do not write it to any file, log, or commit. |
| 12 | +3. **Do not modify production config** (`env.py`, release workflows, `pyproject.toml` version) as part of testing. |
| 13 | + |
| 14 | +## Required input |
| 15 | + |
| 16 | +Ask the user for their ScrapeGraph API key before doing anything else: |
| 17 | + |
| 18 | +> I need a ScrapeGraph API key to run the live SDK tests. Please paste it (I will use it in-process only and will not write it to disk or commit it). |
| 19 | +
|
| 20 | +Export it for the session only: |
| 21 | + |
| 22 | +```bash |
| 23 | +export SGAI_API_KEY="<user-provided-key>" |
| 24 | +``` |
| 25 | + |
| 26 | +## Scope — the v2 SDK surface |
| 27 | + |
| 28 | +The SDK exposes two top-level classes in `scrapegraph_py`: |
| 29 | + |
| 30 | +- `ScrapeGraphAI` (sync) — from `scrapegraph_py.client` |
| 31 | +- `AsyncScrapeGraphAI` (async) — from `scrapegraph_py.async_client` |
| 32 | + |
| 33 | +> Note: there is **no** `smartscraper`, `markdownify`, or `agentic_scraper` method. Those names are stale. Use the endpoints below — they mirror the Playground (Scrape, Extract, Search, Crawl, Monitor). |
| 34 | +
|
| 35 | +### Endpoints to exercise |
| 36 | + |
| 37 | +| Endpoint | Sync method | Async method | |
| 38 | +|----------|-------------|--------------| |
| 39 | +| Scrape | `client.scrape(...)` | `await aclient.scrape(...)` | |
| 40 | +| Extract | `client.extract(...)` | `await aclient.extract(...)` | |
| 41 | +| Search | `client.search(...)` | `await aclient.search(...)` | |
| 42 | +| Crawl | `client.crawl.start(...)` | `await aclient.crawl.start(...)` | |
| 43 | +| Monitor | `client.monitor.create(...)` | `await aclient.monitor.create(...)` | |
| 44 | +| Credits | `client.credits()` | `await aclient.credits()` | |
| 45 | + |
| 46 | +### Namespace sub-methods (cover the full lifecycle) |
| 47 | + |
| 48 | +- `client.crawl` — `start`, `get`, `stop`, `resume`, `delete` |
| 49 | +- `client.monitor` — `create`, `list`, `get`, `update`, `pause`, `resume`, `activity`, `delete` |
| 50 | +- `client.history` — `list`, `get` *(supporting, not shown in Playground)* |
| 51 | + |
| 52 | +The async client exposes the same namespaces under the same attribute names, with `async` methods. `health()` and `close()` also exist as utility methods — call `health()` as a sanity check at the start of the run. |
| 53 | + |
| 54 | +## Scope — test the WHOLE SDK |
| 55 | + |
| 56 | +Every public method above must be exercised against the live API on both `ScrapeGraphAI` and `AsyncScrapeGraphAI`. Do not mock. |
| 57 | + |
| 58 | +For each call: |
| 59 | +- Use a minimal valid payload (e.g. `https://example.com` + trivial prompt). |
| 60 | +- Record the response type, that it matches the returned Pydantic model / `ApiResult`, and any surfaced error. |
| 61 | +- For `crawl` / `monitor`: after `start`/`create`, also exercise `get`, then `stop`/`pause`+`resume`, then `delete` so the full lifecycle is covered and no test artifacts are left behind. |
| 62 | +- Call `credits()` before and after the full run so the user can see credit consumption. |
| 63 | + |
| 64 | +## Procedure |
| 65 | + |
| 66 | +1. Confirm the working directory is clean (`git status`). If not, stop and ask the user. |
| 67 | +2. Confirm you're not on `main`. If making commits, branch first: `git checkout -b test/sdk-smoke-YYYYMMDD`. Running tests without committing does not require a branch. |
| 68 | +3. Install deps: `uv sync`. |
| 69 | +4. Run the existing unit tests first: `uv run pytest tests/ -v`. Fix any failures before live testing. |
| 70 | +5. Write a throwaway script (e.g. `scripts/smoke_sdk.py`) that: |
| 71 | + - Imports `ScrapeGraphAI` and `AsyncScrapeGraphAI` from `scrapegraph_py`. |
| 72 | + - Calls every top-level method and every namespace method listed above, on both clients. |
| 73 | + - Prints a compact table: method | sync ok | async ok | notes. |
| 74 | +6. Run it: `uv run python scripts/smoke_sdk.py`. |
| 75 | +7. Delete the throwaway script. Do not commit it. |
| 76 | +8. Report a summary: which methods passed, which failed, credits consumed, and any suspicious response shapes. |
| 77 | + |
| 78 | +## If you find a bug |
| 79 | + |
| 80 | +- Branch: `git checkout -b fix/<short-description>`. |
| 81 | +- Fix it. Then run the full pre-commit suite from `CLAUDE.md`: |
| 82 | + ```bash |
| 83 | + uv run ruff format src tests |
| 84 | + uv run ruff check src tests --fix |
| 85 | + uv build |
| 86 | + uv run pytest tests/ -v |
| 87 | + ``` |
| 88 | +- Commit with a `fix:` prefix (keeps the semantic-release bump at patch). |
| 89 | +- Push the branch and open a PR. **Do not merge to main yourself.** |
| 90 | + |
| 91 | +## Reminders to surface to the user |
| 92 | + |
| 93 | +- Live tests consume API credits. Confirm before running. |
| 94 | +- If any method returns a 4xx/5xx, report it verbatim — do not retry silently more than once. |
| 95 | +- If the user's key is invalid or rate-limited, stop and tell them; do not swap in any other key. |
0 commit comments