Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Here you'll find a contributing guide to get started with development.

## Environment

For local development, it is required to have Python 3.10 (or a later version) installed.
For local development, it is required to have Python 3.11 (or a later version) installed.

We use [uv](https://docs.astral.sh/uv/) for project management. Install it and set up your IDE accordingly.

Expand Down
13 changes: 11 additions & 2 deletions docs/01_introduction/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,16 @@ import CodeBlock from '@theme/CodeBlock';

import IntroductionExample from '!!raw-loader!./code/01_introduction.py';

The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides useful features like Actor lifecycle management, local storage emulation, and Actor event handling.
The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides everything you need to build an Actor and run it both locally and on the [Apify platform](https://docs.apify.com/platform). With the SDK, you can:

- Manage the Actor lifecycle: initialization, graceful shutdown, status messages, rebooting, and metamorphing.
- Work with datasets, key-value stores, and request queues, with automatic local emulation when running outside the platform.
- Read the Actor input, including automatic decryption of secret fields.
- React to platform events (system info, migration, abort) and persist state across migrations and restarts.
- Manage proxies, both [Apify Proxy](https://docs.apify.com/platform/proxy) and your own, with session and tiered-proxy support.
- Start, call, and abort Actors and tasks, create webhooks, and reach the full Apify API client.
- Charge users with the pay-per-event pricing model.
- Integrate with [Crawlee](../guides/crawlee) and [Scrapy](../guides/scrapy), with guides for [Playwright](../guides/playwright) and others.

<CodeBlock className="language-python">
{IntroductionExample}
Expand All @@ -29,7 +38,7 @@ Explore the Guides section in the sidebar for a deeper understanding of the SDK'

## Installation

The Apify SDK for Python requires Python version 3.10 or above. It is typically installed when you create a new Actor project using the [Apify CLI](https://docs.apify.com/cli). To install it manually in an existing project, use:
The Apify SDK for Python requires Python version 3.11 or above. It is typically installed when you create a new Actor project using the [Apify CLI](https://docs.apify.com/cli). To install it manually in an existing project, use:

```bash
pip install apify
Expand Down
13 changes: 8 additions & 5 deletions docs/01_introduction/quick-start.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ import UnderscoreMainExample from '!!raw-loader!./code/actor_structure/__main__.

## Step 1: Create Actors

To create and run Actors in Apify Console, refer to the [Console documentation](/platform/actors/development/quick-start/web-ide).
To create and run Actors in [Apify Console](https://docs.apify.com/platform/console), refer to the [Console documentation](/platform/actors/development/quick-start/web-ide).

To create a new Apify Actor on your computer, you can use the [Apify CLI](/cli), and select one of the [Python Actor templates](https://apify.com/templates?category=python).

For example, to create an Actor from the "[beta] Python SDK" template, you can use the [`apify create` command](/cli/docs/reference#apify-create-actorname).
For example, to create an Actor from the "Getting started with Python" template, you can use the [`apify create` command](/cli/docs/reference#apify-create-actorname).

```bash
apify create my-first-actor --template python-start
Expand Down Expand Up @@ -59,15 +59,15 @@ The Actor's runtime dependencies are specified in the `requirements.txt` file, w
The Actor's source code is in the `src` folder. This folder contains two important files:

- `main.py` - which contains the main function of the Actor
- `__main__.py` - which is the entrypoint of the Actor package setting up the Actor [logger](../concepts/logging) and executing the Actor's main function via [`asyncio.run()`](https://docs.python.org/3/library/asyncio-runner.html#asyncio.run).
- `__main__.py` - which is the entrypoint of the Actor package, executing the Actor's main function via [`asyncio.run()`](https://docs.python.org/3/library/asyncio-runner.html#asyncio.run).

<Tabs>
<TabItem value="main.py" label="main.py" default>
<CodeBlock className="language-python">
{MainExample}
</CodeBlock>
</TabItem>
<TabItem value="__main__.py" label="__main.py__">
<TabItem value="__main__.py" label="__main__.py">
<CodeBlock className="language-python">
{UnderscoreMainExample}
</CodeBlock>
Expand All @@ -79,13 +79,15 @@ We recommend keeping the entrypoint for the Actor in the `src/__main__.py` file.

## Next steps

Now that you can create and run an Actor locally, explore the rest of the SDK's features and its framework integrations.

### Concepts

To learn more about the features of the Apify SDK and how to use them, check out the Concepts section in the sidebar:

- [Actor lifecycle](../concepts/actor-lifecycle)
- [Actor input](../concepts/actor-input)
- [Working with storages](../concepts/storages)
- [Storages](../concepts/storages)
- [Actor events & state persistence](../concepts/actor-events)
- [Proxy management](../concepts/proxy-management)
- [Interacting with other Actors](../concepts/interacting-with-other-actors)
Expand All @@ -94,6 +96,7 @@ To learn more about the features of the Apify SDK and how to use them, check out
- [Logging](../concepts/logging)
- [Actor configuration](../concepts/actor-configuration)
- [Pay-per-event monetization](../concepts/pay-per-event)
- [Storage clients](../concepts/storage-clients)

### Guides

Expand Down
4 changes: 2 additions & 2 deletions docs/02_concepts/01_actor_lifecycle.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ import RebootExample from '!!raw-loader!roa-loader!./code/01_reboot.py';

import StatusMessageExample from '!!raw-loader!roa-loader!./code/01_status_message.py';

This guide explains how an **Apify Actor** starts, runs, and shuts down, describing the complete Actor lifecycle. For information about the core concepts such as Actors, the Apify Console, storages, and events, check out the [Apify platform documentation](https://docs.apify.com/platform).
This guide explains how an **Apify Actor** starts, runs, and shuts down, describing the complete Actor lifecycle. For information about the core concepts such as Actors, the [Apify Console](https://docs.apify.com/platform/console), storages, and events, check out the [Apify platform documentation](https://docs.apify.com/platform).

## Actor initialization

Expand Down Expand Up @@ -106,4 +106,4 @@ Update the status only when the user's understanding of progress changes - avoid

## Conclusion

This page has presented the full Actor lifecycle: initialization, execution, error handling, rebooting, shutdown and status messages. You've seen how the SDK supports both context-based and manual control patterns. For deeper dives, explore the <ApiLink to="">reference docs</ApiLink>, [guides](https://docs.apify.com/sdk/python/docs/guides/beautifulsoup-httpx), and [platform documentation](https://docs.apify.com/platform).
This page has presented the full Actor lifecycle: initialization, execution, error handling, rebooting, shutdown and status messages. You've seen how the SDK supports both context-based and manual control patterns. For deeper dives, explore the <ApiLink to="class/Actor">`Actor`</ApiLink> API reference, [guides](../guides/beautifulsoup-httpx), and [platform documentation](https://docs.apify.com/platform).
6 changes: 5 additions & 1 deletion docs/02_concepts/02_actor_input.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import ApiLink from '@theme/ApiLink';

The Actor gets its [input](https://docs.apify.com/platform/actors/running/input) from the input record in its default [key-value store](https://docs.apify.com/platform/storage/key-value-store).

To access it, instead of reading the record manually, you can use the <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink> convenience method. It will get the input record key from the Actor configuration, read the record from the default key-value store,and decrypt any [secret input fields](https://docs.apify.com/platform/actors/development/secret-input).
To access it, instead of reading the record manually, you can use the <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink> convenience method. It gets the input record key from the Actor configuration, reads the record from the default key-value store, and decrypts any [secret input fields](https://docs.apify.com/platform/actors/development/secret-input).

For example, if an Actor received a JSON input with two fields, `{ "firstNumber": 1, "secondNumber": 2 }`, this is how you might process it:

Expand All @@ -34,4 +34,8 @@ The Apify platform supports [secret input fields](https://docs.apify.com/platfor

No special handling is needed in your code — when you call <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink>, encrypted fields are automatically decrypted using the Actor's private key, which is provided by the platform via environment variables. You receive the plaintext values directly.

## Conclusion

This page has shown how to read Actor input with <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink>, how to load URL sources with <ApiLink to="class/ApifyRequestList">`ApifyRequestList`</ApiLink>, and how secret input fields are decrypted automatically when you read them.

For more details on Actor input and how to define input schemas, see the [Actor input](https://docs.apify.com/platform/actors/running/input) and [input schema](https://docs.apify.com/platform/actors/development/input-schema) documentation on the Apify platform.
16 changes: 10 additions & 6 deletions docs/02_concepts/03_storages.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
id: storages
title: Working with storages
title: Storages
description: Use datasets, key-value stores, and request queues to persist Actor data.
---

Expand Down Expand Up @@ -45,11 +45,11 @@ Each dataset item, key-value store record, or request in a request queue is then

When developing locally, opening any storage will by default use local storage. To change this behavior and to use remote storage you have to use `force_cloud=True` argument in <ApiLink to="class/Actor#open_dataset">`Actor.open_dataset`</ApiLink>, <ApiLink to="class/Actor#open_request_queue">`Actor.open_request_queue`</ApiLink> or <ApiLink to="class/Actor#open_key_value_store">`Actor.open_key_value_store`</ApiLink>. Proper use of this argument allows you to work with both local and remote storages.

Calling another remote Actor and accessing its default storage is typical use-case for using `force-cloud=True` argument to open remote Actor's storages.
Calling another remote Actor and accessing its default storage is a typical use-case for using `force_cloud=True` argument to open remote Actor's storages.

### Local storage persistence

By default, the storage contents are persisted across multiple Actor runs. To clean up the Actor storages before the running the Actor, use the `--purge` flag of the [`apify run`](https://docs.apify.com/cli/docs/reference#apify-run) command of the Apify CLI.
By default, the storage contents are persisted across multiple Actor runs. To clean up the Actor storages before running the Actor, use the `--purge` flag of the [`apify run`](https://docs.apify.com/cli/docs/reference#apify-run) command of the Apify CLI.

```bash
apify run --purge
Expand Down Expand Up @@ -106,8 +106,8 @@ To get an iterator of the data, you can use the <ApiLink to="class/Dataset#itera
### Exporting items

You can also export the dataset items into a key-value store, as either a CSV or a JSON record,
using the <ApiLink to="class/Dataset#export_to_csv">`Dataset.export_to_csv`</ApiLink>
or <ApiLink to="class/Dataset#export_to_json">`Dataset.export_to_json`</ApiLink> method.
using the <ApiLink to="class/Dataset#export_to">`Dataset.export_to`</ApiLink> method with the
`content_type` argument set to `'csv'` or `'json'`.

<RunnableCodeBlock className="language-python" language="python">
{DatasetExportsExample}
Expand Down Expand Up @@ -183,6 +183,10 @@ To check if all the requests in the queue are handled, you can use the <ApiLink

## Storage clients

Behind the scenes, the SDK uses storage clients to communicate with the storage backend. The appropriate client is selected automatically based on the runtime environment — on the Apify platform, data is persisted via the Apify API, while local runs use the filesystem. For most use cases, you don't need to think about storage clients at all. If you want to learn more about how storage clients work, the available implementations, or how to configure them, see the [Crawlee storage clients guide](https://crawlee.dev/python/docs/guides/storage-clients). The Apify-specific clients are available in the `apify.storage_clients` module.
Behind the scenes, the SDK uses storage clients to communicate with the storage backend. The appropriate client is selected automatically based on the runtime environment. On the Apify platform, data is persisted via the Apify API, while local runs use the filesystem. For most use cases, you don't need to think about storage clients at all. To learn about the available implementations, how to switch between a single and shared request queue, or how to configure a custom client, see [Storage clients](./storage-clients). For a deeper look at how storage clients work internally, see the [Crawlee storage clients guide](https://crawlee.dev/python/docs/guides/storage-clients).

## Conclusion

This page has covered the three storage types (datasets, key-value stores, and request queues): how they are emulated on the local filesystem, how to open named and unnamed storages, and how to read from and write to each through the `Actor` shortcuts and the storage classes.

For comprehensive information about storage on the Apify platform, see the [storage documentation](https://docs.apify.com/platform/storage), including the pages on [datasets](https://docs.apify.com/platform/storage/dataset), [key-value stores](https://docs.apify.com/platform/storage/key-value-store), and [request queues](https://docs.apify.com/platform/storage/request-queue).
32 changes: 18 additions & 14 deletions docs/02_concepts/04_actor_events.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o

## Event types

A listener can optionally receive a single argument, a Pydantic model with the event's data. The following table lists the events, the type of that data object, and when each event is emitted.

<table>
<thead>
<tr>
Expand All @@ -25,25 +27,23 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
<tbody>
<tr>
<td><code>SYSTEM_INFO</code></td>
<td><pre>{`{
"created_at": datetime,
"cpu_current_usage": float,
"mem_current_bytes": int,
"is_cpu_overloaded": bool
}`}
</pre></td>
<td><ApiLink to="class/EventSystemInfoData"><code>EventSystemInfoData</code></ApiLink></td>
<td>
<p>This event is emitted regularly and it indicates the current resource usage of the Actor.</p>
The <code>is_cpu_overloaded</code> argument indicates whether the current CPU usage is higher than <code>Config.max_used_cpu_ratio</code>
<p>Emitted regularly to report the Actor's current resource usage. The
<code>cpu_info.used_ratio</code> field reports the fraction of CPU currently in use
(a float between <code>0.0</code> and <code>1.0</code>), and <code>memory_info.current_size</code>
reports the current memory usage. Compare <code>cpu_info.used_ratio</code> against
<code>Configuration.max_used_cpu_ratio</code> to detect CPU overload.</p>
</td>
</tr>
<tr>
<td><code>MIGRATING</code></td>
<td><code>None</code></td>
<td><ApiLink to="class/EventMigratingData"><code>EventMigratingData</code></ApiLink></td>
<td>
<p>Emitted when the Actor running on the Apify platform
is going to be <a href="https://docs.apify.com/platform/actors/development/state-persistence#what-is-a-migration">migrated</a>
{' '}to another worker server soon.</p>
{' '}to another worker server soon. The <code>time_remaining</code> field reports how much time
the Actor has left before it is force-migrated.</p>
You can use it to persist the state of the Actor so that once it is executed again on the new server,
it doesn't have to start over from the beginning.
Once you have persisted the state of your Actor, you can call <ApiLink to="class/Actor#reboot">`Actor.reboot`</ApiLink>
Expand All @@ -52,7 +52,7 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
</tr>
<tr>
<td><code>ABORTING</code></td>
<td><code>None</code></td>
<td><ApiLink to="class/EventAbortingData"><code>EventAbortingData</code></ApiLink></td>
<td>
When a user aborts an Actor run on the Apify platform,
they can choose to abort gracefully to allow the Actor some time before getting killed.
Expand All @@ -61,7 +61,7 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
</tr>
<tr>
<td><code>PERSIST_STATE</code></td>
<td><pre>{`{ "is_migrating": bool }`}</pre></td>
<td><ApiLink to="class/EventPersistStateData"><code>EventPersistStateData</code></ApiLink></td>
<td>
<p>Emitted in regular intervals (by default 60 seconds) to notify the Actor that it should persist its state,
in order to avoid repeating all work when the Actor restarts.</p>
Expand All @@ -73,7 +73,7 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
</tr>
<tr>
<td><code>EXIT</code></td>
<td><code>None</code></td>
<td><ApiLink to="class/EventExitData"><code>EventExitData</code></ApiLink></td>
<td>
Emitted by the SDK (not the platform) when the Actor is about to exit. You can use this event to perform final cleanup tasks,
such as closing external connections or sending notifications, before the Actor shuts down.
Expand Down Expand Up @@ -103,4 +103,8 @@ You can optionally specify a `key` (the key-value store key under which the stat
{UseStateExample}
</RunnableCodeBlock>

## Conclusion

This page has described the events emitted during a run (`SYSTEM_INFO`, `MIGRATING`, `ABORTING`, `PERSIST_STATE`, and `EXIT`): how to handle them with <ApiLink to="class/Actor#on">`Actor.on`</ApiLink>, and how to persist state automatically with <ApiLink to="class/Actor#use_state">`Actor.use_state`</ApiLink>.

For more details on platform events and state persistence, see the [system events](https://docs.apify.com/platform/actors/development/programming-interface/system-events) and [state persistence](https://docs.apify.com/platform/actors/development/state-persistence) documentation on the Apify platform.
Loading
Loading