Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import asyncio

from crawlee.browsers import BrowserPool, PlaywrightBrowserController, PlaywrightBrowserPlugin
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee._utils.context import ensure_context
from typing_extensions import override


class CustomBrowserPlugin(PlaywrightBrowserPlugin):
"""A custom browser plugin that launches a browser from a custom executable path."""

def __init__(self, executable_path: str, **kwargs: object) -> None:
super().__init__(**kwargs)
self._executable_path = executable_path

@ensure_context
@override
async def new_browser(self) -> PlaywrightBrowserController:
if not self._playwright:
raise RuntimeError('Playwright browser plugin is not initialized.')

browser = await self._playwright.chromium.launch(
executable_path=self._executable_path,
headless=True,
)
return PlaywrightBrowserController(
browser=browser,
max_open_pages_per_browser=self.max_open_pages_per_browser,
)


async def main() -> None:
plugin = CustomBrowserPlugin(executable_path='/path/to/custom/browser')
browser_pool = BrowserPool(plugins=[plugin])
crawler = PlaywrightCrawler(browser_pool=browser_pool)

@crawler.router.default_handler
async def handler(context: PlaywrightCrawlingContext) -> None:
context.log.info(f'Crawling: {context.request.url}')

await crawler.run(['https://crawlee.dev'])


asyncio.run(main())
16 changes: 15 additions & 1 deletion docs/guides/playwright_crawler.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -88,4 +88,18 @@ Navigation hooks allow for additional configuration at specific points during pa

## Conclusion

This guide introduced the <ApiLink to="class/PlaywrightCrawler">`PlaywrightCrawler`</ApiLink> and explained how to configure it using <ApiLink to="class/BrowserPool">`BrowserPool`</ApiLink> and <ApiLink to="class/PlaywrightBrowserPlugin">`PlaywrightBrowserPlugin`</ApiLink>. You learned how to launch multiple browsers, configure browser and context settings, use <ApiLink to="class/BrowserPool">`BrowserPool`</ApiLink> lifecycle hooks, and apply navigation hooks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/crawlee-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
## Extending the browser plugin

For full control over browser launching, you can subclass <ApiLink to="class/PlaywrightBrowserPlugin">`PlaywrightBrowserPlugin`</ApiLink> and override its `new_browser` method. This lets you integrate any Playwright-compatible browser backend — such as a custom Chromium build, a stealth browser, or a browser with a persistent profile.

The overridden `new_browser` method must return a <ApiLink to="class/PlaywrightBrowserController">`PlaywrightBrowserController`</ApiLink> instance wrapping your custom browser. Pass your plugin to <ApiLink to="class/BrowserPool">`BrowserPool`</ApiLink>, which you then provide to <ApiLink to="class/PlaywrightCrawler">`PlaywrightCrawler`</ApiLink> via the `browser_pool` argument.

<CodeBlock className="language-python">
{ExtendingPluginExample}
</CodeBlock>

For a real-world example of a custom browser plugin, see the [Camoufox example](../examples/playwright-crawler-with-camoufox).

:::note
Third-party projects that provide alternative browser backends for Crawlee can link to this section as the canonical reference for plugin subclassing.
:::