Skip to content

Commit a70e4ce

Browse files
committed
internal repo sync
1 parent 9243bd6 commit a70e4ce

7 files changed

Lines changed: 222 additions & 34 deletions

File tree

README.md

Lines changed: 81 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,99 @@
1-
<img src="./banner.png" />
1+
# 🤖💻 WorkArena - How Capable are Web Agents at Solving Common Knowledge Work Tasks?
22

3-
`WorkArena` is a suite of browser-based tasks designed for ServiceNow products, acting as a benchmark for automating commonly conducted activities within the product environment.
3+
[[Paper]](https://arxiv.org/abs/2403.07718)[[Benchmark Contents]](#benchmark-contents)[[Getting Started]](#getting-started)[[BrowserGym]](https://github.com/ServiceNow/BrowserGym)[[Citing This Work]](#citing-this-work)
44

5-
## Setup
5+
`WorkArena` is a suite of browser-based tasks tailored to gauge web agents' effectiveness in supporting routine tasks for knowledge workers.
6+
By harnessing the ubiquitous [ServiceNow](https://www.servicenow.com/what-is-servicenow.html) platform, this benchmark will be instrumental in assessing the widespread state of such automations in modern knowledge work environments.
67

7-
### ServiceNow Instance
8+
WorkArena is included in [BrowserGym](https://github.com/ServiceNow/BrowserGym), a conversational gym environment for the evaluation of web agents.
89

9-
1. Go to https://developer.servicenow.com/ and create an account
10-
2. Request a Utah developer instance (initializing it might take a while)
11-
3. Log into your ServiceNow instance via the browser and change the admin password if instructed to do so. If you're already registered in the instance, you can find the instance information (Username, Password, instance URL) in the `My Instances` section of your developer account.
12-
4. Set the following environment variables:
10+
11+
## Benchmark Contents
12+
13+
At the moment, WorkArena includes `23,150` task instances drawn from `29` tasks that cover the main components of the ServiceNow user interface. The following videos show an agent built on `GPT-4-vision` interacting with every such component. As emphasized by our results, this benchmark is not solved and thus, the performance of the agent is not always on point.
14+
15+
### Knowledge Bases
16+
17+
**Goal:** The agent must search for specific information in the company knowledge base.
18+
19+
_The agent interacts with the user via BrowserGym's conversational interface._
20+
21+
https://github.com/ServiceNow/ui-copilot/assets/2374980/a778fbfd-6f9c-41b2-9c20-1d97cc348866
22+
23+
### Forms
24+
25+
**Goal:** The agent must fill a complex form with specific values for each field.
26+
27+
https://github.com/ServiceNow/ui-copilot/assets/2374980/1f3fa96d-d76e-4f04-a75f-bcf758c5aa42
28+
29+
### Service Catalogs
30+
31+
**Goal:** The agent must order items with specific configurations from the company's service catalog.
32+
33+
https://github.com/ServiceNow/ui-copilot/assets/2374980/8451faa8-3776-4e52-bb90-a560ea23a709
34+
35+
### Lists
36+
37+
**Goal:** The agent must filter a list according to some specifications.
38+
39+
_In this example, the agent struggles to manipulate the UI and fails to create the filter._
40+
41+
https://github.com/ServiceNow/ui-copilot/assets/2374980/042f058b-a966-4f5e-a38f-146464132c49
42+
43+
### Menus
44+
45+
**Goal:** The agent must navigate to a specific application using the main menu.
46+
47+
https://github.com/ServiceNow/ui-copilot/assets/2374980/d5f89fd0-ed72-49b8-81ce-8a493a2c8f5f
48+
49+
50+
## Getting Started
51+
52+
To setup WorkArena, you will need to get your own ServiceNow instance, install our Python package, and upload some data to your instance. Follow the steps below to achieve this.
53+
54+
### a) Create a ServiceNow Developer Instance
55+
56+
1. Go to https://developer.servicenow.com/ and create an account.
57+
2. Click on `Request an instance` and select the `Vancouver` release (initializing the instance will take a few minutes)
58+
3. Once the instance is ready, you will see a popup showing its URL and credentials. You will also receive a copy by email. Based on this information, set the following environment variables:
1359
* `SNOW_INSTANCE_URL`: URL of your ServiceNow developer instance
14-
* `SNOW_INSTANCE_UNAME`: username for your instance (usually `admin`)
15-
* `SNOW_INSTANCE_PWD`: password for your instance (you'll receive this by email and you can get it from your ServiceNow developer account)
60+
* `SNOW_INSTANCE_UNAME`: Just use "admin"
61+
* `SNOW_INSTANCE_PWD`: The password for your instance. Make sure you place the value in quotes "" since it might contain special characters.
62+
4. Log into your instance via a browser using the admin credentials. Close any popup that appears on the main screen (e.g., agreeing to analytics).
1663

17-
To set environment variables in Bash, you can use the `export` command. Here's an example:
64+
**Warning:** Feel free to look around the platform, but please make sure you revert any changes (e.g., changes to list views, pinning some menus, etc.) as these changes will be persistent and affect the benchmarking process.
1865

66+
### b) Install WorkArena and Initialize your Instance
67+
68+
Run the following command to install WorkArena in the [BrowswerGym](https://github.com/servicenow/browsergym) environment:
1969
```
20-
export SNOW_INSTANCE_URL="https://your-instance-url.service-now.com"
21-
export SNOW_INSTANCE_UNAME="your-username"
22-
export SNOW_INSTANCE_PWD="your-password"
70+
pip install browsergym-workarena
2371
```
2472

25-
Another option is to add the environment variables to your conda environment. To do this, you can execute the following command :
26-
73+
Then, run this command in a terminal to upload the benchmark data to your ServiceNow instance:
2774
```
28-
conda env config vars set ENV_VAR=VALUE
75+
workarena-install
2976
```
3077

31-
### Install Data
32-
33-
Run the following code to install all the data shipped with the benchmark:
78+
### c) Validate Your Installation
3479

80+
The are a lot of moving parts (authentication credentials, benchmark data, etc.) so we highly recommend that you sanity-check your installation using our provided unit tests. Do this by running (might take a few minutes):
3581
```
36-
from browsergym.workarena.install import setup
37-
setup()
82+
pytest -v .
3883
```
3984

40-
### Finally
85+
Your installation is now complete! 🎉
86+
4187

42-
1. Run `pytest -v .` to make sure that your setup works.
88+
## Citing This Work
89+
90+
```
91+
@misc{workarena2024,
92+
title={WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?},
93+
author={Alexandre Drouin and Maxime Gasse and Massimo Caccia and Issam H. Laradji and Manuel Del Verme and Tom Marty and Léo Boisvert and Megh Thakkar and Quentin Cappart and David Vazquez and Nicolas Chapados and Alexandre Lacoste},
94+
year={2024},
95+
eprint={2403.07718},
96+
archivePrefix={arXiv},
97+
primaryClass={cs.LG}
98+
}
99+
```

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@ dynamic = ["dependencies", "version"]
2828
[project.urls]
2929
homepage = "https://github.com/ServiceNow/WorkArena"
3030

31+
[project.scripts]
32+
workarena-install = "browsergym.workarena.install:main"
33+
3134
[tool.hatch.version]
3235
path = "src/browsergym/workarena/__init__.py"
3336

src/browsergym/workarena/install.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -285,3 +285,12 @@ def setup():
285285
# XXX: Install workflows first because they may automate some downstream installations
286286
setup_workflows()
287287
setup_knowledge_base()
288+
289+
290+
def main():
291+
"""
292+
Entrypoint for package CLI installation command
293+
294+
"""
295+
logging.basicConfig(level=logging.INFO)
296+
setup()

src/browsergym/workarena/tasks/form.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -322,7 +322,7 @@ def _run_init_scripts(self, page: Page) -> None:
322322

323323
def _generate_random_config(self, seed: int, page: Page) -> None:
324324
"""Generate a random configuration for the task."""
325-
super().setup(seed, page)
325+
self.pre_setup(seed, page)
326326
self._run_init_scripts(page)
327327
# Determine task fields
328328
logging.debug("Determining task fields")

src/browsergym/workarena/tasks/list.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -222,7 +222,7 @@ def get_goal(self) -> str:
222222
return self.goal
223223

224224
def _generate_random_config(self, seed: int, page: Page):
225-
super().setup(seed, page)
225+
self.pre_setup(seed, page)
226226
self._wait_for_ready(page)
227227
self.list_info = self._extract_list_info(page)
228228
# Get available fields
@@ -427,7 +427,7 @@ def setup(self, seed: int, page: Page) -> tuple[str, dict]:
427427
self.filter_len = len(self.filter_columns)
428428

429429
def _generate_random_config(self, seed: int, page: Page):
430-
super().setup(seed, page)
430+
self.pre_setup(seed, page)
431431
self._wait_for_ready(page)
432432

433433
# Extract the list from the page

src/browsergym/workarena/tasks/service_catalog.py

Lines changed: 47 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -196,11 +196,23 @@ def _wait_for_ready(self, page: Page, wait_for_form_api=False) -> None:
196196
def form_js_selector(self):
197197
return self.js_prefix + "." + self.js_api_forms
198198

199-
def setup(self, page: playwright.sync_api.Page, seed: int = None) -> tuple[str, dict]:
200-
self.pre_setup(seed=seed, page=page)
201-
202-
self._add_init_scripts_to_context_and_reload(page, ["registerGsftMainLoaded();"])
199+
def setup(self, page: playwright.sync_api.Page, seed: int = None) -> None:
200+
self.pre_setup(page=page, seed=seed)
201+
# the cart is shared for all agents running in parallel. The "Order Now" button
202+
# is not affected so we'll make sure the agent can only use that one
203+
disable_add_to_cart = """
204+
window.addEventListener('DOMContentLoaded', (event) => {
205+
const button = document.querySelector('button[aria-label="Add to Cart"]');
206+
if (button) {
207+
button.disabled = true;
208+
}
209+
});
210+
"""
211+
self._add_init_scripts_to_context_and_reload(
212+
page, ["registerGsftMainLoaded()", disable_add_to_cart]
213+
)
203214
self._wait_for_ready(page)
215+
self._remove_top_items_panel(page)
204216
assert self.all_configs is not None, "No configuration available for the task."
205217
config = self.fixed_config if self.fixed_config else self.random.choice(self.all_configs)
206218
# use fixed config if any
@@ -219,6 +231,23 @@ def setup(self, page: playwright.sync_api.Page, seed: int = None) -> tuple[str,
219231

220232
return goal, info
221233

234+
def _remove_top_items_panel(self, page: Page):
235+
"""Removes the 'top items' panel that sometimes on the landing page"""
236+
frame = page.wait_for_selector("iframe#gsft_main").content_frame()
237+
238+
# Use evaluate to find and remove divs containing an element with role="heading" and the text "Top Requests"
239+
frame.evaluate(
240+
"""() => {
241+
const headings = Array.from(document.querySelectorAll('[role="heading"]'));
242+
headings.forEach((heading) => {
243+
if (heading.textContent.includes("Top Requests")) {
244+
let parentDiv = heading.closest('div.drag_section');
245+
if (parentDiv) parentDiv.remove();
246+
}
247+
});
248+
}"""
249+
)
250+
222251
def cheat(self, page: Page, chat_messages: list[str]) -> None:
223252
super().cheat(page, chat_messages)
224253
self._wait_for_ready(page=page)
@@ -302,9 +331,20 @@ def cheat(self, page: Page, chat_messages: list[str]) -> None:
302331
order_now_button.click()
303332

304333
def _generate_random_config(self, seed: int, page: Page):
305-
super().setup(seed=seed, page=page)
306-
307-
self._add_init_scripts_to_context_and_reload(page, ["registerGsftMainLoaded()"])
334+
self.pre_setup(page=page, seed=seed)
335+
# the cart is shared for all agents running in parallel. The "Order Now" button
336+
# is not affected so we'll make sure the agent can only use that one
337+
disable_add_to_cart = """
338+
window.addEventListener('DOMContentLoaded', (event) => {
339+
const button = document.querySelector('button[aria-label="Add to Cart"]');
340+
if (button) {
341+
button.disabled = true;
342+
}
343+
});
344+
"""
345+
self._add_init_scripts_to_context_and_reload(
346+
page, ["registerGsftMainLoaded()", disable_add_to_cart]
347+
)
308348
self._wait_for_ready(page)
309349
if self.fixed_request_item:
310350
self.requested_item = self.fixed_request_item

tests/test_task_setup.py

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
"""
2+
Tests that are not specific to any particular kind of task.
3+
4+
"""
5+
6+
import pytest
7+
import json
8+
import logging
9+
import random
10+
11+
# bugfix: use same playwright instance in browsergym and pytest
12+
from utils import setup_playwright
13+
from playwright.sync_api import Page, TimeoutError
14+
from tenacity import retry, stop_after_attempt, retry_if_exception_type
15+
from browsergym.workarena.config import ORDER_APPLE_WATCH_TASK_CONFIG_PATH
16+
17+
from browsergym.workarena.tasks.service_catalog import OrderAppleWatchTask
18+
19+
20+
@retry(
21+
stop=stop_after_attempt(5),
22+
retry=retry_if_exception_type(TimeoutError),
23+
reraise=True,
24+
before_sleep=lambda _: logging.info("Retrying due to a TimeoutError..."),
25+
)
26+
@pytest.mark.slow
27+
def test_add_to_cart_disabled(page: Page):
28+
task_config = json.load(open(ORDER_APPLE_WATCH_TASK_CONFIG_PATH, "r"))[0]
29+
task = OrderAppleWatchTask(fixed_config=task_config)
30+
# setup the task and try clicking on the "Add to cart button"
31+
task.setup(page=page)
32+
order_apple_watch_page = (
33+
task.instance.snow_url
34+
+ "/now/nav/ui/classic/params/target/com.glideapp.servicecatalog_cat_item_view.do%3Fv%3D1%26sysparm_id%3D774906834fbb4200086eeed18110c737%26sysparm_link_parent%3Dd258b953c611227a0146101fb1be7c31%26sysparm_catalog%3De0d08b13c3330100c8b837659bba8fb4%26sysparm_catalog_view%3Dcatalog_default%26sysparm_view%3Dcatalog_default"
35+
)
36+
task.page.goto(order_apple_watch_page)
37+
page.wait_for_timeout(1000)
38+
iframe_element = task.page.wait_for_selector("#gsft_main")
39+
iframe = iframe_element.content_frame()
40+
41+
# verify that Add to cart is disabled and order now is enabled
42+
assert iframe.locator('button[aria-label="Add to Cart"]').is_disabled()
43+
assert iframe.locator('button[aria-label="Order Now"]').is_enabled()
44+
45+
46+
def test_top_items_panel_removed(page: Page):
47+
def check_top_items_panel(page: Page) -> bool:
48+
"""Checks if the 'top items' panel exists on the landing page"""
49+
frame = page.wait_for_selector("iframe#gsft_main").content_frame()
50+
51+
# Use evaluate to find divs containing an element with role="heading" and the text "Top Requests"
52+
panel_exists = frame.evaluate(
53+
"""() => {
54+
const headings = Array.from(document.querySelectorAll('[role="heading"]'));
55+
let panelExists = false;
56+
headings.forEach((heading) => {
57+
if (heading.textContent.includes("Top Requests")) {
58+
panelExists = true;
59+
}
60+
});
61+
return panelExists;
62+
}"""
63+
)
64+
65+
return panel_exists
66+
67+
task_config = json.load(open(ORDER_APPLE_WATCH_TASK_CONFIG_PATH, "r"))[0]
68+
task = OrderAppleWatchTask(fixed_config=task_config)
69+
70+
# Setup the task and check if the Top Items panel exists
71+
task.setup(page=page)
72+
panel_exists = check_top_items_panel(page=page)
73+
page.wait_for_timeout(2000)
74+
assert panel_exists is False
75+
# Reload the page and check if the Top Items panel exists
76+
page.goto(task.start_url)
77+
page.wait_for_timeout(2000)
78+
panel_exists = check_top_items_panel(page=page)
79+
assert panel_exists is True

0 commit comments

Comments
 (0)