|
1 | | -<img src="./banner.png" /> |
| 1 | +# 🤖💻 WorkArena - How Capable are Web Agents at Solving Common Knowledge Work Tasks? |
2 | 2 |
|
3 | | -`WorkArena` is a suite of browser-based tasks designed for ServiceNow products, acting as a benchmark for automating commonly conducted activities within the product environment. |
| 3 | +[[Paper]](https://arxiv.org/abs/2403.07718) ♦ [[Benchmark Contents]](#benchmark-contents) ♦ [[Getting Started]](#getting-started) ♦ [[BrowserGym]](https://github.com/ServiceNow/BrowserGym) ♦ [[Citing This Work]](#citing-this-work) |
4 | 4 |
|
5 | | -## Setup |
| 5 | +`WorkArena` is a suite of browser-based tasks tailored to gauge web agents' effectiveness in supporting routine tasks for knowledge workers. |
| 6 | +By harnessing the ubiquitous [ServiceNow](https://www.servicenow.com/what-is-servicenow.html) platform, this benchmark will be instrumental in assessing the widespread state of such automations in modern knowledge work environments. |
6 | 7 |
|
7 | | -### ServiceNow Instance |
| 8 | +WorkArena is included in [BrowserGym](https://github.com/ServiceNow/BrowserGym), a conversational gym environment for the evaluation of web agents. |
8 | 9 |
|
9 | | -1. Go to https://developer.servicenow.com/ and create an account |
10 | | -2. Request a Utah developer instance (initializing it might take a while) |
11 | | -3. Log into your ServiceNow instance via the browser and change the admin password if instructed to do so. If you're already registered in the instance, you can find the instance information (Username, Password, instance URL) in the `My Instances` section of your developer account. |
12 | | -4. Set the following environment variables: |
| 10 | + |
| 11 | +## Benchmark Contents |
| 12 | + |
| 13 | +At the moment, WorkArena includes `23,150` task instances drawn from `29` tasks that cover the main components of the ServiceNow user interface. The following videos show an agent built on `GPT-4-vision` interacting with every such component. As emphasized by our results, this benchmark is not solved and thus, the performance of the agent is not always on point. |
| 14 | + |
| 15 | +### Knowledge Bases |
| 16 | + |
| 17 | +**Goal:** The agent must search for specific information in the company knowledge base. |
| 18 | + |
| 19 | +_The agent interacts with the user via BrowserGym's conversational interface._ |
| 20 | + |
| 21 | +https://github.com/ServiceNow/ui-copilot/assets/2374980/a778fbfd-6f9c-41b2-9c20-1d97cc348866 |
| 22 | + |
| 23 | +### Forms |
| 24 | + |
| 25 | +**Goal:** The agent must fill a complex form with specific values for each field. |
| 26 | + |
| 27 | +https://github.com/ServiceNow/ui-copilot/assets/2374980/1f3fa96d-d76e-4f04-a75f-bcf758c5aa42 |
| 28 | + |
| 29 | +### Service Catalogs |
| 30 | + |
| 31 | +**Goal:** The agent must order items with specific configurations from the company's service catalog. |
| 32 | + |
| 33 | +https://github.com/ServiceNow/ui-copilot/assets/2374980/8451faa8-3776-4e52-bb90-a560ea23a709 |
| 34 | + |
| 35 | +### Lists |
| 36 | + |
| 37 | +**Goal:** The agent must filter a list according to some specifications. |
| 38 | + |
| 39 | +_In this example, the agent struggles to manipulate the UI and fails to create the filter._ |
| 40 | + |
| 41 | +https://github.com/ServiceNow/ui-copilot/assets/2374980/042f058b-a966-4f5e-a38f-146464132c49 |
| 42 | + |
| 43 | +### Menus |
| 44 | + |
| 45 | +**Goal:** The agent must navigate to a specific application using the main menu. |
| 46 | + |
| 47 | +https://github.com/ServiceNow/ui-copilot/assets/2374980/d5f89fd0-ed72-49b8-81ce-8a493a2c8f5f |
| 48 | + |
| 49 | + |
| 50 | +## Getting Started |
| 51 | + |
| 52 | +To setup WorkArena, you will need to get your own ServiceNow instance, install our Python package, and upload some data to your instance. Follow the steps below to achieve this. |
| 53 | + |
| 54 | +### a) Create a ServiceNow Developer Instance |
| 55 | + |
| 56 | +1. Go to https://developer.servicenow.com/ and create an account. |
| 57 | +2. Click on `Request an instance` and select the `Vancouver` release (initializing the instance will take a few minutes) |
| 58 | +3. Once the instance is ready, you will see a popup showing its URL and credentials. You will also receive a copy by email. Based on this information, set the following environment variables: |
13 | 59 | * `SNOW_INSTANCE_URL`: URL of your ServiceNow developer instance |
14 | | - * `SNOW_INSTANCE_UNAME`: username for your instance (usually `admin`) |
15 | | - * `SNOW_INSTANCE_PWD`: password for your instance (you'll receive this by email and you can get it from your ServiceNow developer account) |
| 60 | + * `SNOW_INSTANCE_UNAME`: Just use "admin" |
| 61 | + * `SNOW_INSTANCE_PWD`: The password for your instance. Make sure you place the value in quotes "" since it might contain special characters. |
| 62 | +4. Log into your instance via a browser using the admin credentials. Close any popup that appears on the main screen (e.g., agreeing to analytics). |
16 | 63 |
|
17 | | -To set environment variables in Bash, you can use the `export` command. Here's an example: |
| 64 | +**Warning:** Feel free to look around the platform, but please make sure you revert any changes (e.g., changes to list views, pinning some menus, etc.) as these changes will be persistent and affect the benchmarking process. |
18 | 65 |
|
| 66 | +### b) Install WorkArena and Initialize your Instance |
| 67 | + |
| 68 | +Run the following command to install WorkArena in the [BrowswerGym](https://github.com/servicenow/browsergym) environment: |
19 | 69 | ``` |
20 | | -export SNOW_INSTANCE_URL="https://your-instance-url.service-now.com" |
21 | | -export SNOW_INSTANCE_UNAME="your-username" |
22 | | -export SNOW_INSTANCE_PWD="your-password" |
| 70 | +pip install browsergym-workarena |
23 | 71 | ``` |
24 | 72 |
|
25 | | -Another option is to add the environment variables to your conda environment. To do this, you can execute the following command : |
26 | | - |
| 73 | +Then, run this command in a terminal to upload the benchmark data to your ServiceNow instance: |
27 | 74 | ``` |
28 | | -conda env config vars set ENV_VAR=VALUE |
| 75 | +workarena-install |
29 | 76 | ``` |
30 | 77 |
|
31 | | -### Install Data |
32 | | - |
33 | | -Run the following code to install all the data shipped with the benchmark: |
| 78 | +### c) Validate Your Installation |
34 | 79 |
|
| 80 | +The are a lot of moving parts (authentication credentials, benchmark data, etc.) so we highly recommend that you sanity-check your installation using our provided unit tests. Do this by running (might take a few minutes): |
35 | 81 | ``` |
36 | | -from browsergym.workarena.install import setup |
37 | | -setup() |
| 82 | +pytest -v . |
38 | 83 | ``` |
39 | 84 |
|
40 | | -### Finally |
| 85 | +Your installation is now complete! 🎉 |
| 86 | + |
41 | 87 |
|
42 | | -1. Run `pytest -v .` to make sure that your setup works. |
| 88 | +## Citing This Work |
| 89 | + |
| 90 | +``` |
| 91 | +@misc{workarena2024, |
| 92 | + title={WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?}, |
| 93 | + author={Alexandre Drouin and Maxime Gasse and Massimo Caccia and Issam H. Laradji and Manuel Del Verme and Tom Marty and Léo Boisvert and Megh Thakkar and Quentin Cappart and David Vazquez and Nicolas Chapados and Alexandre Lacoste}, |
| 94 | + year={2024}, |
| 95 | + eprint={2403.07718}, |
| 96 | + archivePrefix={arXiv}, |
| 97 | + primaryClass={cs.LG} |
| 98 | +} |
| 99 | +``` |
0 commit comments