|
| 1 | +# Advanced User Guide |
| 2 | + |
| 3 | +This document highlights some of the more advanced features of |
| 4 | +`openml-python`. |
| 5 | + |
| 6 | +## Configuration |
| 7 | + |
| 8 | +The configuration file resides in a directory `.config/openml` in the |
| 9 | +home directory of the user and is called config (More specifically, it |
| 10 | +resides in the [configuration directory specified by the XDGB Base |
| 11 | +Directory |
| 12 | +Specification](https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html)). |
| 13 | +It consists of `key = value` pairs which are separated by newlines. The |
| 14 | +following keys are defined: |
| 15 | + |
| 16 | +- apikey: required to access the server. |
| 17 | +- server: the server to connect to (default: `http://www.openml.org`). |
| 18 | + For connection to the test server, set this to `test.openml.org`. |
| 19 | +- cachedir: the root folder where the cache file directories should be created. |
| 20 | + If not given, will default to `~/.openml/cache` |
| 21 | +- avoid_duplicate_runs: if set to `True` (default), when certain functions |
| 22 | + are called a lookup is performed to see if there already |
| 23 | + exists such a run on the server. If so, download those |
| 24 | + results instead. |
| 25 | +- retry_policy: Defines how to react when the server is unavailable or |
| 26 | + experiencing high load. It determines both how often to |
| 27 | + attempt to reconnect and how quickly to do so. Please don't |
| 28 | + use `human` in an automated script that you run more than |
| 29 | + one instance of, it might increase the time to complete your |
| 30 | + jobs and that of others. One of: |
| 31 | + - human (default): For people running openml in interactive |
| 32 | + fashion. Try only a few times, but in quick succession. |
| 33 | + - robot: For people using openml in an automated fashion. Keep |
| 34 | + trying to reconnect for a longer time, quickly increasing |
| 35 | + the time between retries. |
| 36 | + |
| 37 | +- connection_n_retries: number of times to retry a request if they fail. |
| 38 | +Default depends on retry_policy (5 for `human`, 50 for `robot`) |
| 39 | +- verbosity: the level of output: |
| 40 | + - 0: normal output |
| 41 | + - 1: info output |
| 42 | + - 2: debug output |
| 43 | + |
| 44 | +This file is easily configurable by the `openml` command line interface. |
| 45 | +To see where the file is stored, and what its values are, use openml |
| 46 | +configure none. |
| 47 | + |
| 48 | +## Docker |
| 49 | + |
| 50 | +It is also possible to try out the latest development version of |
| 51 | +`openml-python` with docker: |
| 52 | + |
| 53 | +``` bash |
| 54 | +docker run -it openml/openml-python |
| 55 | +``` |
| 56 | + |
| 57 | +See the [openml-python docker |
| 58 | +documentation](https://github.com/openml/openml-python/blob/main/docker/readme.md) |
| 59 | +for more information. |
| 60 | + |
| 61 | +## Key concepts |
| 62 | + |
| 63 | +OpenML contains several key concepts which it needs to make machine |
| 64 | +learning research shareable. A machine learning experiment consists of |
| 65 | +one or several **runs**, which describe the performance of an algorithm |
| 66 | +(called a **flow** in OpenML), its hyperparameter settings (called a |
| 67 | +**setup**) on a **task**. A **Task** is the combination of a |
| 68 | +**dataset**, a split and an evaluation metric. In this user guide we |
| 69 | +will go through listing and exploring existing **tasks** to actually |
| 70 | +running machine learning algorithms on them. In a further user guide we |
| 71 | +will examine how to search through **datasets** in order to curate a |
| 72 | +list of **tasks**. |
| 73 | + |
| 74 | +A further explanation is given in the [OpenML user |
| 75 | +guide](https://openml.github.io/OpenML/#concepts). |
| 76 | + |
0 commit comments