|
| 1 | +--- |
| 2 | +icon: material/rocket-launch |
| 3 | +--- |
| 4 | + |
| 5 | +## :computer: Installation |
| 6 | + |
| 7 | +The OpenML package is available in many languages and has deep integration in many machine learning libraries. |
| 8 | + |
| 9 | +=== "Python/sklearn" |
| 10 | + |
| 11 | + - [Python/sklearn repository](https://github.com/openml/openml-python) |
| 12 | + - `pip install openml` |
| 13 | + |
| 14 | +=== "Pytorch" |
| 15 | + |
| 16 | + - [Pytorch repository](https://github.com/openml/openml-pytorch) |
| 17 | + - `pip install openml-pytorch` |
| 18 | + |
| 19 | +=== "TensorFlow" |
| 20 | + |
| 21 | + - [TensorFlow repository](https://github.com/openml/openml-tensorflow) |
| 22 | + - `pip install openml-tensorflow` |
| 23 | + |
| 24 | +=== "R" |
| 25 | + |
| 26 | + - [R repository](https://github.com/openml/openml-R) |
| 27 | + - `install.packages("mlr3oml")` |
| 28 | + |
| 29 | +=== "Julia" |
| 30 | + |
| 31 | + - [Julia repository](https://github.com/JuliaAI/OpenML.jl/tree/master) |
| 32 | + - `using Pkg;Pkg.add("OpenML")` |
| 33 | + |
| 34 | +=== "RUST" |
| 35 | + |
| 36 | + - [RUST repository](https://github.com/mbillingr/openml-rust) |
| 37 | + - Install from source |
| 38 | + |
| 39 | +=== ".Net" |
| 40 | + |
| 41 | + - [.Net repository](https://github.com/openml/openml-dotnet) |
| 42 | + - `Install-Package openMl` |
| 43 | + |
| 44 | +You can find detailed guides for the different libraries in the top menu. |
| 45 | + |
| 46 | + |
| 47 | +## :key: Authentication |
| 48 | + |
| 49 | +OpenML is entirely open and you do not need an account to access data (rate limits apply). However, <a href="https://www.openml.org" target='blank_'>signing up via the OpenML website</a> is very easy (and free) and required to upload new resources to OpenML and to manage them online. |
| 50 | + |
| 51 | +API authentication happens via an **API key**, which you can find in your profile after logging in to openml.org. |
| 52 | + |
| 53 | +``` |
| 54 | +openml.config.apikey = "YOUR KEY" |
| 55 | +``` |
| 56 | + |
| 57 | +## :joystick: Minimal Example |
| 58 | + |
| 59 | +:material-database: Use the following code to load the [credit-g](https://www.openml.org/search?type=data&sort=runs&status=active&id=31) [dataset](https://docs.openml.org/concepts/data/) directly into a pandas dataframe. Note that OpenML can automatically load all datasets, separate data X and labels y, and give you useful dataset metadata (e.g. feature names and which ones have categorical data). |
| 60 | + |
| 61 | +```python |
| 62 | +import openml |
| 63 | + |
| 64 | +dataset = openml.datasets.get_dataset("credit-g") # or by ID get_dataset(31) |
| 65 | +X, y, categorical_indicator, attribute_names = dataset.get_data(target="class") |
| 66 | +``` |
| 67 | + |
| 68 | + |
| 69 | +:trophy: Get a [task](https://docs.openml.org/concepts/tasks/) for [supervised classification on credit-g](https://www.openml.org/search?type=task&id=31&source_data.data_id=31). |
| 70 | +Tasks specify how a dataset should be used, e.g. including train and test splits. |
| 71 | + |
| 72 | +```python |
| 73 | +task = openml.tasks.get_task(31) |
| 74 | +dataset = task.get_dataset() |
| 75 | +X, y, categorical_indicator, attribute_names = dataset.get_data(target=task.target_name) |
| 76 | +# get splits for the first fold of 10-fold cross-validation |
| 77 | +train_indices, test_indices = task.get_train_test_split_indices(fold=0) |
| 78 | +``` |
| 79 | + |
| 80 | +:bar_chart: Use an [OpenML benchmarking suite](https://docs.openml.org/concepts/benchmarking/) to get a curated list of machine-learning tasks: |
| 81 | +```python |
| 82 | +suite = openml.study.get_suite("amlb-classification-all") # Get a curated list of tasks for classification |
| 83 | +for task_id in suite.tasks: |
| 84 | + task = openml.tasks.get_task(task_id) |
| 85 | +``` |
| 86 | + |
| 87 | +:star2: You can now benchmark your models easily across many datasets at once. A model training is called a run: |
| 88 | + |
| 89 | +```python |
| 90 | +from sklearn import neighbors |
| 91 | + |
| 92 | +task = openml.tasks.get_task(403) |
| 93 | +clf = neighbors.KNeighborsClassifier(n_neighbors=5) |
| 94 | +run = openml.runs.run_model_on_task(clf, task) |
| 95 | +``` |
| 96 | + |
| 97 | +:raised_hands: You can now publish your experiment on OpenML so that others can build on it: |
| 98 | + |
| 99 | +```python |
| 100 | +myrun = run.publish() |
| 101 | +print(f"kNN on {data.name}: {myrun.openml_url}") |
| 102 | +``` |
| 103 | + |
| 104 | + |
| 105 | +## Learning more OpenML |
| 106 | + |
| 107 | +Next, check out the :rocket: [10 minute tutorial](notebooks/getting_started.ipynb) and the :mortar_board: [short description of OpenML concepts](concepts/index.md). |
0 commit comments