Merge branch 'main' into update-concept-description

PGijsbers · PGijsbers · commit 9421f794a5ba · 2026-02-04T18:03:10.000+02:00
diff --git a/README.md b/README.md
@@ -5,22 +5,31 @@ The OpenML documentation in written in MarkDown. The sources are generated by [M
 
 The overal structure (navigation) of the docs is configurated in the `mkdocs.yml` file.
 
-Some of the API's use other documentation generators, such as [Sphinx](https://restcoder.readthedocs.io/en/latest/sphinx-docgen.html) in openml-python. This documentation is pulled in via iframes to gather all docs into the same place, but they need to be edited in their own GitHub repo's.
+This documentation of other APIs is pulled in using the [multirepo plugin](https://github.com/jdoiro3/mkdocs-multirepo-plugin) to gather all docs into the same place, but they need to be edited in their own GitHub repo's. This allows the documentation to live closer to the code and follow conventions of the respective community.
 
 ## Editing documentation
 Documentation can be edited by simply editing the markdown files in the `docs` folder and creating a pull request.
 
 End users can edit the docs by simply clicking the edit button (the pencil icon) on the top of every documentation page. It will open up an editing page on [GitHub](https://github.com/) (you do need to be logged in on GitHub). When you are done, add a small message explaining the change and click 'commit changes'. On the next page, just launch the pull request. We will then review it and approve the changes, or discuss them if necessary. 
 
+For other information on how to write and build documentation locally, see our [contributing](./contributing/OpenML-Docs.md#General-Documentation) page.
+
 ## Deployment
 The documentation is hosted on GitHub pages.
 
-To deploy the documentation, you need to have MkDocs and MkDocs-Material installed, and then run `mkdocs gh-deploy` in the top directory (with the `mkdocs.yml` file). This will build the HTML files and push them to the gh-pages branch of openml/docs. `https://docs.openml.org` is just a reverse proxy for `https://openml.github.io/docs/`.  
+To deploy the documentation, you need to have MkDocs installed locally, and then run `mkdocs gh-deploy` in the top directory (with the `mkdocs.yml` file). This will build the HTML files and push them to the gh-pages branch of openml/docs. `https://docs.openml.org` is just a reverse proxy for `https://openml.github.io/docs/`.  
+
+MkDocs and all required extensions can be installed as follows:
+```
+pip install -r requirements.txt
+```
 
-MKDocs and MkDocs-Material can be installed as follows:
+To test the documentation locally, run 
 ```
-pip install mkdocs
-pip install mkdocs-material
-pip install -U fontawesome_markdown
+mkdocs serve
 ```
 
+To deploy to GitHub Pages, run
+```
+mkdocs gh-deploy
+```
diff --git a/docs/concepts/benchmarking.md b/docs/concepts/benchmarking.md
@@ -9,11 +9,11 @@ Collections of tasks can be published as _benchmarking suites_. Seamlessly integ
 - standardized train-test splits are provided to ensure that results can be objectively compared - results can be shared in a reproducible way through the APIs  
 - results from other users can be easily downloaded and reused 
 
-You can search for <a href="https://www.openml.org/search?type=benchmark&sort=tasks_included&study_type=task" target="_blank">all existing benchmarking suites</a> or create your own. For all further details, see the [benchmarking guide](../benchmark/benchmark.md).
+You can search for <a href="https://www.openml.org/search?type=benchmark&sort=tasks_included&study_type=task" target="_blank">all existing benchmarking suites</a> or create your own. For all further details, see the [benchmarking guide](../benchmark/index.md).
 
 <img src="../../img/studies.png" style="width:100%; max-width:800px;"/>
 
 ## Benchmark studies
 Collections of runs can be published as _benchmarking studies_. They contain the results of all runs (possibly millions) executed on a specific benchmarking suite. OpenML allows you to easily download all such results at once via the APIs, but also visualized them online in the Analysis tab (next to the complete list of included tasks and runs). Below is an example of <a href="https://www.openml.org/search?type=benchmark&study_type=run&id=226" target="_blamnk">a benchmark study for AutoML algorithms</a>.
 
-<img src="../../img/run_study.png" style="width:100%; max-width:1000px;"/>
+<img src="../../img/run_study.png" style="width:100%; max-width:1000px;"/>
diff --git a/docs/contributing/OpenML-Docs.md b/docs/contributing/OpenML-Docs.md
@@ -1,23 +1,34 @@
+## Documentation
+
+Documentation of OpenML consists of the general information pages, such as these, that include common concepts.
+Additionally, each software package such as the Python, Java, and R connectors has their own documentation.
+For convenience, those documentation pages are also available through this common documentation portal.
+
+We always value contributions to our documentation. If you notice any mistake in these documentation pages, click the :material-pencil: button (on the top right). It will open up an editing page on [GitHub](https://github.com/) (you do need to be logged in). When you are done, add a small message explaining the change and click 'commit changes'. On the next page, just launch the pull request. We will then review it and approve the changes, or discuss them if necessary.
+
+Below you can find more information about how each set of documentation pages is built.
+
 ## General Documentation
-High-quality and up-to-date documentation are crucial. If you notice any mistake in these documentation pages, click the :material-pencil: button (on the top right). It will open up an editing page on [GitHub](https://github.com/) (you do need to be logged in). When you are done, add a small message explaining the change and click 'commit changes'. On the next page, just launch the pull request. We will then review it and approve the changes, or discuss them if necessary.
 
 The sources are generated by [MkDocs](http://www.mkdocs.org/), using the [Material theme](https://squidfunk.github.io/mkdocs-material/).
 Check these docs to see what is possible in terms of styling.
 
-OpenML is a big project with multiple repositories. To keep the documentation close to the code, it will always be kept in the relevant repositories (see below), and 
+OpenML is a big project with multiple repositories. 
+To keep the documentation close to the code, it will always be kept in the relevant repositories (see below), and 
 combined into these documentation pages using [MkDocs multirepo](https://github.com/jdoiro3/mkdocs-multirepo-plugin/issues/3).
 
-!!! note "Developer note"
-    To work on the documentation locally, do the following:
-    ```
-    git clone https://github.com/openml/docs.git
-    pip install -r requirements.txt
-    ```
-    To build the documentation, run `mkdocs serve` in the top directory (with the `mkdocs.yml` file). Any changes made after that will be hot-loaded.
+To build the documentation locally, first make sure all dependencies specified in `requirements.txt` are installed:
+
+```bash
+python -m venv .venv
+source .venv/bin/activate
+python -m pip install uv
+uv pip install -r requirements.txt
+```
 
-    The documentation will be auto-deployed with every push or merge with the master branch of `https://www.github.com/openml/docs/`. In the background, a CI job
-    will run `mkdocs gh-deploy`, which will build the HTML files and push them to the gh-pages branch of openml/docs. `https://docs.openml.org` is just a reverse proxy for `https://openml.github.io/docs/`.
+After installing the dependencies, run `mkdocs serve -f mkdocs-local.yml` in the top directory (with the `mkdocs.yml` file). Any changes made after that will be hot-loaded.
 
+To build the full documentation, including importing the documentation from other repositories, run `mkdocs serve` in the top directory (with the `mkdocs.yml` file). This can take a while to compile, so only use this when needed. You might also need to set `export NUMPY_EXPERIMENTAL_DTYPE_API=1` (or `set NUMPY_EXPERIMENTAL_DTYPE_API=1` on Windows).
 
 ## Python API
 To edit the tutorial, you have to edit the `reStructuredText` files on [openml-python/doc](https://github.com/openml/openml-python/tree/master/doc). When done, you can do a pull request.
diff --git a/docs/index.md b/docs/index.md
@@ -15,56 +15,15 @@ icon: material/creation
 <p><i class="fa fa-graduation-cap fa-fw fa-lg"></i>&nbsp; Make your work more visible and reusable</p>
 <p><i class="fa fa-bolt fa-fw fa-lg"></i>&nbsp; Built for automation: streamline your experiments and model building</p>
 
-## Installation
+## How to use OpenML
 
-The OpenML package is available in many languages and across libraries. For more information about them, see the [Integrations](./ecosystem/index.md) page.<br><br>
+OpenML is accessible to a wide range of people:  
 
-=== "Python/sklearn"
+:computer: <a href="https://www.openml.org" target='blank_'>Explore the OpenML website</a> to discover, download and upload ML resources.
 
-    - [Python/sklearn repository](https://github.com/openml/openml-python)
-    -  `pip install openml`
+:robot: [Install an OpenML library](intro/index.md) to access and share resources programmatically through our APIs. Select one of the detailed guides in the top menu.
 
-=== "Pytorch"
-
-    -  [Pytorch repository](https://github.com/openml/openml-pytorch)
-    -  `pip install openml-pytorch`
-
-=== "Keras"
-
-    - [Keras repository](https://github.com/openml/openml-keras)
-    - `pip install openml-keras`
-
-=== "TensorFlow"
-    
-    - [TensorFlow repository](https://github.com/openml/openml-tensorflow)
-    - `pip install openml-tensorflow`
-  
-=== "R"
-        
-    - [R repository](https://github.com/openml/openml-R)
-    - `install.packages("mlr3oml")`
-=== "Julia"
-        
-    - [Julia repository](https://github.com/JuliaAI/OpenML.jl/tree/master)
-    - `using Pkg;Pkg.add("OpenML")`
-
-=== "RUST"
-        
-    - [RUST repository](https://github.com/mbillingr/openml-rust)
-    - Install from source
-
-=== ".Net"
-        
-    - [.Net repository](https://github.com/openml/openml-dotnet)
-    - `Install-Package openMl`
-
-
-You might also need to set up the API key. For more information, see [Authentication](http://localhost:8000/concepts/openness/).
-
-## Learning OpenML
-
-Aside from the individual package documentations, you can learn more about OpenML through the following resources:<br>
-The core concepts of OpenML are explained in the [Concepts](./concepts/index.md) page. These concepts include the principle behind using Datasets, Runs, Tasks, Flows, Benchmarking and much more. Going through them will help you leverage OpenML even better in your work.<br>
+:mortar_board: [Get started](./concepts/index.md) by learning more about the structure and concepts behind OpenML, such as Datasets, Tasks, Flows, Runs, Benchmarking and much more. This will help you leverage OpenML even better in your work.
 
 ## Contributing to OpenML
 
diff --git a/docs/intro/index.md b/docs/intro/index.md
@@ -0,0 +1,107 @@
+---
+icon: material/rocket-launch
+---
+
+## :computer: Installation
+
+The OpenML package is available in many languages and has deep integration in many machine learning libraries.
+
+=== "Python/sklearn"
+
+    - [Python/sklearn repository](https://github.com/openml/openml-python)
+    -  `pip install openml`
+
+=== "Pytorch"
+
+    -  [Pytorch repository](https://github.com/openml/openml-pytorch)
+    -  `pip install openml-pytorch`
+
+=== "TensorFlow"
+    
+    - [TensorFlow repository](https://github.com/openml/openml-tensorflow)
+    - `pip install openml-tensorflow`
+  
+=== "R"
+        
+    - [R repository](https://github.com/openml/openml-R)
+    - `install.packages("mlr3oml")`
+
+=== "Julia"
+        
+    - [Julia repository](https://github.com/JuliaAI/OpenML.jl/tree/master)
+    - `using Pkg;Pkg.add("OpenML")`
+
+=== "RUST"
+        
+    - [RUST repository](https://github.com/mbillingr/openml-rust)
+    - Install from source
+
+=== ".Net"
+        
+    - [.Net repository](https://github.com/openml/openml-dotnet)
+    - `Install-Package openMl`
+
+You can find detailed guides for the different libraries in the top menu.
+
+
+## :key: Authentication
+
+OpenML is entirely open and you do not need an account to access data (rate limits apply). However, <a href="https://www.openml.org" target='blank_'>signing up via the OpenML website</a> is very easy (and free) and required to upload new resources to OpenML and to manage them online.
+
+API authentication happens via an **API key**, which you can find in your profile after logging in to openml.org. 
+
+```
+openml.config.apikey = "YOUR KEY"
+```
+
+## :joystick: Minimal Example
+
+:material-database: Use the following code to load the [credit-g](https://www.openml.org/search?type=data&sort=runs&status=active&id=31) [dataset](https://docs.openml.org/concepts/data/) directly into a pandas dataframe. Note that OpenML can automatically load all datasets, separate data X and labels y, and give you useful dataset metadata (e.g. feature names and which ones have categorical data).
+
+```python
+import openml
+
+dataset = openml.datasets.get_dataset("credit-g") # or by ID get_dataset(31)
+X, y, categorical_indicator, attribute_names = dataset.get_data(target="class")
+```
+
+
+:trophy: Get a [task](https://docs.openml.org/concepts/tasks/) for [supervised classification on credit-g](https://www.openml.org/search?type=task&id=31&source_data.data_id=31). 
+Tasks specify how a dataset should be used, e.g. including train and test splits.
+
+```python
+task = openml.tasks.get_task(31)
+dataset = task.get_dataset()
+X, y, categorical_indicator, attribute_names = dataset.get_data(target=task.target_name)
+# get splits for the first fold of 10-fold cross-validation
+train_indices, test_indices = task.get_train_test_split_indices(fold=0)
+```
+
+:bar_chart: Use an [OpenML benchmarking suite](https://docs.openml.org/concepts/benchmarking/) to get a curated list of machine-learning tasks:
+```python
+suite = openml.study.get_suite("amlb-classification-all")  # Get a curated list of tasks for classification
+for task_id in suite.tasks:
+    task = openml.tasks.get_task(task_id)
+```
+
+:star2: You can now benchmark your models easily across many datasets at once. A model training is called a run:
+
+```python
+from sklearn import neighbors
+
+task = openml.tasks.get_task(403)
+clf = neighbors.KNeighborsClassifier(n_neighbors=5)
+run = openml.runs.run_model_on_task(clf, task)
+```
+
+:raised_hands: You can now publish your experiment on OpenML so that others can build on it:
+
+```python
+myrun = run.publish()
+print(f"kNN on {data.name}: {myrun.openml_url}")
+```
+
+
+## Learning more OpenML
+
+Next, check out the :rocket: [10 minute tutorial](notebooks/getting_started.ipynb) and the :mortar_board: [short description of OpenML concepts](concepts/index.md). 
diff --git a/docs/notebooks/getting_started.ipynb b/docs/notebooks/getting_started.ipynb
@@ -49,7 +49,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "# Getting Started\n",
+        "# OpenML in 10 minutes\n",
         "\n",
         "This page will guide you through the process of getting started with OpenML. While this page is a good starting point, for more detailed information, please refer to the [integrations section](Scikit-learn/index.md) and the rest of the documentation.\n",
         "\n"
diff --git a/mkdocs-local.yml b/mkdocs-local.yml
@@ -82,6 +82,12 @@ markdown_extensions:
 plugins:
     - autorefs
     - section-index
+    - mkdocs-jupyter:
+        ignore: ['temp_dir/**/*','docs/examples/**/*']
+        theme: light
+        remove_tag_config:
+            remove_input_tags:
+                - hide_code
     - redirects:
         redirect_maps:
             'APIs.md': 'https://www.openml.org/apis'
@@ -98,9 +104,10 @@ plugins:
     - git-committers:
         repository: openml/docs
 nav:
-    - OpenML:
-        - Introduction: index.md
-        - Getting Started: notebooks/getting_started.ipynb
+    - OpenML: index.md
+    - Get Started:
+        - OpenML: intro/index.md
+        - 10 Minute Tutorial: notebooks/getting_started.ipynb
         - Concepts:
             - Main concepts: concepts/index.md
             - Data: concepts/data.md
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -120,6 +120,8 @@ plugins:
                     docstring_section_style: table
                     show_docstring_functions: true
                     docstring_style: numpy
+                    follow_imports: false
+                    show_submodules: false
     - gen-files:
         scripts:
             - scripts/gen_python_ref_pages.py
@@ -131,9 +133,10 @@ plugins:
     - git-committers:
         repository: openml/docs
 nav:
-    - OpenML:
-        - Introduction: index.md
-        - Getting Started: notebooks/getting_started.ipynb
+    - OpenML: index.md
+    - Get Started:
+        - OpenML: intro/index.md
+        - 10 Minute Tutorial: notebooks/getting_started.ipynb
         - Concepts:
             - Main concepts: concepts/index.md
             - Data: concepts/data.md
@@ -213,6 +216,7 @@ extra_css:
     - css/extra.css
 extra_javascript:
     - js/extra.js
+    - js/reset_nav.js
 exclude_docs: |
     scripts/
     old/
diff --git a/requirements.txt b/requirements.txt
@@ -5,14 +5,15 @@ mkdocs-redirects==1.2.1
 mkdocs-jupyter==0.25.0
 mkdocs-awesome-pages-plugin==2.9.3
 mkdocs-multirepo-plugin==0.8.3
-mkdocs-autorefs
-mkdocs-section-index
-mkdocs-gen-files
-mkdocs-literate-nav
-mkdocs-git-committers-plugin-2
-mkdocs-git-revision-date-localized-plugin
-mkdocstrings
-mkdocstrings-python
-markdown-include
+mkdocs-autorefs==1.2.0
+mkdocs-section-index==0.3.9
+mkdocs-gen-files==0.5.0
+mkdocs-literate-nav==0.6.1
+mkdocs-git-committers-plugin-2==2.5.0
+mkdocs-git-revision-date-localized-plugin==1.3.0
+mkdocstrings==0.26.2
+mkdocstrings-python==1.12.1
+markdown-include==0.8.1
 notebook==6.4.12
-tqdm
+jupyter_contrib_nbextensions==0.7.0
+tqdm
diff --git a/scripts/gen_python_ref_pages.py b/scripts/gen_python_ref_pages.py