Skip to content

Commit 79e647d

Browse files
janvanrijnmfeurerPGijsbers
authored
Extend extensions page (#1080)
* started working on additional information for extension * extended documentation * final pass over extensions * Update doc/extensions.rst Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de> * Update doc/extensions.rst Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de> * changes suggested by MF * Update doc/extensions.rst Co-authored-by: PGijsbers <p.gijsbers@tue.nl> * Update doc/extensions.rst Co-authored-by: PGijsbers <p.gijsbers@tue.nl> * Update doc/extensions.rst Co-authored-by: PGijsbers <p.gijsbers@tue.nl> * added info to optional method * fix documentation building * updated doc Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de> Co-authored-by: PGijsbers <p.gijsbers@tue.nl>
1 parent bb17e72 commit 79e647d

3 files changed

Lines changed: 86 additions & 10 deletions

File tree

doc/contributing.rst

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,4 @@ In particular, a few ways to contribute to openml-python are:
2525

2626
* Visit one of our `hackathons <https://meet.openml.org/>`_.
2727

28-
* Contribute to another OpenML project, such as `the main OpenML project <https://github.com/openml/OpenML/blob/main/CONTRIBUTING.md>`_.
29-
30-
.. _extensions:
31-
32-
28+
* Contribute to another OpenML project, such as `the main OpenML project <https://github.com/openml/OpenML/blob/master/CONTRIBUTING.md>`_.

doc/extensions.rst

Lines changed: 82 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,14 @@ Connecting new machine learning libraries
2727
Content of the Library
2828
~~~~~~~~~~~~~~~~~~~~~~
2929

30-
To leverage support from the community and to tap in the potential of OpenML, interfacing
31-
with popular machine learning libraries is essential. However, the OpenML-Python team does
32-
not have the capacity to develop and maintain such interfaces on its own. For this, we
30+
To leverage support from the community and to tap in the potential of OpenML,
31+
interfacing with popular machine learning libraries is essential.
32+
The OpenML-Python package is capable of downloading meta-data and results (data,
33+
flows, runs), regardless of the library that was used to upload it.
34+
However, in order to simplify the process of uploading flows and runs from a
35+
specific library, an additional interface can be built.
36+
The OpenML-Python team does not have the capacity to develop and maintain such
37+
interfaces on its own. For this reason, we
3338
have built an extension interface to allows others to contribute back. Building a suitable
3439
extension for therefore requires an understanding of the current OpenML-Python support.
3540

@@ -48,7 +53,7 @@ API
4853
* This class needs to have all the functions from `class Extension` overloaded as required.
4954
* The redefined functions should have adequate and appropriate docstrings. The
5055
`Sklearn Extension API :class:`openml.extensions.sklearn.SklearnExtension.html`
51-
is a good benchmark to follow.
56+
is a good example to follow.
5257

5358

5459
Interfacing with OpenML-Python
@@ -57,6 +62,79 @@ Once the new extension class has been defined, the openml-python module to
5762
:meth:`openml.extensions.register_extension` must be called to allow OpenML-Python to
5863
interface the new extension.
5964

65+
The following methods should get implemented. Although the documentation in
66+
the `Extension` interface should always be leading, here we list some additional
67+
information and best practices.
68+
The `Sklearn Extension API :class:`openml.extensions.sklearn.SklearnExtension.html`
69+
is a good example to follow. Note that most methods are relatively simple and can be implemented in several lines of code.
70+
71+
* General setup (required)
72+
73+
* :meth:`can_handle_flow`: Takes as argument an OpenML flow, and checks
74+
whether this can be handled by the current extension. The OpenML database
75+
consists of many flows, from various workbenches (e.g., scikit-learn, Weka,
76+
mlr). This method is called before a model is being deserialized.
77+
Typically, the flow-dependency field is used to check whether the specific
78+
library is present, and no unknown libraries are present there.
79+
* :meth:`can_handle_model`: Similar as :meth:`can_handle_flow`, except that
80+
in this case a Python object is given. As such, in many cases, this method
81+
can be implemented by checking whether this adheres to a certain base class.
82+
* Serialization and De-serialization (required)
83+
84+
* :meth:`flow_to_model`: deserializes the OpenML Flow into a model (if the
85+
library can indeed handle the flow). This method has an important interplay
86+
with :meth:`model_to_flow`.
87+
Running these two methods in succession should result in exactly the same
88+
model (or flow). This property can be used for unit testing (e.g., build a
89+
model with hyperparameters, make predictions on a task, serialize it to a flow,
90+
deserialize it back, make it predict on the same task, and check whether the
91+
predictions are exactly the same.)
92+
The example in the scikit-learn interface might seem daunting, but note that
93+
here some complicated design choices were made, that allow for all sorts of
94+
interesting research questions. It is probably good practice to start easy.
95+
* :meth:`model_to_flow`: The inverse of :meth:`flow_to_model`. Serializes a
96+
model into an OpenML Flow. The flow should preserve the class, the library
97+
version, and the tunable hyperparameters.
98+
* :meth:`get_version_information`: Return a tuple with the version information
99+
of the important libraries.
100+
* :meth:`create_setup_string`: No longer used, and will be deprecated soon.
101+
* Performing runs (required)
102+
103+
* :meth:`is_estimator`: Gets as input a class, and checks whether it has the
104+
status of estimator in the library (typically, whether it has a train method
105+
and a predict method).
106+
* :meth:`seed_model`: Sets a random seed to the model.
107+
* :meth:`_run_model_on_fold`: One of the main requirements for a library to
108+
generate run objects for the OpenML server. Obtains a train split (with
109+
labels) and a test split (without labels) and the goal is to train a model
110+
on the train split and return the predictions on the test split.
111+
On top of the actual predictions, also the class probabilities should be
112+
determined.
113+
For classifiers that do not return class probabilities, this can just be the
114+
hot-encoded predicted label.
115+
The predictions will be evaluated on the OpenML server.
116+
Also, additional information can be returned, for example, user-defined
117+
measures (such as runtime information, as this can not be inferred on the
118+
server).
119+
Additionally, information about a hyperparameter optimization trace can be
120+
provided.
121+
* :meth:`obtain_parameter_values`: Obtains the hyperparameters of a given
122+
model and the current values. Please note that in the case of a hyperparameter
123+
optimization procedure (e.g., random search), you only should return the
124+
hyperparameters of this procedure (e.g., the hyperparameter grid, budget,
125+
etc) and that the chosen model will be inferred from the optimization trace.
126+
* :meth:`check_if_model_fitted`: Check whether the train method of the model
127+
has been called (and as such, whether the predict method can be used).
128+
* Hyperparameter optimization (optional)
129+
130+
* :meth:`instantiate_model_from_hpo_class`: If a given run has recorded the
131+
hyperparameter optimization trace, then this method can be used to
132+
reinstantiate the model with hyperparameters of a given hyperparameter
133+
optimization iteration. Has some similarities with :meth:`flow_to_model` (as
134+
this method also sets the hyperparameters of a model).
135+
Note that although this method is required, it is not necessary to implement
136+
any logic if hyperparameter optimization is not implemented. Simply raise
137+
a `NotImplementedError` then.
60138

61139
Hosting the library
62140
~~~~~~~~~~~~~~~~~~~

doc/usage.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,9 @@ Docker
7777

7878
It is also possible to try out the latest development version of ``openml-python`` with docker:
7979

80-
``docker run -it openml/openml-python``
80+
.. code:: bash
81+
82+
docker run -it openml/openml-python
8183
8284
See the `openml-python docker documentation <https://github.com/openml/openml-python/blob/main/docker/readme.md>`_ for more information.
8385

0 commit comments

Comments
 (0)