@@ -27,9 +27,14 @@ Connecting new machine learning libraries
2727Content of the Library
2828~~~~~~~~~~~~~~~~~~~~~~
2929
30- To leverage support from the community and to tap in the potential of OpenML, interfacing
31- with popular machine learning libraries is essential. However, the OpenML-Python team does
32- not have the capacity to develop and maintain such interfaces on its own. For this, we
30+ To leverage support from the community and to tap in the potential of OpenML,
31+ interfacing with popular machine learning libraries is essential.
32+ The OpenML-Python package is capable of downloading meta-data and results (data,
33+ flows, runs), regardless of the library that was used to upload it.
34+ However, in order to simplify the process of uploading flows and runs from a
35+ specific library, an additional interface can be built.
36+ The OpenML-Python team does not have the capacity to develop and maintain such
37+ interfaces on its own. For this reason, we
3338have built an extension interface to allows others to contribute back. Building a suitable
3439extension for therefore requires an understanding of the current OpenML-Python support.
3540
4853* This class needs to have all the functions from `class Extension ` overloaded as required.
4954* The redefined functions should have adequate and appropriate docstrings. The
5055 `Sklearn Extension API :class: `openml.extensions.sklearn.SklearnExtension.html `
51- is a good benchmark to follow.
56+ is a good example to follow.
5257
5358
5459Interfacing with OpenML-Python
@@ -57,6 +62,79 @@ Once the new extension class has been defined, the openml-python module to
5762:meth: `openml.extensions.register_extension ` must be called to allow OpenML-Python to
5863interface the new extension.
5964
65+ The following methods should get implemented. Although the documentation in
66+ the `Extension ` interface should always be leading, here we list some additional
67+ information and best practices.
68+ The `Sklearn Extension API :class: `openml.extensions.sklearn.SklearnExtension.html `
69+ is a good example to follow. Note that most methods are relatively simple and can be implemented in several lines of code.
70+
71+ * General setup (required)
72+
73+ * :meth: `can_handle_flow `: Takes as argument an OpenML flow, and checks
74+ whether this can be handled by the current extension. The OpenML database
75+ consists of many flows, from various workbenches (e.g., scikit-learn, Weka,
76+ mlr). This method is called before a model is being deserialized.
77+ Typically, the flow-dependency field is used to check whether the specific
78+ library is present, and no unknown libraries are present there.
79+ * :meth: `can_handle_model `: Similar as :meth: `can_handle_flow `, except that
80+ in this case a Python object is given. As such, in many cases, this method
81+ can be implemented by checking whether this adheres to a certain base class.
82+ * Serialization and De-serialization (required)
83+
84+ * :meth: `flow_to_model `: deserializes the OpenML Flow into a model (if the
85+ library can indeed handle the flow). This method has an important interplay
86+ with :meth: `model_to_flow `.
87+ Running these two methods in succession should result in exactly the same
88+ model (or flow). This property can be used for unit testing (e.g., build a
89+ model with hyperparameters, make predictions on a task, serialize it to a flow,
90+ deserialize it back, make it predict on the same task, and check whether the
91+ predictions are exactly the same.)
92+ The example in the scikit-learn interface might seem daunting, but note that
93+ here some complicated design choices were made, that allow for all sorts of
94+ interesting research questions. It is probably good practice to start easy.
95+ * :meth: `model_to_flow `: The inverse of :meth: `flow_to_model `. Serializes a
96+ model into an OpenML Flow. The flow should preserve the class, the library
97+ version, and the tunable hyperparameters.
98+ * :meth: `get_version_information `: Return a tuple with the version information
99+ of the important libraries.
100+ * :meth: `create_setup_string `: No longer used, and will be deprecated soon.
101+ * Performing runs (required)
102+
103+ * :meth: `is_estimator `: Gets as input a class, and checks whether it has the
104+ status of estimator in the library (typically, whether it has a train method
105+ and a predict method).
106+ * :meth: `seed_model `: Sets a random seed to the model.
107+ * :meth: `_run_model_on_fold `: One of the main requirements for a library to
108+ generate run objects for the OpenML server. Obtains a train split (with
109+ labels) and a test split (without labels) and the goal is to train a model
110+ on the train split and return the predictions on the test split.
111+ On top of the actual predictions, also the class probabilities should be
112+ determined.
113+ For classifiers that do not return class probabilities, this can just be the
114+ hot-encoded predicted label.
115+ The predictions will be evaluated on the OpenML server.
116+ Also, additional information can be returned, for example, user-defined
117+ measures (such as runtime information, as this can not be inferred on the
118+ server).
119+ Additionally, information about a hyperparameter optimization trace can be
120+ provided.
121+ * :meth: `obtain_parameter_values `: Obtains the hyperparameters of a given
122+ model and the current values. Please note that in the case of a hyperparameter
123+ optimization procedure (e.g., random search), you only should return the
124+ hyperparameters of this procedure (e.g., the hyperparameter grid, budget,
125+ etc) and that the chosen model will be inferred from the optimization trace.
126+ * :meth: `check_if_model_fitted `: Check whether the train method of the model
127+ has been called (and as such, whether the predict method can be used).
128+ * Hyperparameter optimization (optional)
129+
130+ * :meth: `instantiate_model_from_hpo_class `: If a given run has recorded the
131+ hyperparameter optimization trace, then this method can be used to
132+ reinstantiate the model with hyperparameters of a given hyperparameter
133+ optimization iteration. Has some similarities with :meth: `flow_to_model ` (as
134+ this method also sets the hyperparameters of a model).
135+ Note that although this method is required, it is not necessary to implement
136+ any logic if hyperparameter optimization is not implemented. Simply raise
137+ a `NotImplementedError ` then.
60138
61139Hosting the library
62140~~~~~~~~~~~~~~~~~~~
0 commit comments