Skip to content

Commit 0235c51

Browse files
ArlindKadramfeurer
authored andcommitted
Refactoring run_flow_on_task and doc add for run_model (#516)
* Documentation fix * Add doc for run_model_on_task * Initial additions * Added functions to cache flows * Tweaking a function from flow which will be used to create a task dict as a pre step for publish * Undo 22b1e62. * PEP8 compliance. * Add (unused) flag to (not) upload flow. Rename get_seeded_model method as the name did not reflect the functionality. * Add RunExistsError. * RunsExistsError now correctly allows multiple runs, reflected in name. * Towards offline run_model_on_task * Fix name. * Py3 style. * Fix typo. * Allow run flow locally. Caching and upload not implemented. * Clean up test with new Error type. * Check if flow exists before uploading. * Remove one-line method that was only called from other method. * Change error type. Add typehint. * Fix imports. * Publish flow if flow_id is None. * Do not allow for mutable parameter. * Fill in parameter_settings based on the referenced flow. * Allow parameters to be extracted for model which is not part of the object. * Can not use reinstantiated model. * to/from filesystem methods. * When (de)serializing, if a local flow was used, also (de)serialize the flow. * When loading a locally stored run, do not force fields for which the flow is required to have been uploaded. * Updated publish_error for new publish. * Use mock for existing_flow * Add documentation on the offline functionality. * Disable two unit tests for now. * Fix typo. * PEP8. * Remove old check. * Update to reflect the change that uploading the flow is no longer default behavior. * Fixed an error where non-existant flows still got the treatment to check for duplicates. * Make tests actually fully local. Update for new parameter order. * Type hints. Explicitly check for int rather than implicit cast of int to bool. * Add errors for inconsistencies between local flows and server information. * Now only sets hyperparameters if sync happened. * Always sync with server if we know the flow to exist on the server. * Update vanilla test. Add test for local flow upload after file stored to disk. * Raise an error if `flow.publish` is called on a flow with different local id than the one known on the server. * Add tests to verify identical behavior if run is loaded from disk instead. * Line too long. * Docs, typehint. Remove unused method publish_flow_is_necessary. * Changed summary as suggested by @mfeurer. * Type hints. * Fix naming inconsistency between from_filesystem and to_filesystem. * Updated for the new parametername. * Function signature formatting improvements. * Consistent spacing around colons. Add parameter description of `from_server` * Add missing parenthesis. * Doc changes, typehint. * Remove check for flow as I think it is outdated. * PrivateDatasetError and RunsExistError now prefixed with 'OpenML' * Updated unit test to verify flows existence before/after run_model_on_task and publish. * Start for testing model on downloaded flow. * Explicit test for none as other __len__ can get invoked on some models to test for truthiness. * Unit test now downloads flow after ensuring it exists. * Test with run_flow_on_task instead so a sentinel can be added to the flow to ensure it does not exist on the server. * Fixed a bug where run.flow_id would be set to False instead of None if associated flow did not exist but was also not uploaded. This gave errors at publish-time. * Fix typo.
1 parent 96ddc13 commit 0235c51

12 files changed

Lines changed: 620 additions & 276 deletions

File tree

examples/flows_and_runs_tutorial.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,33 @@
8989
myrun = run.publish()
9090
print("Uploaded to http://test.openml.org/r/" + str(myrun.run_id))
9191

92+
###############################################################################
93+
# Running flows on tasks offline for later upload
94+
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
95+
# For those scenarios where there is no access to internet, it is possible to run
96+
# a model on a task without uploading results or flows to the server immediately.
97+
98+
# To perform the following line offline, it is required to have been called before
99+
# such that the task is cached on the local openml cache directory:
100+
task = openml.tasks.get_task(6)
101+
102+
# The following lines can then be executed offline:
103+
run = openml.runs.run_model_on_task(
104+
pipe,
105+
task,
106+
avoid_duplicate_runs=False,
107+
upload_flow=False)
108+
109+
# The run may be stored offline, and the flow will be stored along with it:
110+
run.to_filesystem(directory='myrun')
111+
112+
# They made later be loaded and uploaded
113+
run = openml.runs.OpenMLRun.from_filesystem(directory='myrun')
114+
run.publish()
115+
116+
# Publishing the run will automatically upload the related flow if
117+
# it does not yet exist on the server.
118+
92119
############################################################################
93120
# Challenge
94121
# ^^^^^^^^^

openml/datasets/functions.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
OpenMLCacheException,
2525
OpenMLHashException,
2626
OpenMLServerException,
27-
PrivateDatasetError,
27+
OpenMLPrivateDatasetError,
2828
)
2929
from ..utils import (
3030
_create_cache_directory,
@@ -360,7 +360,7 @@ def get_dataset(dataset_id):
360360
# if there was an exception,
361361
# check if the user had access to the dataset
362362
if e.code == 112:
363-
raise PrivateDatasetError(e.message) from None
363+
raise OpenMLPrivateDatasetError(e.message) from None
364364
else:
365365
raise e
366366
finally:

openml/exceptions.py

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,29 @@
11
class PyOpenMLError(Exception):
2-
def __init__(self, message):
2+
def __init__(self, message: str):
33
self.message = message
4-
super(PyOpenMLError, self).__init__(message)
4+
super().__init__(message)
55

66

77
class OpenMLServerError(PyOpenMLError):
88
"""class for when something is really wrong on the server
99
(result did not parse to dict), contains unparsed error."""
1010

11-
def __init__(self, message):
12-
super(OpenMLServerError, self).__init__(message)
11+
def __init__(self, message: str):
12+
super().__init__(message)
1313

1414

1515
class OpenMLServerException(OpenMLServerError):
1616
"""exception for when the result of the server was
1717
not 200 (e.g., listing call w/o results). """
1818

1919
# Code needs to be optional to allow the exceptino to be picklable:
20-
# https://stackoverflow.com/questions/16244923/how-to-make-a-custom-exception-class-with-multiple-init-args-pickleable
21-
def __init__(self, message, code=None, additional=None, url=None):
20+
# https://stackoverflow.com/questions/16244923/how-to-make-a-custom-exception-class-with-multiple-init-args-pickleable # noqa: E501
21+
def __init__(self, message: str, code: str = None, additional: str = None, url: str = None):
2222
self.message = message
2323
self.code = code
2424
self.additional = additional
2525
self.url = url
26-
super(OpenMLServerException, self).__init__(message)
26+
super().__init__(message)
2727

2828
def __str__(self):
2929
return '%s returned code %s: %s' % (
@@ -38,16 +38,25 @@ class OpenMLServerNoResult(OpenMLServerException):
3838

3939
class OpenMLCacheException(PyOpenMLError):
4040
"""Dataset / task etc not found in cache"""
41-
def __init__(self, message):
42-
super(OpenMLCacheException, self).__init__(message)
41+
def __init__(self, message: str):
42+
super().__init__(message)
4343

4444

4545
class OpenMLHashException(PyOpenMLError):
4646
"""Locally computed hash is different than hash announced by the server."""
4747
pass
4848

4949

50-
class PrivateDatasetError(PyOpenMLError):
50+
class OpenMLPrivateDatasetError(PyOpenMLError):
5151
""" Exception thrown when the user has no rights to access the dataset. """
52-
def __init__(self, message):
53-
super(PrivateDatasetError, self).__init__(message)
52+
def __init__(self, message: str):
53+
super().__init__(message)
54+
55+
56+
class OpenMLRunsExistError(PyOpenMLError):
57+
""" Indicates run(s) already exists on the server when they should not be duplicated. """
58+
def __init__(self, run_ids: set, message: str):
59+
if len(run_ids) < 1:
60+
raise ValueError("Set of run ids must be non-empty.")
61+
self.run_ids = run_ids
62+
super().__init__(message)

openml/flows/flow.py

Lines changed: 50 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
from collections import OrderedDict
2+
import os
23

34
import xmltodict
45

56
import openml._api_calls
7+
import openml.exceptions
68
from ..utils import extract_xml_tags
79

810

@@ -128,7 +130,7 @@ def __init__(self, name, description, model, components, parameters,
128130
self.dependencies = dependencies
129131
self.flow_id = flow_id
130132

131-
def _to_xml(self):
133+
def _to_xml(self) -> str:
132134
"""Generate xml representation of self for upload to server.
133135
134136
Returns
@@ -144,7 +146,7 @@ def _to_xml(self):
144146
flow_xml = flow_xml.split('\n', 1)[-1]
145147
return flow_xml
146148

147-
def _to_dict(self):
149+
def _to_dict(self) -> dict:
148150
""" Helper function used by _to_xml and itself.
149151
150152
Creates a dictionary representation of self which can be serialized
@@ -312,8 +314,32 @@ def _from_dict(cls, xml_dict):
312314

313315
return flow
314316

315-
def publish(self):
316-
"""Publish flow to OpenML server.
317+
def to_filesystem(self, output_directory: str) -> None:
318+
os.makedirs(output_directory, exist_ok=True)
319+
if 'flow.xml' in os.listdir(output_directory):
320+
raise ValueError('Output directory already contains a flow.xml file.')
321+
322+
run_xml = self._to_xml()
323+
with open(os.path.join(output_directory, 'flow.xml'), 'w') as f:
324+
f.write(run_xml)
325+
326+
@classmethod
327+
def from_filesystem(cls, input_directory) -> 'OpenMLFlow':
328+
with open(os.path.join(input_directory, 'flow.xml'), 'r') as f:
329+
xml_string = f.read()
330+
return OpenMLFlow._from_dict(xmltodict.parse(xml_string))
331+
332+
def publish(self, raise_error_if_exists: bool = False) -> 'OpenMLFlow':
333+
""" Publish this flow to OpenML server.
334+
335+
Raises a PyOpenMLError if the flow exists on the server, but
336+
`self.flow_id` does not match the server known flow id.
337+
338+
Parameters
339+
----------
340+
raise_error_if_exists : bool, optional (default=False)
341+
If True, raise PyOpenMLError if the flow exists on the server.
342+
If False, update the local flow to match the server flow.
317343
318344
Returns
319345
-------
@@ -326,16 +352,27 @@ def publish(self):
326352
# instantiate an OpenMLFlow.
327353
import openml.flows.functions
328354

329-
xml_description = self._to_xml()
355+
flow_id = openml.flows.functions.flow_exists(self.name, self.external_version)
356+
if not flow_id:
357+
if self.flow_id:
358+
raise openml.exceptions.PyOpenMLError("Flow does not exist on the server, "
359+
"but 'flow.flow_id' is not None.")
360+
xml_description = self._to_xml()
361+
file_elements = {'description': xml_description}
362+
return_value = openml._api_calls._perform_api_call(
363+
"flow/",
364+
'post',
365+
file_elements=file_elements,
366+
)
367+
server_response = xmltodict.parse(return_value)
368+
flow_id = int(server_response['oml:upload_flow']['oml:id'])
369+
elif raise_error_if_exists:
370+
error_message = "This OpenMLFlow already exists with id: {}.".format(flow_id)
371+
raise openml.exceptions.PyOpenMLError(error_message)
372+
elif self.flow_id is not None and self.flow_id != flow_id:
373+
raise openml.exceptions.PyOpenMLError("Local flow_id does not match server flow_id: "
374+
"'{}' vs '{}'".format(self.flow_id, flow_id))
330375

331-
file_elements = {'description': xml_description}
332-
return_value = openml._api_calls._perform_api_call(
333-
"flow/",
334-
'post',
335-
file_elements=file_elements,
336-
)
337-
server_response = xmltodict.parse(return_value)
338-
flow_id = int(server_response['oml:upload_flow']['oml:id'])
339376
flow = openml.flows.functions.get_flow(flow_id)
340377
_copy_server_fields(flow, self)
341378
try:

0 commit comments

Comments
 (0)