|
| 1 | +:orphan: |
| 2 | + |
| 3 | +.. _usage: |
| 4 | + |
| 5 | +Basic Usage |
| 6 | +*********** |
| 7 | + |
| 8 | +Connecting to the OpenML server |
| 9 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 10 | + |
| 11 | +The OpenML server can only be accessed by users who have signed up to the OpenML |
| 12 | +platform. If you don't have an account yet, |
| 13 | +`sign up now <http://openml.org/register>`_. |
| 14 | + |
| 15 | +.. code:: python |
| 16 | +
|
| 17 | + >>> from openml.apiconnector import APIConnector |
| 18 | +
|
| 19 | + >>> username = "Your OpenML username" |
| 20 | + >>> password = "Your OpenML password" |
| 21 | + >>> connector = APIConnector(username=username, password=password) |
| 22 | +
|
| 23 | +The :class:`~openml.apiconnector.APIConnector` will create a cache directory |
| 24 | +and authenticate you at the OpenML server. By this you obtain a session key, |
| 25 | +which is valid for one hour. |
| 26 | + |
| 27 | +You can also configure the OpenML package, e.g. change the cache directory. |
| 28 | +Information about the configuration is in the |
| 29 | +`OpenML client API description <https://github |
| 30 | +.com/openml/OpenML/wiki/Client-API>`_. |
| 31 | + |
| 32 | +Working with datasets |
| 33 | +~~~~~~~~~~~~~~~~~~~~~ |
| 34 | + |
| 35 | +.. code:: python |
| 36 | +
|
| 37 | + >>> dataset_id = 31 |
| 38 | + >>> dataset = connector.download_dataset(1) |
| 39 | +
|
| 40 | +Attributes of the dataset are stored as member variables: |
| 41 | + |
| 42 | +.. code:: python |
| 43 | +
|
| 44 | + >>> dataset.name |
| 45 | + u'credit-g' |
| 46 | + >>> dataset.default_target_attribute |
| 47 | + u'class' |
| 48 | +
|
| 49 | +Data can be loaded in the following ways: |
| 50 | + |
| 51 | +.. code:: python |
| 52 | +
|
| 53 | + >>> pd, categorical = dataset.get_pandas() |
| 54 | +
|
| 55 | +returns the dataset as a pandas.DataFrame and a list of booleans, |
| 56 | +indicating which attributes are categorical. Categorical attributes are |
| 57 | +already encoded as integers. |
| 58 | + |
| 59 | +.. code:: python |
| 60 | +
|
| 61 | + >>> X, y, categorical = dataset.get_pandas() |
| 62 | +
|
| 63 | +returns the dataset split into X and y, as well as a list indicating which |
| 64 | +attributes are categorical. In case you are working with `scikit-learn |
| 65 | +<http://scikit-learn>`_, you can use this data right away: |
| 66 | + |
| 67 | +.. code:: python |
| 68 | +
|
| 69 | + >>> from sklearn import preprocessing, ensemble |
| 70 | + >>> enc = preprocessing.OneHotEncoder(categorical_features=categorical) |
| 71 | + OneHotEncoder(categorical_features=[True, False, True, True, False, True, |
| 72 | + True, False, True, True, False, True, False, True, True, False, True, |
| 73 | + False, True, True], dtype=<type 'float'>, n_values='auto', |
| 74 | + sparse=True) |
| 75 | + >>> X = enc.transform(X).todense() |
| 76 | + >>> clf = ensemble.RandomForestClassifier() |
| 77 | + >>> clf.fit(X, y) |
| 78 | + RandomForestClassifier(bootstrap=True, compute_importances=None, |
| 79 | + criterion='gini', max_depth=None, max_features='auto', |
| 80 | + max_leaf_nodes=None, min_density=None, min_samples_leaf=1, |
| 81 | + min_samples_split=2, n_estimators=10, n_jobs=1, |
| 82 | + oob_score=False, random_state=None, verbose=0) |
| 83 | +
|
| 84 | +Working with tasks |
| 85 | +~~~~~~~~~~~~~~~~~~ |
| 86 | + |
| 87 | +Using the cache |
| 88 | +~~~~~~~~~~~~~~~ |
| 89 | + |
| 90 | +Large scale experiments |
| 91 | +~~~~~~~~~~~~~~~~~~~~~~~ |
0 commit comments