Skip to content

Commit d8a73df

Browse files
committed
Forgot to add usage.rst
1 parent b1e52e1 commit d8a73df

1 file changed

Lines changed: 91 additions & 0 deletions

File tree

source/usage.rst

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
:orphan:
2+
3+
.. _usage:
4+
5+
Basic Usage
6+
***********
7+
8+
Connecting to the OpenML server
9+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10+
11+
The OpenML server can only be accessed by users who have signed up to the OpenML
12+
platform. If you don't have an account yet,
13+
`sign up now <http://openml.org/register>`_.
14+
15+
.. code:: python
16+
17+
>>> from openml.apiconnector import APIConnector
18+
19+
>>> username = "Your OpenML username"
20+
>>> password = "Your OpenML password"
21+
>>> connector = APIConnector(username=username, password=password)
22+
23+
The :class:`~openml.apiconnector.APIConnector` will create a cache directory
24+
and authenticate you at the OpenML server. By this you obtain a session key,
25+
which is valid for one hour.
26+
27+
You can also configure the OpenML package, e.g. change the cache directory.
28+
Information about the configuration is in the
29+
`OpenML client API description <https://github
30+
.com/openml/OpenML/wiki/Client-API>`_.
31+
32+
Working with datasets
33+
~~~~~~~~~~~~~~~~~~~~~
34+
35+
.. code:: python
36+
37+
>>> dataset_id = 31
38+
>>> dataset = connector.download_dataset(1)
39+
40+
Attributes of the dataset are stored as member variables:
41+
42+
.. code:: python
43+
44+
>>> dataset.name
45+
u'credit-g'
46+
>>> dataset.default_target_attribute
47+
u'class'
48+
49+
Data can be loaded in the following ways:
50+
51+
.. code:: python
52+
53+
>>> pd, categorical = dataset.get_pandas()
54+
55+
returns the dataset as a pandas.DataFrame and a list of booleans,
56+
indicating which attributes are categorical. Categorical attributes are
57+
already encoded as integers.
58+
59+
.. code:: python
60+
61+
>>> X, y, categorical = dataset.get_pandas()
62+
63+
returns the dataset split into X and y, as well as a list indicating which
64+
attributes are categorical. In case you are working with `scikit-learn
65+
<http://scikit-learn>`_, you can use this data right away:
66+
67+
.. code:: python
68+
69+
>>> from sklearn import preprocessing, ensemble
70+
>>> enc = preprocessing.OneHotEncoder(categorical_features=categorical)
71+
OneHotEncoder(categorical_features=[True, False, True, True, False, True,
72+
True, False, True, True, False, True, False, True, True, False, True,
73+
False, True, True], dtype=<type 'float'>, n_values='auto',
74+
sparse=True)
75+
>>> X = enc.transform(X).todense()
76+
>>> clf = ensemble.RandomForestClassifier()
77+
>>> clf.fit(X, y)
78+
RandomForestClassifier(bootstrap=True, compute_importances=None,
79+
criterion='gini', max_depth=None, max_features='auto',
80+
max_leaf_nodes=None, min_density=None, min_samples_leaf=1,
81+
min_samples_split=2, n_estimators=10, n_jobs=1,
82+
oob_score=False, random_state=None, verbose=0)
83+
84+
Working with tasks
85+
~~~~~~~~~~~~~~~~~~
86+
87+
Using the cache
88+
~~~~~~~~~~~~~~~
89+
90+
Large scale experiments
91+
~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)