@@ -16,13 +16,11 @@ platform. If you don't have an account yet,
1616
1717 >> > from openml.apiconnector import APIConnector
1818
19- >> > username = " Your OpenML username"
20- >> > password = " Your OpenML password"
21- >> > connector = APIConnector(username = username, password = password)
19+ >> > apikey = ' Your API key'
20+ >> > connector = APIConnector(apikey = apikey)
2221
2322 The :class: `~openml.apiconnector.APIConnector ` will create a cache directory
24- and authenticate you at the OpenML server. By this you obtain a session key,
25- which is valid for one hour.
23+ and manage all your queries to the OpenML server.
2624
2725You can also configure the OpenML package, e.g. change the cache directory.
2826Information about the configuration is in the
@@ -35,7 +33,7 @@ Working with datasets
3533.. code :: python
3634
3735 >> > dataset_id = 31
38- >> > dataset = connector.download_dataset(1 )
36+ >> > dataset = connector.download_dataset(dataset_id )
3937
4038 Attributes of the dataset are stored as member variables:
4139
@@ -50,18 +48,24 @@ Data can be loaded in the following ways:
5048
5149.. code :: python
5250
53- >> > pd, categorical = dataset.get_pandas ()
51+ >> > X = dataset.get_dataset ()
5452
55- returns the dataset as a pandas.DataFrame and a list of booleans,
56- indicating which attributes are categorical. Categorical attributes are
57- already encoded as integers.
53+ returns the dataset as a np.ndarray. In case the data is sparse,
54+ a scipy.sparse.csr matrix is returned.
55+
56+ Most times, having only the X matrix is not enough. Two very useful arguments
57+ are `target ` and `return_categorical_indicator `. `target ` makes `get_dataset
58+ () ` return `X ` and `y ` seperate; `return_categorical_indicator ` makes
59+ `get_dataset() ` return a boolean array which indicate which attributes are
60+ categorical (and should be one hot encoded.)
5861
5962.. code :: python
6063
61- >> > X, y, categorical = dataset.get_pandas()
64+ >> > X, y, categorical = dataset.get_dataset(
65+ target = dataset.default_target_attribute,
66+ return_categorical_indicator = True )
6267
63- returns the dataset split into X and y, as well as a list indicating which
64- attributes are categorical. In case you are working with `scikit-learn
68+ In case you are working with `scikit-learn
6569<http://scikit-learn> `_, you can use this data right away:
6670
6771.. code :: python
@@ -72,7 +76,7 @@ attributes are categorical. In case you are working with `scikit-learn
7276 True , False , True , True , False , True , False , True , True , False , True ,
7377 False , True , True ], dtype = < type ' float' > , n_values = ' auto' ,
7478 sparse = True )
75- >> > X = enc.transform (X).todense()
79+ >> > X = enc.fit_transform (X).todense()
7680 >> > clf = ensemble.RandomForestClassifier()
7781 >> > clf.fit(X, y)
7882 RandomForestClassifier(bootstrap = True , compute_importances = None ,
0 commit comments