You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"""Module pointing to different implementations of Data class
2
2
3
-
DiCE requires only few parameters about the data such as the range of continuous features and the levels of categorical features. Hence, DiCE can be used for a private data whose meta data are only available (such as the feature names and range/levels of different features) by specifying appropriate parameters.
3
+
DiCE requires only few parameters about the data such as the range of continuous
4
+
features and the levels of categorical features. Hence, DiCE can be used for a
5
+
private data whose meta data are only available (such as the feature names and
6
+
range/levels of different features) by specifying appropriate parameters.
4
7
"""
5
8
6
9
@@ -12,23 +15,26 @@ def __init__(self, **params):
12
15
13
16
:param **params: a dictionary of required parameters.
14
17
"""
15
-
16
18
self.decide_implementation_type(params)
17
19
18
20
defdecide_implementation_type(self, params):
19
21
"""Decides if the Data class is for public or private data."""
20
-
21
-
self.__class__=decide(params)
22
+
self.__class__=decide(params)
22
23
self.__init__(params)
23
24
24
-
# To add new implementations of Data, add the class in data_interfaces subpackage and import-and-return the class in an elif loop as shown in the below method.
25
25
26
26
defdecide(params):
27
-
"""Decides if the Data class is for public or private data."""
28
-
29
-
if'dataframe'inparams: # if params contain a Pandas dataframe, then use PublicData class
27
+
"""Decides if the Data class is for public or private data.
28
+
29
+
To add new implementations of Data, add the class in data_interfaces
30
+
subpackage and import-and-return the class in an elif loop as shown
31
+
in the below method.
32
+
"""
33
+
if'dataframe'inparams:
34
+
# if params contain a Pandas dataframe, then use PublicData class
Copy file name to clipboardExpand all lines: dice_ml/data_interfaces/private_data_interface.py
+40-23Lines changed: 40 additions & 23 deletions
Original file line number
Diff line number
Diff line change
@@ -3,34 +3,44 @@
3
3
importsys
4
4
importpandasaspd
5
5
importnumpyasnp
6
-
fromsklearn.model_selectionimporttrain_test_split
7
6
importcollections
8
-
fromcollectionsimportOrderedDict
9
7
importlogging
8
+
9
+
10
10
logging.basicConfig(level=logging.NOTSET)
11
-
fromsklearn.preprocessingimportLabelEncoder
11
+
12
12
13
13
classPrivateData:
14
14
"""A data interface for private data with meta information."""
15
15
16
16
def__init__(self, params):
17
17
"""Init method
18
18
19
-
:param features: Dictionary or OrderedDict with feature names as keys and range in int/float (for continuous features) or categories in string (for categorical features) as values. For python version <=3.6, should provide only an OrderedDict.
19
+
:param features: Dictionary or OrderedDict with feature names as keys and range in int/float
20
+
(for continuous features) or categories in string (for categorical features)
21
+
as values. For python version <=3.6, should provide only an OrderedDict.
20
22
:param outcome_name: Outcome feature name.
21
-
:param type_and_precision (optional): Dictionary with continuous feature names as keys. If the feature is of type int, just string 'int' should be provided, if the feature is of type float, a list of type and precision should be provided. For instance, type_and_precision: {cont_f1: 'int', cont_f2: ['float', 2]} for continuous features cont_f1 and cont_f2 of type int and float (and precision up to 2 decimal places) respectively. Default value is None and all features are treated as int.
22
-
:param mad (optional): Dictionary with feature names as keys and corresponding Median Absolute Deviations (MAD) as values. Default MAD value is 1 for all features.
23
+
:param type_and_precision (optional): Dictionary with continuous feature names as keys.
24
+
If the feature is of type int, just string 'int' should be provided,
25
+
if the feature is of type float, a list of type and precision should be
26
+
provided. For instance, type_and_precision: {cont_f1: 'int',
27
+
cont_f2: ['float', 2]} for continuous features cont_f1 and cont_f2 of
28
+
type int and float (and precision up to 2 decimal places) respectively.
29
+
Default value is None and all features are treated as int.
30
+
:param mad (optional): Dictionary with feature names as keys and corresponding Median Absolute Deviations (MAD)
31
+
as values.
32
+
Default MAD value is 1 for all features.
23
33
:param data_name (optional): Dataset name
24
-
25
34
"""
26
-
27
-
ifsys.version_info> (3,6,0) andtype(params['features']) in [dict, collections.OrderedDict]:
35
+
ifsys.version_info> (3, 6, 0) andtype(params['features']) in [dict, collections.OrderedDict]:
"should provide dictionary with feature names as keys and range (for continuous features) or categories (for categorical features) as values. For python version <3.6, should provide an OrderedDict")
41
+
"should provide dictionary with feature names as keys and range"
42
+
"(for continuous features) or categories (for categorical features) as values. "
43
+
"For python version <3.6, should provide an OrderedDict")
self.ohe_base_df=self.prepare_df_for_ohe_encoding() # base dataframe for doing one-hot-encoding
182
-
# ohe_encoded_feature_names and ohe_base_df are created (and stored as data class's parameters) when get_data_params_for_gradient_dice() is called from gradient-based DiCE explainers
192
+
# base dataframe for doing one-hot-encoding
193
+
# ohe_encoded_feature_names and ohe_base_df are created (and stored as data class's parameters)
194
+
# when get_data_params_for_gradient_dice() is called from gradient-based DiCE explainers
"""Transforms query_instance into one-hot-encoded and min-max normalized data. query_instance should be a dict, a dataframe, a list, or a list of dicts"""
346
+
"""Transforms query_instance into one-hot-encoded and min-max normalized data. query_instance should be a dict,
0 commit comments