Currently, the PPS score is already very useful and I regularly use it for feature selection and general insights whenever I encounter a new data set. Recently I had an idea to maybe increase the capabilities of the metric.
As mentioned in the article RIP correlation. Introducing the Predictive Power Score, When using the PPS one should keep in mind that it only captures direct relations, and not combinations of input features.
To address this weakness, would it be an idea to give the underlying decision tree 2 variables instead of one? This will take a significantly longer time, but it gives combinations of variables a chance and might also be able to give additional information about the input features.
For example, if I have target variable 'y' and input features 'x1, x2, x3, x4'. I apply the pps and find the scores 0.4, 0, 0.4 and 0.6 respectively. Now, as a follow up I try all combinations and discover the following:
- If I combine x1 and x2, I get a predictive power of 0.5, I now know that this combination increases the PPS by 0.5 - 0.4 = 0.1
- If I combine x1 and x4, I get a predictive power score of 0.6. The increase is now 0.6 - 0.6 = 0. Implying that even though x1 has a pps of 0.4, I might as well use x4 and drop x1.
This requires a slightly different implementation of the algorithm, and before committing to developing the implementation I was wondering if this train of thought makes any sense. Opinions on such an additional feature?
Currently, the PPS score is already very useful and I regularly use it for feature selection and general insights whenever I encounter a new data set. Recently I had an idea to maybe increase the capabilities of the metric.
As mentioned in the article RIP correlation. Introducing the Predictive Power Score, When using the PPS one should keep in mind that it only captures direct relations, and not combinations of input features.
To address this weakness, would it be an idea to give the underlying decision tree 2 variables instead of one? This will take a significantly longer time, but it gives combinations of variables a chance and might also be able to give additional information about the input features.
For example, if I have target variable 'y' and input features 'x1, x2, x3, x4'. I apply the pps and find the scores 0.4, 0, 0.4 and 0.6 respectively. Now, as a follow up I try all combinations and discover the following:
This requires a slightly different implementation of the algorithm, and before committing to developing the implementation I was wondering if this train of thought makes any sense. Opinions on such an additional feature?