Hello,
I use the Splunk Machine Learning Toolkit. I would like to predict a rare event. The predicted variable has two values : "GOOD" and "BAD". The "BAD" only represents 13% of the data.
I use RandomForestClassifier to do the prediction. But it has serious difficulty to predict the "BAD". The confusion matrix is :
Predicted | Predicted GOOD | Predicted BAD |
BAD | 11.9% | 88.1% |
GOOD | 19.4% | 80.6% |
Of course, this model has great results with a precision of 0.87 and an F1 of 0.85 because, most of the time, the result is GOOD, but it doesn't work for the "BAD".
How can I improve my model? Is it possible to use class_weight or other things like that ?
Thank you in advance for your answer
... View more