Hey DalJeanis! Thank you so much for the reply!
My use case of the ML Toolkit is to predict a threshold range for every day/hour in the week for the amount of users logging in (or failing to login) to each portal in my product. I have login logs in Splunk that captures successful/failed login attempts, with information about the portal the user intends to visit after authentication. There are different numbers of login events from users with the intent to visit portal X as opposed to portal Y. Therefore, my use case is to measure past login events for each of the several hundred thousand distinct portals in order to predict the appropriate number of login events for each portal on an hourly or daily basis, and the outliers are the situations where there are more (or suspiciously low) logins to portal X than the model would predict.
Now to clarify my question for you, first by “outliers in my failed authentication logs”, I mean high counts of failed authentication (shown by my logs) in a period of time. Using the Detect Numeric Outliers Showcase example in the ML Toolkit, I was able to create an algorithm to detect anomalous amount of failed logins in a 1 hour time span throughout all my product. Now, I want to dig deeper and analyze each portal separately.
Regarding your second question, I initially wanted to split this analysis by IP addresses (to see the trends of one IP address compared to others). Since the last time I posted, I realized splitting by Portal ID is more efficient in accomplishing my goal. Given any week, there is a pattern of login attempts that can be seen throughout all customer portals. Instead of hardcoding a threshold number, I ideally want a range of failed login attempts for each portal given any hour of the day AND alert on when there is a high (or suspiciously low) amount of logins coming from a portal.
Lastly, for your third question, time is everything in this practice. I want to identify what is “common” in all time periods of the day/week. Using ML and the six month data in my Splunk, I want to identify the usual number of login events for each portal on an hourly or daily basis.
Do you have any suggestions for how to use ML for this use case? Or do you think I am still trying to hammer something that is not a nail? Again, I appreciate your help and look forward to any input!
... View more