I have an application that has predictable log entries when it starts a series of activities and when it finishes. I can create transactions, etc. - all good. What I'm struggling with, however, is how to construct a search that tells me which activities didn't complete. Basically - identifying that a set of activities was started, but didn't result in the log entries that indicate it finished. I've done some searches like
index="cloudwatch" | regex "\w{8}\-\w{4}\-\w{4}\-\w{4}\-\w{12}/\d{6}/\d+/\w{8}\-\w{4}\-\w{4}\-\w{4}\-\w{12}" | rex "(?\w{8}\-\w{4}\-\w{4}\-\w{4}\-\w{12})/(?\d{6})/(?\d+)/(?\w{8}\-\w{4}\-\w{4}\-\w{4}\-\w{12})" | stats count by admin,ticket,token,chunk_id | where count<7
the regex is basically identifying all the events that have the string that can be used to identify participation in the same transaction - then the rex extracts the individual parts that are meaningful. I can run this, but over a set of tens of millions of events, it's not the fastest in the world. Even setting this up as a scheduled search, I'll end up with phantom records because not all events are within the timeframe being searched - you'll get some orphans at the edges. I can also search for just the beginning / end, adding something like
(("received event" manifest.json) OR "writing postdata")
to the base search, that can speed up the search, but only by a little - and it'll get worse as the data set grows. Ultimately, I want to define a search that finds 'chunk_id's that didn't complete, schedule it, and get an alert. The reason being, there's usually some corrective action needed and it can be time-sensitive.
I feel like I've struggled with this notion of "finding the data that isn't there" numerous times in the past, never quite getting something that seemed "right" - so, finally posting something up here in case anyone has some pointers.
Thanks!
Have you looked at accelerated datamodel
? From you description, is appears, you have the right query to get you the desired results, what you looking for is a faster solution.
http://docs.splunk.com/Documentation/Splunk/6.5.1/Knowledge/Acceleratedatamodels