Solved: Help with creating an earliest/latest event table

katharsys · ‎07-31-2019

(Using Splunk 6.1.2 for...reasons)

Background: We send out a push notification to a third party. The third party sometimes responds quickly with a data pull for that event, and sometimes days later.

The notifications are automatically extracted as the field "pushID"
The data pulls are automatically extracted as the field "pullID"

Several notifications and several extractions can occur for each event. To further complicate matters, there can be several irrelevant pullID events occurring before the relevant pushID event occurs.

What I am needing to do is get the pushID event time that occurs before the very first pullID time following, and calculate the delay (in decimal hours - which makes it a tad bit easier).

Up to this point, I have been using a memory hogging join subsearch on pullID for every stats earliest (pushID) and attempting to return the stats latest(pullID) result. Unfortunately, this is not returning correct results - and I need help in building something leaner and more accurate.

Sample Data (all from same host and source):

Jan 18 21:23:32 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 15 14:16:06 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 15 14:16:05 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 15 14:15:58 web event-server: EVENT_NOTIFICATION for pushID: ORG_12345_EVENT12345ABC1
Jan 15 14:14:11 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 15 14:13:03 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 15 14:13:02 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 15 14:13:00 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 15 14:12:35 web event-server: EVENT_NOTIFICATION for pushID: ORG_12345_EVENT12345ABC1
Jan 14 13:11:18 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 14 13:11:18 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 14 11:53:58 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1

Under my current code, the correct pushID time is returning (Jan 15 14:12:35) but the wrong pullID result is returned (Jan 18 21:23:32). Mostly complete code (minus some other evals for other things) below:

host=web source=/var/log/messages earliest=01/14/2019:00:00:00 latest=01/20/2019:24:00:00 pushID!=NULL
...[other evals and rex here]...
| stats earliest(_time) as pushTime by pushID
| fields pushID pushTime
| eval pullID=pushID 
| join pullID max=0  usetime=true earlier=false
[search host=web source=/var/log/messages earliest=01/14/2019:00:00:00 latest=01/20/2019:24:00:00
| stats earliest(_time) as pullTime by pullID
| fields pullID pullTime
] 
| eval delayTime=pullTime-pushTime
| where delayTime > 0
...[other evals here]...
| eval delayHours = delayTime/3600
| table pushID pushTime pullID pullTime delayHours

What am I doing wrong, and how can I fix it?
Is there a much simpler and less expensive way to retrieve the correct results? (have tried transactions, selfjoins, subsearches without joins, appendcols, etc.)

katharsys · ‎08-01-2019

After scouring more of the Splunk answers pages, I have cobbled together a result that works reasonably well. Unfortunately, my Karma is too low for me to actually post links to the various answers used, but I will try to add them at the end.

There is only one condition on this search which I can't seem to figure out - if by some chance, there is a pullID that is incredibly late, it will get knocked out of the results if there is a pullID that occurs before the pushID timestamp. That said, here is the resulting pseudocode:

index=main earliest=-1w@w1 latest=@w1 sourcetype=syslog (pushID=* OR pullID=*)
| eval matchID=coalesce(pushID,pullID)
...[Other evals, rex, and dragons be here]...
| eval eventType=if(isnull(pullID),"Push","Pull")
| eval hour-{eventType}=strftime(_time,"%c")
| stats latest(hour-Push) AS PushTime earliest(hour-Pull) as PullTime distinct_count(eventType) AS Type by matchID
| where Type=2
| eval decimalHours=(strptime(PullTime,"%c")-strptime(PushTime,"%c"))/3600
| where decimalHours>0.0084
| sort decimalHours desc
| fields - Type

@somesoni2's answer here https://answers.splunk.com/answers/586269/alternative-to-joins.html#answer-586298 for the idea on using distinct_count based on an event type.

@MuS's answer here https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo... for the how-to on making a dynamic field name.

EDIT: The where clause for decimalHours was to limit only those pullIDs that were longer than 5 minutes (give or take).

View solution in original post

katharsys · ‎08-01-2019

After scouring more of the Splunk answers pages, I have cobbled together a result that works reasonably well. Unfortunately, my Karma is too low for me to actually post links to the various answers used, but I will try to add them at the end.

There is only one condition on this search which I can't seem to figure out - if by some chance, there is a pullID that is incredibly late, it will get knocked out of the results if there is a pullID that occurs before the pushID timestamp. That said, here is the resulting pseudocode:

index=main earliest=-1w@w1 latest=@w1 sourcetype=syslog (pushID=* OR pullID=*)
| eval matchID=coalesce(pushID,pullID)
...[Other evals, rex, and dragons be here]...
| eval eventType=if(isnull(pullID),"Push","Pull")
| eval hour-{eventType}=strftime(_time,"%c")
| stats latest(hour-Push) AS PushTime earliest(hour-Pull) as PullTime distinct_count(eventType) AS Type by matchID
| where Type=2
| eval decimalHours=(strptime(PullTime,"%c")-strptime(PushTime,"%c"))/3600
| where decimalHours>0.0084
| sort decimalHours desc
| fields - Type

@somesoni2's answer here https://answers.splunk.com/answers/586269/alternative-to-joins.html#answer-586298 for the idea on using distinct_count based on an event type.

@MuS's answer here https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo... for the how-to on making a dynamic field name.

EDIT: The where clause for decimalHours was to limit only those pullIDs that were longer than 5 minutes (give or take).

woodcock · ‎08-04-2019

Make sure that you UpVote those answers in those links that helped you.

katharsys · ‎08-05-2019

Done! Thank you for the reminder to UpVote those comments!

praphulla1 · ‎08-01-2019

hello

I think i have understood your problem statement. Maybe you can try tweaking your approach, instead of JOIN, use append and then try use transaction to find the time difference between the events.

https://docs.splunk.com/Documentation/Splunk/7.3.0/SearchReference/Transaction

hopefully this helps find a solution, good luck.

gcusello · ‎08-01-2019

Hi katharsys,
probably the problems is related to the limit of 50,000 events in subsearches.
so you have to use a different approach;

(host=web source=/var/log/messages earliest=01/14/2019:00:00:00 latest=01/20/2019:24:00:00 pushID!=NULL) OR (host=web source=/var/log/messages earliest=01/14/2019:00:00:00 latest=01/20/2019:24:00:00)
| stats earliest(_time) AS pushTime latest(_time) AS pullTime BY pullID
| eval delayTime=pullTime-pushTime
 | where delayTime > 0
 ...[other evals here]...
 | eval delayHours = delayTime/3600
 | table pushID pushTime pullID pullTime delayHours

If you need to use other fields insert in | stats also values(your_field) AS your_field
In addition I suggest to always use index in your searches.

Bye.
Giuseppe

katharsys · ‎08-01-2019

Hello,

I am sort of confused by the response. The way you have that written, it is simply getting the earliest pullID timestamp and the latest pullID timestamp... To use my exemplar data above, it would simply return Jan 14 11:53:58 and assign it to the pushTime(?!) and Jan 18 21:23:32 as pullTime, which is the issue I have having trouble with...

There is no correlation between the latest pushID time and the earliest pullID time that occurs following the pushID time.

Help with creating an earliest/latest event table

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms