(Using Splunk 6.1.2 for...reasons)
Background: We send out a push notification to a third party. The third party sometimes responds quickly with a data pull for that event, and sometimes days later.
The notifications are automatically extracted as the field "pushID"
The data pulls are automatically extracted as the field "pullID"
Several notifications and several extractions can occur for each event. To further complicate matters, there can be several irrelevant pullID
events occurring before the relevant pushID
event occurs.
What I am needing to do is get the pushID
event time that occurs before the very first pullID
time following, and calculate the delay (in decimal hours - which makes it a tad bit easier).
Up to this point, I have been using a memory hogging join
subsearch on pullID
for every stats earliest (pushID
) and attempting to return the stats latest(pullID
) result. Unfortunately, this is not returning correct results - and I need help in building something leaner and more accurate.
Sample Data (all from same host and source):
Jan 18 21:23:32 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 15 14:16:06 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 15 14:16:05 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 15 14:15:58 web event-server: EVENT_NOTIFICATION for pushID: ORG_12345_EVENT12345ABC1
Jan 15 14:14:11 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 15 14:13:03 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 15 14:13:02 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 15 14:13:00 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 15 14:12:35 web event-server: EVENT_NOTIFICATION for pushID: ORG_12345_EVENT12345ABC1
Jan 14 13:11:18 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 14 13:11:18 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Jan 14 11:53:58 web event-server: Requesting pull event for pullID: ORG_12345_EVENT12345ABC1
Under my current code, the correct pushID
time is returning (Jan 15 14:12:35) but the wrong pullID
result is returned (Jan 18 21:23:32). Mostly complete code (minus some other evals for other things) below:
host=web source=/var/log/messages earliest=01/14/2019:00:00:00 latest=01/20/2019:24:00:00 pushID!=NULL
...[other evals and rex here]...
| stats earliest(_time) as pushTime by pushID
| fields pushID pushTime
| eval pullID=pushID
| join pullID max=0 usetime=true earlier=false
[search host=web source=/var/log/messages earliest=01/14/2019:00:00:00 latest=01/20/2019:24:00:00
| stats earliest(_time) as pullTime by pullID
| fields pullID pullTime
]
| eval delayTime=pullTime-pushTime
| where delayTime > 0
...[other evals here]...
| eval delayHours = delayTime/3600
| table pushID pushTime pullID pullTime delayHours
After scouring more of the Splunk answers pages, I have cobbled together a result that works reasonably well. Unfortunately, my Karma is too low for me to actually post links to the various answers used, but I will try to add them at the end.
There is only one condition on this search which I can't seem to figure out - if by some chance, there is a pullID that is incredibly late, it will get knocked out of the results if there is a pullID that occurs before the pushID timestamp. That said, here is the resulting pseudocode:
index=main earliest=-1w@w1 latest=@w1 sourcetype=syslog (pushID=* OR pullID=*)
| eval matchID=coalesce(pushID,pullID)
...[Other evals, rex, and dragons be here]...
| eval eventType=if(isnull(pullID),"Push","Pull")
| eval hour-{eventType}=strftime(_time,"%c")
| stats latest(hour-Push) AS PushTime earliest(hour-Pull) as PullTime distinct_count(eventType) AS Type by matchID
| where Type=2
| eval decimalHours=(strptime(PullTime,"%c")-strptime(PushTime,"%c"))/3600
| where decimalHours>0.0084
| sort decimalHours desc
| fields - Type
@somesoni2's answer here https://answers.splunk.com/answers/586269/alternative-to-joins.html#answer-586298 for the idea on using distinct_count based on an event type.
@MuS's answer here https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo... for the how-to on making a dynamic field name.
EDIT: The where clause for decimalHours was to limit only those pullIDs that were longer than 5 minutes (give or take).
After scouring more of the Splunk answers pages, I have cobbled together a result that works reasonably well. Unfortunately, my Karma is too low for me to actually post links to the various answers used, but I will try to add them at the end.
There is only one condition on this search which I can't seem to figure out - if by some chance, there is a pullID that is incredibly late, it will get knocked out of the results if there is a pullID that occurs before the pushID timestamp. That said, here is the resulting pseudocode:
index=main earliest=-1w@w1 latest=@w1 sourcetype=syslog (pushID=* OR pullID=*)
| eval matchID=coalesce(pushID,pullID)
...[Other evals, rex, and dragons be here]...
| eval eventType=if(isnull(pullID),"Push","Pull")
| eval hour-{eventType}=strftime(_time,"%c")
| stats latest(hour-Push) AS PushTime earliest(hour-Pull) as PullTime distinct_count(eventType) AS Type by matchID
| where Type=2
| eval decimalHours=(strptime(PullTime,"%c")-strptime(PushTime,"%c"))/3600
| where decimalHours>0.0084
| sort decimalHours desc
| fields - Type
@somesoni2's answer here https://answers.splunk.com/answers/586269/alternative-to-joins.html#answer-586298 for the idea on using distinct_count based on an event type.
@MuS's answer here https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo... for the how-to on making a dynamic field name.
EDIT: The where clause for decimalHours was to limit only those pullIDs that were longer than 5 minutes (give or take).
Make sure that you UpVote
those answers in those links that helped you.
Done! Thank you for the reminder to UpVote
those comments!
hello
I think i have understood your problem statement. Maybe you can try tweaking your approach, instead of JOIN, use append and then try use transaction to find the time difference between the events.
https://docs.splunk.com/Documentation/Splunk/7.3.0/SearchReference/Transaction
hopefully this helps find a solution, good luck.
Hi katharsys,
probably the problems is related to the limit of 50,000 events in subsearches.
so you have to use a different approach;
(host=web source=/var/log/messages earliest=01/14/2019:00:00:00 latest=01/20/2019:24:00:00 pushID!=NULL) OR (host=web source=/var/log/messages earliest=01/14/2019:00:00:00 latest=01/20/2019:24:00:00)
| stats earliest(_time) AS pushTime latest(_time) AS pullTime BY pullID
| eval delayTime=pullTime-pushTime
| where delayTime > 0
...[other evals here]...
| eval delayHours = delayTime/3600
| table pushID pushTime pullID pullTime delayHours
If you need to use other fields insert in | stats also values(your_field) AS your_field
In addition I suggest to always use index in your searches.
Bye.
Giuseppe
Hello,
I am sort of confused by the response. The way you have that written, it is simply getting the earliest pullID timestamp and the latest pullID timestamp... To use my exemplar data above, it would simply return Jan 14 11:53:58 and assign it to the pushTime(?!) and Jan 18 21:23:32 as pullTime, which is the issue I have having trouble with...
There is no correlation between the latest pushID time and the earliest pullID time that occurs following the pushID time.