Splunk Search

sum field until other associated field was changed.

arjitgoswami
Explorer

Hi All,

I have a scenario where I need to find total time taken by a particular servlet only until other servlet is not called.

for instance, my log file looks like
time Servlet Timetaken
24-05-2017 07:00 Servlet1 Time taken: 3
24-05-2017 07:15 Servlet1 Time taken: 5
24-05-2017 07:30 Servlet2 Time taken: 6
24-05-2017 07:45 Servlet3 Time taken: 7
24-05-2017 08:00 Servlet1 Time taken: 9

so my output should look like:

Servlet name TimeTaken

Servlet1 8 ---> this is sum of all time taken until Servlet2 appear.
Servlet2 6
Servlet3 7
Servlet1 9

using stats sum(TimeTaken) by Servlet is giving me sum of entire Servlet instance (here for Servlet1, I am getting 17) which I don't want. I want how much time did servlet1 took before Servlet2 came.

Can somebody please help me in this scenario?

Thanks and regards,
Arjit Goswami.

Tags (2)
1 Solution

DalJeanis
Legend

Start with this ...

your search that gets the records with _time, ServletName and TimeTaken
| sort 0 _time 

This substitutes for the above to give a run-anywhere test sample

| makeresults 
| eval ServletName="A B A A A B B C C C A C"  
| makemv ServletName 
| mvexpand ServletName 
|streamstats count as addtime 
| eval _time=_time+addtime 
| table _time ServletName 
| eval TimeTaken=1

... then this calculates the time

| streamstats current=f last(ServletName) as PriorName
| eval newServlet=if(PriorName==ServletName,0,1)
| streamstats sum(newServlet) as ServletNumber
| stats sum(TimeTaken) as TimeTaken by ServletNumber ServletName

I'm pretty sure that @somesoni2 or @woodcock has a more elegant way, but this works.

View solution in original post

arjitgoswami
Explorer

@DalJeanis @somesoni2 @woodcock @niketnilay ! Thanks for your responses but the issue here is we don't know the count of the records here. My Bad I should have mentioned it earlier !!! the thing is its a log file so these are just not only 5 records but there would be 1000's of records (and we don't know which all servlets are called in log file) . So records are like:

24-05-2017 07:00 Servlet1 Time taken: 3
24-05-2017 07:15 Servlet1 Time taken: 5
24-05-2017 07:30 Servlet2 Time taken: 6
24-05-2017 07:45 Servlet3 Time taken: 7
24-05-2017 08:00 Servlet1 Time taken: 9
.....
....
....
...
blah blah blah

Can you please suggest approach to this problem which can address this issue?

Thanks in advance

Kind regards,
Arjit.

0 Karma

woodcock
Esteemed Legend

As I understand it, each of the solutions provided should work just fine. Have you tested them? If all fail, you will need to VERY clearly explain the nature of the failure/breakdown.

0 Karma

arjitgoswami
Explorer

@woodcock! Thanks. Issue was related to configuration. It all worked now !!!

@DalJeanis @somesoni2 @woodcock @niketnilay! Thanks for all your help.

0 Karma

niketn
Legend

@DalJeanis @somesoni2 @woodcock, here is another approach using reset_on_change.

Following query creates some time dependent test data using gentimes and makeresults.

| gentimes start=-9
| eval _time=starttime
| table _time
| appendcols [| makeresults  
| eval ServletName="Servlet1"
| eval TimeTaken="3"
| append [| makeresults  
| eval ServletName="Servlet1"
| eval TimeTaken="6"]
| append [| makeresults  
| eval ServletName="Servlet2"
| eval TimeTaken="3"]
| append [| makeresults  
| eval ServletName="Servlet2"
| eval TimeTaken="1"]
| append [| makeresults  
| eval ServletName="Servlet2"
| eval TimeTaken="3"]
| append [| makeresults  
| eval ServletName="Servlet3"
| eval TimeTaken="7"]
| append [| makeresults  
| eval ServletName="Servlet4"
| eval TimeTaken="4"]
| append [| makeresults  
| eval ServletName="Servlet5"
| eval TimeTaken="3"]
| append [| makeresults  
| eval ServletName="Servlet5"
| eval TimeTaken="9"]
]

Following query gets the expected results:
The first streamstats calculates the sum of Time Taken by a Servlet until the ID changes.
The subsequent streamstats filters records where same Servlet is repeated.

| streamstats sum(TimeTaken) as Duration BY ServletName reset_on_change=true
| reverse
| streamstats current=f window=2 last(ServletName) as NextServletName
| where ServletName!=NextServletName OR isnull(NextServletName)
| table _time ServletName Duration
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

woodcock
Esteemed Legend

Like this:

| makeresults 
| eval raw="1,3 1,5 2,6 3,7 1,9" 
| makemv delim=" " raw 
| mvexpand raw 
| rename raw AS _raw 
| rex "^(?<Servlet>[^,]+),(?<Timetaken>.*)$" 
| streamstats count as addtime 
| eval _time=_time+addtime 
| table _time Servlet Timetaken

| rename COMMENT AS "Everything above generates sample event data; everything below is your solution"

| streamstats count AS myPos
| streamstats count AS myServletPos BY Servlet
| eventstats first(myPos) AS firstServletPos BY Servlet
| eval normaizedServletPos = myPos - firstServletPos + 1
| eval goodTimetaken=if((myServletPos = normaizedServletPos), Timetaken, 0)
| stats sum(goodTimetaken) BY Servlet
0 Karma

DalJeanis
Legend

Start with this ...

your search that gets the records with _time, ServletName and TimeTaken
| sort 0 _time 

This substitutes for the above to give a run-anywhere test sample

| makeresults 
| eval ServletName="A B A A A B B C C C A C"  
| makemv ServletName 
| mvexpand ServletName 
|streamstats count as addtime 
| eval _time=_time+addtime 
| table _time ServletName 
| eval TimeTaken=1

... then this calculates the time

| streamstats current=f last(ServletName) as PriorName
| eval newServlet=if(PriorName==ServletName,0,1)
| streamstats sum(newServlet) as ServletNumber
| stats sum(TimeTaken) as TimeTaken by ServletNumber ServletName

I'm pretty sure that @somesoni2 or @woodcock has a more elegant way, but this works.

somesoni2
Revered Legend

My answer would've been same as this (with only diff of using | accum newServlet instead of | streamstats sum(newServlet) as ServletNumber.

0 Karma

DalJeanis
Legend

@woodcock and @somesoni2 - dang, I was assuming there was some clever way to use an eval in the streamstats to save a step, but I couldn't figure it out.

I guess it looks like this, but it isn't any more efficient that I can see...

 | streamstats sum(eval(if(PriorName==ServletName,0,1))) as ServletNumber

... and it would be more confusing to beginners who are likely to be reading the answer for years.

0 Karma

woodcock
Esteemed Legend

I wrote my answer the way that I did mostly for educational purposes; it could have been done more efficiently but would lose clarity.

0 Karma

DalJeanis
Legend

@woodcock - Yep. We're teaching people how to use the tool belt, more than just how to make a "chair".

Still, it's cool sometimes to see you masters do something sideways... and I go, "Wait... what was THAT?"

0 Karma

woodcock
Esteemed Legend

When I see that @DalJeanis has answered already, I typically move on assuming it is answered well (same for @somesoni2).

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...