Solved: How to get stats to set a field from the same even...

tffishe · ‎04-25-2017

While handling our CAS logs I have a report that calculates the time it takes to validate a CAS service ticket.

I use stats to get the average, high, and low values for the time it takes for tickets to be processed (validated or failed to validate).

Right now I also include the last ticket and user in the stats output in case I want to look further into them.

The ticket and user are set for each event by the previous sort and multiple streamstats.

What I'd like to have instead of the last ticket and user in the stats command is the ticket and user from the event which had the max etime (elapsed time) field.

Can anyone suggest how I could get that?

Here is my search:

sourcetype=caslog action="SERVICE_TICKET*" 
| fields _time,action,user,ticket,service,client_ip 
| eval service=mvjoin(mvindex(split(service,"/"),0,3),"/") 
| eval action=replace(action,"SERVICE_TICKET_","ST_") 
| sort 0 service 
| streamstats first(service) as Service by ticket 
| streamstats first(user) as User by ticket 
| streamstats first(_time) as stime by ticket 
| search action!="SERVICE_TICKET_CREATED" 
| eval etime=_time-stime 
| stats count, avg(etime) as ea, max(etime) as eh, Min(etime) as el, last(ticket) as Last_Ticket last(User) as Last_User by action, Service
| eval Avg_Ticket_Time=round(ea,3) 
| eval Low_Ticket_Time=round(el,3) 
| eval High_Ticket_Time=round(eh,3) 
|table count,action,Service,Avg_Ticket_Time,Low_Ticket_Time,High_Ticket_Time,Last_Ticket,Last_User 
| sort 0 - action,count

DalJeanis · ‎04-25-2017

first and last are relative to the order that splunk sees the records. So, just before your stats command, put this---

| sort 0 etime

and therefore the last record will be the one with the highest elapsed time, assuming the rest of your code is correct.

more notes -

1) Given the results of the replace command, you probably mean...

| search action!="ST_CREATED"

2) Since first gets the earliest occurrence in the order seen by splunk, then unless you use sort or reverse, the order will be most recent to oldest, and first will return the most recent record. Which means in the code above all your etimes would be negative numbers, which I assume you would have noticed? For my sanity, I'm going to assume the format and value of "service" is somehow implicitly causing them to sort into date-time order.

If I were coding it, I'd probably have the sort as...

| sort 0 ticket _time

All of the above changes leave the code looking somewhat like this...

sourcetype=caslog action="SERVICE_TICKET*" 
| table _time,action,user,ticket,service,client_ip 
| eval service=mvjoin(mvindex(split(service,"/"),0,3),"/") 
| eval action=replace(action,"SERVICE_TICKET_","ST_") 
| sort 0 ticket _time
| streamstats first(service) as Service, first(user) as User, first(_time) as stime  by ticket 
| search action!="ST_CREATED" 
| eval etime=_time-stime 
| sort 0 etime 
| stats count,  min(etime) as el, avg(etime) as ea, max(etime) as eh, 
    last(ticket) as Slowest_Ticket, last(User) as Slowest_User by action, Service
| eval Avg_Ticket_Time=round(ea,3) 
| eval Low_Ticket_Time=round(el,3) 
| eval High_Ticket_Time=round(eh,3) 
| table count, action, Service, Avg_Ticket_Time, Low_Ticket_Time, High_Ticket_Time, Slowest_Ticket, Slowest_User 
| sort 0 - action,count

View solution in original post

somesoni2 · ‎04-25-2017

GIve this a try

sourcetype=caslog action="SERVICE_TICKET*" | fields _time,action,user,ticket,service,client_ip | 
eval service=mvjoin(mvindex(split(service,"/"),0,3),"/") | eval action=replace(action,"SERVICE_TICKET_","ST_") | 
sort 0 service | streamstats first(service) as Service first(user) as User first(_time) as stime by ticket |
 search action!="SERVICE_TICKET_CREATED" | eval etime=_time-stime 
| eventstats count  avg(etime) as ea, max(etime) as eh, Min(etime) as el  by action, Service 
| where etime=eh | rename ticket as Last_Ticket User as Last_User
| eval Avg_Ticket_Time=round(ea,3) | eval Low_Ticket_Time=round(el,3) | eval High_Ticket_Time=round(eh,3) | 
table count,action,Service,Avg_Ticket_Time,Low_Ticket_Time,High_Ticket_Time,Last_Ticket,Last_User | sort 0 - action,count

DalJeanis · ‎04-25-2017

first and last are relative to the order that splunk sees the records. So, just before your stats command, put this---

| sort 0 etime

and therefore the last record will be the one with the highest elapsed time, assuming the rest of your code is correct.

more notes -

1) Given the results of the replace command, you probably mean...

| search action!="ST_CREATED"

2) Since first gets the earliest occurrence in the order seen by splunk, then unless you use sort or reverse, the order will be most recent to oldest, and first will return the most recent record. Which means in the code above all your etimes would be negative numbers, which I assume you would have noticed? For my sanity, I'm going to assume the format and value of "service" is somehow implicitly causing them to sort into date-time order.

If I were coding it, I'd probably have the sort as...

| sort 0 ticket _time

All of the above changes leave the code looking somewhat like this...

sourcetype=caslog action="SERVICE_TICKET*" 
| table _time,action,user,ticket,service,client_ip 
| eval service=mvjoin(mvindex(split(service,"/"),0,3),"/") 
| eval action=replace(action,"SERVICE_TICKET_","ST_") 
| sort 0 ticket _time
| streamstats first(service) as Service, first(user) as User, first(_time) as stime  by ticket 
| search action!="ST_CREATED" 
| eval etime=_time-stime 
| sort 0 etime 
| stats count,  min(etime) as el, avg(etime) as ea, max(etime) as eh, 
    last(ticket) as Slowest_Ticket, last(User) as Slowest_User by action, Service
| eval Avg_Ticket_Time=round(ea,3) 
| eval Low_Ticket_Time=round(el,3) 
| eval High_Ticket_Time=round(eh,3) 
| table count, action, Service, Avg_Ticket_Time, Low_Ticket_Time, High_Ticket_Time, Slowest_Ticket, Slowest_User 
| sort 0 - action,count

tffishe · ‎04-27-2017

Good thought about simply using another sort.
I had already found a solution that worked myself using this:

sourcetype=caslog action="SERVICE_TICKET*" | fields _time,action,user,ticket,service,client_ip | 
  eval service=mvjoin(mvindex(split(service,"/"),0,3),"/") | eval Action=replace(action,"SERVICE_TICKET_","ST_") | 
  sort 0 ticket _time | streamstats first(service) as Service by ticket | streamstats first(user) as User by ticket | 
  streamstats first(_time) as stime by ticket | search action!="SERVICE_TICKET_CREATED" | 
  eval etime=_time-stime | eval stime=substr(strftime(stime, "%m/%d:%H:%M"),1,20) | 
  streamstats max(etime) as maxtime by Action,Service | eval maxticket=if(maxtime = etime,ticket,maxticket) | 
  eval maxuser=if(maxtime = etime,User,maxuser) | 
  stats count as Count, avg(etime) as ea, Min(etime) as el, max(etime) as eh, last(maxticket) as Max_Ticket, 
    last(maxuser) as Max_User by Action,Service | eval "Avg Ticket Time"=round(ea,3) | 
  eval "Low Ticket Time"=round(el,3) | eval "High Ticket Time"=round(eh,3) | 
  table Count,Action,Service,"Avg Ticket Time","Low Ticket Time","High Ticket Time",Max_Ticket,Max_User

Which uses streamstats to find the max time and then eval if to match the user and ticket for the max time. It works, but your solution was more efficient - cutting the time nearly in half.

I had also found my flaw in the subsearch for action after I added the eval to reduce the space occupied by the action. I had not caught my flaw in the first sort which should have been on ticket and time. For CAS service tickets that was key, so thanks for that.

Results are looking good now (although I can't do too long of time span or limits chops my results without telling me).

DalJeanis · ‎04-27-2017

Great! You could probably collect the data to a summary index to get your longer-term view... but you would want to add an additional _time component to ease analysis.

I don't see where you are hitting limits other than overall amount of time. There are no subsearches in evidence, and the streamstats ... by won't hit any limits because of the sort.

Changing | sort 0 ticket _time to | reverse would likely be marginally faster, but it WOULD run into streamstats limits if you had a large number of tickets.

You might do a quick time trial comparing the following lines. They are functionally equivalent, but I don't know enough about the implementation to know if one is any faster than the other.

| eval action=replace(action,"SERVICE_TICKET_","ST_")

or

| rex mode=sed field=action  "s/SERVICE_TICKET/ST_/g"

tffishe · ‎04-27-2017

The limits I referred to are (I am assuming) the same as what was affecting this discussion - https://answers.splunk.com/answers/453261/why-is-the-streamstats-command-not-returning-all-e.html. I'm seeing very similar results. If I run this search over the last seven days it is fine (just under 1 Mil results), but over 12 days it looks like it loses some of the events. My stats show lower counts for more days. It looks like the streamstats is just dropping some events without saying a word. I can build summary tables if I find I need to do this regularly; but it's concerning that a search would simply drop some events without any indication.
I will give rex a try for the replace to get the efficiency.

Thanks.

How to get stats to set a field from the same event as another max() field

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!