Solved: How to extract more complex statistics from my sam...

lctanlc · ‎06-22-2015

I have a log file with the following content:

2015-05-02 17:07 - :Search time taken | '16,414ms' |, Search count returned| '10,000' |
2015-05-02 18:05 - :Search time taken | '0ms' |, Search count returned| '0' |
2015-05-02 18:05 - :Search time taken | '2,555ms' |, Search count returned| '230' |
2015-05-02 23:51 - :Search time taken | '2,066ms' |, Search count returned| '3' |
2015-05-03 04:28 - :Search time taken | '982ms' |, Search count returned| '61' |
2015-05-05 07:15 - :Search time taken | '9,568ms' |, Search count returned| '5,122' |
2015-05-05 09:15 - :Search time taken | '7,344ms' |, Search count returned| '4,425' |

How can I extract the following statistics from this log file through the search:
- Average, Minimum and Maximum search time taken (in seconds) to return 1 to 100 of search count
- Average, Minimum and Maximum search time taken (in seconds) to return 101 to 1000 of search count
- Average, Minimum and Maximum search time taken (in seconds) to return more than 1000 of search count
- Total number of instances that returned 1 to 100 of search count
- Total number of instances that returned 101 to 1000 of search count
- Total number of instances that returned more than 1000 of search count
- Total number of instances with Absolute search time taken more than 0s but less than or equal to 5s
- Total number of instances with Absolute search time taken more than 5s but less than or equal to 10s
- Total number of instances with Absolute search time taken more than 10s

cmeinco · ‎06-23-2015

Extract:

... | rex field=_raw "Search time taken[^\d]+(?[\d,]+)ms.*?Search count returned[^\d]+(?[\d+,]+)" | convert rmcomma(srch_time) rmcomma(srch_count)

Convert time to seconds:

... | eval srch_time_seconds=(srch_time/1000)

I love using case as it allows me to define my buckets as I want them. The previous answer had the quickest way to your solution looking for the log10 ranges (bin span=log10 srch_time), but if you need custom bucket sizes or ranges and labels, use case

For your second example: (<=5,5-10,10+); you said >0, but i assume you mean >=, the default 1=1 would catch any negative numbers or error cases.

... | eval srch_time_buckets=case(srch_time_seconds>10,"10+",srch_time>5,"5-10",srch_time>=0,"<5",1=1,"<0")

And for both you can use the same stats:

... | stats count AS total avg(srch_time) AS averageTime min(srch_time) AS minTime max(srch_time) AS maxTime by srch_time_buckets

Good Luck!

View solution in original post

gyslainlatsa · ‎06-24-2015

hi Ictanic,
this is a regular expression for extract time_taken:

................| rex field=_raw "^[^'\n]*'(?P<time_taken>[^']+)"

or for extract all values use: | rex field=_raw max_match=0 "^[^'\n]*'(?P[^']+)"
and this for extract search_count:

.............. | rex field=_raw "^(?:[^'\n]*'){3}(?P<count_search>[^']+)"

or for extract all values use: | rex field=_raw max_match=0 ""^(?:[^'\n]*'){3}(?P[^']+)"
globally, you write your search like this:

--------------------------------- | rex field=_raw "^[^'\n]*'(?P<time_taken>[^']+)" | rex field=_raw "^(?:[^'\n]*'){3}(?P<count_search>[^']+)"|stats avg(time_taken) min(time_taken) max(time_taken) by count_search

cmeinco · ‎06-23-2015

Extract:

... | rex field=_raw "Search time taken[^\d]+(?[\d,]+)ms.*?Search count returned[^\d]+(?[\d+,]+)" | convert rmcomma(srch_time) rmcomma(srch_count)

Convert time to seconds:

... | eval srch_time_seconds=(srch_time/1000)

I love using case as it allows me to define my buckets as I want them. The previous answer had the quickest way to your solution looking for the log10 ranges (bin span=log10 srch_time), but if you need custom bucket sizes or ranges and labels, use case

For your second example: (<=5,5-10,10+); you said >0, but i assume you mean >=, the default 1=1 would catch any negative numbers or error cases.

... | eval srch_time_buckets=case(srch_time_seconds>10,"10+",srch_time>5,"5-10",srch_time>=0,"<5",1=1,"<0")

And for both you can use the same stats:

... | stats count AS total avg(srch_time) AS averageTime min(srch_time) AS minTime max(srch_time) AS maxTime by srch_time_buckets

Good Luck!

martin_mueller · ‎06-22-2015

Extract your fields for the search duration and search count, then run searches like this:

... | bin span=log10 search_count | stats avg(duration) min(duration) max(duration) by search_count

That example should answer your first three points, the others can be solved using the same pattern.

How to extract more complex statistics from my sample log file?

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms