Splunk Search

How to extract more complex statistics from my sample log file?

lctanlc
New Member

I have a log file with the following content:

2015-05-02 17:07 - :Search time taken | '16,414ms' |, Search count returned| '10,000' |
2015-05-02 18:05 - :Search time taken | '0ms' |, Search count returned| '0' |
2015-05-02 18:05 - :Search time taken | '2,555ms' |, Search count returned| '230' |
2015-05-02 23:51 - :Search time taken | '2,066ms' |, Search count returned| '3' |
2015-05-03 04:28 - :Search time taken | '982ms' |, Search count returned| '61' |
2015-05-05 07:15 - :Search time taken | '9,568ms' |, Search count returned| '5,122' |
2015-05-05 09:15 - :Search time taken | '7,344ms' |, Search count returned| '4,425' |

How can I extract the following statistics from this log file through the search:
- Average, Minimum and Maximum search time taken (in seconds) to return 1 to 100 of search count
- Average, Minimum and Maximum search time taken (in seconds) to return 101 to 1000 of search count
- Average, Minimum and Maximum search time taken (in seconds) to return more than 1000 of search count
- Total number of instances that returned 1 to 100 of search count
- Total number of instances that returned 101 to 1000 of search count
- Total number of instances that returned more than 1000 of search count
- Total number of instances with Absolute search time taken more than 0s but less than or equal to 5s
- Total number of instances with Absolute search time taken more than 5s but less than or equal to 10s
- Total number of instances with Absolute search time taken more than 10s

Tags (1)
0 Karma
1 Solution

cmeinco
Path Finder

Extract:

... | rex field=_raw "Search time taken[^\d]+(?[\d,]+)ms.*?Search count returned[^\d]+(?[\d+,]+)" | convert rmcomma(srch_time) rmcomma(srch_count)

Convert time to seconds:

... | eval srch_time_seconds=(srch_time/1000)

I love using case as it allows me to define my buckets as I want them. The previous answer had the quickest way to your solution looking for the log10 ranges (bin span=log10 srch_time), but if you need custom bucket sizes or ranges and labels, use case

For your second example: (<=5,5-10,10+); you said >0, but i assume you mean >=, the default 1=1 would catch any negative numbers or error cases.

... | eval srch_time_buckets=case(srch_time_seconds>10,"10+",srch_time>5,"5-10",srch_time>=0,"<5",1=1,"<0")

And for both you can use the same stats:

... | stats count AS total avg(srch_time) AS averageTime min(srch_time) AS minTime max(srch_time) AS maxTime by srch_time_buckets

Good Luck!

View solution in original post

0 Karma

gyslainlatsa
Motivator

hi Ictanic,
this is a regular expression for extract time_taken:

................| rex field=_raw "^[^'\n]*'(?P<time_taken>[^']+)"

or for extract all values use: | rex field=_raw max_match=0 "^[^'\n]*'(?P[^']+)"
and this for extract search_count:

.............. | rex field=_raw "^(?:[^'\n]*'){3}(?P<count_search>[^']+)" 

or for extract all values use: | rex field=_raw max_match=0 ""^(?:[^'\n]*'){3}(?P[^']+)"
globally, you write your search like this:

--------------------------------- | rex field=_raw "^[^'\n]*'(?P<time_taken>[^']+)" | rex field=_raw "^(?:[^'\n]*'){3}(?P<count_search>[^']+)"|stats avg(time_taken) min(time_taken) max(time_taken) by count_search
0 Karma

cmeinco
Path Finder

Extract:

... | rex field=_raw "Search time taken[^\d]+(?[\d,]+)ms.*?Search count returned[^\d]+(?[\d+,]+)" | convert rmcomma(srch_time) rmcomma(srch_count)

Convert time to seconds:

... | eval srch_time_seconds=(srch_time/1000)

I love using case as it allows me to define my buckets as I want them. The previous answer had the quickest way to your solution looking for the log10 ranges (bin span=log10 srch_time), but if you need custom bucket sizes or ranges and labels, use case

For your second example: (<=5,5-10,10+); you said >0, but i assume you mean >=, the default 1=1 would catch any negative numbers or error cases.

... | eval srch_time_buckets=case(srch_time_seconds>10,"10+",srch_time>5,"5-10",srch_time>=0,"<5",1=1,"<0")

And for both you can use the same stats:

... | stats count AS total avg(srch_time) AS averageTime min(srch_time) AS minTime max(srch_time) AS maxTime by srch_time_buckets

Good Luck!

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Extract your fields for the search duration and search count, then run searches like this:

... | bin span=log10 search_count | stats avg(duration) min(duration) max(duration) by search_count

That example should answer your first three points, the others can be solved using the same pattern.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...