Splunk Search

How to extract more complex statistics from my sample log file?

lctanlc
New Member

I have a log file with the following content:

2015-05-02 17:07 - :Search time taken | '16,414ms' |, Search count returned| '10,000' |
2015-05-02 18:05 - :Search time taken | '0ms' |, Search count returned| '0' |
2015-05-02 18:05 - :Search time taken | '2,555ms' |, Search count returned| '230' |
2015-05-02 23:51 - :Search time taken | '2,066ms' |, Search count returned| '3' |
2015-05-03 04:28 - :Search time taken | '982ms' |, Search count returned| '61' |
2015-05-05 07:15 - :Search time taken | '9,568ms' |, Search count returned| '5,122' |
2015-05-05 09:15 - :Search time taken | '7,344ms' |, Search count returned| '4,425' |

How can I extract the following statistics from this log file through the search:
- Average, Minimum and Maximum search time taken (in seconds) to return 1 to 100 of search count
- Average, Minimum and Maximum search time taken (in seconds) to return 101 to 1000 of search count
- Average, Minimum and Maximum search time taken (in seconds) to return more than 1000 of search count
- Total number of instances that returned 1 to 100 of search count
- Total number of instances that returned 101 to 1000 of search count
- Total number of instances that returned more than 1000 of search count
- Total number of instances with Absolute search time taken more than 0s but less than or equal to 5s
- Total number of instances with Absolute search time taken more than 5s but less than or equal to 10s
- Total number of instances with Absolute search time taken more than 10s

Tags (1)
0 Karma
1 Solution

cmeinco
Path Finder

Extract:

... | rex field=_raw "Search time taken[^\d]+(?[\d,]+)ms.*?Search count returned[^\d]+(?[\d+,]+)" | convert rmcomma(srch_time) rmcomma(srch_count)

Convert time to seconds:

... | eval srch_time_seconds=(srch_time/1000)

I love using case as it allows me to define my buckets as I want them. The previous answer had the quickest way to your solution looking for the log10 ranges (bin span=log10 srch_time), but if you need custom bucket sizes or ranges and labels, use case

For your second example: (<=5,5-10,10+); you said >0, but i assume you mean >=, the default 1=1 would catch any negative numbers or error cases.

... | eval srch_time_buckets=case(srch_time_seconds>10,"10+",srch_time>5,"5-10",srch_time>=0,"<5",1=1,"<0")

And for both you can use the same stats:

... | stats count AS total avg(srch_time) AS averageTime min(srch_time) AS minTime max(srch_time) AS maxTime by srch_time_buckets

Good Luck!

View solution in original post

0 Karma

gyslainlatsa
Motivator

hi Ictanic,
this is a regular expression for extract time_taken:

................| rex field=_raw "^[^'\n]*'(?P<time_taken>[^']+)"

or for extract all values use: | rex field=_raw max_match=0 "^[^'\n]*'(?P[^']+)"
and this for extract search_count:

.............. | rex field=_raw "^(?:[^'\n]*'){3}(?P<count_search>[^']+)" 

or for extract all values use: | rex field=_raw max_match=0 ""^(?:[^'\n]*'){3}(?P[^']+)"
globally, you write your search like this:

--------------------------------- | rex field=_raw "^[^'\n]*'(?P<time_taken>[^']+)" | rex field=_raw "^(?:[^'\n]*'){3}(?P<count_search>[^']+)"|stats avg(time_taken) min(time_taken) max(time_taken) by count_search
0 Karma

cmeinco
Path Finder

Extract:

... | rex field=_raw "Search time taken[^\d]+(?[\d,]+)ms.*?Search count returned[^\d]+(?[\d+,]+)" | convert rmcomma(srch_time) rmcomma(srch_count)

Convert time to seconds:

... | eval srch_time_seconds=(srch_time/1000)

I love using case as it allows me to define my buckets as I want them. The previous answer had the quickest way to your solution looking for the log10 ranges (bin span=log10 srch_time), but if you need custom bucket sizes or ranges and labels, use case

For your second example: (<=5,5-10,10+); you said >0, but i assume you mean >=, the default 1=1 would catch any negative numbers or error cases.

... | eval srch_time_buckets=case(srch_time_seconds>10,"10+",srch_time>5,"5-10",srch_time>=0,"<5",1=1,"<0")

And for both you can use the same stats:

... | stats count AS total avg(srch_time) AS averageTime min(srch_time) AS minTime max(srch_time) AS maxTime by srch_time_buckets

Good Luck!

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Extract your fields for the search duration and search count, then run searches like this:

... | bin span=log10 search_count | stats avg(duration) min(duration) max(duration) by search_count

That example should answer your first three points, the others can be solved using the same pattern.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...