How to get max hits for a field ?

sp1711 · ‎06-04-2015

So, I have a search with a regex that has pulled 2 different fields- lets say user and client.

the url is something like,

{base_url}/user/{user_1}/hello.

This user field can hold 100's of values - user_1, user_2, user_3...........

I want to know how many times each "user" is hit on a daily basis for different clients(there are 4 clients). And I only want the users that have max hits everyday (top 5 ).

So, for everyday, for every client, top 5 users with the count ofcourse.
how do I do that.?

I tried this,

My_search|bucket span=1d _time | stats count by _time client user | head 5

This gives me a messed up output. Any ideas??

MuS · ‎06-04-2015

Hi sp1711,

The obvious search is something like:

My_search | timechart values(client) AS client count by user limit=5

but this shows the top 5 globally, not the top 5 per day.
The problem with "per-day" is that every day could have 5 completely different top user and thus for a month, you may need 150 series.

If you really want to calculate per day, it's something more like:

My_search
| bin span=1d _time
| stats count by _time client user
| sort - _time count
| dedup 5 _time

this will give you, per-day, the top 5 client, user ,count groups.
Add this to graph / chart it:

| timechart span=1d values(client) AS client sum(count) by user limit=1000

Hope this helps ...

cheers, MuS

sp1711 · ‎06-04-2015

Hi MuS,

That really got me close to what I want. I tried your second search

My_search
| bin span=1d _time
| stats count by _time client user
| sort - _time count
| dedup 5 _time

This gives me the top 5 users everyday along with which client it belongs. It doesn't give me top 5 users for every client. How do I tweak this for the expected result?

MuS · ‎06-04-2015

Just change the stats like | stats count by client user _time so it matches your needs. The first field after the by statement is the the sorting one.

sp1711 · ‎06-04-2015

Yes I did try that before posting the comment. It only gives me top 5 person for everyday. It gives me

|client|User|count|
|A| 1|100|
|A|2|90|
|A| 3|80|
|A| 4|70|
|A|5|50|

It doesnt give the stats for other clients B,C and D

sp1711 · ‎06-04-2015

Ok, Instead of dedup 5 _time I did dedup 5 client this does the job. But I'm getting the data only for today even if I select a date range of a month in the search. Thats weird.!

MuS · ‎06-04-2015

Use the job inspector to verify what happens with the time range in the base search

sp1711 · ‎06-04-2015

So I checked that ,
The component, command.dedup has input of 10,000 and output of 10.

Which makes sense because whatever date range I choose I only get 2 days worth of result (top 5 each), which makes it 10. Is that any issue with limit?

MuS · ‎06-04-2015

what is the exact search command you're using now?

sp1711 · ‎06-04-2015

This is the search index="abc" tag=def sourcetype=access_combined "hello"|fields correlation_id|join correlation_id[search index="abc" tag=something sourcetype=access_combined "whatsup"]|rex "(?i)/users/(?P[^/]+)" | rex field=req_host "^(?[^.]*)"

sp1711 · ‎06-04-2015

The formatting is screwed up!

One of the regex has user in it and another has client.

It eats up some parts when I try to format.

MuS · ‎06-04-2015

ohhh you're using a subsearch....I'm no friend of them at all 😉 Because you hit limits with them and they are not really fast. This is not related to this question, but look at this answer http://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-joi... and try to adapt your search to a single stats search.

sp1711 · ‎06-04-2015

Thanks for the direction. 🙂

How to get max hits for a field ?

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life