I have a log that registers search queries with the following format:
_time q MAC
11/13/12 12:46:31.000 AM CA 0000000B4BE8
11/13/12 12:46:31.200 AM CARL 0000000B4BE8
11/13/12 12:46:31.250 AM CARLIE 0000000B4BE8
11/13/12 12:46:31.255 AM CARLIE B 0000000B4BE8
11/13/12 12:46:31.270 AM CARLIE BR 0000000B4BE8
11/13/12 12:46:32.100 AM CHAR 0000000B4BE8
11/13/12 12:46:32.150 AM CHARL 0000000B4BE8
11/13/12 12:46:32.155 AM CHARLI 0000000B4BE8
11/13/12 12:46:32.223 AM CHARLIE 0000000B4BE8
11/13/12 12:46:32.354 AM CHARLIE B 0000000B4BE8
11/13/12 12:46:33.400 AM CHARLIE BR 0000000B4BE8
The result set I need should be:
MAC count q
0000000B4BE8 5 CARLIE BR
0000000B4BE8 6 CHARLIE BR
The recommended query found in below answer is the following:
search construct|
| stats count by q, MAC
| eval len=length(q)
| eventstats max(len) AS longest by MAC| where len=longest
| table MAC q count
The query does not returned the distinct longest q. If this query is ran against the sample log presented before the result set is:
MAC count q
0000000B4BE8 6 CHARLIE BR
It should be:
MAC count q
0000000B4BE8 5 CARLIE BR
0000000B4BE8 6 CHARLIE BR
Any ideas? How to fix it.
Thanks,
Lp
Yes; you'll want to write a field extraction to either capture the "A" "AL" "ALF" "ALL" values as their own field, or include them as part of the "q" field, then you can | stats count by <field>
.
If you capture it as q, the command would look like | stats count by q
; if you kept the contents of q the same (right now, auto-KV extraction would extract it as "UP"), but created a new field (e.g. "otherfield") for the A- values, your command would be | stats count by q, otherfield
I clearly understand you. That is why I modified the original question to show an example.Thanks for your help.
In Splunk context, the word "distinct" is going to mean "values not exactly identical to", so from your main example CHAR and CHARL are distinct.
You seem to be asking "is this a substring of something else?" and that's a trickier question. I'll mull it over and see if I can come up with something.
I might have not laid out my question correctly. I just need the longest (length) distinct q by MAC. I modified the original question to illustrate the requirement.
Thanks for your help.
Now you're contradicting yourself; do you want
I fully tested your recommended query. I found that it fails identifying the longest distinct q. I updated the original question to show the problem.
If this helped, consider accepting it as your answer to the question....
Put the stats first, that should help overall search time.
It seems to work for a small data set but I am not sure for large data set. For large data sets it seems that is not doing its job. I have to fully test it. I will comment this question after I fully tested. For the time being it is a good progress.
Try
<your_query>
| stats count by q, MAC
| eval len=length(q)
| eventstats max(len) AS longest by MAC
| where len=longest
| table MAC q count
Search limitations? As in performance? No, I wouldn't think so. Eventstats can be a little heavy, but given what you're doing, it's necessary, so....
If you find that there are performance problems with the search, you could perform the stats count by q, MAC first, then do the eval, eventstats, where, etc, after.
Stats will reduce the overall number of rows by performing aggregation, and discarding detail (in this case, time stamps go away, but you get the count), so the next search commands have smaller data sets to work with, and are therefore a bit faster.
It is quite slow... Any way to improve the query?
I updated the question. It did the job. thanks I learned from your example. One more question. Should I expect any search limitations with this query?
If you're extracting q automatically by Splunk, you're using the automatic key=value extraction. Because of the whitespace in the q field, Splunk will see that as a separate word, and not as part of the value of "q". Therefore, yes you will need the rex command I've provided above.
I am already extracting q and MAC both are extracted automatically by Splunk. Do I need the rex command?
You asked for the longest q. The search string I've posted in the comment above (combined with the rex
statement to capture all of the "q" field) works and produces the output you've asked for in the original question.
If you're having trouble with it, post the results you get from the above search, so that we can figure out what's going wrong.
It does not find the distinct searches. It just grabs the longest q. any idea?
Out of the box, Splunk will extract q having these values: UP SON HEL HELLO. I've had to use a regex on this sample to extract the full (with spaces) q field.
rex "q\=(?[\w\s]+)(?:\s+MAC=)"
Since you wanted it broken out by MAC, I also had to add this to my eventstats command, and the final stats call.
| eval len=length(q) | eventstats max(len) AS longest by MAC | where len=longest | stats count by MAC, q | table MAC q count
I updated the question. thanks for your help.
In every instance of your content sample, MAC is "1". Can you update that with actual sample data, and a clearer description of what you're after?