Splunk Search

Combining similar log entries and counting as one

robert2138
Engager

I have multiple users making a request to a web server each time they type a character into a search box. User 1 is typing 'please' and user 22 is typing 'cat'. Simplified log entries:

ip=10.10.0.1&q=p
ip=10.10.1.22&q=c
ip=10.10.0.1&q=pl
ip=10.10.0.1&q=ple
ip=10.10.1.22&q=ca
ip=10.10.0.1&q=plea
ip=10.10.1.22&q=cat
ip=10.10.0.1&q=pleas
ip=10.10.0.1&q=please

I would like to:

  • Count the number of requests that are all part of the same 'typing' action. So for Please it would be 6 and for Cat it be 3. It can be assumed that the user types a different word each time.

  • Count the number of distinct typing actions - so 1 for both users.

Any ideas?

Thanks
Robert

Tags (1)
0 Karma

Gilberto_Castil
Splunk Employee
Splunk Employee

This is an interesting question. Assuming that the data is broken down in such a way that each occurrence is one event, then consider the following:

You can group the queries by ip very easily and that would give you a deterministic list of queries sent by each user. The problem is that listing the matches in this manner does not separate the potential of a match. For instance, let's add “car” to your data. Then the search above is not clearly reflective of the probabilistic match. That is: The queries match the source IP but the queries themselves are not necessarily related.

sourcetype="answers-1372961586" | stats list(q) AS q by ip

alt text

Clearly, “cat” and “car” do not belong together in this type of analysis.



Perhaps you might want to calculate the likeness of a match based on the inclusive characters of each query. In this manner you have an empirical match. Think of this like working with two identical arrays in a two-dimentional plane. You are checking each value against a corresponding cell in that matrix. For instance

  sourcetype="answers-1372961586" 
| stats list(q) AS q 
| eval match=q 
| mvexpand match 
| eval match=match."%" 
| mvexpand q 
| where like(q,match) 
| eval match=rtrim(match,"%") 
| stats list(match) AS match count by q 
| sort - count

alt text




In addition, you can also observe a probabilistic measurement of a query as the user types. For instance, if you were to isolate the query for “please”, you’ll notice the probable outcome based on potential match choices.

sourcetype="answers-1372961586" 
| stats list(q) AS q 
| eval match=q 
| mvexpand match 
| eval match=match."%" 
| mvexpand q 
| where like(q,match) 
| eval match=rtrim(match,"%") 
| eval q_length=len(q) 
| eval match_length=len(match) 
| eval likeness=(round(match_length/q_length,2) * 100)." %"
| stats list(match) AS match list(likeness) AS likeness by q 
| search q="please"

Thus, when the end user has typed in at least four characters p,l,e,a, there is an approximate potential of 67% that the query will match the word “please”.

alt text

Of course, this is based this minimal data set and with a cursory evaluation of character match… I am sure there are more intelligent ways to predict this in real life.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

you can try starting with ... | stats values(q) as vq by ip and looking at | eval c=mvcount(vq). Though if you're needing to deal with multiple lines, you'll probably need the transaction command.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...