Splunk Search

Can splunk compare two strings and return % likeness/similarity between the two?

Moogz
Splunk Employee
Splunk Employee

For example, if i have a username of bsmith843 in a field returned by one search, and bsmiths845 as a field from another search, is there any way to gauge the similarity between the two strings? I know i can use wildcards/regex to try and match the strings, but if i can't match everyone i would like to know how similar they are..

wrangler2x
Motivator

And from even further in the future...

There is an app in Splunkbase which supports Levenshtein distance, Damerau-Levenshtein_distance, Jaro distance, Jaro winkler, match rating comparison, and Hamming distance comparisons, plus a number of phonetic algorithms, including soundex. It is called JellyFisher. Here is a sample Levenshtein distance evaluation using this app:

... | jellyfisher levensthein_distance(sourcetype,source)

What would be returned here is an integer, according to this description of Levenshtein distance.

Each of the JellyFisher functions returns the result in a field named after the function (i.e., levensthein_distance, damerau_levenshtein_distance, soundex).

Here is a link to the JellyFisher app.

Here is a mocked-up use of it:

| makeresults
| eval foo="kitten", bar="smitten" 
| jellyfisher levenshtein_distance(foo, bar) 
| table foo bar levenshtein_distance 

alt text

Lowell
Super Champion

There is a python function that does something very close to this. It returns a number between 0 and 1 based on the similarity of two terms. You can find it in the difflib module.


Here is a really quick example of an app named "fieldcompare" which contains a single python search command. The app is made up of the following files:

$SPLUNK_HOME/etc/apps/fieldcompare/bin/fieldcompare.py

import splunk.Intersplunk
import difflib

(isgetinfo, sys.argv) = splunk.Intersplunk.isGetInfo(sys.argv)
args, kwargs = splunk.Intersplunk.getKeywordsAndOptions()

if isgetinfo:
    # streaming, generating, retevs, reqsop, preop
    splunk.Intersplunk.outputInfo(True, False, False, False, None)


(results, dummyresults, settings) = splunk.Intersplunk.getOrganizedResults()

field1_name = kwargs.get("field1", "field1")
field2_name = kwargs.get("field2", "field2")
output_field = kwargs.get("result", "ratio")


try:
    for result in results:
        try:
            f1 = result[field1_name]
            f2 = result[field2_name]
        except KeyError:
            # If either field is missing, simply ignore
            continue

        sm = difflib.SequenceMatcher(None, f1, f2)
        result[output_field] = sm.ratio()

    splunk.Intersplunk.outputResults(results)

except Exception, e:
    splunk.Intersplunk.generateErrorResults("Unhandled exception:  %s" % (e,))

$SPLUNK_HOME/etc/apps/fieldcompare/default/commands.conf:

[fieldcompare]
filename = fieldcompare.py
supports_getinfo = true

$SPLUNK_HOME/etc/apps/fieldcompare/metadata/default.meta:

[commands/fieldcompare]
access = read : [ * ], write : [ admin ]
export = system

[scripts/fieldcompare.py]
access = read : [ * ], write : [ admin ]
export = system


If the example show above, the search command and app are called "fieldcompare", but you can use any name you want.

Here is a usage example:

 ... | fieldcompare field1=first_field field2=compare_field results=output | eval percent=round(100*output,2) | sort - percent

Be sure to look over the Custom search commands docs page for additional details about how you go about setting this up within your splunk environment.

muralianup
Communicator

I used this script but its throwing "Error in 'script': Getinfo probe failed for external search command 'fieldcompare'" error. Any suggestions ?

0 Karma

Stephen_Sorkin
Splunk Employee
Splunk Employee

Yes, this can be done using a custom search script and one of the many Python modules that can compare strings. You can take a look at http://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-string-comparison which discusses using the Levenshtein distance as a measure. With more detail about your use case, I could suggest how to structure a search and custom command, but this should be enough to start with.

dwaddle
SplunkTrust
SplunkTrust

I bring to you a message from the future! Nimsh wrote a Levenshtein custom command at some point .. https://splunkbase.splunk.com/app/1898/

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...