Splunk Search

Identify specific text differences between two fields

JWBailey
Communicator

I am trying to compare a large text field in two different events for some very slight differences and identify the specific text that is different, not just that the fields dont match. I need the specific differences to use in reporting and other various functions.

Specifically these are Value fields containing Windows Security Descriptor String Format information when permissions are modified in active directory. When a change is made, two events are generated, one with the complete old list of permissions and a second event with the complete new list of permissions. The delta between these fields are the changes, while the items that show up in both fields are the things that were not modified. I want a report that tells me (at the minimum) what items were added / removed / modified.

For example:

Event 1:

4/10/2014 1:00:00 PM

Item = Door1

Type = Old

Value = (12345,BOB)(12345,ADAM)(12345,KATIE)(12345,MIKE)(12345,SARA)(12345,STEVE)

Event 2:

4/10/2014 1:00:00 PM

Item = Door1

Type = New

Value = (12345,BOB)(12345,ADAM)(12345,KATIE)(12345,MIKE)(00000,STEVE)(12345,SUE)

Three things happened between event 1 and 2: SARA was removed, STEVE was modified, and SUE was added.

My end goal is to generate a report about Door1, that identifies the key changes. Including using ldapsearch to pull information from active directory that is not included in the event. (the true data contains SID information that is a unique identifier in Active Directory)

For example, I am looking to generate a report to this effect (the formatting is not specific, just to show the information I am looking for);

Door1:

4/10/2014 1:00:00 PM Employee 100 – ADDED

4/10/2014 1:00:00 PM Employee 101 – REMOVED

4/10/2014 1:00:00 PM Employee 102 – MODIFIED

I have tried doing this, but run into issues with putting all the pieces together. I can break the Values field up using makemv, and then use the rare command to identify the parts that have changed, but I lose the ability to reference other fields. How can all of these pieces be put together?

I looked at the diff command, but it seems that will just identify that the fields are different, not what the differences are.

Any help is appreciated.

Thanks.

Tags (2)
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

You can help the diff command along a bit by splitting the values into their own lines like this:

index=* | head 1 | eval value = "(12345,BOB)(12345,ADAM)(12345,KATIE)(12345,MIKE)(12345,SARA)(12345,STEVE) (12345,BOB)(12345,ADAM)(12345,KATIE)(12345,MIKE)(00000,STEVE)(12345,SUE)" | makemv value | mvexpand value | eval value = replace(value, "\)\(", ")
(") | diff attribute=value

Note the newline character in the replace(). Running that gives you this result in typical diff fashion:

@@ -2,5 +2,5 @@
(12345,ADAM)
(12345,KATIE)
(12345,MIKE)
-(12345,SARA)
-(12345,STEVE)
+(00000,STEVE)
+(12345,SUE)

However, that will not do a semantic comparison but rather a textual one. If you move the fields around without actually changing anything it'll report many changes.

0 Karma

JWBailey
Communicator
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I'm not quite sure what you're talking about without an example... one thing to know about diff though, it will hide large sections of no change. See the example in my answer, BOB is hidden because it is only showing three lines of context around a change.

0 Karma

JWBailey
Communicator

I don’t believe this solution is giving me accurate results. The true data / fields I am using are much larger than my examples, and can have many more differences between them. I don’t believe all of the results are showing this way.

I can test this (for the 2 results case) like this: [ My Search Here | makemv delim=”)(” Value | rare limit=0 Value | where count=1 ]

This gives me a table of just the differences between the fields. These are the results I expect to see using the diff solution, but they are not exactly the same.

Are there other factors to diff I am not accounting for?

0 Karma

JWBailey
Communicator

I didn’t think this was going to get me where I needed, but I have been working with it, and made significant progress. Ran into one major problem... uh-oh!

Diff seems to only look at 2 events… and throw away all the rest. What happens if I have more than two results? Is there a way to use diff like this… without losing all the other data. Ideally the diff would iterative and continue comparing pairs of events until it doesn’t have anymore.

I asked this same question with more details here:

http://answers.splunk.com/answers/132416/multiple-diff-commands-in-same-search

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...