Splunk Search

Comparing data from two files and showing the results.

malag_doval
Engager

I have two files with a simple list of filenames in each. What I'd like to do is to compare one file to the other and remove any thing in the first file if it appears in the second; for example-
File 1
foo.exe
bar.exe
car.exe
dar.exe

File 2
car.exe
dar.exe
smar.exe

My desired output would be:
foo.exe
bar.exe

as I am trying to use File 2 as a sort of whitelist and filter File 1 through it. Is this even possible? Any help would be greatly appreciated, thanks!

Tags (2)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

I'm assuming each line in a file is one event in Splunk.

source="file1*" OR source="file2*" | eventstats count by _raw | where count = 1 AND match(source,"file1")

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

I'm assuming each line in a file is one event in Splunk.

source="file1*" OR source="file2*" | eventstats count by _raw | where count = 1 AND match(source,"file1")

malag_doval
Engager

OK it turns out you were spot on with that query, I recreated the data sets and imported them again and it worked first shot. It must have had something to do with the method of importing data; All I know is that I'm happy I can continue on now so thank you very much Martin 😄

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

So you get results after the eventstats but nothing after the where?

There's only two things that can go wrong, and they're unrelated to the version. Either the count filters out everything - then the pre-where results have no events with count=1, or the match filters out everything - then the matching against the source file name doesn't work.

0 Karma

malag_doval
Engager

Ahh yes, I did have the double \'s in but thismorning when I was setting it up again I neglected to put those in sorry.

I also tried to move the WHICH statement as apart of the search statement, since splunk was very insistant about combining the two together but it didnt make any difference.

One other thing I just realised - I am running an old version; I will update immediately and try it again, this might be an issue as well. :<

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Note, the second argument to match() is treated as a regular expression. Writing "\s" causes the matcher to look for a space, while "\d" will look for a digit. You will have to escape the backslashes like so: "\\"

The lazy would just write match(source, "difffile1") 🙂

0 Karma

malag_doval
Engager

The query does indeed produce something when I strip off the WHERE command. My query string is thus:

source="c:\splunk data\difffile1.csv*" OR source="c:\splunk data\difffile2.csv*" | eventstats count by _raw | where count = 1 AND match (source, "c:\splunk data\difffile1.csv"

Thanks again for working this through with me, I very much appreciate it!

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Does the query without the where command produce outputs?

What's your modified query, and your source field contents?

0 Karma

malag_doval
Engager

Thanks for your help, I've replaced "fileX" for my file names and run the query but it doesn't produce any outputs. PS, each line in the source files is it's own event, you're correct.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...