Splunk Search

Trying to identify duplicate logs

thepocketwade
Path Finder

I've found some logs in our splunk environment that seem to be duplicates (they differ only by their srcip field--which means one is coming directly from a client, while the other comes from a syslog server). So far the only way I've found to determine if the entries are actually duplicates is to export the results into different files based on srcip, then remove the srcip field and diff the resulting files. I'd really like to find a way to pull this comparison off in splunk, but I've not been able to so far. Does anyone have any ideas about how to do this?

EDIT: Here's an example of what I'm dealing with (redacting some stuff, of course).

Aug 19 09:34:36 A.B.C.D srcip=A.B.C.D fac=authpriv pri=notice sudo:      USER : TTY=pts/8 ; PWD=/var/log ; USER=root ; COMMAND=/bin/grep ssh messages
Aug 19 09:34:36 A.B.C.D srcip=W.X.Y.Z fac=authpriv pri=notice sudo:      USER : TTY=pts/8 ; PWD=/var/log ; USER=root ; COMMAND=/bin/grep ssh messages

These are clearly the same event; but the log is coming to splunk from A.B.C.D (the client) and W.X.Y.Z (a syslog server).

I initially hypothesized that it was everything of facility authpriv being duplicated, but that doesn't seem to be the case --I haven't been able to verify it at least.

So, again, what I'm looking for is a way to find events like this. "diff" won't work because they differ slightly, but I need to find all of our duplicates so I can take steps to cut out the second instance of the log.

Tags (2)
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

I see. then this might do it:

... | rex "^(?<text1>.*?srcip=)(?<srcip>\S+)(?<text2>.*)" | eval text=text1.text2 | stats count(srcip) as c values(srcip) by text | where c>1

View solution in original post

Lowell
Super Champion

The transaction approach can work, but don't use maxpause=1s, use maxspan=1s instead. The difference being that maxpause is about time between events, and maxspan=1s means that the total duration of the transaction cannot exceed 1 second.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

I see. then this might do it:

... | rex "^(?<text1>.*?srcip=)(?<srcip>\S+)(?<text2>.*)" | eval text=text1.text2 | stats count(srcip) as c values(srcip) by text | where c>1

thepocketwade
Path Finder

I've determined that there exist duplicate lines and I'm trying to determine how many duplicates I have or any information about them that could lead to reducing the duplicates. Also I'm certain they are duplicates because the timestamps don't differ at all and they log the same activity on the same machine (for example, two logs of a user su'ing to root).

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

I suppose I also don't understand, do the individual events have timestamps that differ by a second? Also, I suppose I should note that log lines are inherently extremely similar, differing only by a field or two, so I ask, are there other fields in your data (some GUID or sessionid, e.g.) that indicate that they are the same? If so, it seems more productive to focus on the identifying field values than the differing ones.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

I don't understand your question. Are you trying to find duplicate lines (and it sounds to me like you've already determined that there are duplicate lines) or are you trying to group together sets of lines and then see if the entire set is the same as another set?

0 Karma

thepocketwade
Path Finder

I tried piping my search to transaction with a maxpause of 1s, since the duplicates seem to come in at the same time. But that led to enormous transaction that didn't really alleviate the situation.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...