Getting Data In

How to verify the data held in two apps is the same?

clack
Engager

Hi,

I have a situation where i need to verify that the data held in two apps is the same.
To perform this verification i am logging out the data for each account into two log files: appOneLog and appTwoLog.
These logs contain the field accountId and some other data relating to the account.

I'm trying to find all the accounIds that are present in appOne and not appTwo and vice versa.
I have made various attempts at this which, i think, should work according to the splunk documentation.

First of all, I tried to use the transaction command to pair up events from appOneLog and appTwoLog into transactions and match against any transaction that had less than 2 events i.e.:

sourcetype="app*" | sort 0 str(accountId) | transaction accountId maxspan=10m maxpause=10m maxevents=2 | search eventcount<2

but when checking the results of this query I found it to contain some events for accounts that were actually present in both sets of logs.

I then tried to take a different approach by labelling each result from the appOneLog source with a present in appOne flag, creating a table and then performing a join on a search over the appTwoLog source that labelled each event with a present in appTwo flag. After that I would match on any event that did not have both flags set i.e.:

sourcetype="appOneLog" | eval presentInAppOne="Y" | table accountId presentInAppOne | join accountId type=outer [search sourcetype="appTwoLog" | eval presentInAppTwo= "Y" | table accountId presentInAppTwo ] | where (NOT (presentInAppOne="Y" AND presentInAppTwo="Y"))

however, yet again, this returns some false positive results where the account actually is in both but is getting labelled as just being in appOne.

Can you think of any reason why these false positive results are being returned?
Or any alternative way of retrieving this information?

An important thing to note here is that the number of events i am searching over is very large i.e. it can get up to just over 2,000,000 events.
However, i have tried the above queries on a smaller subset of the results and still get the same problems.

Thanks in advance.

1 Solution

somesoni2
SplunkTrust
SplunkTrust

Try these

Get list of accountId which are NOT present in both the sourcetypes (present in either)

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1

Get list of accountId present in appOneLog but not in appTwoLog

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1 and mvindex(st,0)="appOneLog"

Get list of accountId present in appTwoLog but not in appOneLog

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1 and mvindex(st,0)="appTwoLog"

View solution in original post

somesoni2
SplunkTrust
SplunkTrust

Try these

Get list of accountId which are NOT present in both the sourcetypes (present in either)

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1

Get list of accountId present in appOneLog but not in appTwoLog

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1 and mvindex(st,0)="appOneLog"

Get list of accountId present in appTwoLog but not in appOneLog

sourcetype="appOneLog" OR sourcetype="appTwoLog" | stats values(sourcetype) as st by accountId | where mvcount(st)=1 and mvindex(st,0)="appTwoLog"

clack
Engager

Thanks, that is exactly what i was looking for and is also a lot more efficient then my previous attempts. I think they may have been hitting some memory constraints.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...