Solved: List difference between two csv files

kmattern · ‎08-21-2012

I have three CSV files. One is a list of all customers that have logged into my system in the past 24 hours. The second is the master list of all of my customers. I want to list the difference between the two CSV files or, if you prefer, I want to list the customers that have not logged in in the past 24 hours.

CSV file 1: AllCustomers.csv, a static list containing more fields than CSV file 2.

CSV file 2: InactiveCustomers.csv, a static list of all customers and reasons why they might be inactive. This file has two columns, cs_username, Reason

CSV file 3: SynchedCustomers.csv, a list of customers who have logge4din in the past 24 hours. This file has one column, cs_username.

First Search, this returns an accurate list of all active customers into SynchedCustomers.csv
:



sourcetype="iis" cs_uri_stem=*configs.xml 

| lookup AllCustomers.csv cs_username 

| dedup cs_username 

| fields cs_username 

| table cs_username

| outputlookup SynchedCustomers.csv

Using a second search all I want to do is to list the two fields in InactiveCustomers.csv if they are NOT found in SynchedCustomers.csv.

This search returns more than the inactive customers:

| inputlookup InactiveCustomers.csv 
| search NOT 
[search SynchedCustomers.csv | fields cs_username]

What am I doing wrong.

tfletcher_splun · ‎08-22-2012

You want to use the set command it looks like:

| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username]

This returns the list of usernames. Then you want to lookup the values from your lookup to add the inactive reasons back in:

... | lookup InactiveCustomers.csv cs_username OUTPUT Reason | table cs_username Reason

For a full search of:

| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username] | lookup InactiveCustomers.csv cs_username OUTPUT Reason | table cs_username Reason

Then you can get fancy and start running stats like:

[above search] | stats count values(cs_username) by Reason

More on the set command:

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Set

View solution in original post

tfletcher_splun · ‎08-22-2012

You want to use the set command it looks like:

| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username]

This returns the list of usernames. Then you want to lookup the values from your lookup to add the inactive reasons back in:

... | lookup InactiveCustomers.csv cs_username OUTPUT Reason | table cs_username Reason

For a full search of:

| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username] | lookup InactiveCustomers.csv cs_username OUTPUT Reason | table cs_username Reason

Then you can get fancy and start running stats like:

[above search] | stats count values(cs_username) by Reason

More on the set command:

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Set

kmattern · ‎08-23-2012

Thanks, that worked. I had to run a lot of simple searches to verify it but it is the best solution yet.

tfletcher_splun · ‎08-22-2012

Sorry I forgot you need to filter out the ones in Synched that are not in Inactive, use this search:

| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username] | lookup InactiveCustomers.csv cs_username OUTPUT Reason | search Reason=* | table cs_username Reason

chris · ‎08-21-2012

Have you tried the following for the 2nd search: | inputlookup InactiveCustomers.csv
| search NOT
[inputlookup SynchedCustomers.csv ]

Ayn · ‎08-21-2012

It returns MORE than the inactive customers? In that case by definition InactiveCustomers.csv does not in fact only contain inactive customers, as that's what the initial results are loaded from. Is that correct and expected for some reason?

List difference between two csv files

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life