Getting Data In

List difference between two csv files

kmattern
Builder

I have three CSV files. One is a list of all customers that have logged into my system in the past 24 hours. The second is the master list of all of my customers. I want to list the difference between the two CSV files or, if you prefer, I want to list the customers that have not logged in in the past 24 hours.

CSV file 1: AllCustomers.csv, a static list containing more fields than CSV file 2.

CSV file 2: InactiveCustomers.csv, a static list of all customers and reasons why they might be inactive. This file has two columns, cs_username, Reason

CSV file 3: SynchedCustomers.csv, a list of customers who have logge4din in the past 24 hours. This file has one column, cs_username.

First Search, this returns an accurate list of all active customers into SynchedCustomers.csv
:


sourcetype="iis" cs_uri_stem=*configs.xml
| lookup AllCustomers.csv cs_username
| dedup cs_username
| fields cs_username
| table cs_username
| outputlookup SynchedCustomers.csv

Using a second search all I want to do is to list the two fields in InactiveCustomers.csv if they are NOT found in SynchedCustomers.csv.

This search returns more than the inactive customers:

| inputlookup InactiveCustomers.csv 
| search NOT 
[search SynchedCustomers.csv | fields cs_username]

What am I doing wrong.

Tags (3)
1 Solution

tfletcher_splun
Splunk Employee
Splunk Employee

You want to use the set command it looks like:

| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username]

This returns the list of usernames. Then you want to lookup the values from your lookup to add the inactive reasons back in:

... | lookup InactiveCustomers.csv cs_username OUTPUT Reason | table cs_username Reason

For a full search of:

| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username] | lookup InactiveCustomers.csv cs_username OUTPUT Reason | table cs_username Reason

Then you can get fancy and start running stats like:

[above search] | stats count values(cs_username) by Reason

More on the set command:

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Set

View solution in original post

tfletcher_splun
Splunk Employee
Splunk Employee

You want to use the set command it looks like:

| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username]

This returns the list of usernames. Then you want to lookup the values from your lookup to add the inactive reasons back in:

... | lookup InactiveCustomers.csv cs_username OUTPUT Reason | table cs_username Reason

For a full search of:

| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username] | lookup InactiveCustomers.csv cs_username OUTPUT Reason | table cs_username Reason

Then you can get fancy and start running stats like:

[above search] | stats count values(cs_username) by Reason

More on the set command:

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Set

kmattern
Builder

Thanks, that worked. I had to run a lot of simple searches to verify it but it is the best solution yet.

0 Karma

tfletcher_splun
Splunk Employee
Splunk Employee

Sorry I forgot you need to filter out the ones in Synched that are not in Inactive, use this search:

| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username] | lookup InactiveCustomers.csv cs_username OUTPUT Reason | search Reason=* | table cs_username Reason
0 Karma

chris
Motivator

Have you tried the following for the 2nd search: | inputlookup InactiveCustomers.csv
| search NOT
[inputlookup SynchedCustomers.csv ]

0 Karma

Ayn
Legend

It returns MORE than the inactive customers? In that case by definition InactiveCustomers.csv does not in fact only contain inactive customers, as that's what the initial results are loaded from. Is that correct and expected for some reason?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...