I have three CSV files. One is a list of all customers that have logged into my system in the past 24 hours. The second is the master list of all of my customers. I want to list the difference between the two CSV files or, if you prefer, I want to list the customers that have not logged in in the past 24 hours.
CSV file 1: AllCustomers.csv, a static list containing more fields than CSV file 2.
CSV file 2: InactiveCustomers.csv, a static list of all customers and reasons why they might be inactive. This file has two columns, cs_username, Reason
CSV file 3: SynchedCustomers.csv, a list of customers who have logge4din in the past 24 hours. This file has one column, cs_username.
First Search, this returns an accurate list of all active customers into SynchedCustomers.csv
:
sourcetype="iis" cs_uri_stem=*configs.xml
| lookup AllCustomers.csv cs_username
| dedup cs_username
| fields cs_username
| table cs_username
| outputlookup SynchedCustomers.csv
Using a second search all I want to do is to list the two fields in InactiveCustomers.csv if they are NOT found in SynchedCustomers.csv.
This search returns more than the inactive customers:
| inputlookup InactiveCustomers.csv | search NOT [search SynchedCustomers.csv | fields cs_username]
What am I doing wrong.
You want to use the set command it looks like:
| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username]
This returns the list of usernames. Then you want to lookup the values from your lookup to add the inactive reasons back in:
... | lookup InactiveCustomers.csv cs_username OUTPUT Reason | table cs_username Reason
For a full search of:
| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username] | lookup InactiveCustomers.csv cs_username OUTPUT Reason | table cs_username Reason
Then you can get fancy and start running stats like:
[above search] | stats count values(cs_username) by Reason
More on the set command:
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Set
You want to use the set command it looks like:
| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username]
This returns the list of usernames. Then you want to lookup the values from your lookup to add the inactive reasons back in:
... | lookup InactiveCustomers.csv cs_username OUTPUT Reason | table cs_username Reason
For a full search of:
| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username] | lookup InactiveCustomers.csv cs_username OUTPUT Reason | table cs_username Reason
Then you can get fancy and start running stats like:
[above search] | stats count values(cs_username) by Reason
More on the set command:
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Set
Thanks, that worked. I had to run a lot of simple searches to verify it but it is the best solution yet.
Sorry I forgot you need to filter out the ones in Synched that are not in Inactive, use this search:
| set diff [inputlookup InactiveCustomers.csv | fields cs_username] [inputlookup SynchedCustomers.csv | fields cs_username] | lookup InactiveCustomers.csv cs_username OUTPUT Reason | search Reason=* | table cs_username Reason
Have you tried the following for the 2nd search: | inputlookup InactiveCustomers.csv
| search NOT
[inputlookup SynchedCustomers.csv ]
It returns MORE than the inactive customers? In that case by definition InactiveCustomers.csv does not in fact only contain inactive customers, as that's what the initial results are loaded from. Is that correct and expected for some reason?