Hi
I have 2 CSV file in lookups directory :
The first CSV is called "host.csv" and has a field called "host" which corresponds to the hostname
The second CSV is called "DCM.csv" and has afield called "Computer Name" which corresponds to the hostname and another field called "Flag"
I need to count host by Flag for hosts which exists in the first CSV and in the second CSV
Could you help me please?
Just to clarify, I understood this as "There can be hosts which are in only one of the two lookups. I want to count all that appear in the first file by Flag". If you need something else, please clarify.
First you'll have to get the lookup file as your search results. This is done with the inputlookup
command. Since it is a generating command, you need to put a pipe before it:
| inputlookup host.csv
Run this and you'll see all hosts from your first file. Next, you need to lookup each line of this data in your second file using the correct fields:
| lookup DCM.csv "Computer Name" as host
With this command, you have the new field Flag
in your data for hosts that were in both lookups and which have a value for Flag
in the second (you'll still have results that are only in the first lookup and also those that are in the first and in the seconds but don't have a value for the field Flag
in the second, either of these will not have a value in Flag
however). Now we can count:
| stats count by host Flag
This will give you a count per host and Flag combination, but hosts without a value in Flag
will be thrown away. If you want them, you'll need to provide a value before stats
, for example with fillnull
:
| fillnull value="no Flag value set" Flag
Final search might look like this:
| inputlookup host.csv
| lookup DCM.csv "Computer Name" as host
| fillnull value="no Flag value set" Flag
| stats count by host Flag
Like this:
|inputcsv host.csv | eval which="host"
| appendpipe [|inputcsv DCM.csv | rename "Computer Name" AS host | eval which="DCM" ]
| stats values(*) AS * dc(which) AS whichCount BY host
| where whichCount=2
| stats count BY Flag
thanks woodcock your code is perfect and works fine but I have to accept jeffland answer which has answered first
have a nice day!!
Thanks. Generally, if you get more than one answer I would recommend choosing the answers which gives a better explanation or links to relevant documentation.
Why use | appendpipe
? Why | inputcsv
? Also, why use a subsearch with and not | inputlookup append=t
?
The user said that he has CSV files
, which I took very literally. I agree that inputlookup
would be better than inputcsv
. This kind of a subsearch is unlimited and I have used it for a decade since I invented it to bypass the subsearch-limit, long before append=t
existed. Maybe that works just as well but I have no reason to switch now.
I would argue that having used something for a decade is not a good reason to continue doing it if better options become available. However, I hadn't realized that inputcsv
also has an append=t
argument, so it hardly makes a difference in this case. Since the question doesn't say where the csv files are kept (in etc/apps/<appname>/lookups
or /var/run/splunk/csv
), either should work depending on lookup location.
Just to clarify, I understood this as "There can be hosts which are in only one of the two lookups. I want to count all that appear in the first file by Flag". If you need something else, please clarify.
First you'll have to get the lookup file as your search results. This is done with the inputlookup
command. Since it is a generating command, you need to put a pipe before it:
| inputlookup host.csv
Run this and you'll see all hosts from your first file. Next, you need to lookup each line of this data in your second file using the correct fields:
| lookup DCM.csv "Computer Name" as host
With this command, you have the new field Flag
in your data for hosts that were in both lookups and which have a value for Flag
in the second (you'll still have results that are only in the first lookup and also those that are in the first and in the seconds but don't have a value for the field Flag
in the second, either of these will not have a value in Flag
however). Now we can count:
| stats count by host Flag
This will give you a count per host and Flag combination, but hosts without a value in Flag
will be thrown away. If you want them, you'll need to provide a value before stats
, for example with fillnull
:
| fillnull value="no Flag value set" Flag
Final search might look like this:
| inputlookup host.csv
| lookup DCM.csv "Computer Name" as host
| fillnull value="no Flag value set" Flag
| stats count by host Flag
perfect jeffland thanks for your clear answer!
What have you tried?