Splunk Search

How to count the number of occurrences of a string in multiple fields?

pfhendr
Explorer

I have a team of Unix admins, each of which manages multiple applications. I created a CSV lookup file that contains the name of the application, the primary admin responsible for it and one or more backup admins in case the primary is unavailable. It is very useful to keep track of who is doing what, and who to assign a new app to when a new one comes in.

I'm able to create a simple bar chart showing the number of apps assigned to each primary admin, but I'm not having any luck in getting the number of admins that are a backup to each project.

Here is an example of my CSV file:

Application,Primary,Backup1,Backup2
App1,Tom,Dick,Harry
App2,Dick,Tom,
App3,Jim,Harry,Fred
App4,Fred,Tom,
App5,Tom,Fred,Harry

What I would like to see is a list of each admin and a count of apps each one is a primary for plus a count of apps they are a backup for, something like:

Admin    Primary  Backup
Tom       2         2
Dick      1         1
Harry     0         3
Fred      1         2

It's pretty easy to get a count of apps each one is a primary for:

| inputlookup AdminAppSupport.csv| stats count as "Primary Apps" by Primary
| sort -"Primary Apps"

But all my attempts to do a count of admins that are in Backup1 or Backup2 fail.

Any pointers would be greatly appreciated.

0 Karma
1 Solution

elliotproebstel
Champion

There has to be a cleaner way to do this, but this at least gets the job done:

| inputlookup AdminAppSupport.csv 
| appendpipe 
    [| stats count AS Primary_Apps BY Primary ] 
| appendpipe 
    [| stats count AS Backup1_Apps BY Backup1] 
| appendpipe 
    [| stats count AS Backup2_Apps BY Backup2] 
| where isnull(Application) 
| eval Admin=coalesce(Primary, Backup1, Backup2) 
| stats max(*_Apps) AS * BY Admin 
| eval Backup=Backup1+Backup2 
| fields - Backup1 Backup2 
| fillnull value=0

View solution in original post

0 Karma

elliotproebstel
Champion

There has to be a cleaner way to do this, but this at least gets the job done:

| inputlookup AdminAppSupport.csv 
| appendpipe 
    [| stats count AS Primary_Apps BY Primary ] 
| appendpipe 
    [| stats count AS Backup1_Apps BY Backup1] 
| appendpipe 
    [| stats count AS Backup2_Apps BY Backup2] 
| where isnull(Application) 
| eval Admin=coalesce(Primary, Backup1, Backup2) 
| stats max(*_Apps) AS * BY Admin 
| eval Backup=Backup1+Backup2 
| fields - Backup1 Backup2 
| fillnull value=0
0 Karma

pfhendr
Explorer

Thanks! That gives me exactly what I'm looking for. Now to digest how it's doing it. 🙂

0 Karma

elliotproebstel
Champion

The best way to peek under the hood would be to start by running this chunk:

| inputlookup AdminAppSupport.csv 
| appendpipe 
    [| stats count AS Primary_Apps BY Primary ] 
| appendpipe 
    [| stats count AS Backup1_Apps BY Backup1] 
| appendpipe 
    [| stats count AS Backup2_Apps BY Backup2] 

The three appendpipe commands are generating the stats per category. The table is all a mess, but the data is there in the rows that don't contain a value for Application. That leads us into the next line of code, where we remove all the lines with values for Application:

| where isnull(Application) 

Now we need a list of all unique Admin names. Given that each row is guaranteed to only contain one Admin's name (regardless of whether they were Primary, Backup1, or Backup2), we use coalesce to grab whichever value is not null:

| eval Admin=coalesce(Primary, Backup1, Backup2)

Next, we need to make this table more readable by pushing it together so that each Admin has a single row, containing the max value from each category (P, B1, B2). We could just as easily have chosen the min value, since there will only be one value - but stats requires some kind of function, and I chose this one.

| stats max(*_Apps) AS * BY Admin 

But you wanted to combine the Backup1 and Backup2 counts, so we need to add them together into a single Backup field and remove the B1 and B2 fields:

| eval Backup=Backup1+Backup2 
| fields - Backup1 Backup2 

Finally, we add in a fillnull so that the table is nicely populated and not full of blank spots.

| fillnull value=0

I hope this helps your digestion. 🙂

0 Karma

pfhendr
Explorer

When I used this on my actual csv file I found that some of the counts for the Backup field wasn't what I expected. I eventually moved the fillnull value=0 line to just above the stats max() line, now it's working as expected. I'm guessing the stats command couldn't handle the null values.

0 Karma

elliotproebstel
Champion

Ahh, good catch!

0 Karma

pfhendr
Explorer

Yep, thanks, that makes it much clearer!

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...