Splunk Search

How to count the number of occurrences of a string in multiple fields?

pfhendr
Explorer

I have a team of Unix admins, each of which manages multiple applications. I created a CSV lookup file that contains the name of the application, the primary admin responsible for it and one or more backup admins in case the primary is unavailable. It is very useful to keep track of who is doing what, and who to assign a new app to when a new one comes in.

I'm able to create a simple bar chart showing the number of apps assigned to each primary admin, but I'm not having any luck in getting the number of admins that are a backup to each project.

Here is an example of my CSV file:

Application,Primary,Backup1,Backup2
App1,Tom,Dick,Harry
App2,Dick,Tom,
App3,Jim,Harry,Fred
App4,Fred,Tom,
App5,Tom,Fred,Harry

What I would like to see is a list of each admin and a count of apps each one is a primary for plus a count of apps they are a backup for, something like:

Admin    Primary  Backup
Tom       2         2
Dick      1         1
Harry     0         3
Fred      1         2

It's pretty easy to get a count of apps each one is a primary for:

| inputlookup AdminAppSupport.csv| stats count as "Primary Apps" by Primary
| sort -"Primary Apps"

But all my attempts to do a count of admins that are in Backup1 or Backup2 fail.

Any pointers would be greatly appreciated.

0 Karma
1 Solution

elliotproebstel
Champion

There has to be a cleaner way to do this, but this at least gets the job done:

| inputlookup AdminAppSupport.csv 
| appendpipe 
    [| stats count AS Primary_Apps BY Primary ] 
| appendpipe 
    [| stats count AS Backup1_Apps BY Backup1] 
| appendpipe 
    [| stats count AS Backup2_Apps BY Backup2] 
| where isnull(Application) 
| eval Admin=coalesce(Primary, Backup1, Backup2) 
| stats max(*_Apps) AS * BY Admin 
| eval Backup=Backup1+Backup2 
| fields - Backup1 Backup2 
| fillnull value=0

View solution in original post

0 Karma

elliotproebstel
Champion

There has to be a cleaner way to do this, but this at least gets the job done:

| inputlookup AdminAppSupport.csv 
| appendpipe 
    [| stats count AS Primary_Apps BY Primary ] 
| appendpipe 
    [| stats count AS Backup1_Apps BY Backup1] 
| appendpipe 
    [| stats count AS Backup2_Apps BY Backup2] 
| where isnull(Application) 
| eval Admin=coalesce(Primary, Backup1, Backup2) 
| stats max(*_Apps) AS * BY Admin 
| eval Backup=Backup1+Backup2 
| fields - Backup1 Backup2 
| fillnull value=0
0 Karma

pfhendr
Explorer

Thanks! That gives me exactly what I'm looking for. Now to digest how it's doing it. 🙂

0 Karma

elliotproebstel
Champion

The best way to peek under the hood would be to start by running this chunk:

| inputlookup AdminAppSupport.csv 
| appendpipe 
    [| stats count AS Primary_Apps BY Primary ] 
| appendpipe 
    [| stats count AS Backup1_Apps BY Backup1] 
| appendpipe 
    [| stats count AS Backup2_Apps BY Backup2] 

The three appendpipe commands are generating the stats per category. The table is all a mess, but the data is there in the rows that don't contain a value for Application. That leads us into the next line of code, where we remove all the lines with values for Application:

| where isnull(Application) 

Now we need a list of all unique Admin names. Given that each row is guaranteed to only contain one Admin's name (regardless of whether they were Primary, Backup1, or Backup2), we use coalesce to grab whichever value is not null:

| eval Admin=coalesce(Primary, Backup1, Backup2)

Next, we need to make this table more readable by pushing it together so that each Admin has a single row, containing the max value from each category (P, B1, B2). We could just as easily have chosen the min value, since there will only be one value - but stats requires some kind of function, and I chose this one.

| stats max(*_Apps) AS * BY Admin 

But you wanted to combine the Backup1 and Backup2 counts, so we need to add them together into a single Backup field and remove the B1 and B2 fields:

| eval Backup=Backup1+Backup2 
| fields - Backup1 Backup2 

Finally, we add in a fillnull so that the table is nicely populated and not full of blank spots.

| fillnull value=0

I hope this helps your digestion. 🙂

0 Karma

pfhendr
Explorer

When I used this on my actual csv file I found that some of the counts for the Backup field wasn't what I expected. I eventually moved the fillnull value=0 line to just above the stats max() line, now it's working as expected. I'm guessing the stats command couldn't handle the null values.

0 Karma

elliotproebstel
Champion

Ahh, good catch!

0 Karma

pfhendr
Explorer

Yep, thanks, that makes it much clearer!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...