Splunk Search

Is there any way to compare multivalue fields to single value fields?

jamesmoriarty
Explorer

Hello Splunk Community,

I've tried to do my homework on the subject and I'm coming up short, so here I am. I'm a few months new to Splunk and I have a question regarding multivalue fields. The problem I'm working with is calculating the number of federal holidays between two dates by employee while accounting for hire-date. So for example, if I was looking at two employees, one starting in Jan, the other in Feb, if I look at holidays between Jan and Feb by employee, the individual hired in Feb shouldn't have New Years or MLK counted against them.

My current strategy is to reference a lookup table containing several years worth of federal holidays. It's a bit hack-y, as it adds two multivalue fields to each event - the holiday name and date. I've used the 'addinfo' command to get a min/max time from the time selector, and a striptime() command to evaluate the epoch time of each holiday's date, but when I use the mvfilter command to compare the epoch holiday time and the info_min_time/info_max_time I get an error saying the arguments to mvfilter are invalid. I did some digging and found out that mvfilter(X) only works when X is an expression referencing one field, not more than one.

So for instance, this line gives me an error:
| eval in_range=mvfilter(epoch_holiday>=info_min_time AND epoch_holiday <= info_max_time)

While this line does not:
| eval keep=mvfilter(epoch_holiday>=1483228800 AND epoch_holiday <= 1488326400)

So my big question - is there a way to compare a multivalue field to one or more single value field(s)? I've tried using mvexpand/mvcombine but it messes with the events in a weird way. I'm wondering if I'm asking Splunk to do something it's not quite designed to do, but any help would be greatly appreciated. Thanks!

-J

0 Karma
1 Solution

DalJeanis
Legend

Yeah, sometimes you need to learn to think sideways. Can you please step back and describe the REAL report that led you to ask the question? We have about four potential solutions, but sometimes the right one is to change the overall structure a bit. "Calculating the number of federal holidays between two dates" is a subtask... but the entire task might be easier if it was interpreted a different way.


Let me give you this method as a potential start. It might not be the optimal way overall, but it is workable and doesn't require anything unusual.

your query that gets all your employee stuff, including StartDateEpoch and EndDateEpoch and a unique identifier EmpID
| fields EmpID StartDateEpoch EndDateEpoch ... and lots of other useful fields

| appendpipe 
    [| stats count by EmpID StartDateEpoch EndDateEpoch 
     | inputlookup append=t myfedholidays.csv 
     | eventstats values(myDateEpoch)  as HolidayDateEpoch 
     | rename COMMENT as "note that more than 

     | where isnotnull(EmpID)
     | mvexpand HolidayDateEpoch
     | where HolidayDateEpoch>= StartDateEpoch AND HolidayDateEpoch<= EndDateEpoch 
     | stats count as HolidayCount by EmpID
     | eval killme="yes"
     ]
| eventstats values(HolidayCount) as HolidayCount by EmpID 
| where isnotnull(killme)

Another method would be to set up a time-based lookup table, but that might lead to having to reverse the control structure. In essence, you would be asking the question, "what employees were here for each of these holidays?"

View solution in original post

DalJeanis
Legend

Given your specifics, here's how I would proceed:

Calculate, ONCE, the number of work days from any date to any other date within a year. Put it in a lookup table and look up the hire date (and optionally, the end-of-period date if before year-end). Boom, you have your "work days" figure. Or you could do "work hours" if you have any half-day holidays.

You could also do it with "number of holidays", but if you are really trying to calculate "number of work days", then why bother getting the inverse first?


Do you see why I ask these questions?

0 Karma

jamesmoriarty
Explorer

Absolutely, those are fantastic questions to ask and your answers were very helpful. Thank you so much!

DalJeanis
Legend

Yeah, sometimes you need to learn to think sideways. Can you please step back and describe the REAL report that led you to ask the question? We have about four potential solutions, but sometimes the right one is to change the overall structure a bit. "Calculating the number of federal holidays between two dates" is a subtask... but the entire task might be easier if it was interpreted a different way.


Let me give you this method as a potential start. It might not be the optimal way overall, but it is workable and doesn't require anything unusual.

your query that gets all your employee stuff, including StartDateEpoch and EndDateEpoch and a unique identifier EmpID
| fields EmpID StartDateEpoch EndDateEpoch ... and lots of other useful fields

| appendpipe 
    [| stats count by EmpID StartDateEpoch EndDateEpoch 
     | inputlookup append=t myfedholidays.csv 
     | eventstats values(myDateEpoch)  as HolidayDateEpoch 
     | rename COMMENT as "note that more than 

     | where isnotnull(EmpID)
     | mvexpand HolidayDateEpoch
     | where HolidayDateEpoch>= StartDateEpoch AND HolidayDateEpoch<= EndDateEpoch 
     | stats count as HolidayCount by EmpID
     | eval killme="yes"
     ]
| eventstats values(HolidayCount) as HolidayCount by EmpID 
| where isnotnull(killme)

Another method would be to set up a time-based lookup table, but that might lead to having to reverse the control structure. In essence, you would be asking the question, "what employees were here for each of these holidays?"

jamesmoriarty
Explorer

For sure! The overall problem is finding employee utilization. This essentially boils down to finding the number of days possible to work in a given time period, and out of all those dates, how many hours were people working. Holidays comes into play when we have employees who were hired at different times, which is why I need to find the number of holidays by employee.

I could also be approaching all of this way wrong, that's just how my brain broke down the problem: find the time range by employee, count the number of holidays and business days in that time range, and subtract holidays from business days to get workable days.

To do all of that, however, I convert dates (in different formats - yay inconsistencies) from human readable to epoch and go from there. Again, could be way wrong, but since things are broken up by employee I update info_min_time with the hire_epoch value if hire_epoch is larger (hired after the earliest time in time range).

I appreciate your help!

0 Karma

DalJeanis
Legend

@jamesmoriarty - Did you say "update info_min_time"? That is a search-level field, and it should only rarely be used for anything other than comparison. Updating it, rather than creating a new field... just not a good idea.

Okay, I'm adding a second answer based on your specifics.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...