Splunk Search

Working with dynamic values in rex and multivalue fields

moystard
New Member

Hello all,

I have been banging my head on a problem for the past 24 hours and I am in great need of your help.

I am processing data coming from surveys; I have a tabular data set that contains three multi value fields:
- one multi value field that contains the questions that were shown to the user (example: q1, q2, q3)
- one multi value field that contains the questions that were answered by the user (example: q2, q3)
- one multi value field that contains the duration it took for the respondent to answer those questions in the format: :

All this data is dynamic (including question references), and I cannot make any assumption on the fields content and names.

What I am interested in is reconciliating how long it took for a question that was answered to be answered. Ultimately, I want to check that the duration for every answered question is above a certain static threshold (example: 2000ms).

I am at the stage where, using rex, and mvfilter, I am able to generate the data below from raw events, and know if a question has been answered, and if it has a duration. Unfortunately, I have not been able to extract the duration and compare it with the threshold because rex and match in mvfilter do not support dynamic values. My idea was indeed to mvexpand the shown questions, and then extract for each shown question if it has been answered, and in how long. So far my search looks like this;


| rex field=_raw max_match=0 "question_answer_(?[a-zA-Z_]*)\""
| rex field=_raw max_match=0 "question_duration_(?[a-zA-Z_]*\":\d*),"
| replace "*\"*" with "**" in question_durations
| table shown_questions, answered_questions, question_durations, *
| mvexpand shown_questions
| eval is_answered=if(match(answered_questions, shown_questions), "true", "false")
| eval has_duration=if(match(question_durations, shown_questions), "true", "false")

Here I am a bit lost as it's fairly unclear to me how I can work with dynamic values to extract the question duration either from the multivalue field question_durations or from the individual fields question_duration<question-code>. I thought originally to use mvfilter(match(questions_durations, shown_questions)) to extract the line of the question but it does not work due to the dynamic shown_questions parameter. Putting a static value in there works, but unfortunately that's not an option for me.

Ultimately, my objective is to be able to check if the duration of all questions that have been answered is above a certain threshold.

Would you have an idea on how I could achieve this?

Example of data (before the mvexpand):

Example of Data

0 Karma

to4kawa
Ultra Champion
| makeresults
| eval _raw="is_answered,has_duration,shown_questions,answered_questions,questions_durations
false,false,q1#q2#q4,q1#q2#q4,q1:3478#q2:3046#q4:12691
false,false,,,q1:1890
false,false,q1#q2#q4,q1#q2#q4,q1:2812#q2:3907#q4:4620
false,true,q1,,q1:0
false,false,,,q1:1551
false,false,q1#q3#q4,q1,q1:3488#q3:3399#q4:1791
false,false,q1#q3#q4,q1#q4,q1:8731#q3:1618#q4:8802
false,false,q1#q3#q4,q1#q4,q1:26155#q3:20770#q4:14013"
| multikv forceheader=1
| table is_answered,has_duration,shown_questions,answered_questions,questions_durations
| foreach *
    [eval <<FIELD>>=split(<<FIELD>>,"#")]
`comment("Above is before mvexpand")`
`comment("From here, the logic")`
| streamstats count as session 
| eval counter=if(isnull(shown_questions),0,mvrange(0,mvcount(shown_questions))) 
| stats list(*) as * by session counter 
| rename session as _session ,counter as _counter 
| foreach * 
    [ eval <<FIELD>> = mvindex(<<FIELD>>,_counter)]
| table is_answered,has_duration,shown_questions,answered_questions,questions_durations

Hi, folks.
How about it?
Is mvexpand necessary?

0 Karma

DavidHourani
Super Champion

Hi @moystard,

"Ultimately, my objective is to be able to check if the duration of all questions that have been answered is above a certain threshold." Do you mean per user ? Or the sum total of the durations ?

This could be easier than what you're trying to achieve as a question with duration means an answered question, isn't that right ?

0 Karma

moystard
New Member

Unfortunately not, a question with duration can be skipped and therefore can be excluded.

I found a solution to my issue using a foreach loop. I start with the duration and check if the question has been answered instead of the other way around.

0 Karma

DavidHourani
Super Champion

Nice that way you have the duration from the start and all you have to do is confirm "is answered" or not.

You can also drop all the columns in the table above and keep : answered questions and question duration columns. From there you can MV expand both fields and check for a match between the answered questions and the first part of the duration. If there is then you keep the line. That should work pretty fast as well avoiding foreach loops.

0 Karma

jpolvino
Builder

Could you please post a couple actual events, sanitized, so we can see the format? It would be helpful to then illustrate what kind of analysis you want to do, using a couple of those events as data. This will help connect the dots.

0 Karma

moystard
New Member

Apologies, I should have added an example from the start. All done now.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...