Splunk Search

Search query using spath and mvexpand on multi-value nested JSON doesn't scale

sharad06
Explorer

Hi Splunk Experts,

I am sending events to Splunk Enterprise in the following nested JSON format:

{
     compliance:     Compliance Unknown,    
     ctupdate:   hostinfo,  
     host_properties:   [
            {
             name:   _times,    
              since:     1508907165,    
             value:  last_onsite    
            },  
            {
              name:  compliance_state,  
               since:    1508268020,    
               value:    N/A    
            },  
            {
              name:  engine_seen_packet,    
               since:    1508907165,    
               value:    Yes    
            },  
            {
               name:     guest_corporate_state, 
               since:    1508268020,    
               value:    N/A    
            },  
            {
             name:   linux_operating_system,    
             since:  1508907165,    
             value:  2.0.2  
            },  
            {
             name:   online,    
             since:  1508907165,    
             value:  online_multivalue1
            },  
                {
             name:   online,    
             since:  1508907165,    
             value:  online_multivalue2
            },
                {
             name:   online,    
             since:  1508907165,    
             value:  online_multivalue3
            },
            {
             name:   ssh_open_port, 
             since:  1508959259,    
             value:  0  
            },
            {
             name:   va_netfunc,    
             since:  1508959247,    
             value:  Linux Desktop/Server   
            }   
    ]   
     ip:     192.168.1.17,  
     tenant_id:  acd1034578ef
}

I want to create a dashboard that would go over all such events and display a pie-chart with online values seen so far. To achieve this, I've been using the following query:

`ct_hostinfo` `get_sourcetypes`
| spath output=prop_name path=host_properties{}.name
| spath output=prop_val path=host_properties{}.value
| eval prop_key_val=mvzip(prop_name, prop_val, "---")
| mvexpand prop_key_val
| eval prop_key_val=split(prop_key_val, "---")
| eval prop_name=mvindex(prop_key_val, 0)
| eval prop_val=mvindex(prop_key_val, 1)
| search prop_name=online
| stats count by prop_val

The above query does its work but it doesn't scale. If I let my dashboard collect data for over ~5k events, I start seeing the following error:

command.mvexpand: output will be truncated at 1300 results due to excessive memory usage. Memory threshold of 500MB as configured in limits.conf / [mvexpand] / max_mem_usage_mb has been reached.

I would like to know if I can somehow scale the above search query so that it can handle greater number of events. Would I need to defined field extractions etc. in transforms.conf/props.conf? I don't know much about field extraction configs.

I would be very grateful, if someone can suggest me a better query or field extractions.

Thanks.

0 Karma

DalJeanis
Legend

You are carrying along a ton of fields when you mvexpand your records. That calculates to over 300K per record. You only need prop_key_val at that point, right?

Two fields commands, immediately before the mvexpand, should put you up to millions of records possible.

 | eval prop_key_val=mvzip(prop_name, prop_val, "---")
 | fields prop_key_val
 | fields - _*
 | mvexpand prop_key_val

Looking deeper, you only want ONE of the values for prop_name=online, so you don't need the mvexpand at all. You need mvfilter.

Try this against your existing search for a small time number of records. It should get the same results. If not, then adjust the mvfilter test until it does. (Adjust case sensitivity, like() vs match(), and so on)

`ct_hostinfo` `get_sourcetypes`
 | spath output=prop_name path=host_properties{}.name
 | spath output=prop_val path=host_properties{}.value
 | eval prop_key_val=mvzip(prop_name, prop_val, "---")
 | eval prop_key_val=mvfilter( like(prop_key_val,"online%") 
 | mvexpand prop_key_val
 | eval prop_val=mvindex(split(prop_key_val, "---"), 1)
 | stats count by prop_val

sharad06
Explorer

Hi Dal,

I tried your suggestion to remove internal fields (fields - _*). But it doesn't work 😞

It's always returning 'No results found'.

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...