Getting Data In

Doing stats on multivalued JSON fields (mxexpand is too slow)

claudio_manig
Communicator

Hi Ninjas

I'm dealing with some deeply nested JSON events like:

"sendTime":"2017-09-21T17:02:06.583+02:00","runningProcess":[{"Name":"_Total","PercentProcessorTime":"100","WorkingSetPrivate":"1557368"},{"Name":"Bananaservice","PercentProcessorTime":"0","WorkingSetPrivate":"593"},{"Name":"Cherryservice","PercentProcessorTime":"0","WorkingSetPrivate":"7671"},{"Name":"Pineappleservice","PercentProcessorTime":"0","WorkingSetPrivate":"466"},{"Name":"Kiwiservice","PercentProcessorTime":"0","WorkingSetPrivate":"442"},{"Name":"Appleservice","PercentProcessorTime":"0","WorkingSetPrivate":"630"},{"Name":"Peachservice","PercentProcessorTime":"0","WorkingSetPrivate":"1470"}

So all i want to do is getting out the avg values over time by each process, something like

| stats avg(runningProcess{}.PercentProcessorTime) as CPU by runningProcess{}.Name, _time
| stats list(*) as * by _time

But without mvexpand and so on, I'm not getting the right data as just takes the value of the first entry of the mv field by each event.
As said, I'm aware of doing it with mvexpand etc. but it slows down the search dramatically and i was wondering whether there is a more elegant solution to get the right data here.

Thanks

0 Karma
1 Solution

kamlesh_vaghela
SplunkTrust
SplunkTrust

Hi claudio.manig,

I have tried to remove mvexpand from your search. Can you please try below search??

| index=skynet sourcetype=t800
| rename runningProcess{}.Name as ProcessName, runningProcess{}.PercentProcessorTime as ProcessCPU, runningProcess{}.WorkingSetPrivate as ProcessMemoryKB
| eval tempField=mvzip(mvzip(ProcessName,ProcessCPU),ProcessMemoryKB) 
| stats count by _time,messageId,tempField 
| eval ProcessName=mvindex(split(tempField,","),0), ProcessCPU=mvindex(split(tempField,","),1), ProcessMemoryKB=mvindex(split(tempField,","),2) 
| where ProcessCPU > 20 and ProcessName!="_Total" and ProcessName!="Idle" 
| stats count avg(ProcessCPU) AS AvgProcessCPU avg(ProcessMemoryKB) AS AvgMem by ProcessName 
| sort -AvgProcessCPU -count

I'm not sure about "runningProcess{}.Name" field name I'm assuming from your search. So you can replace it with the original field name.

I hope It will help you.

Thanks

View solution in original post

kamlesh_vaghela
SplunkTrust
SplunkTrust

Hi claudio.manig,

I have tried to remove mvexpand from your search. Can you please try below search??

| index=skynet sourcetype=t800
| rename runningProcess{}.Name as ProcessName, runningProcess{}.PercentProcessorTime as ProcessCPU, runningProcess{}.WorkingSetPrivate as ProcessMemoryKB
| eval tempField=mvzip(mvzip(ProcessName,ProcessCPU),ProcessMemoryKB) 
| stats count by _time,messageId,tempField 
| eval ProcessName=mvindex(split(tempField,","),0), ProcessCPU=mvindex(split(tempField,","),1), ProcessMemoryKB=mvindex(split(tempField,","),2) 
| where ProcessCPU > 20 and ProcessName!="_Total" and ProcessName!="Idle" 
| stats count avg(ProcessCPU) AS AvgProcessCPU avg(ProcessMemoryKB) AS AvgMem by ProcessName 
| sort -AvgProcessCPU -count

I'm not sure about "runningProcess{}.Name" field name I'm assuming from your search. So you can replace it with the original field name.

I hope It will help you.

Thanks

claudio_manig
Communicator

Hey kamlesh - looking really good so far! But i was not sure why you used the "messageId" field to group your first stats as this field does not exist.

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

ooh.. that's just my temporary variable... I forgot to remove it.

Can you please remove it and execute search?

| index=skynet sourcetype=t800
| rename runningProcess{}.Name as ProcessName, runningProcess{}.PercentProcessorTime as ProcessCPU, runningProcess{}.WorkingSetPrivate as ProcessMemoryKB
| eval tempField=mvzip(mvzip(ProcessName,ProcessCPU),ProcessMemoryKB)
| stats count by _time,tempField
| eval ProcessName=mvindex(split(tempField,","),0), ProcessCPU=mvindex(split(tempField,","),1), ProcessMemoryKB=mvindex(split(tempField,","),2)
| where ProcessCPU > 20 and ProcessName!="_Total" and ProcessName!="Idle"
| stats count avg(ProcessCPU) AS AvgProcessCPU avg(ProcessMemoryKB) AS AvgMem by ProcessName
| sort -AvgProcessCPU -count

0 Karma

claudio_manig
Communicator

No worries I already thought that and tired it and it works! I have to test it to verify the effective performance impact. Thanks a lot!

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

great.. If it is working then be sure to click Accept to close this question.

0 Karma

DalJeanis
Legend

@claudio.manig - Please post the rest of the SPL for the searches that you've tried. It's possible that we can optimize the extraction itself, or reduce the impact of the mvexpand with some subtle magic.

0 Karma

claudio_manig
Communicator

Well mvexpand itself already slows it down by its logic - creating an event out of each field value. So in reality i have json events with around 200+ processes running, resulting in tons of events out of one.

But sure, have a go, the query looks something like that:

| index=skynet sourcetype=t800
| spath output=prcNames path=runningProcess{}
| mvexpand prcNames
| rex field=prcNames "\{\"Name\":\"(?<ProcessName>.*)\",\"PercentProcessorTime\":\"(?<ProcessCPU>.*)\",\"WorkingSetPrivate\":\"(?<ProcessMemoryKB>.*)\"\}" 
| where ProcessCPU > 20 and ProcessName!="_Total" and ProcessName!="Idle" 
| stats count avg(ProcessCPU) AS AvgProcessCPU avg(ProcessMemoryKB) AS AvgMem by ProcessName 
| sort -AvgProcessCPU -count

The spath is needed so we can group the subcategories of each processname like cpu and mem together.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...