Converting index query to data model query

north_star · ‎11-18-2023

I have tried to simplify the query for better understanding and removing some unnecessary things.

This query is to find out if the same malware has been found on more than 4 hosts (dest) in a given time span, something like a malware outbreak.

Below is the indexed based query that works fine. I am trying to convert this to a data model based query, but not getting the desired results. I am new to writing data model based queries. Thanks for all the help!

(`cim_Malware_indexes`) tag=malware tag=attack

| eval grouping_signature=if(isnotnull(file_name),signature . ":" . file_name,signature) => trying to create a new field called "grouping_signature" by concatenating signature and file_name fields

| stats count dc(dest) as infected_device_count BY grouping_signature => trying to calculate the distinct count of hosts the have the same malware found on them by "grouping_signature" field

| where infected_device_count > 4 => trying to find events where number of infected devices is greater than 4

| stats sum(count) AS "count" sum(infected_device_count) AS infected_device_count BY grouping_signature => trying to find the total number of infected hosts by "grouping_signature" field

north_star · ‎11-19-2023

Ok, I have tried to simplify the query for better understanding and removing some unnecessary things.

This query is to find out if the same malware has been found on more than 4 hosts (dest) in a given time span, something like a malware outbreak.

Below is the indexed based query that works fine. I am trying to convert this to a data model based query.

(`cim_Malware_indexes`) tag=malware tag=attack

| eval grouping_signature=if(isnotnull(file_name),signature . ":" . file_name,signature) => trying to create a new field called "grouping_signature" by concatenating signature and file_name fields

| stats count dc(dest) as infected_device_count BY grouping_signature => trying to calculate the distinct count of hosts the have the same malware found on them by "grouping_signature" field

| where infected_device_count > 4 => trying to find events where number of infected devices is greater than 4

| stats sum(count) AS "count" sum(infected_device_count) AS infected_device_count BY grouping_signature => trying to find the total number of infected hosts by "grouping_signature" field

PickleRick · ‎11-19-2023

OK. I never remember the proper syntax from tstatsing from datamodel so you might need to correct this a bit but you'd probably want something like this

| tstats values(Malware.dest) as hosts from datamodel=Malware.something by Malware.signature

This will give you list of hosts by each signature from a given period. Now you might want to put it through

| where mvcount(hosts)>4

or something like that.

You can't do complicated aggregations with tstats - that's why you should normalize your data. That's what the whole datamodel is for.

north_star · ‎11-19-2023

Thank you.

That works for fields (like signature in this example) which are directly available from the data model.

But if we want to create new fields within the search (like grouping_signature in this example) to perform some calculations using eval or string concatenations and use them to do a group by, how could we accomplish that in the tstats query?

In this example, I want to use eval to concatenate signature and file_name fields into a new field called grouping_signature and then use the new field for the group by. If the file_name is not present, then only use signature for the group by(that's why eval to perform that check).

PickleRick · ‎11-19-2023

No. You can't do that using tstats.

You can do the search using

| datamodel Malware search

or

| from datamodel=Malware

and then do normal stats but this way you won't be able to leverage the acceleration of summaries.

You could try to append two separate tstats (one with filenames and one without) using tstats in prestats=t and append=t but that's some very confusing functionality.

north_star · ‎11-19-2023

Ok, thank you.

PickleRick · ‎11-19-2023

1. Why do the | eval dest=lower(dest) ? CIM is for normalizing your data. Do it properly - unify the case of your names.

2. if(isnotnull ... can be expressed more clearly with coalesce().

3. You're searching from CIM indexes but you're manually doing things like | rename computerDnsName as dest. You should have done that as calculated field to make your data CIM-compliant.

So first you should make your data CIM-compliant then tell us what you want to achieve.

Converting index query to data model query

count

eval

stats

tstats

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio