All Apps and Add-ons

Correlation features: differentiate the possibilties

ctaf
Contributor

Hello,

I am having some troubles understanding splunk's correlation features. I think it is really important to understand all the possibilities of Splunk before setting up a good environnement, this is why I am asking for help.
I will try to describe what I understand right now, please, tell me if I'm wrong:

  1. When indexing data, you can do it manually or with an Add-on. The add-on will setup everything automatically
  2. If the Add-on uses CIM, the indexed data is searchable with the CIM data model accross all the indexes ?
  3. If there is no automatic CIM applied on your data, you can do it manually. But how can you do that?
  4. The CIM is not mandatory to correlate data between differents sources. It is also possible to:
    • Do a "field extraction" to create always the same fields in order to normalize the fields (i.e.: always src_ip for "source IP address")
    • Apply a transformation on the logs so as to normalize the fields
    • Do "sub_search"
    • Use "map" function to use results from a first search in a second search

I realize I am getting confused with the CIM and its possibilites to normalize data.
If a good soul could help me, that would be great !

Thanks !

0 Karma
1 Solution

rpille_splunk
Splunk Employee
Splunk Employee

Hi ctaf.

Your question 2 depends on how you have set things up. You can limit the indexes that the data model searches across on the CIM's setup page. That's described in step 4 here: http://docs.splunk.com/Documentation/CIM/latest/User/Install

Here's a great place to start to answer #3:
http://docs.splunk.com/Documentation/CIM/latest/User/UsetheCIMtonormalizedataatsearchtime

View solution in original post

rpille_splunk
Splunk Employee
Splunk Employee

Hi ctaf.

Your question 2 depends on how you have set things up. You can limit the indexes that the data model searches across on the CIM's setup page. That's described in step 4 here: http://docs.splunk.com/Documentation/CIM/latest/User/Install

Here's a great place to start to answer #3:
http://docs.splunk.com/Documentation/CIM/latest/User/UsetheCIMtonormalizedataatsearchtime

Richfez
SplunkTrust
SplunkTrust

To add to that, Splunk can search across a variety of disparate data sources at once when appropriate, like Syslog inputs from your firewall, Active Directory login information and your anti-spam devices. All it takes is being able to get the data into Splunk.

Yes, you are right that having an add-on or app provide CIM compliance will help that a lot because the CIM data models will aggregate all sorts of similar information together. But it can be done without those, too, either by making your own data models or by just using searches across multiple indexes.

Field extraction is a separate but related thing and a simple example may help in showing how it makes easy correlation searches "by default". Supposed you have various applications/logs that have the "user" in different named fields; "User Name" or "user" or "clientName" or whatever. And one that just lists the user's name somewhere in it, but doesn't have it set like "user=myusername" or anything, so Splunk will see "myusername" but won't know it's actually the user.

In that case, searching for just "myusername" across all indexes will show all events where myusername shows up, regardless of what the field is called. Now, if you extract "myusername" into a named field - let's call it "user", then alias all those other names other logs call the user into the field "user" as well, you now have a single fieldname you can specify stuff against, like a search for "user=myusername OR user=anotheruser" or "user!=thatuser", and when run it'll find all events where something's created that field "user" and where the values of it match what you asked for.

This is essentially what CIM compliance does - it maps tags and eventtypes, then aliases/creates the fields all mapped into one fieldname for you. Usually. 🙂

Tags and eventtypes - think of those as a logical tag on an event. So, you can have events tagged "malware" so you can search for all malware related items across all events you have, regardless of where they originally came from: AV, Cisco Firesight, Nessus scans, whatever - if it's tagged right, those events will be associated into the CIM data model in the right places and can be searched on as one group.

Anyway, good questions! Did the answers help?

I may add, keep reading the docs from Splun - they're awesome!

ctaf
Contributor

Thank you guys.
I checked the following link, it helped me a lot understanding CIM usage: http://docs.splunk.com/Documentation/CIM/latest/User/UsetheCIMtonormalizeOSSECdata

So if I understand correctly, to search through CIM-compliant data, I have to use the tags in order to correlate ? And because all the fields will be normalized, I can use the field name with the guarantee that they are all the same accross different sources.

0 Karma

rpille_splunk
Splunk Employee
Splunk Employee

Yes, you apply tags and you also normalize fields that do not already match the field names expected by the model, so you can guarantee that the fields will be the same across sources because you have done the normalization. Step 5 on that page shows some examples of that. And, like you correctly pointed out before, there might be add-ons available for your data sources that takes care of the tagging and field normalization for you.

Best of luck!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...