Splunk Search

Merging different log formats

jgauthier
Contributor

All,

I am wondering it it's possible to take two entirely different source file formats (containing the same data) and be able to report against them. Real life scenario, I have a mail server that writes logs in tab a tab delimited format. I have another mail server that uses CSV. They contain the same general fields.

I would like to take these two different sources, consolidated into one sourcetype in splunk, and do logical analysis from that at that point.

Is it possible, and what are some broad guidelines to achieve this?

My concerns are:
Parsing the timestamps, as they are different. Report time extractions. How will splunk extract the fields when the formats are completely different?

Thanks for any tips!

0 Karma
1 Solution

southeringtonp
Motivator

Be sure to configure your sourcetypes properly -- after that the rest should fall into place.

Configuration Steps:

  • Make sure that each of your formats is assigned a unique sourcetype. You can either assign the sourcetype based on each input or source to splunk, or you can use a transform to do it based on value in the event data. For more information on sourcetypes, look here.

  • Timestamping will usually work out-of-the-box, even for unknown data formats. If it doesn't, you can customize the timestamp extraction.

  • Create field extractions for each sourcetype. You'll need a unique extraction per-field, per-sourcetype. Take a look at the Common Information Model for suggested naming conventions. For delimited data, you may also want to look at FIELDS and DELIMS in transforms.conf (link).

  • Search across all of your sourcetypes together, and pipe them to reporting commands to get values based on both.

Example searches:

sourcetype=format1 OR sourcetype=format2 | table user, src_ip, action
sourcetype=format1 OR sourcetype=format2 | stats count by user
sourcetype=format* | stats count by user

The first example just gives you nice formatting. The second and third example both gather basic usage stats. Note the second example -- if you name your sourcetypes appropriately, you can use wildcards to catch all of the variations at once. More examples of the various reporting commands are in the manual or the Cheat Sheet.

View solution in original post

southeringtonp
Motivator

Be sure to configure your sourcetypes properly -- after that the rest should fall into place.

Configuration Steps:

  • Make sure that each of your formats is assigned a unique sourcetype. You can either assign the sourcetype based on each input or source to splunk, or you can use a transform to do it based on value in the event data. For more information on sourcetypes, look here.

  • Timestamping will usually work out-of-the-box, even for unknown data formats. If it doesn't, you can customize the timestamp extraction.

  • Create field extractions for each sourcetype. You'll need a unique extraction per-field, per-sourcetype. Take a look at the Common Information Model for suggested naming conventions. For delimited data, you may also want to look at FIELDS and DELIMS in transforms.conf (link).

  • Search across all of your sourcetypes together, and pipe them to reporting commands to get values based on both.

Example searches:

sourcetype=format1 OR sourcetype=format2 | table user, src_ip, action
sourcetype=format1 OR sourcetype=format2 | stats count by user
sourcetype=format* | stats count by user

The first example just gives you nice formatting. The second and third example both gather basic usage stats. Note the second example -- if you name your sourcetypes appropriately, you can use wildcards to catch all of the variations at once. More examples of the various reporting commands are in the manual or the Cheat Sheet.

jgauthier
Contributor

Okay, so I do want different source types. I just need to use the same field names in the extractions and then things should work. Okay, Thanks! Makes perfect sense!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...