Getting Data In

How to merge logs that share a common field value before indexing?

jrodriguezap
Contributor

Hello
Maybe someone can give me an idea about this case.
I have a AntiSpam sending messages like this:

Aug 18 21:21:41 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|EHLO|mail.netsol.com
Aug 18 21:21:41 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|MSG_SIZE|56702
Aug 18 21:21:41 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|MSGID|35845b9268841243
Aug 18 21:21:41 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|SUBJECT| rv: informe
Aug 18 21:21:41 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|SOURCE|external
Aug 18 21:21:41 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|SENDER|acueva@netsol.com
Aug 18 21:21:41 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|CLIENT|209.17.115.10

For every email, 7 or more logs are sent which have as their only common ground "sessionid". I know that transaction sessionid can be used in a search, but I would like these logs to be merged into one event when indexing so transaction won't be necessary.

I thank You in advance.
Greetings
Jorge

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

That's beyond the power of regular expressions.

If the only way to determine that one bunch of lines is finished and the next bunch is starting is a changed sessionid then you could do this:

  • Index the events as-is into a temporary index
  • Schedule a summary search that assembles each bunch into one event
  • Index that assembled event into the actual index for this data

It's not going to cost additional license volume as long as the entire target index only contains the sourcetype stash as produced by summary indexing. If you transform that into the old sourcetype it'll count twice.

0 Karma

joshua_hart1
Path Finder

@jrodriguezap: I have the exact same spam solution that you have and I have run into the exact same issue you've been dealing with. I haven't tried the summary index trick, though I admit I had thought of it. I'd hate to waste the storage space just to get the events from this log source into a single event, so I was hoping you may have figured out a way to do this. Any luck on this front?

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

If an index contains both summary indexing and regular data then the entire index is counted against your license volume. I never actually tested that myself though, so don't hold me accountable 😛 I'm also not quite sure where I read that... Hmm. The docs aren't really helpful, maybe someone with the right link can shed some light.

0 Karma

jrodriguezap
Contributor

I understand. I'm using that index to store logs of other teams, and I consume X amount of license, the X value is retained or added consumption summary I am applying?

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Summary indexing doesn't consume license volume as long as the target index is only used for summary indexing.

0 Karma

jrodriguezap
Contributor

Thank you very much Martin
I'm reviewing these tracks that you have given me, I will be commenting at the end how it goes. What concerns me is the use of licensing currently have scheduled two automatic searches that generate a summary, the first goes to "summary index" and the second "mydbindex", adding the host = field "172.x.x.x"
I have not checked the license consumption, but I'll be consuming extra for that?

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Let's assume your maximum transaction duration is less than X minutes, and that your maximum indexing delay is D. Schedule a search every X minutes to run from -2X-D to -D. Assemble your transaction events and then filter for transactions starting between -2X-D and -X-D. That way you catch transactions that starts in the first half and end in the second half. Transactions that start in the second half will be caught in the next scheduled execution.

0 Karma

jrodriguezap
Contributor

Hi Martin
Thanks, I was looking at the option of using the summary index, running the "transaction sessionid" every 1 minute, however, I have seen that there are many log with sessionid outside the window of 1 minute. That is, for example:

    Aug 18 21:22:02 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|EHLO|mail.netsol.com
    Aug 18 21:22:02 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|MSG_SIZE|56702
--------------------- end window 1 minute---------------------------------------------
    Aug 18 21:21:47 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|MSGID|35845b9268841243
    Aug 18 21:21:31 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|SUBJECT| rv: informe
    Aug 18 21:21:05 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|SOURCE|external
    Aug 18 21:21:03 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|SENDER|acueva@netsol.com
 --------------------- start window 1 minute---------------------------------------------
    Aug 18 21:20:52 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|CLIENT|209.17.115.10

And when I look at the summary index, there are many events with incomplete fields.
Have you ever touched this case? maybe I can improve the search transaction

0 Karma

jrodriguezap
Contributor

Hi Martin, thanks for the tip, I was reviewing options SHOULD_LINEMERGE and BREAK_ONLY_BEFORE.
And I think the most successful, is to make the BREAK when the value of sessionid is different.
Since the first event is not always the same, often varies, but always the sessionid value is maintained.
What I thought would put something like:

SHOULD_LINEMERGE=True
BREAK_ONLY_BEFORE=sessionid\: IS DIFFERENT

And it's not just where you could use regex to Splunk understands.
Do you think that possible?

Jorge

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Is the first event known?

If so, you can set BREAK_ONLY_BEFORE to match the first event and let Splunk use that to break events rather than the time stamp.

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...