Hello
Maybe someone can give me an idea about this case.
I have a AntiSpam sending messages like this:
Aug 18 21:21:41 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|EHLO|mail.netsol.com
Aug 18 21:21:41 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|MSG_SIZE|56702
Aug 18 21:21:41 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|MSGID|35845b9268841243
Aug 18 21:21:41 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|SUBJECT| rv: informe
Aug 18 21:21:41 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|SOURCE|external
Aug 18 21:21:41 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|SENDER|acueva@netsol.com
Aug 18 21:21:41 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|CLIENT|209.17.115.10
For every email, 7 or more logs are sent which have as their only common ground "sessionid"
. I know that transaction sessionid
can be used in a search, but I would like these logs to be merged into one event when indexing so transaction won't be necessary.
I thank You in advance.
Greetings
Jorge
That's beyond the power of regular expressions.
If the only way to determine that one bunch of lines is finished and the next bunch is starting is a changed sessionid
then you could do this:
It's not going to cost additional license volume as long as the entire target index only contains the sourcetype stash
as produced by summary indexing. If you transform that into the old sourcetype it'll count twice.
@jrodriguezap: I have the exact same spam solution that you have and I have run into the exact same issue you've been dealing with. I haven't tried the summary index trick, though I admit I had thought of it. I'd hate to waste the storage space just to get the events from this log source into a single event, so I was hoping you may have figured out a way to do this. Any luck on this front?
If an index contains both summary indexing and regular data then the entire index is counted against your license volume. I never actually tested that myself though, so don't hold me accountable 😛 I'm also not quite sure where I read that... Hmm. The docs aren't really helpful, maybe someone with the right link can shed some light.
I understand. I'm using that index to store logs of other teams, and I consume X amount of license, the X value is retained or added consumption summary I am applying?
Summary indexing doesn't consume license volume as long as the target index is only used for summary indexing.
Thank you very much Martin
I'm reviewing these tracks that you have given me, I will be commenting at the end how it goes. What concerns me is the use of licensing currently have scheduled two automatic searches that generate a summary, the first goes to "summary index" and the second "mydbindex", adding the host = field "172.x.x.x"
I have not checked the license consumption, but I'll be consuming extra for that?
Let's assume your maximum transaction duration is less than X minutes, and that your maximum indexing delay is D. Schedule a search every X minutes to run from -2X-D
to -D
. Assemble your transaction events and then filter for transactions starting between -2X-D
and -X-D
. That way you catch transactions that starts in the first half and end in the second half. Transactions that start in the second half will be caught in the next scheduled execution.
Hi Martin
Thanks, I was looking at the option of using the summary index, running the "transaction sessionid" every 1 minute, however, I have seen that there are many log with sessionid outside the window of 1 minute. That is, for example:
Aug 18 21:22:02 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|EHLO|mail.netsol.com
Aug 18 21:22:02 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|MSG_SIZE|56702
--------------------- end window 1 minute---------------------------------------------
Aug 18 21:21:47 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|MSGID|35845b9268841243
Aug 18 21:21:31 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|SUBJECT| rv: informe
Aug 18 21:21:05 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|SOURCE|external
Aug 18 21:21:03 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|SENDER|acueva@netsol.com
--------------------- start window 1 minute---------------------------------------------
Aug 18 21:20:52 172.24.20.35|sessionid:f79a66d0000002d5-cf-53f2b45b0526|CLIENT|209.17.115.10
And when I look at the summary index, there are many events with incomplete fields.
Have you ever touched this case? maybe I can improve the search transaction
Hi Martin, thanks for the tip, I was reviewing options SHOULD_LINEMERGE and BREAK_ONLY_BEFORE.
And I think the most successful, is to make the BREAK when the value of sessionid
is different.
Since the first event is not always the same, often varies, but always the sessionid value is maintained.
What I thought would put something like:
SHOULD_LINEMERGE=True
BREAK_ONLY_BEFORE=sessionid\: IS DIFFERENT
And it's not just where you could use regex to Splunk understands.
Do you think that possible?
Jorge
Is the first event known?
If so, you can set BREAK_ONLY_BEFORE
to match the first event and let Splunk use that to break events rather than the time stamp.