Getting Data In

How do I group lines into a single event at index-time?

hulahoop
Splunk Employee
Splunk Employee

What I'm trying to do: at index time, create a multiline event based on a unique ID. In the data sample below, I need to group by the unique identifier in bold. All entries have this unique ID and are written in clumps as far as I can tell (i.e. they are not interleaved). Is this possible to do in Splunk 4.0? Or is the transaction command my only option? I believe in the Splunk 3.0 days this could be accomplished via metaevents.

[Thu May 13 11:54:33 2004] **00032066** - PLUGIN: Plugins loaded.
[Thu May 13 11:54:33 2004] **00032066** - PLUGIN: ---------------SystemInformation------------------
[Thu May 13 11:54:33 2004] **00032066** - PLUGIN: Bld date: Nov 11 2002, 21:26:39
[Thu May 13 11:54:33 2004] **00032066** - PLUGIN: Webserver: IBM_HTTP_SERVER/1.3.26  Apache/1.3.26 (Unix)
[Thu May 13 11:54:33 2004] **00032066** - PLUGIN: Hostname = hulahoop
[Thu May 13 11:54:33 2004] **00032066** - PLUGIN: --------------------------------------------------
[Fri May 21 16:14:30 2004] **0003c0f2** - ERROR: ws_config_parser: handleLogEnd: Failed to open log file: '/do/the/hulahoop' [Fri May 21 16:14:30 2004] **0003c0f2** - ERROR: lib_security: logSSLError: str_security (gsk error 408): GSK_ERROR_BAD_KEYFILE_PASSWORD [Fri May 21 16:14:30 2004] **0003c0f2** - ERROR: ws_transport: transportInitializeSecurity: Failed to initialize security
[Fri May 21 16:14:30 2004] **00033040** - ERROR: ws_config_parser: handleLogEnd: Failed to open log file: '/foo/bar/baz' [Fri May 21 16:14:30 2004] **00033040** - ERROR: lib_security: logSSLError: str_security (gsk error 408): GSK_ERROR_BAD_KEYFILE_PASSWORD [Fri May 21 16:14:30 2004] **00033040** - ERROR: set the WAS_HOME environment variable to the appropriate directory [Fri May 21 16:14:30 2004] **00033040** - ERROR: ws_common: websphereBeginRequest: Config reloading FAILED; using old config
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

Are the extra blank lines in the input file, or did you add them when you posted? If they are in the input, you should be able to set LINEBREAKER=([\r\n]{2})

Otherwise, the answer is no. If you did want to do this, you could use transaction command to group them first, then write the results to a "summary" index. (You will probably need a custom pre-summarizing script and some props.conf settings to do this though, and no, this isn't a great solution.)

The transaction command works well, and will be very low cost if your data is clumped, as and you used (say) connected=t and maxopentxn=5.

View solution in original post

msor001
New Member

I don't know what it will do to your indexer's resources, but I suppose a multilined negative lookbehind might work as a linebreaker... \1 might need to be $1 depending on the flavor of regex being used at this layer of props.

LINE_BREAKER = ([\n\r]+)(?m)\[[^\s]+\s[^\s]+\s[\d]{2}\s(?:[\d:]{3}){2}[^\s]+\s[\d]{4}\]\s([*]{2}[^\*]+[*]{2})(?<!\1).+
0 Karma

djones
Engager

Since that's a WebSphere plugin log file, and the WebSphere plugin can handle many simultaneous requests, those lines would not have to come in "chunks". You'll see those events interleaved more often in a WebSphere environment that has a higher number of simultaneous users.

As a result, the transaction command is likely the most reliable way to process this data.

You can use search macros and/or saves searches to help make it easier for end users to deal with those WAS plugin events. A nice example is in the [Splunk docs].

hulahoop
Splunk Employee
Splunk Employee

Dave, thank you. You are correct. The events are interleaved and there are sometimes hundreds of events with the same message ID across a span of days+. I only have a small sample, and the transaction command works beautifully on it, but I can't tell how it will scale for millions of events in a production environment. Without using the transaction command, it is possible to use the message ID field to group these events, which I imagine is more efficient. For most purposes, I believe correlation by the message ID will suffice.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Are the extra blank lines in the input file, or did you add them when you posted? If they are in the input, you should be able to set LINEBREAKER=([\r\n]{2})

Otherwise, the answer is no. If you did want to do this, you could use transaction command to group them first, then write the results to a "summary" index. (You will probably need a custom pre-summarizing script and some props.conf settings to do this though, and no, this isn't a great solution.)

The transaction command works well, and will be very low cost if your data is clumped, as and you used (say) connected=t and maxopentxn=5.

hulahoop
Splunk Employee
Splunk Employee

The reason why I'm uncomfortable using the transaction approach is it's not obvious as an end user this is what needs to be done to have Splunk understand this log file intelligently.

0 Karma

hulahoop
Splunk Employee
Splunk Employee

Thank you, G. I was afraid of that. 😞 The data is clumped, but I added the extra lines for illustration. There are no line breaks in the log file itself. Remind me again why metaevents were deprecated.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...