Getting Data In

How can we create multiline events based on the value of a field?

heath
Path Finder

We have json source data with a MESSAGE field that has the actual log entry we want to collect. Each event also has a CONTAINER_ID that we would like to add to the events as MetaData at index time.

Sample input log data:
{ "CONTAINER_ID" : "abc", "MESSAGE" : "10/Jul/2017:22:32:36 first line of multiline log entry" }
{ "CONTAINER_ID" : "abc", "MESSAGE" : "second line of a multiline log entry" }
{ "CONTAINER_ID" : "xyz", "MESSAGE" : "10/Jul/2017:22:33:29 different log entry" }

The end result we would want is two Splunk events:
* Event one
container_id = "abc" (container_id is added as metadata similar to host, source, sourcetype)
_raw = "10/Jul/2017:22:32:36 first line of multiline log entry
second line of a multiline log entry"

  • Event two container_id = "xyz" _raw = "10/Jul/2017:22:33:29 different log entry"

Note that the first event is a multiline event containing log lines that have the same CONTAINER_ID, abc.

I'm able to generate the container_id as metadata and I'm able to overwrite _raw with the value of MESSAGE. However, I don't see a way to create a multiline event based on the original CONTAINER_ID. There doesn't seem to be a way to combine log lines based on the value of a field. Is this possible?

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Yes, you can do this, but can you guarantee that the first and second lines will always be consecutive (and not have the first line of the second event come BEFORE the second line of the first event? If not, then you should use transactions. The other good reason for using transactions might also be that each JSON string can be properly parsed if you break them up by each string, and let Splunk display the JSON strings in a nice formatted way.

If you insist on putting the events together, you can do it (if you data follows the form above with the following config line in props.conf for the sourctype:

BREAK_ONLY_BEFORE=\d\d/\w\w\w/\d\d\d\d:\d\d:\d\d:\d\d

I would seriously consider using transactions, however. This would make it easy to do if you use the following in your config:

KV_MODE = json

which will give you a nice formatting for each line, and field extractions for all JSON string fields.

0 Karma

heath
Path Finder

Thanks for the reply. Unfortunately the data can come in out of order:
{ "CONTAINER_ID" : "abc", "MESSAGE" : "10/Jul/2017:22:32:36 first line of multiline log entry" }
{ "CONTAINER_ID" : "xyz", "MESSAGE" : "10/Jul/2017:22:32:36 different log entry" }
{ "CONTAINER_ID" : "abc", "MESSAGE" : "second line of a multiline log entry" }

The result would be that the "second line..." from abc would be incorrectly added to xyz's log entry.

We are trying to do this at index time but yeah, the alternative is using '| transaction' at search time.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

somesoni2's comment above is 100% correct. If the data doesn't come in in order, then you can't easily (or even with only a little difficulty) do it at index time, it the data processes serially. Your options are search time or (seriously difficult) processing of the data before it goes to be indexed.

0 Karma

somesoni2
Revered Legend

The event grouping can't be done based on values in raw data, but can be done with patterns. If, based on example logs you posted in question, first line of your events contains MESSAGE" : "<<timestamp here>> and second line doesn't, then it'll be possible to merge those entries as one event.

You can defined indexed time field for your container ID by following this

http://docs.splunk.com/Documentation/SplunkCloud/6.6.0/Data/Configureindex-timefieldextraction#Index...

0 Karma

heath
Path Finder

Thanks for the reply. Unfortunately if the data came in out of order:
{ "CONTAINER_ID" : "abc", "MESSAGE" : "10/Jul/2017:22:32:36 first line of multiline log entry" }
{ "CONTAINER_ID" : "xyz", "MESSAGE" : "10/Jul/2017:22:32:36 different log entry" }
{ "CONTAINER_ID" : "abc", "MESSAGE" : "second line of a multiline log entry" }

The result would be that the "second line..." from abc would be incorrectly added to xyz's log entry.

0 Karma

somesoni2
Revered Legend

Yeah... Splunk can't save events in memory to find matches at index time. Recommended way would be to get your logging fixed so that all related events are logged together, with some pattern to group. Other,bad, no-so-easy, option would be to create a pre-processing custom script, which runs periodically to do grouping for your data per your need and write sanitize output to a new file which Splunk can monitor and group easily.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...