How to configure Splunk to split a single large fi...

rakesh_498115 · ‎03-26-2015

Hi ,

I have a single source which has a huge number of events. These events are broadly classified into two groups and all are present in the same file single file. Now, my requirement is to get the file indexed into a single index as called "myindex" and have two different sourcetypes "group1" and "group2". group1 and group2 category in the file is distinguihsed with the help of the keyword XXX and YYY in my log file.for example XXX denotes group1 and YYY denotes group2.

Here is the sample of log file.

// mylog_sample.txt

24-08-2014 10:23:34  12e,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:35  12e,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:36  1w2,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:37  12e,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:39  122,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
25-08-2014 10:23:34  12e,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:35  12e,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:36  1w2,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:37  12e,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:39  122,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456

All the data is present in the same file. Now i want to split the whole data into two different sourcetypes "group1" and "group2" in a single index.

so if i search the data with:

index="myindex" sourcetype="group1"

it should list the following data ..

24-08-2014 10:23:34  12e,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:35  12e,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:36  1w2,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:37  12e,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:39  122,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456

and if I search with the following:

index="myindex" sourcetype="group2"

it should list the following data ..

25-08-2014 10:23:34  12e,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:35  12e,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:36  1w2,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:37  12e,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:39  122,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456

Any help on the above use case. I used to transforms.conf, but no luck on separation. Please post the proper configuration that helps and suits the requirement.

Many thanks.
Rakesh.

stephanefotso · ‎03-26-2015

Yes it is possible, but you could do it before the indexing-time of the data pipeline, since override a sourcetype occurs at parse-time.
I hope this could help you. http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/Advancedsourcetypeoverrides

SGF

rakesh_498115 · ‎03-27-2015

Thanks for the update stephan. but this seems not working below is my configuration.

// inputs.conf

[monitor:///opt/splunk/splunkInput/mylog_sample.txt]
disabled = false
followTail = 0
recursive = false
sourcetype = temp
index = myindex

// transforms.conf

[set_group1_routing]
REGEX = XXX
FORMAT = sourcetype::group1
DEST_KEY = MetaData:Sourcetype

[set_group2_routing]
REGEX = YYY
FORMAT = sourcetype::group2
DEST_KEY = MetaData:Sourcetype

// props.conf

[group1]
TRANSFORMS-350_routing=set_group1_routing
DATETIME_CONFIG = CURRENT
MAX_TIMESTAMP_LOOKAHEAD = 150
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false

[group2]
TRANSFORMS-350_routing=set_group2_routing
DATETIME_CONFIG = CURRENT
MAX_TIMESTAMP_LOOKAHEAD = 150
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false

Help me if am missing something. thanks in advance 🙂

maciep · ‎03-27-2015

it looks like your data will be sending with a sourcetype of temp intially. So your props can probably look more like this

[temp]
DATETIME_CONFIG = CURRENT
MAX_TIMESTAMP_LOOKAHEAD = 150
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
TRANSFORMS-350_routing=set_group1_routing, set_group2_routing

group1

group2

So the data will come in with a sourcetype of "temp" and hit your props. So along with the timestamp/linebreak settings, your transforms will be applied which will set the new sourcetype accordingly.

Also, if you decide on creating field extractions or other search-time settings, they would be applied to/configured in the stanzas for those new sourcetypes you created - group1 and group2

vincenteous · ‎03-27-2015

Are you using a heavy forwarder? Where do you put this configuration? Is it your indexer?

rakesh_498115 · ‎03-30-2015

No vincenteous... i am using this configuration at indexer .

How to configure Splunk to split a single large file into 2 sourcetypes based on a keyword in the log file?

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer at Splunk .conf24 ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Combine Multiline Logs into a Single Event with SOCK: a Step-by-Step Guide for ...