Getting Data In

Getting data into splunk

srajanbabu
Explorer

How to change the format of the input data to our need before indexing in splunk. My original lof is in the format.
SNM4 YAHOO3SN.#### :: 03/03/13 00:00:07 :: User yahoo3sn logged in
SNM4 YAHOO3SN.871F :: 03/03/13 00:00:07 :: User logged off, Processing will begin
SNM4 YAHOO3SN.871F :: 03/03/13 00:00:07 :: Autoforward profile found for site YAHOO3SN
i want to change the format of the above log before indexing starts in splunk t tp the below format

YAHOO3SN.871F|logged in|03/03/2013|00:00:07

Tags (1)

norbert_hamel
Communicator

Well, if you are not familiar with RegExes you should use a tool like QuickREx, which is available as portable version also:

^         find he beginning of the line.
(.*?)\s   find some text followed by a space and store this to variable $1
(.*?)\s   find some text followed by a space and store this to variable $2
::\s      find two colons followed by a space
(\d\d)\/  find 2 numbers followed by a slash and store this to variable $3 (day)
(\d\d)\/  find 2 numbers followed by a slash and store this to variable $4 (month)
(\d\d)\s  find 2 numbers followed by a space and store this to variable $5 (2-digit year)
(\d\d\:\d\d\:\d\d)\s::\s  
          find the hour, minutes and seconds, followed by space, colon, colon, space and store this to variable $6 
(.*?)((logged off)|(logged in))(.*)
          find some text followed by either "logged in" or "logged out" and store this to variable $8

Write the following text to the _raw event:

$2|$3/$4/20$5|$6|$8
Content of variable $2 followed by pipe, then the day ($3) followed by slash, the month ($4) followed by slash, the "20" followed by the 2-digit year ($5) to have a proper year, then the time ($6) followed by "logged in" or "logged out" ($8)
0 Karma

srajanbabu
Explorer

thanks a lot

0 Karma

norbert_hamel
Communicator

You can use a combination of props.conf and transforms.conf on your Indexer for that. In this example, the props.conf will inform your Splunk to use the transformation called "rewrite-MyLogs" for the sourcetype "MySourceType". The transformation will use a regular expression on the input and find the terms "logged in" or "logged off" and create the new data for the Indexer. For the date the "20" is added to the short format of the year. The case that none of the 2 terms can be found is not yet covered in this snippet.

Note that this rewriting of logs requires additional system resources and therefore may impact the performance of your Splunk installation. In order to solve that you could place this part as well on a Heavy Forwarder in front of the Indexer(s).

Note2: When you are rewriting the date/time anyway, you should consider to use a standard time format like ISO 8601, this may avoid troubles in the future 🙂

#####
props.conf:
[MySourceType]
TRANSFORMS-MyLogs = rewrite_MyLogs

#####
transforms.conf
[rewrite_MyLogs]
REGEX = ^(.*?)\s(.*?)\s::\s(\d\d)\/(\d\d)\/(\d\d)\s(\d\d\:\d\d\:\d\d)\s::\s(.*?)((logged off)|(logged in))(.*)
FORMAT = $2|$3/$4/20$5|$6|$8
DEST_KEY = _raw

srajanbabu
Explorer

Can you explain the format and regex pattern in detail .

0 Karma

kristian_kolb
Ultra Champion

well, there are some ways to 'change' data prior to indexing, like described in;

http://docs.splunk.com/Documentation/Splunk/6.0/Data/Anonymizedatausingconfigurationfiles

The steps described there are mostly for removing unwanted pieces of information, such as credit card numbers etc. For more extensive rewriting of log data, it might be better to look at the logging application, and see what output options it provides.

/K

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...