Getting Data In

Spunk creates multiple indexes for a single batch file execution

anshumandas
New Member

We are forwarding a directory consisting of hundreds of batch job execution logs. However Splunk reindexes the logs buy splitting the logs into multiple events(3, 4. ...sometimes 10 events). As a result of this behaviour, the number of events and for that matter the volume of data is increasing exponentially. The nature/size of logs are not are distinct however the header and footer details are in similar formats. I have provided a snapshot of a sample log file and how splunk splits and indexes the data below:

Actual Log File:

===============================================================
= JOB : ABCD[(0900 03/23/16),(0AAAAAAAAAAARCVF)].tttttt
= USER : deb Sponsor svvbnmn,SHELL=/bin/ksh
= JCLFILE : $HOME/jobs/xyz.sh
= Job Number: 20

= Thu 03/24/16 00:43:18 EDT

ABC for UNIX/ghcgv 11.2
HGF Starting /opt/app/hghj/dxdxfd/VCX/ghcgv $HOME/jobs/xyz.sh
Tivoli Workload Scheduler (UNIX)/ghcgv 11.2 (20130417)
Installed for user "dxdxfd".
Locale LANG set to the following: "en"
stty: : No such device or address
stty: : No such device or address
stty: : No such device or address
+------------------------------------------------------------+
xyz.sh; Message; Program started at: 03/24/2016 00:43:18
Machine Job Starting...........
Waiting for job...

Finished waiting for job
Job Status : (1)

Status code = 1
Job submitted successfully
MachineJob Ending.............

xyz.sh; Message; Program ended successfully at: 03/24/2016 00:44:54

= Exit Status : 0
= System Time (Seconds) : 0 Elapsed Time (Minutes) : 1
= User Time (Seconds) : 0

= Thu 03/24/16 00:44:54 EDT

How Splunk indexes the log file:

Event-1:

3/24/16

12:43:18.000 AM

= JOB : ABCD[(0900 03/23/16),(0AAAAAAAAAAARCVF)].tttttt
= USER : deb Sponsor svvbnmn,SHELL=/bin/ksh
= JCLFILE : $HOME/jobs/xyz.sh
= Job Number: 20

= Thu 03/24/16 00:43:18 EDT

ABC for UNIX/ghcgv 11.2
HGF Starting /opt/app/hghj/dxdxfd/VCX/ghcgv $HOME/jobs/xyz.sh
Tivoli Workload Scheduler (UNIX)/ghcgv 11.2 (20130417)
Installed for user "dxdxfd".
Locale LANG set to the following: "en"
stty: : No such device or address
stty: : No such device or address
stty: : No such device or address
+------------------------------------------------------------+
xyz.sh; Message; Program started at: 03/24/2016 00:43:18
DataStage Job Starting...........
Waiting for job...

Event-2:

3/24/16

12:43:18.000 AM

= JOB : ABCD[(0900 03/23/16),(0AAAAAAAAAAARCVF)].tttttt
= USER : deb Sponsor svvbnmn,SHELL=/bin/ksh
= JCLFILE : $HOME/jobs/xyz.sh
= Job Number: 20

= Thu 03/24/16 00:43:18 EDT

ABC for UNIX/ghcgv 11.2
HGF Starting /opt/app/hghj/dxdxfd/VCX/ghcgv $HOME/jobs/xyz.sh
Tivoli Workload Scheduler (UNIX)/ghcgv 11.2 (20130417)
Installed for user "dxdxfd".
Locale LANG set to the following: "en"
stty: : No such device or address
stty: : No such device or address
stty: : No such device or address
+------------------------------------------------------------+
xyz.sh; Message; Program started at: 03/24/2016 00:43:18
Machine Job Starting...........
Waiting for job...

Event-3:

3/24/16

12:44:54.000 AM

= JOB : ABCD[(0900 03/23/16),(0AAAAAAAAAAARCVF)].tttttt
= USER : deb Sponsor svvbnmn,SHELL=/bin/ksh
= JCLFILE : $HOME/jobs/xyz.sh
= Job Number: 20

= Thu 03/24/16 00:43:18 EDT

ABC for UNIX/ghcgv 11.2
HGF Starting /opt/app/hghj/dxdxfd/VCX/ghcgv $HOME/jobs/xyz.sh
Tivoli Workload Scheduler (UNIX)/ghcgv 11.2 (20130417)
Installed for user "dxdxfd".
Locale LANG set to the following: "en"
stty: : No such device or address
stty: : No such device or address
stty: : No such device or address
+------------------------------------------------------------+
xyz.sh; Message; Program started at: 03/24/2016 00:43:18
Machine Job Starting...........
Waiting for job...

Finished waiting for job
Job Status : (1)

Status code = 1
Job submitted successfully
MachineJob Ending.............

xyz.sh; Message; Program ended successfully at: 03/24/2016 00:44:54

= Exit Status : 0
= System Time (Seconds) : 0 Elapsed Time (Minutes) : 1
= User Time (Seconds) : 0

= Thu 03/24/16 00:44:54 EDT

Question: We would like to have Splunk index the data as a single event instead of multiple events. Can you please help suggest the approach to deal in such a scenario? We were not able to find a solution reading through the blogs and would really love to hear from you.

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

So what you are sending is considered a multiline event. You will need to setup line breaking and time stamp recognition for each sourcetype. In my past experience, events like this are most likely coming from AIX or mainframe sources. You will need to find all the different formats and create sourcetypes for each. (Splunk currently cant handle having a single file with multiple sourcetypes.)

Per those sourcetypes, youll need to configure the event breaking and time stamping. The GUI is a great way to start for this...

Here is a great place to start : http://docs.splunk.com/Documentation/Splunk/6.3.3/Data/Configureeventlinebreaking

Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...