Getting Data In

what can I do with non-trivial log file format?

dholecki
Engager

I have Splunk Universal Forwarder installed on one machine and Splunk Enterprise installed on another machine.

On the machine with Splunk Forwarder I have a process running which produces a logfile in such strange format:

2019-05-03      nodes     signals    requests
23:50:11            7          56         348 
23:51:02            7          31         784 
23:52:13            8          24        1022 
23:53:15            8          12          98 
23:54:11            8          17          34 
2019-05-03      nodes     signals    requests
23:55:07            8          24         123 
23:56:10            8          33         211 
23:57:03            5         101         215 
23:58:11            5           9         213 
23:59:01            5           6         211 
2019-05-04      nodes     signals    requests
00:00:06            3          21         115 
00:01:12            3          31         304 
00:02:03            3          98         215 
00:03:19            5           9         213 
00:04:01            5           6          34 

I want the forwarder to forward this log to the splunk on the other machine, and on the other machine I want to have this log parsed into reasonable events. For instance, I want this line:

23:52:13            8          24        1022 

to produce such event:

timestamp: 2019-05-03 23:52:13
nodes: 8
signals: 24
requests: 1022

How can I achieve this effect?

I can quite easily write a Python script which converts this strange format into CSV format, but then I have no good idea how to make Splunk Enterprise or Splunk Forwarder use my script. I know that I can configure a scripted input on the Splunk Forwarder, the Splunk Forwarder would run it periodically, my script would read the whole log file, it would convert it to csv, it would print it and the forwarder would send CSV to the Splunk Enterprise. However, as the log file grows, its beginning stays the same, and my script would read this beginning every time it is executed - so the same events would be sent to the Splunk Enterprise multiple times. So I would have some events duplicated. I could improve my script so it remembers somewhere (for instance, in a database) which events it has already printed and make it not print these events again, but then this script becomes complicated.

I could also use my script in another way: it could run non-stop, tail the log, convert it to csv and write the result to another file - and then I would configure Splunk Forwarder to monitor this other file, produced by my script. But then my script would have to take care of log file rotation, and I would need some mechanism to take care of starting the script again if it is somehow killed - so also this solution is complicated.

Is there any better way to achieve my goal?

0 Karma

skalliger
SplunkTrust
SplunkTrust

Hi,

KV_MODE = multi is your friend here. 🙂

Skalli

0 Karma

dholecki
Engager

An important part of the problem is that the date is in header rows, while hour, minutes and seconds are in normal rows, so I somehow have to combine them in order to get the timestamp. So I think that KV_MODE=multi will not help me and I think I must parse it with some code (for instance, with a Python script).

0 Karma

adonio
Ultra Champion

imho go with the Python script to modify the data, seems to be able to create very simple csv / tsv / psv file here with full timestamp.
while you are at it, you can quietly grunt in annoyance about this weird log format and the poor choices the developers of this format took

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...