Getting Data In

How to configure Splunk to parse a log file with tables separated by horizontal lines?

hkmurali
New Member

I have a log file that's in the following format. 8- tables on key machine data such as top CPU process, top machine utilization, top disk utilization, top memory process, server process etc, each separated by two horizontal lines as you can see below
But the parser is unable to fetch these 8-9 tables separately and processes it as a single big file and gives me incorrect patterns
(I have thousands of log files in the exact pattern below each from a different timestamp)
Please help me with the parsing of such files in Splunk so that I can perform analytics for each process separately
How should I go about in parsing and searching this to give me meaningful data.

-----------------------------------------------------------------
-----------------------------------------------------------------
                    Top 20 CPU Consuming Processes               
-----------------------------------------------------------------
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
prodems  15100 96.2 24.5 8789508 8048972 ?     Sl   Nov28 848:31 /opt/java1.6_64/jdk1.6.0_26/bin/java -server com.redprairie.moca.server.MocaServerMain
prodwms  12817 91.2  6.4 13761360 2106460 ?    Sl   Nov28 805:25 /opt/java1.6_64/jdk1.6.0_26/bin/java -server -Xmx12288m -XX:MaxPermSize=192m com.redprairie.moca.server.MocaServerMain
oracle    9171  9.9  9.5 6545108 3139724 ?     Ss   Nov28  85:17 oracleprod (LOCAL=NO)
oracle   15580  9.3  9.8 6547152 3236888 ?     Ss   Nov28  82:05 oracleprod (LOCAL=NO)
oracle    3471  9.0  9.6 6545140 3162072 ?     Ss   Nov28  77:55 oracleprod (LOCAL=NO)
oracle   17994  8.5  9.7 6545124 3213692 ?     Rs   Nov28  74:50 oracleprod (LOCAL=NO)
oracle   13446  5.3 10.7 6554744 3529236 ?     Ss   Nov28  47:01 oracleprod (LOCAL=NO)
-----------------------------------------------------------------
-----------------------------------------------------------------
                    Top 20 Memory Consuming Processes            
-----------------------------------------------------------------
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

oracle   13452  6.0 10.9 6552032 3590740 ?     Ss   Nov28  52:56 oracleprod (LOCAL=NO)
oracle   13342  6.0 10.9 6551532 3587824 ?     Ss   Nov28  53:39 oracleprod (LOCAL=NO)
oracle   13415  6.0 10.8 6558572 3554244 ?     Ss   Nov28  53:06 oracleprod (LOCAL=NO)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

I think I'd write a scripted input to process the files into something a little easier for Splunk to ingest. The script could take the timestamp from the filename, skip the horizontal lines, extract the metric from the "Top 20" line, and put everything from "USER" to next horizontal line into an event, ignoring blank lines. It would this 8 times, once for each table. The result would look something like

1/9/2016 12:27:00 CPU=" USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
prodems  15100 96.2 24.5 8789508 8048972 ?     Sl   Nov28 848:31 /opt/java1.6_64/jdk1.6.0_26/bin/java -server com.redprairie.moca.server.MocaServerMain
prodwms  12817 91.2  6.4 13761360 2106460 ?    Sl   Nov28 805:25 /opt/java1.6_64/jdk1.6.0_26/bin/java -server -Xmx12288m -XX:MaxPermSize=192m com.redprairie.moca.server.MocaServerMain
...
1/9/2016 12:27:00 Memory= USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
oracle   13452  6.0 10.9 6552032 3590740 ?     Ss   Nov28  52:56 oracleprod (LOCAL=NO)
oracle   13342  6.0 10.9 6551532 3587824 ?     Ss   Nov28  53:39 oracleprod (LOCAL=NO)
..."

Then I'd use the multikv command at search time to process each event.

---
If this reply helps you, Karma would be appreciated.
0 Karma

hkmurali
New Member

Can u help me with the script pls?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I don't have the time to write one for you. Any programmer should be able to put together a python or java program that can parse that file. The important thing to do is write to stdout what you want Splunk to index. Once you have that, it can be set up as a scripted input.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...