I have a log file that's in the following format. 8- tables on key machine data such as top CPU process, top machine utilization, top disk utilization, top memory process, server process etc, each separated by two horizontal lines as you can see below
But the parser is unable to fetch these 8-9 tables separately and processes it as a single big file and gives me incorrect patterns
(I have thousands of log files in the exact pattern below each from a different timestamp)
Please help me with the parsing of such files in Splunk so that I can perform analytics for each process separately
How should I go about in parsing and searching this to give me meaningful data.
-----------------------------------------------------------------
-----------------------------------------------------------------
Top 20 CPU Consuming Processes
-----------------------------------------------------------------
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
prodems 15100 96.2 24.5 8789508 8048972 ? Sl Nov28 848:31 /opt/java1.6_64/jdk1.6.0_26/bin/java -server com.redprairie.moca.server.MocaServerMain
prodwms 12817 91.2 6.4 13761360 2106460 ? Sl Nov28 805:25 /opt/java1.6_64/jdk1.6.0_26/bin/java -server -Xmx12288m -XX:MaxPermSize=192m com.redprairie.moca.server.MocaServerMain
oracle 9171 9.9 9.5 6545108 3139724 ? Ss Nov28 85:17 oracleprod (LOCAL=NO)
oracle 15580 9.3 9.8 6547152 3236888 ? Ss Nov28 82:05 oracleprod (LOCAL=NO)
oracle 3471 9.0 9.6 6545140 3162072 ? Ss Nov28 77:55 oracleprod (LOCAL=NO)
oracle 17994 8.5 9.7 6545124 3213692 ? Rs Nov28 74:50 oracleprod (LOCAL=NO)
oracle 13446 5.3 10.7 6554744 3529236 ? Ss Nov28 47:01 oracleprod (LOCAL=NO)
-----------------------------------------------------------------
-----------------------------------------------------------------
Top 20 Memory Consuming Processes
-----------------------------------------------------------------
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
oracle 13452 6.0 10.9 6552032 3590740 ? Ss Nov28 52:56 oracleprod (LOCAL=NO)
oracle 13342 6.0 10.9 6551532 3587824 ? Ss Nov28 53:39 oracleprod (LOCAL=NO)
oracle 13415 6.0 10.8 6558572 3554244 ? Ss Nov28 53:06 oracleprod (LOCAL=NO)
I think I'd write a scripted input to process the files into something a little easier for Splunk to ingest. The script could take the timestamp from the filename, skip the horizontal lines, extract the metric from the "Top 20" line, and put everything from "USER" to next horizontal line into an event, ignoring blank lines. It would this 8 times, once for each table. The result would look something like
1/9/2016 12:27:00 CPU=" USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
prodems 15100 96.2 24.5 8789508 8048972 ? Sl Nov28 848:31 /opt/java1.6_64/jdk1.6.0_26/bin/java -server com.redprairie.moca.server.MocaServerMain
prodwms 12817 91.2 6.4 13761360 2106460 ? Sl Nov28 805:25 /opt/java1.6_64/jdk1.6.0_26/bin/java -server -Xmx12288m -XX:MaxPermSize=192m com.redprairie.moca.server.MocaServerMain
...
1/9/2016 12:27:00 Memory= USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
oracle 13452 6.0 10.9 6552032 3590740 ? Ss Nov28 52:56 oracleprod (LOCAL=NO)
oracle 13342 6.0 10.9 6551532 3587824 ? Ss Nov28 53:39 oracleprod (LOCAL=NO)
..."
Then I'd use the multikv
command at search time to process each event.
Can u help me with the script pls?
I don't have the time to write one for you. Any programmer should be able to put together a python or java program that can parse that file. The important thing to do is write to stdout what you want Splunk to index. Once you have that, it can be set up as a scripted input.