Extract multivalue fields from multiline events

marcoscala · ‎12-11-2014

Hi All,
I'm trying to parse multiline structured tabular events like this:

CPU              Schedule           Job                                    State Pr Start  Elapse  Dependencies  Return Code
APP10CNHL       #CF14330AAAAAAAAG *LIM  5*                                 STUCK 10 11/27         [NHIPPK600S90    ,11/26/14]
                                         SPK6009001                               SUCC  10 11/27  00:01  #J10175                0
                                         SPK60090CP                               ABEND 10 11/27  00:01  #J10416                5
                                         IPK6009003                               HOLD  10(11/26)        SPK60090CP
                                         IPK6009004                               HOLD  10(11/26)        SPK60090CP
                                         IPK6009005                               HOLD  10(11/26)        SPK60090CP
                                         IPK6009006                               HOLD  10(11/26)        SPK60090CP
                                         FPK60090ZZ                               HOLD  10(11/26)        IPK6009003; IPK6009004; IPK6009005
                                                                                                         IPK6009006
    APP10CNHL       #NHIPPK605GCL     *LIM  4*                                 SUCC  10 15:00  00:19  [Carry]
                                     GPK605_RMDIR                             SUCC  10 15:00  00:01  #J32208                0
                                     GPK605_RMDIR_GC1                         SUCC  10 15:00  00:01  #J32210                0
                          (IVPRPTEC#)GPK605_STOP_LSNR_U                       SUCC  10 15:00  00:01  #J17236184             0
                          (IVPRPTEC#)GPK605_STOP_LSNR                         SUCC  10 15:00  00:01  #J39714998             0
                          (IVPRPTEC#)GPK6_SHOW_SESSION                        SUCC  10 15:00  00:01  #J57409632             0
                          (IVPRPTEC#)GPK605_KILL_SESSION                      SUCC  10 15:00  00:01  #J57409644             0
                          (IVPRPTEC#)SLEEP_60                                 SUCC  10 15:01  00:02  #J57409672             0
                                     GPK605_MAIL_OFF                          SUCC  10 15:02  00:01  #J1133                 0
                          (IVPRPTEC#)GPK605_START_LSNR                        SUCC  10 15:02  00:01  #J39714862             0
                          (IVPRPTEC#)SLEEP_300                                SUCC  10 15:02  00:06  #J39714878             0
                          (IVPRPTEC#)GPK6BK0001_EXPORT                        SUCC  10 15:07  00:11  #J14352512             0
                                     GPK605_MKDIR                             SUCC  10 15:18  00:01  #J30532                0
                                     FINERETE                                 SUCC  10 15:18  00:01  #J1647

Each event starts with a ScheduleID line, then followed by one or more JOBs lines.

Event braking works fine, as well field extraction for the first line. But my problem is to extract fields from the following JOB lines, with a sort of recurring regex.

Here's field extraction for ScheduleID first line:

[batch_prp]
EXTRACT-batch-schedule-header = (?mi)^(?<CPU>\w+)\s+(?<SCHEDULE>#\w+)\s+\*\w+\s+\d\*\s+(?<STATE>\w+)\s+(?<PR>\d+)(\s|\()(?<START>\d\d\S\d\d)?.*(\[(?<RETURN>.*)\])?

I was then trying to extract multi-value event "job-line" with this rex:

sourcetype=batch_prp  |rex "(?m)^((\s+(?<Line>.*)$))+"

but this extracts only the last occurrence of Line in each event.

Any idea on how to write the regex to extract all the different value of the "Line" field?

Thanks a lot!!!

Marco Scala

stephane_cyrill · ‎12-16-2014

Hello Marco,

I have indexed your file email-prp-xxxx-batch .txt and i have tried to write some regex to extract the fields. I still have some small lack of understanding about the data. But for the moment i this can help you.

NOTE: to run this on your server , make sure you put an appropriate sourcetype. mar is the one a have created during data input and marco is the index in wich i have place the data.

index=marco sourcetype="mar" | head 10000 | rex "(?im)^[^\\-\\n]*\\-(?P<CPU>[^#]+)" | rex "(?i)9CNHL       (?P<SCHEDULE>[^ ]+)" 
| rex "(?i)(?P<SCHEDULE>[^ ]+)\\s+\\*\\w+\\s+\\d+\\*\\s+\\w+\\s+\\d+\\s+\\d+:\\d+\\s+\\d+:\\d+\\s+\\[\\w+\\]" |rex "(?im)^\\s+(?P<JOB>[^ ]+)"| rex "(?im)^\\s+\\w+\\s+(?P<STATE>[^ ]+)" | rex "(?i) SUCC  (?P<PR>[^ ]+)" | rex "(?i)^\\s+[a-z_-]+\\w+\\s+\\w+\\s+\\d+\\s+(?P<START>[^ ]+)"| rex "(?i)^[^/]*/\\d+\\s+\\d+:\\d+\\s+(?P<DEPENDENCIES>[^ ]+)"| rex "(?i)^[^/]*/\\d+\\s+(?P<ELAPSE>[^ ]+)"  |table SCHEDULE JOB STATE PR START ELAPSE DEPENDENCIES

marcoscala · ‎12-16-2014

Hi Stephane and thanks for your try. You basically split the rex in several rex to capture any single piece of the data, but there is still the same problem: in your search you just captured the first JOB line. Instead I have to capture ALL the JOBs lines under the first header line containing CPU, SCHEDULE and the other fields. If you try for instance with Schedule CF14330AAAAAAAAF (the first of the file), you'll extract only the first JOB line out of 20.

Thanks,
Marco

fleboho · ‎07-12-2015

Hi Marcoscala, have you managed to solve this problem. Am facing the same kind problem as you are.

marcoscala · ‎12-11-2014

PS: if anybody wants to play with this file, pls let me know and I'll email it!

Marco

stephane_cyrill · ‎12-15-2014

Hi , i think i can help you just send me the file to work on
at cyrilleko@gmail.com

Extract multivalue fields from multiline events

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life