Splunk Search

regex formation for source type and filed extraction.

harsh1734
New Member

hi,
these are my sample log file-:

< Jul 15 23:48:33 Phase 0 running (1132 seconds)
CPU Time Status Skew Vertex
0.046 [ : 1] 0% Audit.XYX
0.135 [ : 1] 0% Audit.PQR

7.955 [ :12] 0% LMNOP

   Data Bytes         Records      Status      Flow                     
          712               4 [    :    :   1]   0% Audit.Flow1         
          712               4 [    :    :   1]   0% Audit.Flow2         
            0               0 [    :    :  12]   0% Flow_1              
            0               0 [    :    :  12]   0% Flow_10             
   41,417,795         264,261 [    :    :  12]   1% Flow_11             
   41,417,795         264,261 [    :    :  12]   1% Flow_11             
1,746,882,294       3,158,255 [    :    :  12]   0% Flow_12             

Jul 15 23:48:33 Phase 0 running (1132 seconds)
CPU Time Status Skew Vertex
0.046 [ : 1] 0% Audit.XYX
0.135 [ : 1] 0% Audit.PQR

7.955 [ :12] 0% LMNOP

   Data Bytes         Records      Status      Flow                     
          712               4 [    :    :   1]   0% Audit.Flow1         
          712               4 [    :    :   1]   0% Audit.Flow2         
            0               0 [    :    :  12]   0% Flow_1              
            0               0 [    :    :  12]   0% Flow_10             
   41,417,795         264,261 [    :    :  12]   1% Flow_11             
   41,417,795         264,261 [    :    :  12]   1% Flow_11             
1,746,882,294       3,158,255 [    :    :  12]   0% Flow_12             

Jul 15 23:48:33 Phase 0 ended (1132 seconds)
CPU Time Status Skew Vertex
0.046 [ : 1] 0% Audit.XYX
0.135 [ : 1] 0% Audit.PQR

7.955 [ :12] 0% LMNOP

   Data Bytes         Records      Status      Flow                     
          712               4 [    :    :   1]   0% Audit.Flow1         
          712               4 [    :    :   1]   0% Audit.Flow2         
            0               0 [    :    :  12]   0% Flow_1              
            0               0 [    :    :  12]   0% Flow_10             
   41,417,795         264,261 [    :    :  12]   1% Flow_11             
   41,417,795         264,261 [    :    :  12]   1% Flow_11             
1,746,882,294       3,158,255 [    :    :  12]   0% Flow_12             


Jul 15 23:48:33 Phase 1 running (1132 seconds)
CPU Time Status Skew Vertex
0.046 [ : 1] 0% Audit.XYX
0.135 [ : 1] 0% Audit.PQR

7.955 [ :12] 0% LMNOP

   Data Bytes         Records      Status      Flow                     
          712               4 [    :    :   1]   0% Audit.Flow1         
          712               4 [    :    :   1]   0% Audit.Flow2         
            0               0 [    :    :  12]   0% Flow_1              
            0               0 [    :    :  12]   0% Flow_10             
   41,417,795         264,261 [    :    :  12]   1% Flow_11             
   41,417,795         264,261 [    :    :  12]   1% Flow_11             
1,746,882,294       3,158,255 [    :    :  12]   0% Flow_12             

consisting of phase (0,1) running,started and ended.
i want to calculate max cpu time took by a particular vertex when the phase ended and max data bytes consumed by flow.

so i have created regex for the field extraction,the log files that we have is from unix environment.

DOS expression for required data set :
.* Phase \d ended.\r\n(.\r\n)*-{80}\r\n-{80}

Unix expression for required data set :
.* Phase \d ended.\n(.\n)*-{80}\n-{80}

but this is not working as expected.Splunk does not extract the given pattern as a record.
record that we are interested in is described int regex above.
the result looks like
Jul 15 23:48:33 Phase 0 ended (1132 seconds)
CPU Time Status Skew Vertex
0.046 [ : 1] 0% Audit.XYX
0.135 [ : 1] 0% Audit.PQR

7.955 [ :12] 0% LMNOP

   Data Bytes         Records      Status      Flow                     
          712               4 [    :    :   1]   0% Audit.Flow1         
          712               4 [    :    :   1]   0% Audit.Flow2         
            0               0 [    :    :  12]   0% Flow_1              
            0               0 [    :    :  12]   0% Flow_10             
   41,417,795         264,261 [    :    :  12]   1% Flow_11             
   41,417,795         264,261 [    :    :  12]   1% Flow_11             
1,746,882,294       3,158,255 [    :    :  12]   0% Flow_12             


also when we try to extract fields for CPU TIME of ended phases Splunk expression generator randomly picks up some numbers across different phases..

what si the best way to
1)ensure splunk considers only ended phases as distinct records
2)for every ended records extract fields like cpu time,vertex,flow etc >

Tags (3)
0 Karma

lguinn2
Legend

I don't recognize your regular expression syntax. Splunk uses PCRE. I would start with the following and see what happens.

props.conf

[yoursourcetype]
TRUNCATE = 50000
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE_DATE = true
MAX_TIMESTAMP_LOOKAHEAD = 25
TIME_FORMAT = %b %d %H:%M:%S
EXTRACT-e1 =Phase\s(?<Phase>\d+)\s(?<PhaseStatus>\S+)\s\((?<PhaseSeconds>\d+) seconds\)
REPORT-r1 = extract_fields1,extract_fields2

transforms.conf

[extract_fields1]
REGEX = (?m)([\d\,]+)\s+([\d\,]+)\s\[\s*(\d*\:\s*\d* \:\s*\d*)\]\s+\(d{1,3})%\s(\S+)$
FORMAT =  FlowDataBytes:$1 FlowRecords:$2 FlowStatus:$3 Flow:$4 FlowName:$5
MV_ADD = true

[extract_fields2]
REGEX = (?m)([\d\.]+)\s\[\s*(\d*\:\s*\d*)\]\s+\(d{1,3})%\s(\S+)$
FORMAT =  CPUTime:$1 Status:$2 Skew:$3 Vertex:$4
MV_ADD = true           

TRUNCATE should be set to the maximum number of bytes in any single event. To test these settings, try this search

sourcetype=yoursourcetype 
| table Phase PhaseStatus PhaseSeconds CPUTime Status Skew Vertext FlowDataBytes FlowRecords FlowStatus FlowName

Note that you have to reindex the data to create the events properly. I hope that you are using a test instance of Splunk or at least a test index for this...

More info on linebreaking here and more info on field extraction here

0 Karma

harsh1734
New Member

hi iguinn thanks for your valuable reply it helps us in some extend ..
but when iam running that, showing result like

phase phasestatus phaseseconds cputime status skew vertex
1 ended 1174

0 running 465

it displaying only phase,phasestatus and phaseseconds,but rest of the fields are not coming mainly cputime in which we are most intersted....
the main part is to extract the cpu time...

i have pasted the code given by you in props and transforms.conf as it is..
so need to know like in trnasforms.conf you hv written
[extract_filed1] we hv to put field value here?

0 Karma

Ayn
Legend

Could you please fix your formatting? Code blocks should be indented by 4 spaces on each line.

0 Karma
Get Updates on the Splunk Community!

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...