Splunk Search

How to Extract a particular string with the lowest respective Value in a single event

like2splunk
Explorer

Hello everyone,
I am trying to identify the resultant ERROR from a given event. My search is in italics bellow and an example event is shown thereafter:

index="logs" process=beamCommonProcess
"Transitioned to Error State" OR "Timeslice:"
| sort _time

2017-03-03 06:45:21,754 [ WARN] {Application Queue} (com.iba.tcs.beam.bds.devices.impl.gateway.rpc.ScanningControllerProxy) - ScanningController failure: NECU Transitioned to Error State
NECU Error: [0x0] _SynchronizationSGCUTimeout : Timeslice: 13589 Submap: 4280
FCU Error: [0x0] _SynchronizationSGCUTimeout : Timeslice: 13589 Submap: 4280
RCU Error: [0x2] Threshold Violation : Timeslice: 13587 Submap: 4280
(Y_VOLT_SEC_FB: -0.243739 V MapThresholdLow: -1.047e-01 MapThresholdHigh: 1.782e-01)
SGCU Error: [0x10] _FilteringAbsolute : Timeslice: 13585 Submap: 4280
(MIN_CHARGE_PRIM: 1.386e-11 C AbsoluteThresholdLow: 1.955e-09 AbsoluteThresholdHigh: 2.000e-09)


Notice the section of the event that is in BOLD. There are four possible sources of the error: NECU Error, FCU Error, RCU Error, and SGCU Error. You'll notice the string "Timeslice" occurs for each of these lines. The root-cause is linked to the source with the LOWEST respective Timeslice. What I need to do is extract that line and identify the actual error name.
For the given example above, my desired output is "MIN_CHARGE_PRIM" because it has the lowest "Timeslice" value. Then I simply need to place that in a table by _time. That way I can see what type of error occurred and when.

Your help is much appreciated!

Tags (1)
0 Karma
1 Solution

lguinn2
Legend

In your example event, this is a little unclear - can more than one of the sources (NECU, FCU, RCU, SGCU) appear in the same event? Or does each event have only one of these sources?

If the answer is "one source per event," then the solution depends on combining the various events.
For my reply, I will assume that all of these sources appear in a single event.

First, you will need a set of field extractions to make this analysis possible. I will show them as a set of rex commands, but you could also make these field extractions permanent by adding them to props.conf or by using the Field Extractor (but feed the FE the regular expressions, don't let the FE generate them).

index="logs" process=beamCommonProcess "Transitioned to Error State" OR "Timeslice:"
| rex "\sNECU Error: \[0x\d+\].*?Timeslice: (?<necu_timeslice>\d+)"
| rex "\sFCU Error: \[0x\d+\].*?Timeslice: (?<fcu_timeslice>\d+)"
| rex "\sRCU Error: \[0x\d+\].*?Timeslice: (?<rcu_timeslice>\d+)\sSubmap: \d+\s*\((?<rcu_msg>.*?)\)"
| rex "\sSGCU Error: \[0x\d+\].*?Timeslice: (?<sgcu_timeslice>\d+)\sSubmap: \d+\s*\((?<sgcu_msg>.*?)\)"
| eval minTS = min(necu_timeslice,fcu_timeslice,fcu_timeslice,sgcu_timeslice)
| eval output = case(minTS==necu_timeslice,"NECU Error",
    minTS==fcu_timeslice,"FCU Error",
    minTS==rcu_timeslice,"RCU Error: " . rcu_msg),
    minTS==sgcu_timeslice,"SGCU Error: " . sgcu_msg,
    1==1,"Unknown")
| table _time minTS output
| sort _time
| rename minTS as "Minimum Timeslice" output as "Root Cause"

View solution in original post

lguinn2
Legend

In your example event, this is a little unclear - can more than one of the sources (NECU, FCU, RCU, SGCU) appear in the same event? Or does each event have only one of these sources?

If the answer is "one source per event," then the solution depends on combining the various events.
For my reply, I will assume that all of these sources appear in a single event.

First, you will need a set of field extractions to make this analysis possible. I will show them as a set of rex commands, but you could also make these field extractions permanent by adding them to props.conf or by using the Field Extractor (but feed the FE the regular expressions, don't let the FE generate them).

index="logs" process=beamCommonProcess "Transitioned to Error State" OR "Timeslice:"
| rex "\sNECU Error: \[0x\d+\].*?Timeslice: (?<necu_timeslice>\d+)"
| rex "\sFCU Error: \[0x\d+\].*?Timeslice: (?<fcu_timeslice>\d+)"
| rex "\sRCU Error: \[0x\d+\].*?Timeslice: (?<rcu_timeslice>\d+)\sSubmap: \d+\s*\((?<rcu_msg>.*?)\)"
| rex "\sSGCU Error: \[0x\d+\].*?Timeslice: (?<sgcu_timeslice>\d+)\sSubmap: \d+\s*\((?<sgcu_msg>.*?)\)"
| eval minTS = min(necu_timeslice,fcu_timeslice,fcu_timeslice,sgcu_timeslice)
| eval output = case(minTS==necu_timeslice,"NECU Error",
    minTS==fcu_timeslice,"FCU Error",
    minTS==rcu_timeslice,"RCU Error: " . rcu_msg),
    minTS==sgcu_timeslice,"SGCU Error: " . sgcu_msg,
    1==1,"Unknown")
| table _time minTS output
| sort _time
| rename minTS as "Minimum Timeslice" output as "Root Cause"

like2splunk
Explorer

Iguinn - your code works well! I have one follow-up question though:
There are two different error types that give this search a problem. The first "TSS Enable" occurs when the "Timeslice" for each Error Device is equal to each other, e.g. necu_timeslice=fcu_timeslice=rcu_timeslice=sgcu_timeslice. The second "TimeoutEnablingBeam" is an error that doesn't have any "Timeslice" displayed at all in the event - to be clear the word "Timeslice" does not occur in this type of error event.

How do I make the output different for that scenario?
Can I add an equation in the "case" command possibly?
I tried something like the following but it's not differentiating between the two errors:

index="logs" process=beamCommonProcess
"Transitioned to Error State" OR "SET_RANGE activity requested for beam supply point" OR "DISABLE_BEAM activity is complete" OR "Timeslice:"
| transaction startswith="SET_RANGE activity requested for beam supply point" endswith="DISABLE_BEAM activity is complete"
| search "Transitioned to Error State"
| sort _time
| rex "\sNECU Error: [0x\d+].?Timeslice: (?\d+)\sSubmap: \d+\s((?.?))"
| rex "\sFCU Error: [0x\d+].
?Timeslice: (?\d+)\sSubmap: \d+\s*((?.?))"
| rex "\sRCU Error: [0x\d+].
?Timeslice: (?\d+)\sSubmap: \d+\s*((?.?))"
| rex "\sSGCU Error: [0x\d+].
?Timeslice: (?\d+)\sSubmap: \d+\s*((?.*?))"
| eval minTS = min(necu_timeslice,fcu_timeslice,rcu_timeslice,sgcu_timeslice)
| eval minTS = if(isnull(minTS), 0, minTS)
| eval output = case(minTS==necu_timeslice,necu_msg,
minTS==fcu_timeslice,fcu_msg,
minTS==rcu_timeslice,rcu_msg,
minTS==sgcu_timeslice,sgcu_msg,
minTS=0, "TSS Enable",
1==1,"TimeoutEnablingBeam")
| table _time minTS output
| sort _time

0 Karma

lguinn2
Legend

Try this:

index="logs" process=beamCommonProcess
"Transitioned to Error State" OR "SET_RANGE activity requested for beam supply point" OR "DISABLE_BEAM activity is complete" OR "Timeslice:"
| transaction startswith="SET_RANGE activity requested for beam supply point" endswith="DISABLE_BEAM activity is complete"
| search "Transitioned to Error State"
| rex "\sNECU Error: \[0x\d+\].*?Timeslice: (?<necu_timeslice>\d+)"
| rex "\sFCU Error: \[0x\d+\].*?Timeslice: (?<fcu_timeslice>\d+)"
| rex "\sRCU Error: \[0x\d+\].*?Timeslice: (?<rcu_timeslice>\d+)\sSubmap: \d+\s*\((?<rcu_msg>.*?)\)"
| rex "\sSGCU Error: \[0x\d+\].*?Timeslice: (?<sgcu_timeslice>\d+)\sSubmap: \d+\s*\((?<sgcu_msg>.*?)\)"
| eval minTS = min(necu_timeslice,fcu_timeslice,rcu_timeslice,sgcu_timeslice)
| eval output = case(isnull(minTS),"TimeoutEnablingBeam",
                                necu_timeslice=fcu_timeslice and fcu_timeslice=rcu_timeslice and rcu_timeslice=sgcu_timeslice,"TSS Enable",
                                minTS==necu_timeslice,necu_msg,
                                minTS==fcu_timeslice,fcu_msg,
                                minTS==rcu_timeslice,rcu_msg,
                                minTS==sgcu_timeslice,sgcu_msg,
                               1==1,"Unknown")
| eval minTS = if(isnull(minTS), "No Timeslice", minTS)
| table _time minTS output
| sort _time

Note that the case statement stops when it finds the first match. So I put the test for "no timeslice" up front, followed by the test for "timeout enabling Beam." I also moved the eval for minTS to after the case statement.

0 Karma

like2splunk
Explorer

Thank you Iguinn for the prompt response!
In the example I have, the displayed result is one single event.
Each event lists "NECU Error, FCU Error, etc." every time.
The issue is figuring out which one of those DEVICES is the root-cause (I probably should not have used the word "source").
That's where the timeslice comes in - the earliest timeslice is the actuall Error and the rest just respond to that unit being in Error State and report an error as well.

Does that answer your question? In the meantime, I'm going to work with your Answer to see how it fits with my search. Thank you again for your help!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...