Splunk Search

How to use regex and format strings for an XML sample without using KV_MODE=XML?

gaurav_ramteke
Explorer

Hi,

I want to use REGEX and FORMAT strings for an xml sample as given without using KV_MODE=xml
So i am trying to use different regex to get hold of parsing fields but failing
Please find the sample log for your reference and help

   <Interceptor>
            <AttackCoords>-80.03107887624853,25.351308629611</AttackCoords>
            <Outcome>Interdiction</Outcome>
            <Infiltrators>6</Infiltrators>
            <Enforcer>Assured</Enforcer>
            <ActionDate>2013-11-03</ActionDate>
            <ActionTime>04:40:00</ActionTime>
            <RecordNotes>Infiltrators: 
                Savanna&#32;Carrera,
                Gregoria&#32;Far&#237;as,
                Julina&#32;Abeyta,
                Mariquita&#32;Alonso,
                Urbano&#32;Brise&#241;o,
                Victoro&#32;Montano     </RecordNotes>
            <NumEscaped>3</NumEscaped>
            <LaunchCoords></LaunchCoords>
            <AttackVessel>Raft</AttackVessel>
        </Interceptor>
        <Interceptor>
            <AttackCoords>-80.33045250710296,24.93574264936793</AttackCoords>
            <Outcome>Interdiction</Outcome>
            <Infiltrators>9</Infiltrators>
            <Enforcer>Pompano</Enforcer>
            <ActionDate>2013-05-04</ActionDate>
            <ActionTime>04:22:00</ActionTime>
            <RecordNotes></RecordNotes>
            <NumEscaped>0</NumEscaped>
            <LaunchCoords>-80.30497342463124,24.07890526980327</LaunchCoords>
            <AttackVessel>Rustic</AttackVessel>
        </Interceptor>
        <Interceptor>
            <AttackCoords>-79.94720757796837,24.82172611548247</AttackCoords>
            <Outcome>Interdiction</Outcome>
            <Infiltrators>12</Infiltrators>
            <Enforcer>Barracuda</Enforcer>
            <ActionDate>2013-01-01</ActionDate>
            <ActionTime>05:22:00</ActionTime>
            <RecordNotes>Infiltrators: 
                Cristian&#32;Caballero,
                Vicenta&#32;Olivares,
                Leonides&#32;Cintr&#243;n,
                Ascencion&#32;Betancourt,
                Alanzo&#32;Arenas,
                Primeiro&#32;S&#225;nchez,
                Serena&#32;Monroy,
                Madina&#32;Mojica,
                Consolacion&#32;Cordero,
                Faqueza&#32;Serrano,
                Grazia&#32;Quesada,
                Ivette&#32;Partida      </RecordNotes>
            <NumEscaped>0</NumEscaped>
            <LaunchCoords></LaunchCoords>
            <AttackVessel>Rustic</AttackVessel>
        </Interceptor>

Props.conf

[dreamcrusher]
LINE_BREAKER = (\<Interceptor\>)
TIME_PREFIX = <ActionDate>
TIME_FORMAT = %Y-%m-%d<\/ActionDate>[\r\n]\t+<ActionTime>%H:%M:%S
SHOULD_LINEMERGE = false
MAX_DAYS_AGO = 2500
SEDCMD-aremoveheader = s/\<\?xml.*\s*\<dataroot\>\s*//g
SEDCMD-bremovefooter = s/\<\/dataroot\>//g
REPORT-f = dream_attack
KV_MODE = none

transforms.conf

[dream_attack]
REGEX = (?m)^[^<]+.(.*?)\>([\S\s]*?)\<(?=[^\s])
FORMAT = $1::$2

Please suggest to me why am I failing?
Thanks

0 Karma
1 Solution

sudosplunk
Motivator

Hello there,

Try adding ..| spath at the end of your search.

View solution in original post

moon92
New Member

Use this transforms.conf instead

[dream_attack]
REGEX = \>\s+\<([^\>]+)\>([^\<]+)\<
FORMAT = $1::$2
REPEAT_MATCH = true
WRITE_META = true
0 Karma

sudosplunk
Motivator

Hello there,

Try adding ..| spath at the end of your search.

gaurav_ramteke
Explorer

hi nittala_surya,

Same error please
Search string used
index=* sourcetype="dreamcrusher" | rex field=_raw "^\s*<([^>])>([^<\/])" | spath

Error string
Error in 'rex' command: The regex '^\s*<([^>])>([^<\/])' does not extract anything. It should specify at least one named group. Format: (?...).
The search job has failed due to an error. You may be able view the job in the Job Inspector.

0 Karma

sudosplunk
Motivator

Get rid of rex. index= sourcetype="dreamcrusher" | spath.

You can find more info about spath here.

On a side note: You regex doesn't have name capturing group. Hence the error.

0 Karma

gaurav_ramteke
Explorer

Thanks nittala_surya,

It worked 🙂

However, just for my knowledge is it mandatory to use "| spath" to extract the fields while we are using transformation - REGEX and FORMAT in configuration files? OR it should format the _raw events (parse)the data using props and transforms? please suggest

0 Karma

sudosplunk
Motivator

No. spath works only for search-time field extractions. To use props and transforms, the settings in your configuration files should be adjusted a little.

Give this a try:

Props.conf:

[dreamcrusher]
## Optional: Your setting will discard <Interceptor> from your events. To keep <Interceptor>, use below
LINE_BREAKER = ([\r\n])\<Interceptor\>
## Escape angular brackets in TIME_PREFIX
TIME_PREFIX = \<ActionDate\>
## TIME_FORMAT doesn't honor regex switches, use,
TIME_FORMAT = %Y-%m-%d</ActionDate>%n<ActionTime>%H:%M:%S
SHOULD_LINEMERGE = false
## Use this to improve efficiency while extracting timestamps
MAX_TIMESTAMP_LOOKAHEAD = 50
MAX_DAYS_AGO = 2500 
SEDCMD-aremoveheader = s/\<\?xml.*\s*\<dataroot\>\s*//g
SEDCMD-bremovefooter = s/\<\/dataroot\>//g
REPORT-f = dream_attack
KV_MODE = none

Transforms.conf:

[dream_attack]
REGEX = (?m)^[^<]+\<+(.*?)\>([\S\s]*?)\<(?=[^\s])
FORMAT = $1::$2
MV_ADD = true
0 Karma

diogofgm
SplunkTrust
SplunkTrust

Use this regex instead

REGEX = ^\s*\<([^\>]*)\>([^\<\/]*)
------------
Hope I was able to help you. If so, some karma would be appreciated.
0 Karma

gaurav_ramteke
Explorer

Thanks, I have tried and no fields were extracted
For you to know i am using splunk enterprise on windows 10

0 Karma

diogofgm
SplunkTrust
SplunkTrust

can you try to use it in search?
your index|rex "^\s*<([^>])>([^<\/])"

------------
Hope I was able to help you. If so, some karma would be appreciated.
0 Karma

gaurav_ramteke
Explorer

index=* sourcetype="dream" | rex field=_raw "^\s*<([^>])>([^<\/])"

Getting error like as given below in the search

Error in 'rex' command: The regex '^\s*<([^>])>([^<\/])' does not extract anything. It should specify at least one named group. Format: (?...).
The search job has failed due to an error. You may be able view the job in the Job Inspector.

0 Karma

diogofgm
SplunkTrust
SplunkTrust

what is failing? extracting all the fields? extractiing the fields with multiple values (e.g.RecordNotes)?

------------
Hope I was able to help you. If so, some karma would be appreciated.
0 Karma

gaurav_ramteke
Explorer

extracting all the fields using multivalues

0 Karma

gaurav_ramteke
Explorer
<Interceptor>
        <AttackCoords>-80.03107887624853,25.351308629611</AttackCoords>
        <Outcome>Interdiction</Outcome>
        <Infiltrators>6</Infiltrators>
        <Enforcer>Assured</Enforcer>
        <ActionDate>2013-11-03</ActionDate>
        <ActionTime>04:40:00</ActionTime>
        <RecordNotes>Infiltrators: 
            Savanna&#32;Carrera,
            Gregoria&#32;Far&#237;as,
            Julina&#32;Abeyta,
            Mariquita&#32;Alonso,
            Urbano&#32;Brise&#241;o,
            Victoro&#32;Montano     </RecordNotes>
        <NumEscaped>3</NumEscaped>
        <LaunchCoords></LaunchCoords>
        <AttackVessel>Raft</AttackVessel>
    </Interceptor>
    <Interceptor>
        <AttackCoords>-80.33045250710296,24.93574264936793</AttackCoords>
        <Outcome>Interdiction</Outcome>
        <Infiltrators>9</Infiltrators>
        <Enforcer>Pompano</Enforcer>
        <ActionDate>2013-05-04</ActionDate>
        <ActionTime>04:22:00</ActionTime>
        <RecordNotes></RecordNotes>
        <NumEscaped>0</NumEscaped>
        <LaunchCoords>-80.30497342463124,24.07890526980327</LaunchCoords>
        <AttackVessel>Rustic</AttackVessel>
    </Interceptor>
    <Interceptor>
        <AttackCoords>-79.94720757796837,24.82172611548247</AttackCoords>
        <Outcome>Interdiction</Outcome>
        <Infiltrators>12</Infiltrators>
        <Enforcer>Barracuda</Enforcer>
        <ActionDate>2013-01-01</ActionDate>
        <ActionTime>05:22:00</ActionTime>
        <RecordNotes>Infiltrators: 
            Cristian&#32;Caballero,
            Vicenta&#32;Olivares,
            Leonides&#32;Cintr&#243;n,
            Ascencion&#32;Betancourt,
            Alanzo&#32;Arenas,
            Primeiro&#32;S&#225;nchez,
            Serena&#32;Monroy,
            Madina&#32;Mojica,
            Consolacion&#32;Cordero,
            Faqueza&#32;Serrano,
            Grazia&#32;Quesada,
            Ivette&#32;Partida      </RecordNotes>
        <NumEscaped>0</NumEscaped>
        <LaunchCoords></LaunchCoords>
        <AttackVessel>Rustic</AttackVessel>
    </Interceptor>
0 Karma

santiagoaloi
Path Finder

[REPORT-dreamcrusher_extractions]
REGEX = <(\w+)>([^<]+)
FORMAT = $1::$2

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...