Splunk Search

How to use regex and format strings for an XML sample without using KV_MODE=XML?

gaurav_ramteke
Explorer

Hi,

I want to use REGEX and FORMAT strings for an xml sample as given without using KV_MODE=xml
So i am trying to use different regex to get hold of parsing fields but failing
Please find the sample log for your reference and help

   <Interceptor>
            <AttackCoords>-80.03107887624853,25.351308629611</AttackCoords>
            <Outcome>Interdiction</Outcome>
            <Infiltrators>6</Infiltrators>
            <Enforcer>Assured</Enforcer>
            <ActionDate>2013-11-03</ActionDate>
            <ActionTime>04:40:00</ActionTime>
            <RecordNotes>Infiltrators: 
                Savanna&#32;Carrera,
                Gregoria&#32;Far&#237;as,
                Julina&#32;Abeyta,
                Mariquita&#32;Alonso,
                Urbano&#32;Brise&#241;o,
                Victoro&#32;Montano     </RecordNotes>
            <NumEscaped>3</NumEscaped>
            <LaunchCoords></LaunchCoords>
            <AttackVessel>Raft</AttackVessel>
        </Interceptor>
        <Interceptor>
            <AttackCoords>-80.33045250710296,24.93574264936793</AttackCoords>
            <Outcome>Interdiction</Outcome>
            <Infiltrators>9</Infiltrators>
            <Enforcer>Pompano</Enforcer>
            <ActionDate>2013-05-04</ActionDate>
            <ActionTime>04:22:00</ActionTime>
            <RecordNotes></RecordNotes>
            <NumEscaped>0</NumEscaped>
            <LaunchCoords>-80.30497342463124,24.07890526980327</LaunchCoords>
            <AttackVessel>Rustic</AttackVessel>
        </Interceptor>
        <Interceptor>
            <AttackCoords>-79.94720757796837,24.82172611548247</AttackCoords>
            <Outcome>Interdiction</Outcome>
            <Infiltrators>12</Infiltrators>
            <Enforcer>Barracuda</Enforcer>
            <ActionDate>2013-01-01</ActionDate>
            <ActionTime>05:22:00</ActionTime>
            <RecordNotes>Infiltrators: 
                Cristian&#32;Caballero,
                Vicenta&#32;Olivares,
                Leonides&#32;Cintr&#243;n,
                Ascencion&#32;Betancourt,
                Alanzo&#32;Arenas,
                Primeiro&#32;S&#225;nchez,
                Serena&#32;Monroy,
                Madina&#32;Mojica,
                Consolacion&#32;Cordero,
                Faqueza&#32;Serrano,
                Grazia&#32;Quesada,
                Ivette&#32;Partida      </RecordNotes>
            <NumEscaped>0</NumEscaped>
            <LaunchCoords></LaunchCoords>
            <AttackVessel>Rustic</AttackVessel>
        </Interceptor>

Props.conf

[dreamcrusher]
LINE_BREAKER = (\<Interceptor\>)
TIME_PREFIX = <ActionDate>
TIME_FORMAT = %Y-%m-%d<\/ActionDate>[\r\n]\t+<ActionTime>%H:%M:%S
SHOULD_LINEMERGE = false
MAX_DAYS_AGO = 2500
SEDCMD-aremoveheader = s/\<\?xml.*\s*\<dataroot\>\s*//g
SEDCMD-bremovefooter = s/\<\/dataroot\>//g
REPORT-f = dream_attack
KV_MODE = none

transforms.conf

[dream_attack]
REGEX = (?m)^[^<]+.(.*?)\>([\S\s]*?)\<(?=[^\s])
FORMAT = $1::$2

Please suggest to me why am I failing?
Thanks

0 Karma
1 Solution

sudosplunk
Motivator

Hello there,

Try adding ..| spath at the end of your search.

View solution in original post

moon92
New Member

Use this transforms.conf instead

[dream_attack]
REGEX = \>\s+\<([^\>]+)\>([^\<]+)\<
FORMAT = $1::$2
REPEAT_MATCH = true
WRITE_META = true
0 Karma

sudosplunk
Motivator

Hello there,

Try adding ..| spath at the end of your search.

gaurav_ramteke
Explorer

hi nittala_surya,

Same error please
Search string used
index=* sourcetype="dreamcrusher" | rex field=_raw "^\s*<([^>])>([^<\/])" | spath

Error string
Error in 'rex' command: The regex '^\s*<([^>])>([^<\/])' does not extract anything. It should specify at least one named group. Format: (?...).
The search job has failed due to an error. You may be able view the job in the Job Inspector.

0 Karma

sudosplunk
Motivator

Get rid of rex. index= sourcetype="dreamcrusher" | spath.

You can find more info about spath here.

On a side note: You regex doesn't have name capturing group. Hence the error.

0 Karma

gaurav_ramteke
Explorer

Thanks nittala_surya,

It worked 🙂

However, just for my knowledge is it mandatory to use "| spath" to extract the fields while we are using transformation - REGEX and FORMAT in configuration files? OR it should format the _raw events (parse)the data using props and transforms? please suggest

0 Karma

sudosplunk
Motivator

No. spath works only for search-time field extractions. To use props and transforms, the settings in your configuration files should be adjusted a little.

Give this a try:

Props.conf:

[dreamcrusher]
## Optional: Your setting will discard <Interceptor> from your events. To keep <Interceptor>, use below
LINE_BREAKER = ([\r\n])\<Interceptor\>
## Escape angular brackets in TIME_PREFIX
TIME_PREFIX = \<ActionDate\>
## TIME_FORMAT doesn't honor regex switches, use,
TIME_FORMAT = %Y-%m-%d</ActionDate>%n<ActionTime>%H:%M:%S
SHOULD_LINEMERGE = false
## Use this to improve efficiency while extracting timestamps
MAX_TIMESTAMP_LOOKAHEAD = 50
MAX_DAYS_AGO = 2500 
SEDCMD-aremoveheader = s/\<\?xml.*\s*\<dataroot\>\s*//g
SEDCMD-bremovefooter = s/\<\/dataroot\>//g
REPORT-f = dream_attack
KV_MODE = none

Transforms.conf:

[dream_attack]
REGEX = (?m)^[^<]+\<+(.*?)\>([\S\s]*?)\<(?=[^\s])
FORMAT = $1::$2
MV_ADD = true
0 Karma

diogofgm
SplunkTrust
SplunkTrust

Use this regex instead

REGEX = ^\s*\<([^\>]*)\>([^\<\/]*)
------------
Hope I was able to help you. If so, some karma would be appreciated.
0 Karma

gaurav_ramteke
Explorer

Thanks, I have tried and no fields were extracted
For you to know i am using splunk enterprise on windows 10

0 Karma

diogofgm
SplunkTrust
SplunkTrust

can you try to use it in search?
your index|rex "^\s*<([^>])>([^<\/])"

------------
Hope I was able to help you. If so, some karma would be appreciated.
0 Karma

gaurav_ramteke
Explorer

index=* sourcetype="dream" | rex field=_raw "^\s*<([^>])>([^<\/])"

Getting error like as given below in the search

Error in 'rex' command: The regex '^\s*<([^>])>([^<\/])' does not extract anything. It should specify at least one named group. Format: (?...).
The search job has failed due to an error. You may be able view the job in the Job Inspector.

0 Karma

diogofgm
SplunkTrust
SplunkTrust

what is failing? extracting all the fields? extractiing the fields with multiple values (e.g.RecordNotes)?

------------
Hope I was able to help you. If so, some karma would be appreciated.
0 Karma

gaurav_ramteke
Explorer

extracting all the fields using multivalues

0 Karma

gaurav_ramteke
Explorer
<Interceptor>
        <AttackCoords>-80.03107887624853,25.351308629611</AttackCoords>
        <Outcome>Interdiction</Outcome>
        <Infiltrators>6</Infiltrators>
        <Enforcer>Assured</Enforcer>
        <ActionDate>2013-11-03</ActionDate>
        <ActionTime>04:40:00</ActionTime>
        <RecordNotes>Infiltrators: 
            Savanna&#32;Carrera,
            Gregoria&#32;Far&#237;as,
            Julina&#32;Abeyta,
            Mariquita&#32;Alonso,
            Urbano&#32;Brise&#241;o,
            Victoro&#32;Montano     </RecordNotes>
        <NumEscaped>3</NumEscaped>
        <LaunchCoords></LaunchCoords>
        <AttackVessel>Raft</AttackVessel>
    </Interceptor>
    <Interceptor>
        <AttackCoords>-80.33045250710296,24.93574264936793</AttackCoords>
        <Outcome>Interdiction</Outcome>
        <Infiltrators>9</Infiltrators>
        <Enforcer>Pompano</Enforcer>
        <ActionDate>2013-05-04</ActionDate>
        <ActionTime>04:22:00</ActionTime>
        <RecordNotes></RecordNotes>
        <NumEscaped>0</NumEscaped>
        <LaunchCoords>-80.30497342463124,24.07890526980327</LaunchCoords>
        <AttackVessel>Rustic</AttackVessel>
    </Interceptor>
    <Interceptor>
        <AttackCoords>-79.94720757796837,24.82172611548247</AttackCoords>
        <Outcome>Interdiction</Outcome>
        <Infiltrators>12</Infiltrators>
        <Enforcer>Barracuda</Enforcer>
        <ActionDate>2013-01-01</ActionDate>
        <ActionTime>05:22:00</ActionTime>
        <RecordNotes>Infiltrators: 
            Cristian&#32;Caballero,
            Vicenta&#32;Olivares,
            Leonides&#32;Cintr&#243;n,
            Ascencion&#32;Betancourt,
            Alanzo&#32;Arenas,
            Primeiro&#32;S&#225;nchez,
            Serena&#32;Monroy,
            Madina&#32;Mojica,
            Consolacion&#32;Cordero,
            Faqueza&#32;Serrano,
            Grazia&#32;Quesada,
            Ivette&#32;Partida      </RecordNotes>
        <NumEscaped>0</NumEscaped>
        <LaunchCoords></LaunchCoords>
        <AttackVessel>Rustic</AttackVessel>
    </Interceptor>
0 Karma

santiagoaloi
Path Finder

[REPORT-dreamcrusher_extractions]
REGEX = <(\w+)>([^<]+)
FORMAT = $1::$2

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...