Splunk Search

Is there away to have Splunk automatically extract XML key value pairs?

lyndac
Contributor

I'm using SPLUNK to index an xml file. Is there a way to have SPLUNK automatically extract the key-value pairs for each event (xmlkv) for every search. I don't want the user to have to type the | xmlkv in the search bar each time. I see in props.conf you can set the KV_MODE, but none of the settings indicate xml extraction.

Labels (1)
Tags (1)
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

Edited for version 4.3:

As for version 4.3, while the below accepted answer works, you can also use the props.conf setting:

KV_MODE = xml

this performs spath-type extraction on the events.


Maybe. As it turns out, the xmlkv command is not really a real XML extraction, it's just a regular regex that can be done by Splunk config probably better than the xmlkv command itself. (See $SPLUNK_HOME/etc/apps/search/bin/xmlkv.py.)

Just define a search-time extraction for your sourcetype (or source or whatever) in props.conf:

[mysourcetype]
REPORT-xmlkv = xmlkv-alternative

and in transforms.conf:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

View solution in original post

woodcock
Esteemed Legend

try this:

LINE_BREAKER = ([\r\n]{2})

lukeh
Contributor

hey gkanapathy 🙂

I used your mad skillz regex in my transforms.conf but it negates the line breaker in my props.conf 😞

Any ideas on how to ensure the line breaker still works in this example?

props.conf:

[nagiosstatus]
MAX_EVENTS = 500000
TIME_PREFIX = \<created\>
MAX_TIMESTAMP_LOOKAHEAD = 500
SHOULD_LINEMERGE = false
LINE_BREAKER = (\n\n)
REPORT-xmlkv = xmlkv-alternative

transforms.conf:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

sample xml log:

<nagios>

<info>
    <created>1299121157</created>
    <version>3.2.1</version>
    <last_update_check>1299108670</last_update_check>
    <update_available>1</update_available>
    <last_version>3.2.1</last_version>
    <new_version>3.2.3</new_version>
</info>

<programstatus>
    <modified_host_attributes>1</modified_host_attributes>
    <modified_service_attributes>1</modified_service_attributes>
    <nagios_pid>15961</nagios_pid>
    <daemon_mode>1</daemon_mode>
    <program_start>1299103468</program_start>
    <last_command_check>1299121108</last_command_check>
    <last_log_rotation>0</last_log_rotation>
    <enable_notifications>1</enable_notifications>
    <active_service_checks_enabled>1</active_service_checks_enabled>
    <passive_service_checks_enabled>1</passive_service_checks_enabled>
    <active_host_checks_enabled>1</active_host_checks_enabled>
    <passive_host_checks_enabled>1</passive_host_checks_enabled>
    <enable_event_handlers>1</enable_event_handlers>
    <obsess_over_services>0</obsess_over_services>
    <obsess_over_hosts>0</obsess_over_hosts>
    <check_service_freshness>1</check_service_freshness>
    <check_host_freshness>0</check_host_freshness>
    <enable_flap_detection>0</enable_flap_detection>
    <enable_failure_prediction>1</enable_failure_prediction>
    <process_performance_data>1</process_performance_data>
    <global_host_event_handler></global_host_event_handler>
    <global_service_event_handler></global_service_event_handler>
    <next_comment_id>94586</next_comment_id>
    <next_downtime_id>35813</next_downtime_id>
    <next_event_id>1185528</next_event_id>
    <next_problem_id>532761</next_problem_id>
    <next_notification_id>1337020</next_notification_id>
    <total_external_command_buffer_slots>4096</total_external_command_buffer_slots>
    <used_external_command_buffer_slots>11</used_external_command_buffer_slots>
    <high_external_command_buffer_slots>128</high_external_command_buffer_slots>
    <active_scheduled_host_check_stats>21,132,401</active_scheduled_host_check_stats>
    <active_ondemand_host_check_stats>33,278,834</active_ondemand_host_check_stats>
    <passive_host_check_stats>0,0,0</passive_host_check_stats>
</programstatus>

</nagios>

Thanks in advance,

Luke 🙂

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Edited for version 4.3:

As for version 4.3, while the below accepted answer works, you can also use the props.conf setting:

KV_MODE = xml

this performs spath-type extraction on the events.


Maybe. As it turns out, the xmlkv command is not really a real XML extraction, it's just a regular regex that can be done by Splunk config probably better than the xmlkv command itself. (See $SPLUNK_HOME/etc/apps/search/bin/xmlkv.py.)

Just define a search-time extraction for your sourcetype (or source or whatever) in props.conf:

[mysourcetype]
REPORT-xmlkv = xmlkv-alternative

and in transforms.conf:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

richgalloway
SplunkTrust
SplunkTrust

This answer is still helpful 12 years later.  Thanks, @gkanapathy !

---
If this reply helps you, Karma would be appreciated.
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

As of version 4.3, you can now use the setting in props.conf:

KV_MODE = xml

which will perform spath extraction.

0 Karma

jangid
Builder

Very Nice 🙂

0 Karma

lyndac
Contributor

Worked perfectly! Thanks!

0 Karma

Lowell
Super Champion

Nice trick. You could also add MV_ADD = True to your xmlkv-alternative stanza if you want to capture repeating XML elements as a multi-value field, for example if your XML represents a list of items. This is something that you can't do with the default xmlkv command. Pretty cool.

Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...