Splunk Search

Is there away to have Splunk automatically extract XML key value pairs?

lyndac
Contributor

I'm using SPLUNK to index an xml file. Is there a way to have SPLUNK automatically extract the key-value pairs for each event (xmlkv) for every search. I don't want the user to have to type the | xmlkv in the search bar each time. I see in props.conf you can set the KV_MODE, but none of the settings indicate xml extraction.

Labels (1)
Tags (1)
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

Edited for version 4.3:

As for version 4.3, while the below accepted answer works, you can also use the props.conf setting:

KV_MODE = xml

this performs spath-type extraction on the events.


Maybe. As it turns out, the xmlkv command is not really a real XML extraction, it's just a regular regex that can be done by Splunk config probably better than the xmlkv command itself. (See $SPLUNK_HOME/etc/apps/search/bin/xmlkv.py.)

Just define a search-time extraction for your sourcetype (or source or whatever) in props.conf:

[mysourcetype]
REPORT-xmlkv = xmlkv-alternative

and in transforms.conf:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

View solution in original post

woodcock
Esteemed Legend

try this:

LINE_BREAKER = ([\r\n]{2})

lukeh
Contributor

hey gkanapathy 🙂

I used your mad skillz regex in my transforms.conf but it negates the line breaker in my props.conf 😞

Any ideas on how to ensure the line breaker still works in this example?

props.conf:

[nagiosstatus]
MAX_EVENTS = 500000
TIME_PREFIX = \<created\>
MAX_TIMESTAMP_LOOKAHEAD = 500
SHOULD_LINEMERGE = false
LINE_BREAKER = (\n\n)
REPORT-xmlkv = xmlkv-alternative

transforms.conf:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

sample xml log:

<nagios>

<info>
    <created>1299121157</created>
    <version>3.2.1</version>
    <last_update_check>1299108670</last_update_check>
    <update_available>1</update_available>
    <last_version>3.2.1</last_version>
    <new_version>3.2.3</new_version>
</info>

<programstatus>
    <modified_host_attributes>1</modified_host_attributes>
    <modified_service_attributes>1</modified_service_attributes>
    <nagios_pid>15961</nagios_pid>
    <daemon_mode>1</daemon_mode>
    <program_start>1299103468</program_start>
    <last_command_check>1299121108</last_command_check>
    <last_log_rotation>0</last_log_rotation>
    <enable_notifications>1</enable_notifications>
    <active_service_checks_enabled>1</active_service_checks_enabled>
    <passive_service_checks_enabled>1</passive_service_checks_enabled>
    <active_host_checks_enabled>1</active_host_checks_enabled>
    <passive_host_checks_enabled>1</passive_host_checks_enabled>
    <enable_event_handlers>1</enable_event_handlers>
    <obsess_over_services>0</obsess_over_services>
    <obsess_over_hosts>0</obsess_over_hosts>
    <check_service_freshness>1</check_service_freshness>
    <check_host_freshness>0</check_host_freshness>
    <enable_flap_detection>0</enable_flap_detection>
    <enable_failure_prediction>1</enable_failure_prediction>
    <process_performance_data>1</process_performance_data>
    <global_host_event_handler></global_host_event_handler>
    <global_service_event_handler></global_service_event_handler>
    <next_comment_id>94586</next_comment_id>
    <next_downtime_id>35813</next_downtime_id>
    <next_event_id>1185528</next_event_id>
    <next_problem_id>532761</next_problem_id>
    <next_notification_id>1337020</next_notification_id>
    <total_external_command_buffer_slots>4096</total_external_command_buffer_slots>
    <used_external_command_buffer_slots>11</used_external_command_buffer_slots>
    <high_external_command_buffer_slots>128</high_external_command_buffer_slots>
    <active_scheduled_host_check_stats>21,132,401</active_scheduled_host_check_stats>
    <active_ondemand_host_check_stats>33,278,834</active_ondemand_host_check_stats>
    <passive_host_check_stats>0,0,0</passive_host_check_stats>
</programstatus>

</nagios>

Thanks in advance,

Luke 🙂

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Edited for version 4.3:

As for version 4.3, while the below accepted answer works, you can also use the props.conf setting:

KV_MODE = xml

this performs spath-type extraction on the events.


Maybe. As it turns out, the xmlkv command is not really a real XML extraction, it's just a regular regex that can be done by Splunk config probably better than the xmlkv command itself. (See $SPLUNK_HOME/etc/apps/search/bin/xmlkv.py.)

Just define a search-time extraction for your sourcetype (or source or whatever) in props.conf:

[mysourcetype]
REPORT-xmlkv = xmlkv-alternative

and in transforms.conf:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

richgalloway
SplunkTrust
SplunkTrust

This answer is still helpful 12 years later.  Thanks, @gkanapathy !

---
If this reply helps you, Karma would be appreciated.
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

As of version 4.3, you can now use the setting in props.conf:

KV_MODE = xml

which will perform spath extraction.

0 Karma

jangid
Builder

Very Nice 🙂

0 Karma

lyndac
Contributor

Worked perfectly! Thanks!

0 Karma

Lowell
Super Champion

Nice trick. You could also add MV_ADD = True to your xmlkv-alternative stanza if you want to capture repeating XML elements as a multi-value field, for example if your XML represents a list of items. This is something that you can't do with the default xmlkv command. Pretty cool.

Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...