Splunk Search

Do we have any best practices for field extraction command usage for XML inputs?

sundarrajan
Path Finder

Reason for this specific question is to understand the performance quotient for each command like rex/xmlkv/spath/multikv. One evident experience is if I use xmlkv, it is taking a quite a long time to fetch relevant fields. Also, finding a challenge in using rex commands. How to improve the performance of splunk while still using xmlkv?

0 Karma

niketn
Legend

Try using xpath or spath which are specifically for reading XML/JSON data in tree like structure. You can alternatively also use rex provided your XML schema is know to you so that you can define specific structure.

Example XML node:Field XMLData = <NodeName>Sample Data</NodeName>

1) Using rex command -> rex field=XMLData ("&lt;NodeName&gt;(?<MyNodeName>\w+)&lt;/NodeName&gt;")
Using spath command -> spath input=XMLData output=MyNodeName path=RequestType.NodeName

The spath command will extract multiple key value pairs if they exist in the path provided. However, for rex command in order to get multiple key-value pairs you should set max_match property (any number representing number of matches and 0 implies all matches. Default is 1 for single match).
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Rex
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath

2) If you are searching for element NodeName in your events, always make sure it is present in your base search filter i.e.
index=YourIndexName sourcetype=YourSourceType AND "<NodeName&gt" AND "</NodeName&gt"

3) It would be preferable to include XML Header fields also in base search queries which remain same for similar events you are planning to search like "<RequestType&gtAddProductToCart</RequestType&gt". This way you are only looking at AddProductToCart XMLs and ignoring all others if they are not required in your search.

4) Finally, if there are too many XMLs being written quite frequently you should try to extract fields generate stats and push them to Summary index on aggregated fields as key value pair for faster searches. Refer si<stats> command where si is for summary index and <stats> could be stats, chart, timechart etc. Also collect command which is more user controlled through Scheduled Searches.
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Collect

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...