Getting Data In

Removing line breaks within XML log data

cparham
Explorer

This is related to http://answers.splunk.com/questions/2141/xml-log-source-type

How would I remove line breaks found inside elements <Sql_Text>...</Sql_Text>. Many of the SQL statements inside <Sql_Text> tags have hard line breaks often at very undesirable places.

For example:

<Sql_Text>UPDATE APP_USR SET CONTACT_ID = 2 WHERE APP_  
ID = 64  
</Sql_Text>

I need it to look like this to make searching possible:

<Sql_Text>  
UPDATE APP_USR SET CONTACT_ID = 2 WHERE APP_ID = 64  
</Sql_Text>  

or even this

<Sql_Text>UPDATE APP_USR SET CONTACT_ID = 2 WHERE APP_ID = 64</Sql_Text>  

Thanks.

Tags (1)
1 Solution

Lowell
Super Champion

I don't think this is possible to do within a single regular expression. So you'll have to either:

  1. Fix this externally using a scripted input. (This is the most complicated approach and the only one that will yield correct index-time results.)
  2. Fix your data a search time using a sequence of commands (which could be consolidated using a macro for simplicity.

Here is a search-time solution that would get reformat your text the way you are looking for:

| rex "(?s)^(?<_sqltxt_pre>.*?<Sql_Text>)(?<Sql_Text>.*?)(?<_sqltxt_post></Sql_Text>.*)$" | rex field=Sql_Text mode=sed "s/[\r\n]//g" | eval _raw=_sqltxt_pre . trim(Sql_Text) . _sqltxt_post

So for your example above, if you want to search for APP_ID 64, then you would need a search like so:

sourcetype=oracle_audit_xml | rex "(?s)^(?<_sqltxt_pre>.*?<Sql_Text>)(?<Sql_Text>.*?)(?<_sqltxt_post></Sql_Text>.*)$" | rex field=Sql_Text mode=sed "s/[\r\n]//g" | eval _raw=_sqltxt_pre . trim(Sql_Text) . _sqltxt_post | search "APP_ID = 64"

View solution in original post

Lowell
Super Champion

I don't think this is possible to do within a single regular expression. So you'll have to either:

  1. Fix this externally using a scripted input. (This is the most complicated approach and the only one that will yield correct index-time results.)
  2. Fix your data a search time using a sequence of commands (which could be consolidated using a macro for simplicity.

Here is a search-time solution that would get reformat your text the way you are looking for:

| rex "(?s)^(?<_sqltxt_pre>.*?<Sql_Text>)(?<Sql_Text>.*?)(?<_sqltxt_post></Sql_Text>.*)$" | rex field=Sql_Text mode=sed "s/[\r\n]//g" | eval _raw=_sqltxt_pre . trim(Sql_Text) . _sqltxt_post

So for your example above, if you want to search for APP_ID 64, then you would need a search like so:

sourcetype=oracle_audit_xml | rex "(?s)^(?<_sqltxt_pre>.*?<Sql_Text>)(?<Sql_Text>.*?)(?<_sqltxt_post></Sql_Text>.*)$" | rex field=Sql_Text mode=sed "s/[\r\n]//g" | eval _raw=_sqltxt_pre . trim(Sql_Text) . _sqltxt_post | search "APP_ID = 64"

Lowell
Super Champion

(?s) tells the regular expression engine that . (dot) can match anything. (Without this the expression .* will not match across the end of a line, which we need in this case since your event spans multiple lines.) Basically we are breaking each event into 3 pieces, modifying the middle (e.g. your "Sql_Text") and then joining the 3 pieces back into one big event. I thought I added a basic explanation, but would appearer that I forgot. 😉 Sorry about that. Best of luck! I recommend http://www.regular-expressions.info/ for better understanding regular expressions.

0 Karma

cparham
Explorer

Thanks. The search-time solution appears to work.
[although I'm a little confused by the first rex expression 🙂 I think I follow most of it except the first group match (?s)]

0 Karma

Lowell
Super Champion

Is there a constant-width wrapping going on here? The problem with completely stripping out end-of-line characters like this is that there could be times where there are legitimate line breaks in the actual SQL statement; so you could end up with values lines being mashed together that shouldn't be.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...