Splunk Search

Regex Help: Parse CSV with whatever it has got rather than failing on entire line

koshyk
Super Champion

Hi
We have a regex/requirement to extract col1,col2,col3,col4 everytime. But the data may not contain col3 onwards everytime.
How to write regex , so it will be forgiving and extract what it has got, rather than failing for the entire line?

(?<col1>[^\"]*?)\",\"(?<col2>[^\"]*?)\",\"(?<col3>[^\"]*?)\",\"(?<col4>[^\"]*?)\"

below is dataset

"r1col1","r1col2"
"r1col1","r1col2","r1col3"
"r3col1","r3col2","r3col3","r3col4"
"r4col1","r4col2","r4col3","r4col4","r4col5","r4col6","r4col7"

in above regex, it is failing for Line1 and Line2, but rather prefer to give atleast col1 and col2 if it doesn't find others.

https://regex101.com/r/Bkle5V/1

0 Karma
1 Solution

elliotproebstel
Champion

How about this:
(?<col1>[^\"]*?)\",(\"(?<col2>[^\"]*?)\",)?(\"(?<col3>[^\"]*?)\",)?(\"(?<col4>[^\"]*?)\")?

This makes col2, col3, and col4 optional by wrapping them in parenthesis and appending a question mark, to indicate that the field may occur 0 or 1 times - effectively making them optional.

https://regex101.com/r/Bkle5V/2

View solution in original post

elliotproebstel
Champion

How about this:
(?<col1>[^\"]*?)\",(\"(?<col2>[^\"]*?)\",)?(\"(?<col3>[^\"]*?)\",)?(\"(?<col4>[^\"]*?)\")?

This makes col2, col3, and col4 optional by wrapping them in parenthesis and appending a question mark, to indicate that the field may occur 0 or 1 times - effectively making them optional.

https://regex101.com/r/Bkle5V/2

koshyk
Super Champion

cheers. it works

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...