Splunk Search

Regex Help: Parse CSV with whatever it has got rather than failing on entire line

koshyk
Super Champion

Hi
We have a regex/requirement to extract col1,col2,col3,col4 everytime. But the data may not contain col3 onwards everytime.
How to write regex , so it will be forgiving and extract what it has got, rather than failing for the entire line?

(?<col1>[^\"]*?)\",\"(?<col2>[^\"]*?)\",\"(?<col3>[^\"]*?)\",\"(?<col4>[^\"]*?)\"

below is dataset

"r1col1","r1col2"
"r1col1","r1col2","r1col3"
"r3col1","r3col2","r3col3","r3col4"
"r4col1","r4col2","r4col3","r4col4","r4col5","r4col6","r4col7"

in above regex, it is failing for Line1 and Line2, but rather prefer to give atleast col1 and col2 if it doesn't find others.

https://regex101.com/r/Bkle5V/1

0 Karma
1 Solution

elliotproebstel
Champion

How about this:
(?<col1>[^\"]*?)\",(\"(?<col2>[^\"]*?)\",)?(\"(?<col3>[^\"]*?)\",)?(\"(?<col4>[^\"]*?)\")?

This makes col2, col3, and col4 optional by wrapping them in parenthesis and appending a question mark, to indicate that the field may occur 0 or 1 times - effectively making them optional.

https://regex101.com/r/Bkle5V/2

View solution in original post

elliotproebstel
Champion

How about this:
(?<col1>[^\"]*?)\",(\"(?<col2>[^\"]*?)\",)?(\"(?<col3>[^\"]*?)\",)?(\"(?<col4>[^\"]*?)\")?

This makes col2, col3, and col4 optional by wrapping them in parenthesis and appending a question mark, to indicate that the field may occur 0 or 1 times - effectively making them optional.

https://regex101.com/r/Bkle5V/2

koshyk
Super Champion

cheers. it works

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...