Splunk Search

How to edit my regex to extract fields from an imported CSV file with extra commas in some rows?

banderson7
Communicator

So this is fun...
I need to import a CSV on a regular basis, and I have no control over the format or data in the csv. Contents include:

name1-name2-Uptime,N. California [RealBrowser],8419,100,0,23
name1-name2-Uptime,N. Virginia [RealBrowser],10062,100,0,25
name1-name2-Uptime,Chicago, IL [RealBrowser],6882,95.83300018,1,24
name1-name2-Uptime,Dallas, TX [RealBrowser],9558,100,0,23

Some of the second columns have commas, but not all, and my regex is getting thrown by them.

I've tried the following regex, but it combines the second, third and fourth columns.

^(?<check_name>[^,]+),(?<location>.*),(?<avg_response_time>[^,]+)
0 Karma
1 Solution

woodcock
Esteemed Legend

Like this:

^(?<check_name>[^,]+),(?<location>.*),(?<num1>[^,]*),(?<num2>[^,]*),(?<num3>[^,]*),(?<num4>[^,]*)$

View solution in original post

landen99
Motivator

From the data I am seeing, the fields are not separated by commas and need regex that looks for things other than commas:

 name1-name2-Uptime    N. California [RealBrowser]    8419    100    0    23
 name1-name2-Uptime    N. Virginia [RealBrowser]    10062    100    0    25
 name1-name2-Uptime    Chicago, IL [RealBrowser]    6882    95.83300018    1    24
 name1-name2-Uptime    Dallas, TX [RealBrowser]    9558    100    0    23

Based on the form of your regex for the value patterns for each field, I suggest this regex:

(?<check_name>[A-Za-z]\S+)\s+(?<location>[^\]]+\])\s+(?<avg_response_time>\d+)

But I recommend breaking the fields out a little more:

(?<check_name1>[A-Za-z]\w+)\-(?<check_name2>[A-Za-z]\w+)\-(?<state>\w+)\s+(?<location>[^\[]+) \[(?<browser>[^\]]+)\]\s+(?<avg_response_time>\d+)

https://regex101.com/r/zK5yT5/1

0 Karma

woodcock
Esteemed Legend

Like this:

^(?<check_name>[^,]+),(?<location>.*),(?<num1>[^,]*),(?<num2>[^,]*),(?<num3>[^,]*),(?<num4>[^,]*)$

banderson7
Communicator

That did it!
What's the difference between the first field and the rest? What's the * do that the + didn't?

0 Karma

landen99
Motivator

"*" is zero or more allowing the possibility of no matches for the pattern. But your data lacks the commas for those regex to work so I don't think it should work. As if you and Gregg are seeing different data than what myself and others are seeing. I'll post mine below, just in case.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Is your sample data before or after it's been indexed by Splunk? If the latter, please post the data in it's original format. I ask because your regex is expecting a comma between fields, but the data has white space between fields.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...