Splunk Search

How to edit my regex to extract fields from an imported CSV file with extra commas in some rows?

banderson7
Communicator

So this is fun...
I need to import a CSV on a regular basis, and I have no control over the format or data in the csv. Contents include:

name1-name2-Uptime,N. California [RealBrowser],8419,100,0,23
name1-name2-Uptime,N. Virginia [RealBrowser],10062,100,0,25
name1-name2-Uptime,Chicago, IL [RealBrowser],6882,95.83300018,1,24
name1-name2-Uptime,Dallas, TX [RealBrowser],9558,100,0,23

Some of the second columns have commas, but not all, and my regex is getting thrown by them.

I've tried the following regex, but it combines the second, third and fourth columns.

^(?<check_name>[^,]+),(?<location>.*),(?<avg_response_time>[^,]+)
0 Karma
1 Solution

woodcock
Esteemed Legend

Like this:

^(?<check_name>[^,]+),(?<location>.*),(?<num1>[^,]*),(?<num2>[^,]*),(?<num3>[^,]*),(?<num4>[^,]*)$

View solution in original post

landen99
Motivator

From the data I am seeing, the fields are not separated by commas and need regex that looks for things other than commas:

 name1-name2-Uptime    N. California [RealBrowser]    8419    100    0    23
 name1-name2-Uptime    N. Virginia [RealBrowser]    10062    100    0    25
 name1-name2-Uptime    Chicago, IL [RealBrowser]    6882    95.83300018    1    24
 name1-name2-Uptime    Dallas, TX [RealBrowser]    9558    100    0    23

Based on the form of your regex for the value patterns for each field, I suggest this regex:

(?<check_name>[A-Za-z]\S+)\s+(?<location>[^\]]+\])\s+(?<avg_response_time>\d+)

But I recommend breaking the fields out a little more:

(?<check_name1>[A-Za-z]\w+)\-(?<check_name2>[A-Za-z]\w+)\-(?<state>\w+)\s+(?<location>[^\[]+) \[(?<browser>[^\]]+)\]\s+(?<avg_response_time>\d+)

https://regex101.com/r/zK5yT5/1

0 Karma

woodcock
Esteemed Legend

Like this:

^(?<check_name>[^,]+),(?<location>.*),(?<num1>[^,]*),(?<num2>[^,]*),(?<num3>[^,]*),(?<num4>[^,]*)$

banderson7
Communicator

That did it!
What's the difference between the first field and the rest? What's the * do that the + didn't?

0 Karma

landen99
Motivator

"*" is zero or more allowing the possibility of no matches for the pattern. But your data lacks the commas for those regex to work so I don't think it should work. As if you and Gregg are seeing different data than what myself and others are seeing. I'll post mine below, just in case.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Is your sample data before or after it's been indexed by Splunk? If the latter, please post the data in it's original format. I ask because your regex is expecting a comma between fields, but the data has white space between fields.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Enter the Dashboard Challenge and Watch the .conf24 Global Broadcast!

The Splunk Community Dashboard Challenge is still happening, and it's not too late to enter for the week of ...

Join Us at the Builder Bar at .conf24 – Empowering Innovation and Collaboration

What is the Builder Bar? The Builder Bar is more than just a place; it's a hub of creativity, collaboration, ...

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users

This article is the continuation of the “Combine multiline logs into a single event with SOCK - a step-by-step ...