Splunk Search

How to extract a value of a field, when the field contains quotes(") Inside?

icquintos
New Member

I have an index with multiple fields, however one of my field could contain multiple quotes.

Id="0001",
Message="The data "test" is not present",
Result="This is a result"

When I check the fields, I have id, Message and Result. However the value of Message is only The data ** I want to extract the whole **The data "test" is not present for the value of Message.

I have checked many questions :
- https://answers.splunk.com/answers/29961/how-can-i-extract-a-quoted-field-value-that-includes-a-quot...
- https://answers.splunk.com/answers/210504/how-to-deal-with-quotes-in-field-values-that-are-c.html?ut...

but could not simply get the answer I was looking for. Thank you.

Tags (2)
0 Karma

Richfez
SplunkTrust
SplunkTrust

One way, if it's always formatted like above Message=<stuff to the comma> with no comma INSIDE the quotes could be done like this run-anywhere example.

| gentimes start=1/1/2016 end=1/2/2016 | eval _raw="Id=\"0001\",
Message=\"The data \"test\" is not present\",
Result=\"This is a result\""
|rex field=_raw "Message=\"(?<MyValue>[^,]*)\","

Which outputs, among other things, MyValue as The data "test" is not present

For your needs, you'll probably just need the |rex ...

0 Karma

icquintos
New Member

Hi rich;

Thank you for your answer.

However, I can't seem to understand the format in |rex ... is there any guide on how to format it?

0 Karma

Richfez
SplunkTrust
SplunkTrust

There are a variety of places to look for help. I'll point out a few at the bottom of this post after I've at least explained what I had done.

Let me explain something in how the _raw field was created (the eval). This will be pertinent to the explanation of the rex.

In creating the _raw field I used, any quote that shows up between the opening quote and the ending quote needs to be escaped. That's just a way to tell the system you want the actual quote sign to be inside the string you are making instead of "closing" the quote off. So you use a backslash in front of it. "This string has a quote mark \" right in the middle of it". If you create that into a field, you'll get

This string has a quote mark " right in the middle of it

If I hadn't used that backslash, the field would be

This string has a quote mark

and there'd be some confusion on the part of what in the world the "right" and "in" and "the" ... were all about.

OK, so on to the rex. The rex is |rex field=_raw "Message=\"(?<MyValue>[^,]*)\","

The Message= is a literal string which says to search piece by piece through the field _raw and look for the string "Message=". That's my anchor - it's me telling the rex where in the entire _raw field to start paying attention. Likewise, the very tail end has ,. That is a string literal, just the same as Message=. So, whatever part we're telling rex to look at starts with Message= and ends in a ,. (The VERY last quote mark is the closing of the entire string, just like in front of the M,)

Now, if you'll look closely, you'll notice you'll see the escaped quotes. It's actually Message=\" (something) \"," Well, we know what the escaping does - this will match Message=" (something) ", right?

Now, what's (something)? It's a capture group. That's the (? ... ) part - the parenthesis and the question mark. What the entire rex is capturing is told by the contents of this capture group. <MyValue> is the name we're giving it. Now, [...] means a group of characters. So if you wrote [abc] here, it would match an a, a b or a c character. Sometimes you'll see [a-zA-Z0-9], which means all characters from a to z, from A to Z, and 0 to 9. If the first character of the group is a ^ though, it means NOT those. So [^a] means a character that is NOT a. So what I've written is [^,] which means a character that isn't a comma. Follow so far? There's only one tiny piece left: the * after the [^,] which means to match zero or more. (+ means one or more).

So the English translation of "Message=\"(?<MyValue>[^,]*)\"," is to find a string starting with Message=" and read from it all (zero or more) characters that aren't commas, assigning what you read into the field "MyValue". The string will have a quote sign and comma at the end.

TADA!

Now, some links.

I have found regexone.com to be fabulous. My 10 year old daughter and I worked through most of them in probably 30 minutes split over two evenings, she enjoyed it and picked up quite a bit.

Another great tutorial - probably best AFTER you've gone through a bunch of the first one (if not all, at least the first 6 or 8 sessions), is rexegg.com. It's a bit more work to start out in, that's why I recommend it second.

Once you've gotten the hang of it, you can start playing around with somewhere like Regex 101. I have found this most useful when you have "real" problems to solve, like creating semi-bogus extractions for random text you find in your Splunk. Paste it into the TEST STRING portion and start fiddling away!

icquintos
New Member

Hi rich;

I have tried this using splunk enterprise, however it is not working.

Search Bar :
SPLUNKDATA 1006 | rex field=_raw "Message=\"(?[^,]*)\","

Data :
Message=" SPLUNKDATA 1006 " Test101",

Field :
Message
Value :
SPLUNKDATA 1006

It seems to be not extracting SPLUNKDATA 1006 " Test101

Please help, Thank you.

0 Karma

Richfez
SplunkTrust
SplunkTrust
| gentimes start=1/1/2016 end=1/2/2016 | eval _raw="Id=\"0001\", Message=\" SPLUNKDATA 1006 \" Test101\", Result=\"This is a result\"" |rex field=_raw "Message=\"(?<MyValue>[^,]*)\","

That returns a _raw that looks like

Id="0001", Message=" SPLUNKDATA 1006 " Test101", Result="This is a result" 

Which definitely has an extra quote mark right in the middle just like your example.

And it also returns MyValue of

SPLUNKDATA 1006 " Test101 

So I'm not sure what's wrong.

So two things here please: First, paste in my search at the top of this comment as it is, tell me if you get my results with respect to what _raw looks like and what MyValue looks like.

Second, please paste in the _raw event you get (for that one, anyway) when you search for just SPLUNKDATA 1006. Perhaps there's something else I'm missing. Please be SURE to use the code (101010) button to format it!

Also, do you have control over the logging? It is highly unrecommended and possibly entirely invalid to have an unescaped quote character in the middle of quoted strings like that. If you are using "quotes" to encapsulate strings inside what appears to be comma delimited text, then you have to escape those quotes - it's only right and proper and prevents problems like this. Nothing I know of will handle this situation properly out of the box, though several things should be able to be modified (like Splunk) to handle it. The right way for that portion of the log line to be written is to either :
a) NOT use the quote mark in the middle.
b) Use a DIFFERENT character to quote with (like a single quote?)
c) Failing either of those, escape it like
SPLUNKDATA 1006 \" Test101

0 Karma

icquintos
New Member

Hi Rich;

I have inserted

SPLUNKDATA 1006 | rex field=_raw "Message=\"(?[^,]*)\","

in the search bar. Is there something wrong with that? hmm.

I'm getting my logs from a directory on my PC, so " marks are common.

Is there any concern with that?

0 Karma

Richfez
SplunkTrust
SplunkTrust

The formatting has likely stolen formatting out of what you pasted. The 5th button from the left is the code button: 10101. If you use that to paste your snippets in then the editor won't break all your special characters.

Let's back up a step. Can you please - making sure to use the code button as described - paste in a few of these events (both ones in Splunk and a few from the original logs) that you have that you are trying to work with? I don't think we need a lot of them, but if you have a representative sampling of ones with no quotes and ones with quotes, that would be great. So open your original log files and grab a few lines out of there, then go to Splunk and grab a few there as well.

Also, what program, batch file or script is this that is generating the log files in the first place?

Your statement about "getting logs from a directory on your PC so quote marks are common" is unclear and I don't understand what one has to do with the other.

Now, a lot of this confusion may simply be that without having used the code button as described above, the formatting and special characters disappeared in all of your examples thus confusing me and anyone else who's trying to help. I would appreciate re-pasting all those examples and things using that code button to capture the formatting and special characters properly.

I know that will take a bit of time, but if you could do this and double-check that after you post it all still appears to be the same as what you were pasting, that would help us IMMENSELY in getting this solved for you!

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...