Splunk Search

Why is my regex not matching for a multivalue field?

spike021
Explorer

I looked through quite a few posts on here and couldn't find an appropriate answer, so please bare with me.

I have events coming into Splunk in JSON format. The top-level fields are extracted fine. However, a nested map/dictionary is giving me issues. When I run a search to get the values from that inner dictionary, it works in that I get a resulting table like:

 A       B
---     ---   
 x       y
         z
         y
         z

 s       m
         n

 u       -  (- means None)

So, the y and z both belong to x and occasionally there are more than 2 items per each x. This happens for any x in A.

Since the cell in the table makes the values in B look separated by a newline, I created a regular expression that I've verified to correctly grab the logical groups for each y and z, if, for instance, they were just in a text box like this:

y
z
y
z
y
z

So the regex would properly grab the two as many times necessary, separately.

What I want to do is pull out each pair and separate the two items into two new fields, say C and D, and then later have a table where I have C and D grouped to field A.

The regex part of the command:

rex field="A{}{}"  "(?<C>[\da-z\.-]+\.[a-z\.]{2,6})\n(?<D>\d{1,3})"

Note: the A{}{} together makes up the multivalue field, B, and A is just A as in the earlier part of my example.

The issue I'm running into is that when I pipe what should be the output from that statement into the table command, I don't get anything.. The regex is definitely confirmed working on a site like http://regexr.com/ just for sanity-checking.

So there must be something I'm missing. Maybe the initial table with my example just looks like newlines separate the two values into rows when it doesn't. In which case I tried using a \s as the separator rather than \n and it still doesn't work.

Or maybe there's a super simple explanation for an obvious mistake I'm making.

Regardless I would appreciate some help very much.

Thanks in advance.

0 Karma
1 Solution

alemarzu
Motivator

Spike,

Are this results acceptable for you ?

http://postimg.org/image/5ofc2b29v/

View solution in original post

alemarzu
Motivator

I just realized that the regex you gave us has an invalid structure. Do you mind sharing a sample data so I can build the proper regex ?

spike021
Explorer

I just added a comment with a sample event.

0 Karma

jkat54
SplunkTrust
SplunkTrust

It only turns invalid when he quotes like this versus

  like this<><\><><><><><><><

But yes, please provide a sample event.

0 Karma

spike021
Explorer

Odd formatting.

So a typical event looks something like this. Priority is to get the keys from the "IMPORTANT" dictionary, but values as well in their own field would be very useful if I could get this to work properly

{
    "timestamp": "2016-01-21T14:44:28", 
    "SOME_FIELD": "etc.",
    "ANOTHER_FIELD": "...", 
    "IMPORTANT": {
        "a_string": 3,
        "another_strong": 44,
        "maybe_another...":95
    }, 
    "test": [
        [
            "something", 
            1.0
        ]
    ]
}
0 Karma

alemarzu
Motivator

U were right, thx 😉

0 Karma

jkat54
SplunkTrust
SplunkTrust

Please provide a full search or at least the table command you are using.

0 Karma

spike021
Explorer

Mentioned it below, but it looks something like: index="myindex" | rex max_match=0 field="A" "(?[\da-z\.-]+\.[a-z\.]{2,6})\n(?\d{1,3})" | table "A", "C", "D"

So nothing particularly complicated, just to get data output, which isn't happening at all yet.

0 Karma

jkat54
SplunkTrust
SplunkTrust
   index="myindex"| rex max_match=0 field="A"  "(?<C>[\da-z\.-]+\.[a-z\.]{2,6})\n(?<D>\d{1,3})" | table "A", "C", "D"
0 Karma

jkat54
SplunkTrust
SplunkTrust

So you're looking for a new line in field A? whats the \n for? Are you turning the JSON into one large event using should_linemerge=true? Are you using KV_MODE=JSON? A sample event and your props/transforms would be most helpful.

0 Karma

spike021
Explorer

My props/transforms are default right now since it seemed like Splunk could already pull out the top-level fields, as mentioned in my other comment a moment ago.

Maybe that's the problem.

0 Karma

spike021
Explorer

So I actually added an example event that you might have missed.

{
     "timestamp": "2016-01-21T14:44:28", 
     "SOME_FIELD": "etc.",
     "ANOTHER_FIELD": "...", 
     "IMPORTANT": {
         "a_string": 3,
         "another_strong": 44,
         "maybe_another...":95
     }, 
     "test": [
         [
             "something", 
             1.0
         ]
     ]
 }

Originally my idea just for the absolute minimum (to at least show I'm able to retrieve that part of the JSON data) was to use a | table "timestamp", IMPORTANT.key, IMPORTANT.values.

Maybe that isn't a good way to go about this?

Splunk already recognizes the rest of the fields at the top level. So if I do | table "timestamp", "ANOTHER_FIELD" then it works fine

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...