Splunk Search

How do I extract and separate an arbitrary number of field values with regex?

andrew207
Path Finder

input:

myCommand -myArgs taska taskb taskc
myCommand -myArgs taska
myCommand -myArgs taska taskb taskc taskd

What's the best way to capture this? At the moment I'm using the regex

myCommand (?P<args>\-\w+)(\s(?P<tasks>[A-z0-9\s]+))

It results in

1. tasks: "taska taskb taskc"
2. tasks: "taska"
3. tasks: "taska taskb taskc taskd"

How would I go about separating these or making them individual? I want to aggregate by "taska" and draw some nice graphs more easily.

0 Karma
1 Solution

jeffland
SplunkTrust
SplunkTrust

I don't think this is possible with regex in the sense of arbitrary number of capturing groups. You could of course define a high number of capturing groups and make them optional, something like

(?<task1>\w+)(?:[ ])?(?<task2>\w+)?(?:[ ])?(?<task3>\w+)?(?:[ ])?(?<task4>\w+)?(?:[ ])?(?<task5>\w+)?

It shouldn't matter if the latter groups don't match.

But do you really need all those tasks as unique fields? What about the order they appear in, will that be fixed? Or, what if one event has task1=ab task2=cd and another event has task1=cd? You won't directly see that these two events both have a task "cd", but it is up to you to judge your needs.

I would suggest you keep them as one field the way you have them now, and do makemv tasks to have them in a multivalue field. That will allow you to compare them more easily and check which events contain which tasks.

View solution in original post

jeffland
SplunkTrust
SplunkTrust

I don't think this is possible with regex in the sense of arbitrary number of capturing groups. You could of course define a high number of capturing groups and make them optional, something like

(?<task1>\w+)(?:[ ])?(?<task2>\w+)?(?:[ ])?(?<task3>\w+)?(?:[ ])?(?<task4>\w+)?(?:[ ])?(?<task5>\w+)?

It shouldn't matter if the latter groups don't match.

But do you really need all those tasks as unique fields? What about the order they appear in, will that be fixed? Or, what if one event has task1=ab task2=cd and another event has task1=cd? You won't directly see that these two events both have a task "cd", but it is up to you to judge your needs.

I would suggest you keep them as one field the way you have them now, and do makemv tasks to have them in a multivalue field. That will allow you to compare them more easily and check which events contain which tasks.

andrew207
Path Finder

makemv suited my purposes.

An arbitrary number of fields should be scalable. Your solution would be a bit messy if there were thousands of "tasks". Thanks though!

0 Karma

jeffland
SplunkTrust
SplunkTrust

And that's where regular expressions hit their limit - they don't support having an arbitrary number of capturing groups (not as far as I know, and quick googling reveals this evidence).

dineshraj9
Builder

If you have 4 tasks, you can extract them separately in 4 columns -

(?P<args>\-\w+)\s+(?P<taskA>\w+)\s+(?P<taskB>\w+)\s+(?P<taskC>\w+)\s+(?P<taskD>\w+)
0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...