Splunk Search

How to edit my regular expression for a multivalue field extraction with new lines?

johnmvang
Path Finder

Hello,

I need REGEX help. I've wasted almost all day trying to do this and only came up with this which is very sloppy. I feel like this could be more efficient and work. When i plug it into Splunk it doesn't do anything in the field extractor "i'll define my own regular expression' section.

My Regex:

^Job Dependencies:\s*[([]*(\w+_\w+_\w+_\w+_\w+)[)\]]*|,\s+[([]*(\w+_\w+_\w+_\w+_\w+)[)\]]*,\n|\G\s*[([]*(\w+_\w+_\w+_\w+_\w+)[)\]]*,*

I only need the Job dependencies. I know i need to turn them into a multi value field so the expected splunk stats list output can look like this:

Job Name                             Job Dependencies
ABC_Job                              ABC_ABC_AB2_123_ABC123
                                     ABC_ABC_AB2_123_123ABC
                                     BCA_BCA_12A_ABC_123ABC
                                     DDD_AAA_CCC_12_123ABC

(I dont need help with the splunk search, just showing so you guys know what i'm trying to achieve.)

Since the Data also has a "Job Prerequisites:" section which have similarly formated data, my regex would capture this data as well, but i don't want it.

Please help. Sample data below:

Job Name :          Job ID:
ABC_Job              ADF123

Job Prerequisites: (ABC_ABC_AB2_123_ABC123, AB1_ABC_AB2_123_123ABC)

Job Dependencies: (ABC_ABC_AB2_123_ABC123, ABC_ABC_AB2_123_123ABC,
                  BCA_BCA_12A_ABC_123ABC, DDD_AAA_CCC_12_123ABC)

THERES A CATCH Sometimes the "Job Dependencies" could have square brackets OR just one dependency for example:

Job Dependencies: (ABC_ABC_AB2_123_ABC123, [ABC_ABC_AB2_123_123ABC],
                  BCA_BCA_12A_ABC_123ABC, DDD_AAA_CCC_12_123ABC)

OR

Job Dependencies: (DDD_AAA_CCC_12_123ABC)

Pretty much, i am trying to find the data with under scores (_) after Job Dependencies. I can't get my regex to wrap or work correctly.

Any help is greatly Appreciated.

Thanks,

John

0 Karma
1 Solution

gokadroid
Motivator

Ignoring all the pieces as required and focusing just on the troubling multivalued Job Dependencies here is what you can try to see if it works out for you.

Assuming one event has only one line of Job Dependencies: which is a multivalued field, how about trying to first rex out the multivalue field in a single field jd and then split it into multiple values in multiJD. Thereafter mvexpand shall give all the values:

your query to filter the events
| rex "your rex to get the job name"
| rex field=_raw "Job Dependencies:\s*\((?<jd>[^\)]+)"
| eval multiJD=split(jd, ",")
| mvexpand multiJD

View solution in original post

0 Karma

woodcock
Esteemed Legend

Try this; it will create a multivalued field:

... | rex max_match=4 "(?ms)(?<Job_Dependency>[^\(\),\[\]\s]+)"
0 Karma

DalJeanis
SplunkTrust
SplunkTrust

To expand on woodcock's code - here's a way to generate test data, and then a sample of his results and a slightly more complicated Rex that you can modify as you like to eliminate any text before the dependencies.

| makeresults
| eval MyDeps = mvappend(
 "Job Dependencies: (ABC_ABC_AB2_123_ABC123, [ABC_ABC_AB2_123_123ABC], BCA_BCA_12A_ABC_123ABC, DDD_AAA_CCC_12_123ABC)",
 "Job Dependencies: ([ABC_ABC_AB2_123_123ABC], BCA_BCA_12A_ABC_123ABC, [DDD_AAA_CCC_12_123ABC])",
 "Job Dependencies: (DDD_AAA_CCC_12_123ABC)",
 "Job Dependencies: ([DDD_AAA_CCC_12_123ABC])",
 "Job Dependencies: (ABC_ABC_AB2_123_ABC123, ABC_ABC_AB2_123_123ABC, BCA_BCA_12A_ABC_123ABC, DDD_AAA_CCC_12_123ABC)")
| mvexpand MyDeps
| rename MyDeps as _raw

everything above this point just makes some test data.

| rex max_match=10 "(?ms)(?<Job_Dep_Rex1>[^\(\),\[\]\s]+)"
| rex max_match=10 "(?ms)((?:Job Dependencies: )|(?<Job_Dep_Rex2>[^\(\),\[\]\s]+))"
0 Karma

gokadroid
Motivator

Ignoring all the pieces as required and focusing just on the troubling multivalued Job Dependencies here is what you can try to see if it works out for you.

Assuming one event has only one line of Job Dependencies: which is a multivalued field, how about trying to first rex out the multivalue field in a single field jd and then split it into multiple values in multiJD. Thereafter mvexpand shall give all the values:

your query to filter the events
| rex "your rex to get the job name"
| rex field=_raw "Job Dependencies:\s*\((?<jd>[^\)]+)"
| eval multiJD=split(jd, ",")
| mvexpand multiJD
0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...