Splunk Search

Help with regex to extract words before column"

snallam123
Path Finder

Events:
com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:

| rex field=_raw "^(?:[^ \n] ){6}(?P[^ ]+)" and "^(.\w?):"

I tried above but it's not correct.

I need to extract these: ServerAuditDetailAssertion, Applications paymentRedirects Permission Application assertion to any new field.

Can someone help me with this?

0 Karma
1 Solution

triest
Communicator

I'm not completely convinced this is what you're asking for, but if I am reading your question correctly does this work for you?

| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

For example the following search:

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

produced

Search results showing _raw with the text from above, a _time field, and foo as a multi-value field containing ServerAuditDetailsAssertion, Application, paymentRedirects, Permission, Application, and assertion

NOTE: In the above example, foo is a multivalue fields. You could use an eval to join them with a delimiter.

For those who don't know regex, the .(?[^.]+):\s+\d+: regex basically says:

As a general strategy, the unique thing on each line that gives us the values we want is it starts with a . and ends with a : So we're going to use that uniqueness to match just what we want.

  1. look for a literal . (remember . normally matches any character so we have to escape)
  2. (?... ) is a capture group. That is the text is matches goes into the field called foo
  3. [^.] Basically any character except a literal . [ ... ] is a character class. The ^ negates it and the class itself is matching a literal .
  4. The + means one or more times, so we're looking for one or more non-. characters. This is what will match the actually values.
  5. We then end it with the : which is what prevents earlier parts of the string from matching (e.g. texh)

The key here is I am assuming Applications, paymentRedirects etc could by anything and that you want all of them. If you only wanted these values you could easily change the regex to look for hard coded values instead of the generic not a .

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>(?:ServerAuditDetailAssertion|Applications|paymentRedirects|Permission|Application|assertion)):\s+\d+:"

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi
try this regex

(?ms).*ServerAuditDetailAssertion:\s+(?<ServerAuditDetailAssertion>[^:]*):\s*.*Applications:\s+(?<Applications>[^:]*):\s*.*paymentRedirects:\s+(?<paymentRedirects>[^:]*):\s*.*Permission:\s+(?<Permission>[^:]*):\s*.*Application:\s+(?<Application>[^:]*):\s*.*assertion:\s+(?<assertion>[^:]*)

that you can test at https://regex101.com/r/6Xa7NE/1

So you'll have, e.g. a stat for each Application:

index=my_index
| rex "(?ms).*ServerAuditDetailAssertion:\s+(?<ServerAuditDetailAssertion>[^:]*):\s*.*Applications:\s+(?<Applications>[^:]*):\s*.*paymentRedirects:\s+(?<paymentRedirects>[^:]*):\s*.*Permission:\s+(?<Permission>[^:]*):\s*.*Application:\s+(?<Application>[^:]*):\s*.*assertion:\s+(?<assertion>[^:]*)"
| stats  values (ServerAuditDetailAssertion) AS ServerAuditDetailAssertion values(paymentRedirects) AS paymentRedirects values(Permission) AS Permission values (Applications) AS Applications values (assertion) AS assertion BY Application

Obviously you can use also other functions as sum, avg, etc... instead values, but I don't know your need.

Bye.
Giuseppe

0 Karma

triest
Communicator

I'm not completely convinced this is what you're asking for, but if I am reading your question correctly does this work for you?

| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

For example the following search:

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>[^.]+):\s+\d+:"

produced

Search results showing _raw with the text from above, a _time field, and foo as a multi-value field containing ServerAuditDetailsAssertion, Application, paymentRedirects, Permission, Application, and assertion

NOTE: In the above example, foo is a multivalue fields. You could use an eval to join them with a delimiter.

For those who don't know regex, the .(?[^.]+):\s+\d+: regex basically says:

As a general strategy, the unique thing on each line that gives us the values we want is it starts with a . and ends with a : So we're going to use that uniqueness to match just what we want.

  1. look for a literal . (remember . normally matches any character so we have to escape)
  2. (?... ) is a capture group. That is the text is matches goes into the field called foo
  3. [^.] Basically any character except a literal . [ ... ] is a character class. The ^ negates it and the class itself is matching a literal .
  4. The + means one or more times, so we're looking for one or more non-. characters. This is what will match the actually values.
  5. We then end it with the : which is what prevents earlier parts of the string from matching (e.g. texh)

The key here is I am assuming Applications, paymentRedirects etc could by anything and that you want all of them. If you only wanted these values you could easily change the regex to look for hard coded values instead of the generic not a .

| makeresults 
| eval _raw="com.texh.servers.policy.assertion.ServerAuditDetailAssertion: 9879:
com.texh.log.custom.Applications: 9999:
com.texh.log.custom.paymentRedirects: 8800:
com.texh.log.custom.Permission: 9999:
com.texh.logs.system.Application: 8877:
com.texh.logs.policy.assertion: 0880:"
| rex max_match=0 "\.(?<foo>(?:ServerAuditDetailAssertion|Applications|paymentRedirects|Permission|Application|assertion)):\s+\d+:"
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...