Splunk Search

Conditional Rex Extraction with multiple extractions

brajaram
Communicator

I have events with large strings of text being output per event

Sample Text:

{"userDetails":{"uuid": "Lots of different values and fields" ,"offlineString":"firstName:NAME|lastName:NAME|OTHER FIELDS","much more info",\"subscriberFirstName\":\"NAME\",\"subscriberLastName\":\"NAME\","tons more data"}

A more general explanation of our structure is: (FirstName......FirstName.....SubscriberName.....FirstName....FirstName.....FirstName.....FirstName....SubscriberName)

There can be any number of FirstName between each subscriberName, and there can be any number of subscriberName in a single splunk event. We've identified the cause of this - lack of proper linebreaks in our props.conf that end up causing multiple JSON events to be connected together in splunk.

As a result, I'm trying to use regex solutions to find answers to problems while these events are connected together. For the problem I'm tackling right now, I'm trying to find a count/percentage of errors, where if FirstName and subscriberFirstName aren't equal, it is an error. By extracting the fields I can now try to compare for equality and see what ratio of events are throwing this error, or at least thats my thought process

I believe the following two rex's should capture the fields properly

rex "\"firstName:(?.*?)\|"

rex "\"subscriberFirstName\\\":\\\"(?.*?)\\\""

The second rex doesn't work properly but does work when I put it into an online regex tool. The first one captures name properly.

I'm trying to find a rex that can capture both first names that exist. I can write the individual rex extractions for each field, but I want to get it as a pair - and I only want subscriberfirstname IF firstname is prior to it.

The challenge is that events can have multiple firstname, but every subscriberfirstname has a prior firstname. So while I can capture each one seperately, is there a way to capture both together but as separate fields?

Tags (2)
0 Karma
1 Solution

mayurr98
Super Champion

hey @brajaram

Try this regex:

| rex field=_raw "firstName:(?P<firstName>[^|]+).*subscriberFirstName\\\\\":\\\\\"(?<subscriberFirstName>[^\\\"]+)\\\\"

Let me know if this helps!

View solution in original post

nick405060
Motivator

Here. _ fields are a little tricky so I would eval/rename them like I did here.

index=myindex | 

eval no_referrer_regex="MYREGEX1" |

eval referrer_regex="MYREGEX2" |

eval regex=if(_time < 1579250700,no_referrer_regex,referrer_regex) | eval raw=_raw |

map maxsearches=10000 search="| makeresults | eval mapped_raw=\"$$raw$$\" | rex field=mapped_raw \"$$regex$$\"" | table pst pst_epoch id action path num desc browser referrer

A second approach would just be to use ad-hoc searches in SimpleXML to set token values.

0 Karma

niketn
Legend

@brajaram, there might be an easier better way yo extract fields since your data seems to be JSON. However, since you have changed the data instead of anonymizing, we can not confirm whether spath to extract fields from JSON data will be applicable or not.

<YourBaseSearch>
| eval _raw=replace(_raw,"\\\\\"","\"")
| spath
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

brajaram
Communicator

So normally SPATH would be a good idea. However, our logs are currently in a problematic state where we have multiple json events connected together in a single splunk event, making it much more difficult. We've identified the issue - a lack of proper linebreak definitions in props.conf, and we're currently working on setting up proper line breaks to split the events up, but in the meantime I'm trying to use regex solutions as a workaround while that occurs.

0 Karma

niketn
Legend

Sure whatever works... Seems like you have found your workaround until then 🙂

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

mayurr98
Super Champion

hey @brajaram

Try this regex:

| rex field=_raw "firstName:(?P<firstName>[^|]+).*subscriberFirstName\\\\\":\\\\\"(?<subscriberFirstName>[^\\\"]+)\\\\"

Let me know if this helps!

brajaram
Communicator

That worked great, thanks! Still need to modify it more to make it work properly but it definitely has been helpful - the messy nature of our logs means any answer will only be a start, but this is exactly what I was looking for.

0 Karma

micahkemp
Champion

Do you want this to capture firstName only if subscriberFirstName also exists? It seemed to me you wanted to capture firstName always, and subscriberFirstName when available.

0 Karma

brajaram
Communicator

So the way our logs are structured is: ( FirstName......FirstName.....SubscriberName.....FirstName....FirstName.....FirstName.....FirstName....SubscriberName)

There can be any number of FirstName between each subscriberName, and there can be any number of subscriberName in a single splunk event. We've identified the cause of this - lack of proper linebreaks in our props.conf that end up causing multiple JSON events to be connected together in splunk.

As a result, I'm trying to use regex solutions to find answers to problems while these events are connected together. For the problem I'm tackling right now, I'm trying to find a count/percentage of errors, where if FirstName and subscriberFirstName aren't equal, it is an error. By extracting the fields I can now try to compare for equality and see what ratio of events are throwing this error, or at least thats my thought process

0 Karma

micahkemp
Champion

Excellent description. You might consider adding it to the question so that others have an easy time determine what the solution means, and why it was needed.

0 Karma

brajaram
Communicator

What I was looking for is the firstname::subscriber firstname pairs, which I am able to get from that query. I can filter the initial search to always have the subscriberName show in events.

Events can be structured very oddly. An event can have a structure like: "First Name...First Name...Subscriber Name...First Name...First Name...Subscriber Name". There can never be a subscriber name without a first name, but the inverse is possible. What I wanted to do is pull out the pairs of values, then compare each specific pair for equality to generate statistics off of that(inequality is an error and we want to track that). Your modification has been a huge first step in that direction for us.

0 Karma

micahkemp
Champion

Your regexes get gross in rex, because of slashies, but they are doable. I'm not sure what you mean by "capture both together but in different fields", but this will capture both (when they exist), and put both values in one field after rex, in SPL:

| makeresults | eval _raw="{\"userDetails\":{\"uuid\": \"Lots of different values and fields\" ,\"offlineString\":\"firstName:NAME|lastName:NAME|OTHER FIELDS\",\"much more info\",\\\"subscriberFirstName\\\":\\\"NAME\\\",\\\"subscriberLastName\\\":\\\"NAME\\\",\"tons more data\"}"
| rex "firstName:(?<firstName>[^|]+)"
| rex "subscriberFirstName\\\\\":\\\\\"(?<subscriberFirstName>[^\\\]+)"
| eval firstNames=mvappend(firstName, subscriberFirstName)
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...