Splunk Search

How to get 3rd word using regex?

saranyaa21
Path Finder

Hi,

City:{city1: 4, city2: 3, city3: 2, city4: 5}

I used this regex to get the 3rd word from the above line: (?<"City_count">(?<=City:)(?:\S+\s+){2}(\S+))
But I get 1st , 2nd and 3rd word as a result.

Please help with a regex to get the 3rd word : city2: 3

Tags (2)
0 Karma
1 Solution

niketn
Legend

@saranyaa21, based on the sample data provided, seems like you are traversing JSON data. Ideally for the sourcetype=server_log, if you are only interested in JSON data, you should try either INDEXED_EXTRACTIONS=JSON or else KV_MODE=json in your props.conf. But not both. In order to test the same at Search time you can try the following run anywhere search with spath to reveal all the nodes in JSON. If you refer to documentation you can apply the same to specific node as well.

| makeresults
| eval _raw="{\"City\":{\"city1\": 4, \"city2\": 3, \"city3\": 2, \"city4\": 5}}"
| spath

Or in your case

sourcetype=server_log "Class_Name" 
| spath

The reason why the regular expression provided by experts provided here might not be working for you is possibly because strings in properly formatted JSON data is always placed inside double quotes which is missing in the sample data provided.
Syntactically, a correct JSON as per your sample data should look something like the following:

{"City":{"city1": 4, "city2": 3, "city3": 2, "city4": 5}}

Following is a regular expression based approach. However, for the same to work with your data hierarchy of JSON data that you currently have would be required. Or else there will be two level of rex required. The following blindly take Key Value Pairs in JSON data:

| makeresults
| eval _raw="{\"City\":{\"city1\": 4, \"city2\": 3, \"city3\": 2, \"city4\": 5}}"
| rex "(?<city>[^\"]+)\"\:\s(?<count>[^\,\}]+)(,|\})" max_match=0
| eval city2=mvindex(city,1),count2=mvindex(count,1)

Or the following with two rex, first to get all Cities and second to pull specific one needed:

| makeresults
| eval _raw="{\"City\":{\"city1\": 4, \"city2\": 3, \"city3\": 2, \"city4\": 5}}"
| rex "(?<=\"City\"\:)(?<Cities>[^\}]+)"
| rex field=Cities "(?<city>[^\"]+)\"\:\s(?<count>[^\,\}]+)(,|\})" max_match=0
| eval city2=mvindex(city,1),count2=mvindex(count,1)
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

niketn
Legend

@saranyaa21, based on the sample data provided, seems like you are traversing JSON data. Ideally for the sourcetype=server_log, if you are only interested in JSON data, you should try either INDEXED_EXTRACTIONS=JSON or else KV_MODE=json in your props.conf. But not both. In order to test the same at Search time you can try the following run anywhere search with spath to reveal all the nodes in JSON. If you refer to documentation you can apply the same to specific node as well.

| makeresults
| eval _raw="{\"City\":{\"city1\": 4, \"city2\": 3, \"city3\": 2, \"city4\": 5}}"
| spath

Or in your case

sourcetype=server_log "Class_Name" 
| spath

The reason why the regular expression provided by experts provided here might not be working for you is possibly because strings in properly formatted JSON data is always placed inside double quotes which is missing in the sample data provided.
Syntactically, a correct JSON as per your sample data should look something like the following:

{"City":{"city1": 4, "city2": 3, "city3": 2, "city4": 5}}

Following is a regular expression based approach. However, for the same to work with your data hierarchy of JSON data that you currently have would be required. Or else there will be two level of rex required. The following blindly take Key Value Pairs in JSON data:

| makeresults
| eval _raw="{\"City\":{\"city1\": 4, \"city2\": 3, \"city3\": 2, \"city4\": 5}}"
| rex "(?<city>[^\"]+)\"\:\s(?<count>[^\,\}]+)(,|\})" max_match=0
| eval city2=mvindex(city,1),count2=mvindex(count,1)

Or the following with two rex, first to get all Cities and second to pull specific one needed:

| makeresults
| eval _raw="{\"City\":{\"city1\": 4, \"city2\": 3, \"city3\": 2, \"city4\": 5}}"
| rex "(?<=\"City\"\:)(?<Cities>[^\}]+)"
| rex field=Cities "(?<city>[^\"]+)\"\:\s(?<count>[^\,\}]+)(,|\})" max_match=0
| eval city2=mvindex(city,1),count2=mvindex(count,1)
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

saranyaa21
Path Finder

Hello Mr. Niketnilay,

Yes. My complete data looks like this :

Details: {Employee:{employee1:100,employee2:101,employee3:103},Company:{company1:001,company2:002,company3:003},City:{city1:4,city2:3,city3:5}}

From this I would like to pick only value 3 from city2:3
Can you please help me with a splunk query

0 Karma

saranyaa21
Path Finder

In your above comment , you are inputting the raw value of city1, city2 and city3 as
| eval _raw="{\"City\":{\"city1\": 4, \"city2\": 3, \"city3\": 2, \"city4\": 5}}"

But my entire data will be a dynamic,
Details: {Employee:{employee1:100,employee2:101,employee3:103},Company:{company1:001,company2:002,company3:003},City:{city1:4,city2:3,city3:5}}
These values keep on changing.
In such case every time I wish to extract only the value of city2.

Can you please with this

0 Karma

niketn
Legend

@saranyaa21, have you tried the below spath command as suggested. Read about spath command from the documentation link provided above.As mentioned earlier String should be under double quotes for the JSON to be valid. Also while you have posted Details section, I think even Details is not the Root Node of JSON data.

 sourcetype=server_log "Class_Name" 
 | spath

Ideally spath is the command supposed to traverse through JSON and XML data. If your _raw data has JSON within some additional non-json data, you would need to extract complete JSON data first. You would need to give us this pattern in case you need assistance with this scenario.

| makeresults is dummy command for us to cook up data as per the question and demo the functionality as a run anywhere search (obviously because we do not have access to actual data in your environment). In your case please try out the following SPL and confirm:

  sourcetype=server_log "Class_Name" 
 | rex "(?<=\"City\"\:)(?<Cities>[^\}]+)"
 | rex field=Cities "(?<city>[^\"]+)\"\:\s(?<count>[^\,\}]+)(,|\})" max_match=0
 | eval city2=mvindex(city,1),count2=mvindex(count,1)
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

saranyaa21
Path Finder

Hello @niketnilay,

Thank you for your answer. Your concept of feeding only the city related data as field to next rex query instead of raw helped me in solving my problem.

I figured out a way for getting it. I fed in the values of city alone as field to the next rex query and extracted the value after city2 using this regular expression

|rex "(?<"City">(?<=City:).?(?:(?!city4).))"| rex field=City "(?<"second_City">(?<=city2:).*?([^\s\,]+))"

thank you once again.

niketn
Legend

@saranyaa21 if this has resolved your issue, you would need to unaccept the previous answer and accept this one instead. You should also upvote the answer/comments that helped 🙂

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

saranyaa21
Path Finder

Done Both 🙂 Thank You @niketnilay

ddrillic
Ultra Champion

Maybe the following would work for you -

City:{(?:\S+\s+){2}(\S+\s+\d)

alt text

saranyaa21
Path Finder

Hello ddrillic,

Your answer works perfectly in the regular expression 101, and gives me the output city2: 3.
But when I use this in my splunk with real time logs, I'm getting emply result.

Below is the compelte splunk query

sourcetype=server_log "Class_Name" | rex field=_raw "(?<"City_count">City:{(?:\S+\s+){2}(\S+\s+\d))" | stats count by City_count

Can you please help in figuring out the flaw here

Thanks in advance

0 Karma

saranyaa21
Path Finder

Hello Mr. ddrillic,

Details: {Employee:{employee1:100,employee2:101,employee3:103},Company:{company1:001,company2:002,company3:003},City:{city1:4,city2:3,city3:5}} is my complete data set.

I wish to extract only 3 from city2:3
Please help me with it

0 Karma

woodcock
Esteemed Legend

Try this:

City:{(?:[^:]+:\s+\d+,\s*){2}(?<Third>[^:]+:\s+\d+)
0 Karma

saranyaa21
Path Finder

Hello Mr. Woodcock,

Thank you for your reply.
I did not get any matching result with the regex. I got an empty value in the splunk. Can you please help with this.

0 Karma

woodcock
Esteemed Legend

You cripple the people who are trying to help you when you do not give us your actual data/structure.

0 Karma

saranyaa21
Path Finder

@woodcock Sorry! was my mistake!

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Given the limited nature of your example data, it's hard to come up with a definitive regex that will do the trick on your data, but here is a potential regex that you can use:

^[^:]+:\{[^:]+:[^:,]+,\s(?P<City_count>[^:]+:[^,]+)

This starts at the beginning of the line ( ^), accepts the first test up to the colon ( [^:]+), gets the colon and curly brace ( :\{), gets the next data pair, comma, and space ( [^:]+:[^:,]+,\s) and them finally gets your desired data in the field City_count ( (?P<City_count>[^:]+:[^,]+)). As I said, this may not work on all your data, but it does work in this case and in cases that are similar.

Your version had a couple of problems, The double quotes around the field name were a problem for the syntax, because quotes are not allowed in field names. Then after skipping the City: it just takes the first two sets of non-space sets followed by spaces (that would be the {city1: 4, data) and combines it with the next set of non-spaces (that would be the city2:) and puts them together in the field, so that you end up with {city1: 4, city2: in the field. Your named capture grouping is all off, and not complete.

When you say that the third "word" that you want is city2: 3, that is exactly what my regex will give you with the data set you have provided. If you just want 3 as the field value, then I would use the following regex:

^[^:]+:\{[^:]+:[^:,]+,\s[^:]+:\s(?P<City_count>[^,]+)

There are simpler ways as well. For example:

^([^:]+:){3}\s(?P<City_count>[^,]+)

This only looks for the third colon, followed by a space, then takes the number(s) to the comma. Example data is always good to post so that it can be matched properly to a result, particularly in the case of regular expressions.

saranyaa21
Path Finder

Hello Mr. cpetterborg,

Appreciate your complete explanation. Given you 2 reward points for explaining.

This is how my entire data set will look like.
Details: {Employee:{employee1:100,employee2:101,employee3:103},Company:{company1:001,company2:002,company3:003},City:{city1:4,city2:3,city3:5}}

For the above data set, I would like to fetch only value 3 from the city2:3 .

Below is the splunk query which I used with your above regex : ^([^:]+:){3}\s(?P[^,]+)

sourcetype=server_log "Class Name" | rex field=_raw "((?<"City_count">(City:^([^:]+:){3}\s(?P[^,]+)))" | stats count by City_count

But I didnot get any output. Can you please help me with it.

Thanks,

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...