Splunk Search

How do I extract a field from a URL and group by that field?

maddy1011
Explorer

How do I group data and get a count for usage per customer? My data is Time and Event. The event data is a URL and the customer name is somewhere in the URL. How do I group by customer to get a count per customer?
It's something like this and customer_name is what I want to group by

Time Event
1/7/15 5:12:44.469 PM 7 Jan 2015 17:12:44,500 RequestLog INFO :end: /XYZ/api/assets/customer_name?

Model=iphone&language=ge&

0 Karma

vasanthmss
Motivator

Hi Maddy,

Sine your URL format pattern is not same so you need to use two regular expressions,

 | rex field=s "(?<customer_name>[^\/]*)\?" | rex field=s "(?<customer_name>[^\/]*)\/localization"

Explanation:

 | rex field=s "(?<customer_name>[^\/]*)\?" 

The above one helps to grab the customer name before query string, eg /../.../.../..././../.../customer_name?query

| rex field=s "(?<customer_name>[^\/]*)\/localization"

The second one helps to grab the customer name which is before localization. ( you have to add in-case if you find any pattern like localization)

Sample searches:

|stats c | eval s="7 Jan 2015 17:12:44,500 RequestLog INFO :end: /XYZ/api/assets/customer_name? 
Model=iphone?uage=ge&" | rex field=s "(?<customer_name>[^\/]*)\?" | rex field=s "(?<customer_name>[^\/]*)\/localization"

|stats c | eval s="2 Jan 2015 14:57:45,121 RequestLogFilter INFO :end: /XYZ/api//dassets/customer_name2?deviceModel=iphone?uage=ge&pageSize=1000?reenSize=0640x1136&assetQuality=hq" | rex field=s "(?<customer_name>[^\/]*)\?" | rex field=s "(?<customer_name>[^\/]*)\/localization"

|stats c | eval s="2 Jan 2015 15:18:29,913 RequestLogFilter INFO :end: /XYZ/api//dasset/customer_name3/localization?language=ge&deviceModel=iphone&assetQuality=hq&assetVersion=160?reenSize=0640x1136˜i=2" | rex field=s "(?<customer_name>[^\/]*)\?" | rex field=s "(?<customer_name>[^\/]*)\/localization"

Hope This will helps you !!

Cheers!!!

Thanks,
V

V
0 Karma

vasanthmss
Motivator

Is that helped?

V
0 Karma

alemarzu
Motivator

Try this maddy and let me know if it works.

^(?:.*[\\\/])(?<customer_name>.*)(?:\?\sModel)
0 Karma

maddy1011
Explorer

This gives me an error"
Error in 'SearchParser': Missing a search command before '\'. Error at position '132' of search query

not sure which "\" it's missing.

0 Karma

javiergn
SplunkTrust
SplunkTrust

Assuming your URLs look like the one you mentioned:

yoursearch
| rex field=_raw "\/(?<customer_name>[^\?\/]+)\?"
| stats count by customer_name

maddy1011
Explorer

This works but it omits certain results. Can you help explain the expression "\/(?[^\?\/]+)\?"

0 Karma

javiergn
SplunkTrust
SplunkTrust

Sure. What the regex is doing:

Find forward slash but don't capture it (needs to be escaped): \/
Start a capturing group (parenthesis with label customer_name)
Find 1 or many characters (plus symbol) different (^) from forward slash or question mark (escape needed again): [^\?\/]+
Then find a question mark but do not capture this in your token (outside the parenthesis)

If you give me an example that is not being captured I can help you with the regex.

You can also use regex101.com to test everything. It's a very intuitive page.

maddy1011
Explorer

So I had to dig in a little bit and figured that the endpoint in the URL has different formats. And the one with // is not being captured. I was trying to see if I can list all unique endpoints, but still struggling.

Here are some more examples.

7 Jan 2015 17:12:44,500 RequestLog INFO :end: /XYZ/api/assets/customer_name?
Model=iphone&language=ge&

and
2 Jan 2015 14:57:45,121 RequestLogFilter INFO :end: /XYZ/api//dassets/customer_name2?deviceModel=iphone&language=ge&pageSize=1000&screenSize=0640x1136&assetQuality=hq

this is the one not being captured.
2 Jan 2015 15:18:29,913 RequestLogFilter INFO :end: /XYZ/api//dasset/customer_name3/localization?language=ge&deviceModel=iphone&assetQuality=hq&assetVersion=160&screenSize=0640x1136&api=2

0 Karma

javiergn
SplunkTrust
SplunkTrust

OK, if you know for sure that your customer name is going to be after the third block then you can try the following too:

yoursearch
 | rex field=_raw ":end: (?:\/\/?[^\/]+){3}\/(?<customer_name>[^\?\/]+)"
 | stats count by customer_name

See the following link that used to test this regex:

https://regex101.com/r/tE9xQ9/1

Hope that helps.

Thanks,
J

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...