Splunk Search

How do I extract a field from a URL and group by that field?

maddy1011
Explorer

How do I group data and get a count for usage per customer? My data is Time and Event. The event data is a URL and the customer name is somewhere in the URL. How do I group by customer to get a count per customer?
It's something like this and customer_name is what I want to group by

Time Event
1/7/15 5:12:44.469 PM 7 Jan 2015 17:12:44,500 RequestLog INFO :end: /XYZ/api/assets/customer_name?

Model=iphone&language=ge&

0 Karma

vasanthmss
Motivator

Hi Maddy,

Sine your URL format pattern is not same so you need to use two regular expressions,

 | rex field=s "(?<customer_name>[^\/]*)\?" | rex field=s "(?<customer_name>[^\/]*)\/localization"

Explanation:

 | rex field=s "(?<customer_name>[^\/]*)\?" 

The above one helps to grab the customer name before query string, eg /../.../.../..././../.../customer_name?query

| rex field=s "(?<customer_name>[^\/]*)\/localization"

The second one helps to grab the customer name which is before localization. ( you have to add in-case if you find any pattern like localization)

Sample searches:

|stats c | eval s="7 Jan 2015 17:12:44,500 RequestLog INFO :end: /XYZ/api/assets/customer_name? 
Model=iphone?uage=ge&" | rex field=s "(?<customer_name>[^\/]*)\?" | rex field=s "(?<customer_name>[^\/]*)\/localization"

|stats c | eval s="2 Jan 2015 14:57:45,121 RequestLogFilter INFO :end: /XYZ/api//dassets/customer_name2?deviceModel=iphone?uage=ge&pageSize=1000?reenSize=0640x1136&assetQuality=hq" | rex field=s "(?<customer_name>[^\/]*)\?" | rex field=s "(?<customer_name>[^\/]*)\/localization"

|stats c | eval s="2 Jan 2015 15:18:29,913 RequestLogFilter INFO :end: /XYZ/api//dasset/customer_name3/localization?language=ge&deviceModel=iphone&assetQuality=hq&assetVersion=160?reenSize=0640x1136˜i=2" | rex field=s "(?<customer_name>[^\/]*)\?" | rex field=s "(?<customer_name>[^\/]*)\/localization"

Hope This will helps you !!

Cheers!!!

Thanks,
V

V
0 Karma

vasanthmss
Motivator

Is that helped?

V
0 Karma

alemarzu
Motivator

Try this maddy and let me know if it works.

^(?:.*[\\\/])(?<customer_name>.*)(?:\?\sModel)
0 Karma

maddy1011
Explorer

This gives me an error"
Error in 'SearchParser': Missing a search command before '\'. Error at position '132' of search query

not sure which "\" it's missing.

0 Karma

javiergn
Super Champion

Assuming your URLs look like the one you mentioned:

yoursearch
| rex field=_raw "\/(?<customer_name>[^\?\/]+)\?"
| stats count by customer_name

maddy1011
Explorer

This works but it omits certain results. Can you help explain the expression "\/(?[^\?\/]+)\?"

0 Karma

javiergn
Super Champion

Sure. What the regex is doing:

Find forward slash but don't capture it (needs to be escaped): \/
Start a capturing group (parenthesis with label customer_name)
Find 1 or many characters (plus symbol) different (^) from forward slash or question mark (escape needed again): [^\?\/]+
Then find a question mark but do not capture this in your token (outside the parenthesis)

If you give me an example that is not being captured I can help you with the regex.

You can also use regex101.com to test everything. It's a very intuitive page.

maddy1011
Explorer

So I had to dig in a little bit and figured that the endpoint in the URL has different formats. And the one with // is not being captured. I was trying to see if I can list all unique endpoints, but still struggling.

Here are some more examples.

7 Jan 2015 17:12:44,500 RequestLog INFO :end: /XYZ/api/assets/customer_name?
Model=iphone&language=ge&

and
2 Jan 2015 14:57:45,121 RequestLogFilter INFO :end: /XYZ/api//dassets/customer_name2?deviceModel=iphone&language=ge&pageSize=1000&screenSize=0640x1136&assetQuality=hq

this is the one not being captured.
2 Jan 2015 15:18:29,913 RequestLogFilter INFO :end: /XYZ/api//dasset/customer_name3/localization?language=ge&deviceModel=iphone&assetQuality=hq&assetVersion=160&screenSize=0640x1136&api=2

0 Karma

javiergn
Super Champion

OK, if you know for sure that your customer name is going to be after the third block then you can try the following too:

yoursearch
 | rex field=_raw ":end: (?:\/\/?[^\/]+){3}\/(?<customer_name>[^\?\/]+)"
 | stats count by customer_name

See the following link that used to test this regex:

https://regex101.com/r/tE9xQ9/1

Hope that helps.

Thanks,
J

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...