Hi all, I am looking for some help for the following use case.
I have a series of endpoints represented by full URLs logged across a few sources, of which i am trying to normalize to then aggregate on.
I am looking for the resource path, less any optional params. To say, I want to capture everything after the [//] double slash, domain name, first [/] singular slash and end that capture on an optional param [?]
https://answers.splunk.com/answers/ask.html?foo=bar --> Becomes --> answers/ask.html
https://answers.splunk.com/answers/ask.html --> Becomes --> answers/ask.html
http://docs.splunk.com/Documentation --> Becomes --> Documentation
If say your url is already in the field myUrl, then try this:
yourBasequery to get myUrl field
|rex field=myUrl "http(s)*\:\/\/([^\/]+)\/(?<uri>[^\?\s]+)"
OR, try on _raw
yourBasequery to get url field
|rex field=_raw "http(s)*\:\/\/([^\/]+)\/(?<uri>[^\?\s]+)"
rex field=urlField "^[^\/]+\/\/[^\/]+\/(?P<wantedField>[^\s;]+).*"
should pick up all three of your example use cases into the new extracted field named 'wantedField'
You can try the replace OR rex-sed method to update the url field per your guideline. (sample run anywhere sample)
| gentimes start=-1 | eval url="https://answers.splunk.com/answers/ask.html?foo=bar https://answers.splunk.com/answers/ask.html http://docs.splunk.com/Documentation"; | makemv url | table url | mvexpand url
| eval url=replace(url,"^[^\/]+\/\/[^\/]+\/([^\s\?;=]+).*","\1") | ...rest of the query
OR
| gentimes start=-1 | eval url="https://answers.splunk.com/answers/ask.html?foo=bar https://answers.splunk.com/answers/ask.html http://docs.splunk.com/Documentation"; | makemv url | table url | mvexpand url
| rex mode=sed field=url "s/^[^\/]+\/\/[^\/]+\/([^\s\?;=]+).*/\1/" | ...rest of the query
Hi @bcatwork - I saw that you up-voted this answer from somesoni2. If this answer did help to solve your question, please don't forget to click "Accept" below the answer to close out this post. If not, please leave a comment with more feedback. Thanks!