I'm new-ish to Splunk, so forgive me if I'm not sure of the best way to do this.
Basically, I'm trying to find out two things:
The total number of hits to URL's with a format of /shop/product/{product-name}?ID={product-id}
The number of hits to those URL's which don't also include the CategoryID parameter
I currently have the following search:
sourcetype=ihs_log "GET /shop/product/" | rex field=url "/shop/product/(?<productname>.+?)\?ID=(?<productid>.+?)\&CategoryID=(?<categoryid>.+?)[\&\#$]" | stats count by productid | sort count | reverse
which gives me the total number of hits which include both ID and CategoryID parameters (in that order, one after another), but if I run the same search without the categoryID bit, e.g.:
sourcetype=ihs_log "GET /shop/product/" | rex field=url "/shop/product/(?<productname>.+?)\?ID=(?<productid>.+?)[\&\#$]" | stats count by productid | sort count | reverse
I get the same result. I would expect the second query to give a higher count, since it should include those cases where ID is passed, but where CategoryID is not passed. Or am I misunderstanding?
At any rate, what do I specify to get only the URL's which don't include the CategoryID parameter (no matter whether it appears before or after ID in the query parameters)? and then sort by productid?
Basically what's the regex for does-not-include?
Like I said, it's probably a trivial question...
Thanks!
... View more