Solved: How to search the total number of hits to URL's th...

roryhewitt · ‎04-08-2015

I'm new-ish to Splunk, so forgive me if I'm not sure of the best way to do this.

Basically, I'm trying to find out two things:

The total number of hits to URL's with a format of /shop/product/{product-name}?ID={product-id}
The number of hits to those URL's which don't also include the CategoryID parameter

I currently have the following search:

sourcetype=ihs_log "GET /shop/product/" | rex field=url "/shop/product/(?<productname>.+?)\?ID=(?<productid>.+?)\&CategoryID=(?<categoryid>.+?)[\&\#$]" | stats count by productid | sort count | reverse

which gives me the total number of hits which include both ID and CategoryID parameters (in that order, one after another), but if I run the same search without the categoryID bit, e.g.:

sourcetype=ihs_log "GET /shop/product/" | rex field=url "/shop/product/(?<productname>.+?)\?ID=(?<productid>.+?)[\&\#$]" | stats count by productid | sort count | reverse

I get the same result. I would expect the second query to give a higher count, since it should include those cases where ID is passed, but where CategoryID is not passed. Or am I misunderstanding?

At any rate, what do I specify to get only the URL's which don't include the CategoryID parameter (no matter whether it appears before or after ID in the query parameters)? and then sort by productid?

Basically what's the regex for does-not-include?

Like I said, it's probably a trivial question...

Thanks!

stephanefotso · ‎04-08-2015

Try this for the first query:

sourcetype=ihs_log "GET /shop/product/" | rex field=url "/shop/product/(?<productname>.+?)\?ID=(?<productid>.+?)\&CategoryID=(?<categoryid>.+?)[\&\#$]" | stats count by productid,categoryid | sort count | reverse

And live the second as it is, and let me know if results are still the same

SGF

View solution in original post

stephanefotso · ‎04-08-2015

Try this for the first query:

sourcetype=ihs_log "GET /shop/product/" | rex field=url "/shop/product/(?<productname>.+?)\?ID=(?<productid>.+?)\&CategoryID=(?<categoryid>.+?)[\&\#$]" | stats count by productid,categoryid | sort count | reverse

And live the second as it is, and let me know if results are still the same

SGF

roryhewitt · ‎04-08-2015

Hmmm. They do give different values. So how come?

And how can I get the number of URL's which don't include CategoryID (or is it simpler just to look at the difference between the two queries?

stephanefotso · ‎04-08-2015

No. just to say, to have the total number of hits which include both productID and CategoryID parameters, you must count by productid and by categoryd: | stats count by productid,categoryid

But if you just want to have the total number by just productID you must count only by productid: | stats count by productid

SGF

roryhewitt · ‎04-09-2015

Aha - so "count by a,b,c" will return only those which include a, b and c.

So is there an example of how to use either regex or rex to return URL's which explicitly don't include a particular query parameter?

stephanefotso · ‎04-09-2015

You use the regular expression to extract fields (parameters) in your events. For example when you do something like this: | rex field=url "/shop/product/(?.+?)\?ID=(?.+?)\&CategoryID=(?.+?)[\&\#$]", you have just extracted three fields ( productname, productid and categoryid), and that fields does not have any effect to the search criteria, and you will decide which field to use in your search criteria, only after the extraction. That is why when you write | stats count by productid | sort count | reverse, you are not taking into acccount the producname, nor the categoryid in your search criteria. You have just extracted them, but you didn't use them in your search criteria

SGF

roryhewitt · ‎04-10-2015

So I admit that I'm still a bit lost 😞

I get that the basic query bit is just the stuff before the first pipe, and then I'm trying to get specific data out from that. That seems to be why the total number of matching events is the same.

What if I don't care about the specifics at all? What if I simply want a count of all events which do (or don't) match certain URL formats (without caring about what the actual values are)? Do I even need to get stats for this?

Basically, I want a total count of all URL's which match this regex

/shop/product/any-value?ID=any-value

which don't include the CategoryID parameter. Can I include that in the basic query?

So I don't need to know what values the ID or CategoryID parameters have, just whether the CategoryID parameter exists in the URL. Am I being too complicated?

How to search the total number of hits to URL's that do NOT include CategoryID?

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!