Background
Creating a listing of bad domains based on 2/3/4 levels of a url
Here's the sample list which I created using Eventtypes
[Bad_Domain_Red]
search = sourcetype="bcoat_proxysg" dest_host="*2o7.net" OR dest_host="*123.ddns.org"
[Bad_Domain_Orange]
search = sourcetype="bcoat_proxysg" dest_host="*abc.net"
Upon creating the event type above.
The results which will appear with the tags
e.g.: -
sourcetype="bcoat_proxysg" eventtype="Bad_Domain_Red"
1) dest_host=abc.2o7.net
2) dest_host=splunk.2o7.net
3) dest_host=manwin.2o7.net
4) myportal.123.ddns.org
5) yourportal.123.ddns.org
etc etc
When I want to create a report of counts based on these domains I'm not able to due to the multi levels.
I'm trying to create a report showing
dest_host Count
2o7.net 50
123.ddns.org 100
As you can see above it is suppose to consolidate the counts for sub domains i.e. abc.2o7.net,splunk.2o7.net etc into 2o7.net
Is there anyway to do it? I have about 200 different domains split into 4 different categories based on color coding red,orange,yellow,green.
Some are monitored at the 2nd level while others are monitored at 3rd or 4th level.
Just to update this very old thread, I did a work around to get this to work.
I did additional extractions for 2,3,4 level domains and did taggings for the domains which are supposed to be grouped according to the individual levels.
Thus my reports can be displayed with the specified domain levels.
Just to update this very old thread, I did a work around to get this to work.
I did additional extractions for 2,3,4 level domains and did taggings for the domains which are supposed to be grouped according to the individual levels.
Thus my reports can be displayed with the specified domain levels.
You can use rex
to extract a new field containing the second-level domain, and run your report based on that.
For example:
sourcetype="bcoat_proxysg" eventtype="Bad_Domain_Red" | rex field=hostname "(?<xdomain>([^\.]+.)?[^\.]+$)"
This should pull out a new field named xdomain
which will contain the top two levels. The second-level domain will be optional, in case of unqualified names.
If you want something fancier, this might work:
| rex field=hostname "((?<xhost>[^\.]+)\.)?(?<xdomain>(([^\.]+\.)+)?[^\.]+)"
For hostnames with only one or two components/segments, xdomain
will contain the entire string. When there are at least three components in the name, the first will go into xhost
and xdomain
will contain everything else.
It works because the plus sign at the end of the the ([^\.]+\.)+
section makes it greedy, causing the regex engine to backtrack to find a match, even if it has to steal the text from the initial (non-greedy) match on (?<xhost>[^\.]+)\.)?
It's worth noting that backtracking can be really bad for regex performance, so this isn't ideal. It can probably be cleaned up with more effort, but should get you going.
See also - http://www.splunk.com/base/Documentation/4.1.5/SearchReference/Rex
Yeah, the heuristic approach of assuming that 3-level and greater contain a hostname might not work in some cases. If you're looking at just a fixed list, the more elegant solution may be to not extract the field at all, but to use a lookup table instead.
Thanks I will test that out at the same time I'm thinking of using the collect command together with more granular tagging to differentiate the different types of domains(2/3/4 level) and throwing them into different indexes applying different regular expressions to pull out the wanted domains (in different indexes).
The difficulty is to find out which domain is suppose to be 2/3/4 level as these are all human defined.
Ah, now I understand. Answer edited above, it's more readable there.
As in my example, where I'm looking at reports which may include both.
The need for identifying 3 level domains is because of domains coming from dynamic DNS.
Thanks I've tried that but however the challenge is that some of these domains are 2 levels while some are 3 or 4 levels.
That's where I'm having the problem......
Any suggestions?