I am trying to pull in Windows DNS logs, but drop all internal requests. I have been able to get the logs in, and have used a SEDcmd in props.conf to change "(7)outlook(7)company(3)com(0)" to ".outlook.company.com." but it will not drop the internal requests whether I use a regex of pre-sedcmd or post. Am I doing something wrong here? From what I have read in the below links this should be working.
answers.splunk(.)com/answers/35259/best-method-for-pulling-microsoft-dns-logs-with-splunk.html#answer-37702
stratumsecurity(.)com/2012/07/03/splunk-security/#more-896
[win_dns_logs]
TRANSFORMS-set = dropline
SHOULD_LINEMERGE=false
TIME_PREFIX = ^
TIME_FORMAT=%m/%d/%y %H:%M:%S
TZ = US/Eastern
#Fixing url formatting
SEDCMD-win_dns_index = s/\(\d+\)/./g
[dropline]
REGEX = \(.\)[Cc][Oo][Mm][Pp][Aa][Nn][Yy]\(.\)[Cc][Oo][Mm]
#REGEX = \(9\)[Cc][Oo][Mm][Pp][Aa][Nn][Yy]\(3\)[Cc][Oo][Mm]
DEST_KEY = queue
FORMAT = nullQueue
General regex advice: Never do this for matching:
^.*something.*$
That will blow up your regexprocessor's CPU time, especially for non-matching events, because it'll keep trying to apply the starting .*
, fail, backtrack, try again, fail, backtrack, ... and you gain nothing, because ^.*something.*$
matches exactly the same stuff as something
.
The only case I know of where it makes sense is if you're extracting/replacing strings and need the bit before the something
in further processing.
Small general advice: Prefix your Splunk regex with (?i)
to make them case-insensitive so you can write ...company.com...
instead of that mess of character classes.
You do restart that HF after every change, right?
Can you post a sample event?
The prefix for case-insensitivity is a great idea. I know the regex should be as limited as possible, I just wanted to expand its "reach" to see if it was a regex issue or another issue. I am restarting the splunkd service on the HF, would I have to reboot the actual box itself? Raw events are in pastebin below.
http://pastebin(.)com/aVUjhEhZ
Thank you.
Check the sourcetype win_dns_logs in props.conf on the forwarder, to see if it is pre-parsed at the forwarder level.
this is usually the case for csv like formats, look for see INDEXED_EXTRACTIONS settings
see http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/Extractfieldsfromfileheadersatindextime
Both the props and transforms files are on the heavy forwarders, which is where the pre-parsing is done. Unfortunately the windows DNS logs are not CSV so this will not work.
Give this REGEX a shot without the SEDCMD:
(?i)\(\d*\)company\(\d*\)com
Your example has (7), but your regex has (9) so that might fix that.
Update - Still not working even after using the regex below.
Regex = ^.*[Cc][Oo][Mm][Pp][Aa][Nn][Yy](.)[Cc][Oo][Mm].*$/g
Sorry, the regex is correct in the transforms file, I merely changed it to say "company". I would rather keep the sed command because otherwise the events would look like the example below. Also, I have tried with (9) or (.) which should correct the events before or after the sed command. I will try the regex with a wider reach though.
Example of DNS query before sed command
(5)drive(6)google(3)com(0)
Example after sed command
.drive.google.com.