I'm trying to extract fields for a Barracuda Spam Firewall. For those deeply interested, they've politely documented their syslog output here.
I've gotten as far as the following regex:
(?:[^\s\n]*\s){5}(?<barracuda_process>[\w/]*)\[(?<barracuda_pid>\d*)\]:\s(?<client_ip>.*\]|127.0.0.1)\s(?<message_id>[\w\d-]*)\s(?<start_time>\d*)\s(?<end_time>\d*)\s(?<service>RECV|SCAN|SEND)\s(?<info>.*)
The problem is that my info field should really be multiple fields based on the value of the service field.
For example, if service="SCAN", the subsequent fields should be:
Encrypted Sender Recipient Score Action Reason ReasonExtra "SUBJ:"Subject
While, if service="SEND", the subsequent fields should be:
Encrypted Action QueueID Response
What's the best way to get these fields extracted?
Thanks, all 'yall!
gpullis, if you are satisfied with one of the answers don't forget to 'accept' it by clicking the outlined check-box to the left of it.
I believe the more "modern" way to do what dwaddle suggested would be to use the EXTRACT keyword in only the props.conf. Each extract will be run against each event, so you can part it out appropriately. Something like:
[cuda]
EXTRACT-common = (?:[^\s\n]*\s){5}(?<barracuda_process>[\w/]*)\[(?<barracuda_pid>\d*)\]:\s(?<client_ip>.*\]|127.0.0.1)\s(?<message_id>[\w\d-]*)\s(?<start_time>\d*)\s(?<end_time>\d*)
EXTRACT-scan_msg = (?<service>SCAN)\s(?<encrypted>\w+)\s(?<sender>\w+)
EXTRACT-send_msg = (?<service>SEND)\s(?<encrypted>\w+)\s(?<action>\w+)
In props.conf
you can define two different REPORT
rules, one for service=SCAN
and the other for service=SEND
. Something like this:
(props.conf)
[cuda]
REPORT-scan=cudascan
REPORT-send=cudasend
(transforms.conf)
[cudascan]
REGEX=(?:[^\s\n]*\s){5}([\w/]*)\[(\d*)\]:\s(.*\]|127.0.0.1)\s([\w\d-]*)\s(\d*)\s(\d*)\s(SCAN)\s(.*)
FORMAT= barracuda_process::$2 barracuda_pid::$3 client_ip::$4 message_id::$5 start_time::$6 end_time::$7 service::$8 info::$9
[cudasend]
REGEX=(?:[^\s\n]*\s){5}([\w/]*)\[(\d*)\]:\s(.*\]|127.0.0.1)\s([\w\d-]*)\s(\d*)\s(\d*)\s(SEND)\s(.*)
FORMAT= barracuda_process::$2 barracuda_pid::$3 client_ip::$4 message_id::$5 start_time::$6 end_time::$7 service::$8 info::$9
On each event, both of these regexes will be tested, but only one will fire - SCAN versus SEND. My regexes obviously need some work to be entirely correct in the scan versus send situation, but this should let you differentiate between the two and grab fields accordingly. (Notice there should be a third rule for RECV.)
If you have a couple of (sanitized) samples of each event type, go ahead and edit your question with those - someone can probably help get your regex nailed down for that. Also, have you seen regexr? http://gskinner.com/RegExr/
Your method makes sense, but my implementation has some suck in it. I've posted my failing as a separate question
here.
I like mw's idea of using the EXTRACT keyword, but I'm failing in the same way when I try to implement it.
Hey did you ever get this figured out? Don't want to reinvent the wheel and I would like to get field extraction working for these logs as well.