I've scoured the Splunk answers site for all the regex/rex/transforms/props threads and still can't figure this out. The data is the syslog output from a pfSense firewall with the extraneous newline character filtered out on the pfSense side (to make parsing easier). I'm convinced Splunk is somehow not finding a regex match in my log files. I can tell it's picking up my props.conf and transforms.conf files just fine because other elements of these files are working as expected.
My transforms.conf file:
[sourcetype_pfsense_by_proto]
DEST_KEY = MetaData:Sourcetype
REGEX = proto\s(\S+)
FORMAT = sourcetype::pfsense_$1
[pfsenseCommonFields]
REGEX = pf: (?P<duration>\d{2}:\d{2}:\d{2}\.\d{6}) rule (?P<rulenum>\d+/\d+)\((?P<reason>\w+)\): (?P<action>\w+) (?P<direction>\w+) on (?P<interface>[A-Za-z0-9]+): \((?P<ipheader>[A-Za-z0-9, ]*\[[A-Za-z0-9, ]*\][A-Za-z0-9, ]*\([A-Za-z0-9, ]*\)[A-Za-z0-9, ]*)\)\s+(?P<srcip>(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?))\.(?P<srcport>\d{1,5}) > (?P<dstip>(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?))\.(?P<dstport>\d{1,5}):
Don't be thrown off by all the number groupings in the middle, they are simply IP address fields with the definition of an octet (from the transforms.conf documentation page) pasted in for each octet. Here is my props.conf file, although I've tried all three ways (EXTRACT, TRANSFORM and REPORT) with the same results:
[host::10.11.12.13]
TRUNCATE = 0
REPORT-pfsenseCommonFields = pfsenseCommonFields
[source::udp:514]
TRANSFORMS-pfsense_by_proto = sourcetype_pfsense_by_proto
Sample of the data that this is meant to parse:
Dec 10 21:38:20 10.11.12.13 Dec 11 05:38:10 pf: 00:00:08.012132 rule 3/0(match): block in on re0: (tos 0xc0, ttl 64, id 3013, offset 0, flags [DF], proto UDP (17), length 76) 10.11.12.101.123 > 198.55.111.5.123: NTPv4, length 48
Dec 10 21:38:20 10.11.12.13 Dec 11 05:38:10 pf: 00:00:03.050708 rule 89/0(match): pass in on re0: (tos 0x0, ttl 128, id 14314, offset 0, flags [DF], proto TCP (6), length 52) 10.11.12.50.61077 > 54.247.105.180.443: Flags [S], cksum 0x4730 (correct), seq 2560336237, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Dec 10 21:38:20 10.11.12.13 Dec 11 05:38:10 pf: 00:00:02.062553 rule 3/0(match): block in on re0: (tos 0x0, ttl 128, id 10232, offset 0, flags [none], proto UDP (17), length 229) 10.11.12.50.138 > 10.11.12.255.138: NBT UDP PACKET(138)
Dec 10 21:38:20 10.11.12.13 Dec 11 05:38:10 pf: 00:00:00.133234 rule 89/0(match): pass in on re0: (tos 0x0, ttl 128, id 14291, offset 0, flags [DF], proto TCP (6), length 52) 10.11.12.50.61068 > 54.247.105.180.443: Flags [S], cksum 0xe445 (correct), seq 2412318003, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Dec 10 21:38:20 10.11.12.13 Dec 11 05:38:10 pf: 00:00:02.742183 rule 82/0(match): pass in on re0: (tos 0x0, ttl 128, id 31516, offset 0, flags [none], proto UDP (17), length 73) 10.11.12.50.50363 > 10.11.12.13.53: 56921+ A? mcs1-870f.broker.sophos.com. (45)
Dec 10 21:38:20 10.11.12.13 Dec 11 05:38:10 pf: 00:00:02.742183 rule 82/0(match): pass in on re0: (tos 0x0, ttl 128, id 31516, offset 0, flags [none], proto UDP (17), length 73) 10.11.12.50.50363 > 10.11.12.13.53: 56921+ A? mcs1-870f.broker.sophos.com. (45)
Dec 10 21:38:20 10.11.12.13 Dec 11 05:38:10 pf: 00:00:01.999673 rule 3/0(match): block in on re0: (tos 0xc0, ttl 64, id 3012, offset 0, flags [DF], proto UDP (17), length 76) 10.11.12.101.123 > 198.55.111.5.123: NTPv4, length 48
Dec 10 21:38:17 10.11.12.13 Dec 11 05:38:07 pf: 00:00:00.429468 rule 89/0(match): pass in on re0: (tos 0x0, ttl 128, id 14341, offset 0, flags [DF], proto TCP (6), length 52) 10.11.12.50.61199 > 54.247.105.180.443: Flags [S], cksum 0x5232 (correct), seq 977794117, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Dec 10 21:38:17 10.11.12.13 Dec 11 05:38:07 pf: 00:00:10.162735 rule 89/0(match): pass in on re0: (tos 0x0, ttl 128, id 14322, offset 0, flags [DF], proto TCP (6), length 52) 10.11.12.50.61192 > 54.247.105.180.443: Flags [S], cksum 0x340c (correct), seq 1407056092, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Dec 10 21:38:17 10.11.12.13 Dec 11 05:38:07 pf: 00:00:08.012132 rule 3/0(match): block in on re0: (tos 0xc0, ttl 64, id 3013, offset 0, flags [DF], proto UDP (17), length 76) 10.11.12.101.123 > 198.55.111.5.123: NTPv4, length 48
Dec 10 21:38:17 10.11.12.13 Dec 11 05:38:07 pf: 00:00:03.050708 rule 89/0(match): pass in on re0: (tos 0x0, ttl 128, id 14314, offset 0, flags [DF], proto TCP (6), length 52) 10.11.12.50.61077 > 54.247.105.180.443: Flags [S], cksum 0x4730 (correct), seq 2560336237, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
(I'm planning to parse more fields later after I get these common fields working.) When I copy/paste this same regex expression into a rex line, it parses all my fields just fine. Based on a hint from another Splunk Answers thread I suspect the issue is a difference in parsing regex (possibly difference in trimming white spaces) but the other thread wasn't clear on the solution. Here is the rex line that successfully parses the fields:
sourcetype=pfsens* | rex field=_raw "pf: (?P<duration>\d{2}:\d{2}:\d{2}\.\d{6}) rule (?P<rulenum>\d+/\d+)\((?P<reason>\w+)\): (?P<action>\w+) (?P<direction>\w+) on (?P<interface>[A-Za-z0-9]+): \((?P<ipheader>[A-Za-z0-9, ]*\[[A-Za-z0-9, ]*\][A-Za-z0-9, ]*\([A-Za-z0-9, ]*\)[A-Za-z0-9, ]*)\)\s+(?P<srcip>(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?))\.(?P<srcport>\d{1,5}) [\>] (?P<dstip>(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?))\.(?P<dstport>\d{1,5})"
This rex statement will show three sourcetypes (pfsense, pfsense_TCP and pfsense_UDP) under normal conditions, but won't show any of my named fields.
I know it's not an issue with the host:: or source:: stanzas because I've swapped the labels with the TRANSFORMS line that adjusts the sourcetype based on protocol, and the sourcetype rename per protocol continues to work just fine.
More history: I exhausted the 60-day enterprise trial but did not change any of the enterprise settings. This instance has been reverted to a Free license so if it's an embedded/leftover permissions issue I'll need guidance in how to fix it under the hood. My field extractions settings page shows:
Name Type Extraction/Transform Owner App Sharing Status Actions
host::10.11.12.13 : REPORT-pfsenseCommonFields Uses transform pfsenseCommonFields
No owner
system
Global | Permissions Enabled Move | Delete
And my field transformations settings page lists the following:
Name Owner App Sharing Status Actions
pfsenseCommonFields
No owner
system
Global | Permissions Enabled | Disable Clone | Move | Delete
sourcetype_pfsense_by_proto
No owner
system
Global | Permissions Enabled | Disable Clone | Move | Delete
In other words, Splunk seems to be recognizing the elements of the props.conf and transforms.conf files just fine, and permissions are Global all around.
Any help would be greatly appreciated.
... View more