In \etc\apps\search\local\transforms.conf
I have the following entry - I have checked it agains the file and it now is correct regex
[registrants]
REGEX = /^([0-9\.]+) ([0-9\-]*) ([0-9\-]*) (\[[^\]]+\]) ("[^"]+") ([0-9\-]+) ([0-9\-]+) ("[^"]+") ("[^"]+") ([0-9\-]+) ("[^"]+") ([0-9\.\-]+)/
FORMAT = client_ip::$1 user::$2 profile::$3 timestamp::$4 url::$5 http_status::$6 bytes::$7 junk::$8 user_agent::$9 processing_time_ms::$10 registrant::$11 forward_for::$12
In \etc\apps\search\local\props.conf
I have the following entry
[Apache-registrant-forward]
REPORT-registrants = registrants
SHOULD_LINEMERGE = false
TIME_PREFIX = \[
maxDist = 28
pulldown_type = 1
In the search app I have
sourcetype="Apache-registrant-forward"
The data looks like
1.1.1.1 - - [24/Apr/2013:15:47:11 +0200] "GET /somerest HTTP/1.1" 200 12345 "-" "some useragent" 123 "1234" 111.222.333.444
1.1.1.2 - - [24/Apr/2013:15:47:11 +0200] "GET /somerest HTTP/1.1" 200 78910 "-" "some useragent" 223 "5678" 222.333.444.555
1.1.1.1 - - [24/Apr/2013:15:47:11 +0200] "GET /somerest HTTP/1.1" 200 28356 "-" "some useragent" 323 "2345" 333.444.555.666
e.g. the client_ip is the proxy and the forward_for is the original IP
When I load the log file, I give it a type from the dropdown which shows Apache-registrant-forward
- I am not sure the type it shows is taken from the file I saved.
Questions
UPDATE
Trying Ayn's code
source="C:\\..." | rex "^(?<client_ip>[0-9\.]+) (?<user>[0-9\-]*) (?<profile>[0-9\-]*) (\[[^\]]+\]) (?<url>\"[^\"]+\") (?<http_status>[0-9\-]+) (?<bytes>[0-9\-]+) (?<user_agent>\"[^\"]+\") (?<processing_time_ms>\"[^\"]+\") (?<registrant>[0-9\-]+) (?<forward_for>\"[^\"]+\") ([0-9\.\-]+)"
which ALMOST works, BUT there is a "-" in the source before the useragent, so I added (\"[^\"]+\")
and instantly it fails finding the field names - here is my regex with each on a new line (but in real life it is on one line
source="C:\\..." | rex "
^(?<client_ip>[0-9\.]+)
(?<user>[0-9\-]*)
(?<profile>[0-9\-]*)
(?<timestamp>\[[^\]]+\])
(?<url>\"[^\"]+\")
(?<http_status>[0-9\-]+)
(?<bytes>[0-9\-]+)
(\"[^\"]+\")
(?<user_agent>\"[^\"]+\")
(?<processing_time_ms>\"[^\"]+\")
(?<registrant>[0-9\-]+)
(?<forward_for>[0-9\.\-]+)
"
The REGEX and FORMAT should not be in the props.conf file, but in the transforms.conf, along these lines.
props.conf
[your sourcetype]
REPORT-xyz = my_extractions
transforms.conf
[my_extractions]
REGEX =
FORMAT =
UPDATE:
Another way of extracting the fields is to use DELIMS
and FIELDS
in transforms.conf (instead of REGEX
and FORMAT
; The props.conf is the same (REPORT-somename = my_extractions
), but in transforms.conf, you put;
[my_extractions]
DELIMS = " "
FIELDS = field1 field2 field3 field4 fieldx
DELIMS
can take one or two parameters; the first is the delimeter between values (or key/value pairs), and the (optional) second parameter is the delimeter between key and value. FIELDS
specify the fields in the order they appear in the events. In your case that is probably a simpler approach, since you don't really need to do regex extractions.
Examples:
event format 1: key1:value1; key2:value2; key3:value3
DELIMS = "; ", ":"
event format 2: value1;value2;value3
DELIMS = ";"
event format 3: key1=value1|key2=value2|key3=value3
DELIMS = "|", "="
Also, since your events seem to be single line, you should probably set SHOULD_LINEMERGE = false
in props.conf.
/K
Well, just switch it around as you see fit - the inline rex
suggestion was more for troubleshooting purposes than anything else though so I think if you got it working you should go back to trying the regex in transforms.conf.
lookup can be run inline as well - if using inline rex
the lookup command needs to be after it because otherwise there will be no fields to lookup 😉
Your regex has the user agent in the record extracted into processing_time_ms, your registrant holds the processing time and the forward_for holds the registrant - we are so close I can taste it 😮
Next thing is to look up the registrant in a look up table. Can I do that when I am using an inline REX ?
I understand. It is frustrating. I added (\"[^\"]+\") and in front of the user agent and it stopped showing the field names. It did show data, just the fieldnames disappeared from the list on the left
If it doesn't work, your regex is no longer matching properly. You need to play around with it.
So I added the junk code and it no longer works. - I have
source="C:\..." | rex "^(?
Ah, there is a "-" between the bytes and useragent
Thanks - it initially gave an error due to the cut and paste from the email. It looks like it works when I copy from your comment instead - (except some of the fields are swapped, I think I can fix that)
If you move your extractions into an inline rex
statement, do you see fields then? E.g.
<yourbasesearch> | rex "^(?<client_ip>[0-9\.]+) (?<user>[0-9\-]*) (?<profile>[0-9\-]*) (\[[^\]]+\]) (?<url>\"[^\"]+\") (?<http_status>[0-9\-]+) (?<bytes>[0-9\-]+) (?<user_agent>\"[^\"]+\") (?<processing_time_ms>\"[^\"]+\") (?<registrant>[0-9\-]+) (?<forward_for>\"[^\"]+\") ([0-9\.\-]+)"
Please re-read my question. I believe all regex issues are solved but none of the fieldnames show up in my search
What do you mean by "test of the regex"?
Right, so now you have your configuration directives in the right places, but your regex is off. It's usually a good idea to test your regex using something like regexpal.com, RegExr (http://gskinner.com/RegExr/) or for that matter Splunk's own rex
command inline in a search.
Your regex currently "breaks" at the user agent. You're not looking for quotation marks there even there are quotation marks in the log. A working regex (at least against the sample data you supplied here) would be something like
^([0-9\.]+) ([0-9\-]*) ([0-9\-]*) (\[[^\]]+\]) ("[^"]+") ([0-9\-]+) ([0-9\-]+) ("[^"]+") ("[^"]+") ([0-9\-]+) ("[^"]+") ([0-9\.]+)
Ok, All is as far as I know it the way it should be. I STILL do not see my custom fields. Also when I click on "Show Source" I get the same 5 records that are the odd ones out.
Ahh - thanks. I was staring me blind on this.
It's not me wanting you to fix things, I'm just trying to help you get things working 🙂
I added the fourth group from the end - ("[^"]+") - because without it your regex wouldn't work. The regex I pasted should work, so...
I can understand it can be overwhelming at first 🙂
You don't need to restart anything, changes to search-time extractions take effect immediately so the next time you issue a search your new settings will be used.
I tested your regex at regexpal.com and saw quickly that it wouldn't match your sample data.
What did you find exactly? I tested with javascript
Also how to I restart the extraction? Sorry for all the questions. Splunk is a bit overwhelming when there is a custom thing going on. Just the fact that I can have pros and transforms in several directories and cannot see which is picked up is a problem of its own
The REGEX and FORMAT should not be in the props.conf file, but in the transforms.conf, along these lines.
props.conf
[your sourcetype]
REPORT-xyz = my_extractions
transforms.conf
[my_extractions]
REGEX =
FORMAT =
UPDATE:
Another way of extracting the fields is to use DELIMS
and FIELDS
in transforms.conf (instead of REGEX
and FORMAT
; The props.conf is the same (REPORT-somename = my_extractions
), but in transforms.conf, you put;
[my_extractions]
DELIMS = " "
FIELDS = field1 field2 field3 field4 fieldx
DELIMS
can take one or two parameters; the first is the delimeter between values (or key/value pairs), and the (optional) second parameter is the delimeter between key and value. FIELDS
specify the fields in the order they appear in the events. In your case that is probably a simpler approach, since you don't really need to do regex extractions.
Examples:
event format 1: key1:value1; key2:value2; key3:value3
DELIMS = "; ", ":"
event format 2: value1;value2;value3
DELIMS = ";"
event format 3: key1=value1|key2=value2|key3=value3
DELIMS = "|", "="
Also, since your events seem to be single line, you should probably set SHOULD_LINEMERGE = false
in props.conf.
/K
Extractions take place at search-time though, so if it's for the sake of the extractions you don't need to reindex your data.
"Why would you need to reindex them?" because I have new files with new data in a changed format.