Splunk Search

Field names specified in props.conf do not show in search app

mplungjan
Path Finder

In \etc\apps\search\local\transforms.conf I have the following entry - I have checked it agains the file and it now is correct regex

[registrants]
REGEX = /^([0-9\.]+) ([0-9\-]*) ([0-9\-]*) (\[[^\]]+\]) ("[^"]+") ([0-9\-]+) ([0-9\-]+) ("[^"]+") ("[^"]+") ([0-9\-]+) ("[^"]+") ([0-9\.\-]+)/
FORMAT = client_ip::$1 user::$2 profile::$3 timestamp::$4 url::$5 http_status::$6 bytes::$7 junk::$8 user_agent::$9 processing_time_ms::$10 registrant::$11 forward_for::$12

In \etc\apps\search\local\props.conf I have the following entry

[Apache-registrant-forward]
REPORT-registrants = registrants
SHOULD_LINEMERGE = false
TIME_PREFIX = \[
maxDist = 28
pulldown_type = 1

In the search app I have

sourcetype="Apache-registrant-forward"

The data looks like

1.1.1.1 - - [24/Apr/2013:15:47:11 +0200] "GET /somerest HTTP/1.1" 200 12345 "-" "some useragent" 123 "1234" 111.222.333.444
1.1.1.2 - - [24/Apr/2013:15:47:11 +0200] "GET /somerest HTTP/1.1" 200 78910 "-" "some useragent" 223 "5678" 222.333.444.555
1.1.1.1 - - [24/Apr/2013:15:47:11 +0200] "GET /somerest HTTP/1.1" 200 28356 "-" "some useragent" 323 "2345" 333.444.555.666

e.g. the client_ip is the proxy and the forward_for is the original IP

When I load the log file, I give it a type from the dropdown which shows Apache-registrant-forward - I am not sure the type it shows is taken from the file I saved.

Questions

  1. I want the regex to be used for all log files I add - I would expect it to go in my system/local folder - is that correct? It is now (due to suggestions here) in my app/search/local folder
  2. how do I tell the search app to use my regex and show me the registrant entry?

UPDATE

Trying Ayn's code

source="C:\\..."  | rex "^(?<client_ip>[0-9\.]+) (?<user>[0-9\-]*) (?<profile>[0-9\-]*) (\[[^\]]+\]) (?<url>\"[^\"]+\") (?<http_status>[0-9\-]+) (?<bytes>[0-9\-]+) (?<user_agent>\"[^\"]+\") (?<processing_time_ms>\"[^\"]+\") (?<registrant>[0-9\-]+) (?<forward_for>\"[^\"]+\") ([0-9\.\-]+)"

which ALMOST works, BUT there is a "-" in the source before the useragent, so I added (\"[^\"]+\") and instantly it fails finding the field names - here is my regex with each on a new line (but in real life it is on one line

 source="C:\\..."   | rex "
 ^(?<client_ip>[0-9\.]+) 
  (?<user>[0-9\-]*) 
  (?<profile>[0-9\-]*) 
  (?<timestamp>\[[^\]]+\]) 
  (?<url>\"[^\"]+\") 
  (?<http_status>[0-9\-]+) 
  (?<bytes>[0-9\-]+) 
  (\"[^\"]+\") 
  (?<user_agent>\"[^\"]+\") 
  (?<processing_time_ms>\"[^\"]+\") 
  (?<registrant>[0-9\-]+) 
  (?<forward_for>[0-9\.\-]+)
  "
0 Karma
1 Solution

kristian_kolb
Ultra Champion

The REGEX and FORMAT should not be in the props.conf file, but in the transforms.conf, along these lines.

props.conf

[your sourcetype]
REPORT-xyz = my_extractions

transforms.conf

[my_extractions]
REGEX = 
FORMAT =

http://docs.splunk.com/Documentation/Splunk/5.0.2/Knowledge/Createandmaintainsearch-timefieldextract...


UPDATE:

Another way of extracting the fields is to use DELIMS and FIELDS in transforms.conf (instead of REGEX and FORMAT; The props.conf is the same (REPORT-somename = my_extractions), but in transforms.conf, you put;

[my_extractions]
DELIMS = " "
FIELDS = field1 field2 field3 field4 fieldx

DELIMS can take one or two parameters; the first is the delimeter between values (or key/value pairs), and the (optional) second parameter is the delimeter between key and value. FIELDS specify the fields in the order they appear in the events. In your case that is probably a simpler approach, since you don't really need to do regex extractions.

Examples:

event format 1: key1:value1; key2:value2; key3:value3
DELIMS = "; ", ":"

event format 2: value1;value2;value3
DELIMS = ";"

event format 3: key1=value1|key2=value2|key3=value3
DELIMS = "|", "="

Also, since your events seem to be single line, you should probably set SHOULD_LINEMERGE = false in props.conf.

/K

View solution in original post

Ayn
Legend

Well, just switch it around as you see fit - the inline rex suggestion was more for troubleshooting purposes than anything else though so I think if you got it working you should go back to trying the regex in transforms.conf.

lookup can be run inline as well - if using inline rex the lookup command needs to be after it because otherwise there will be no fields to lookup 😉

0 Karma

mplungjan
Path Finder

Your regex has the user agent in the record extracted into processing_time_ms, your registrant holds the processing time and the forward_for holds the registrant - we are so close I can taste it 😮

0 Karma

mplungjan
Path Finder

Next thing is to look up the registrant in a look up table. Can I do that when I am using an inline REX ?

0 Karma

mplungjan
Path Finder

I understand. It is frustrating. I added (\"[^\"]+\") and in front of the user agent and it stopped showing the field names. It did show data, just the fieldnames disappeared from the list on the left

0 Karma

Ayn
Legend

If it doesn't work, your regex is no longer matching properly. You need to play around with it.

0 Karma

mplungjan
Path Finder

So I added the junk code and it no longer works. - I have
source="C:\..." | rex "^(?[0-9.]+) (?[0-9-]) (?[0-9-]) ([[^]]+]) (?\"[^\"]+\") (?[0-9-]+) (?[0-9-]+) (\"[^\"]+\") (?\"[^\"]+\") (?\"[^\"]+\") (?[0-9-]+) (?\"[^\"]+\") ([0-9.-]+)"

0 Karma

mplungjan
Path Finder

Ah, there is a "-" between the bytes and useragent

0 Karma

mplungjan
Path Finder

Thanks - it initially gave an error due to the cut and paste from the email. It looks like it works when I copy from your comment instead - (except some of the fields are swapped, I think I can fix that)

0 Karma

Ayn
Legend

If you move your extractions into an inline rex statement, do you see fields then? E.g.

<yourbasesearch> | rex "^(?<client_ip>[0-9\.]+) (?<user>[0-9\-]*) (?<profile>[0-9\-]*) (\[[^\]]+\]) (?<url>\"[^\"]+\") (?<http_status>[0-9\-]+) (?<bytes>[0-9\-]+) (?<user_agent>\"[^\"]+\") (?<processing_time_ms>\"[^\"]+\") (?<registrant>[0-9\-]+) (?<forward_for>\"[^\"]+\") ([0-9\.\-]+)"

mplungjan
Path Finder

Please re-read my question. I believe all regex issues are solved but none of the fieldnames show up in my search

0 Karma

Ayn
Legend

What do you mean by "test of the regex"?

0 Karma

Ayn
Legend

Right, so now you have your configuration directives in the right places, but your regex is off. It's usually a good idea to test your regex using something like regexpal.com, RegExr (http://gskinner.com/RegExr/) or for that matter Splunk's own rex command inline in a search.

Your regex currently "breaks" at the user agent. You're not looking for quotation marks there even there are quotation marks in the log. A working regex (at least against the sample data you supplied here) would be something like

^([0-9\.]+) ([0-9\-]*) ([0-9\-]*) (\[[^\]]+\]) ("[^"]+") ([0-9\-]+) ([0-9\-]+) ("[^"]+") ("[^"]+") ([0-9\-]+) ("[^"]+") ([0-9\.]+)

mplungjan
Path Finder

Ok, All is as far as I know it the way it should be. I STILL do not see my custom fields. Also when I click on "Show Source" I get the same 5 records that are the odd ones out.

0 Karma

mplungjan
Path Finder

Ahh - thanks. I was staring me blind on this.

0 Karma

Ayn
Legend

It's not me wanting you to fix things, I'm just trying to help you get things working 🙂

I added the fourth group from the end - ("[^"]+") - because without it your regex wouldn't work. The regex I pasted should work, so...

0 Karma

Ayn
Legend

I can understand it can be overwhelming at first 🙂

You don't need to restart anything, changes to search-time extractions take effect immediately so the next time you issue a search your new settings will be used.

I tested your regex at regexpal.com and saw quickly that it wouldn't match your sample data.

0 Karma

mplungjan
Path Finder

What did you find exactly? I tested with javascript
Also how to I restart the extraction? Sorry for all the questions. Splunk is a bit overwhelming when there is a custom thing going on. Just the fact that I can have pros and transforms in several directories and cannot see which is picked up is a problem of its own

0 Karma

kristian_kolb
Ultra Champion

The REGEX and FORMAT should not be in the props.conf file, but in the transforms.conf, along these lines.

props.conf

[your sourcetype]
REPORT-xyz = my_extractions

transforms.conf

[my_extractions]
REGEX = 
FORMAT =

http://docs.splunk.com/Documentation/Splunk/5.0.2/Knowledge/Createandmaintainsearch-timefieldextract...


UPDATE:

Another way of extracting the fields is to use DELIMS and FIELDS in transforms.conf (instead of REGEX and FORMAT; The props.conf is the same (REPORT-somename = my_extractions), but in transforms.conf, you put;

[my_extractions]
DELIMS = " "
FIELDS = field1 field2 field3 field4 fieldx

DELIMS can take one or two parameters; the first is the delimeter between values (or key/value pairs), and the (optional) second parameter is the delimeter between key and value. FIELDS specify the fields in the order they appear in the events. In your case that is probably a simpler approach, since you don't really need to do regex extractions.

Examples:

event format 1: key1:value1; key2:value2; key3:value3
DELIMS = "; ", ":"

event format 2: value1;value2;value3
DELIMS = ";"

event format 3: key1=value1|key2=value2|key3=value3
DELIMS = "|", "="

Also, since your events seem to be single line, you should probably set SHOULD_LINEMERGE = false in props.conf.

/K

Ayn
Legend

Extractions take place at search-time though, so if it's for the sake of the extractions you don't need to reindex your data.

0 Karma

mplungjan
Path Finder

"Why would you need to reindex them?" because I have new files with new data in a changed format.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...