Hi,
I am trying to index some processing data from Urchin and having trouble with timestamp recognition and line breaking. I would be happy for each file to be treated as a single event with a timestamp based on the second line of each file, or the file date (local, preferred) or filename (UTC - would need to be converted to local), so that is the direction I've been heading. However, I'm winding up with multiple events.
sample file - /opt/urchin6/data/history/%28NONE%29/splunktest/20110915_134600.log (file date 2011-09-15 09:47 )
------------------------------------------------------
Urchin 6.5.00 (linux2.6_kernel) starting: 20110915 09:46:34
------------------------------------------------------
Processing profile: Winter (on urchin1 6404)
[09:46:37] Logfile: /opt/urchin/remote-logs/web0/urchin_log-20110914
data lines: 904129 (100%)
data hits: 342
data proc: 391.39 MB in 00:00:14 (27.956 MB/sec)
data range: 2011-09-14 03:33 (-0400) - 2011-09-14 23:39 (-0400)
[09:46:51] Post processing data for 201109
sessions: 623 (100%)
[09:46:52] Backing up database files for 201109: /opt/urchin6/data/reports/%28NONE%29/Winter/201109-backupv6-20110915051652.zip
[09:46:52] Removing outdated backup for 201109: /opt/urchin6/data/reports/%28NONE%29/Winter/201109-backupv6-20110913051816.zip
------------------------------------------------------
Urchin 6.5.00 (linux2.6_kernel) finishing: 20110915 09:46:52
------------------------------------------------------
inputs.conf:
[monitor:///opt/urchin6/data/history]
disabled=false
sourcetype = urchin_history
props.conf:
[urchin_history]
LINE_BREAKER = (?!)
SHOULD_LINEMERGE = true
TRUNCATE = 0
This gets indexed as five events with times at 9/14 9:46:51 (1 event), 9/15 9:46:34 (1 event), 9/15 9:46:52 (2), and 9/15 9:47:10 (1 - index time, I think.) When put together with a "| transaction source", the events are out of sequence.
Thanks in advance,
Andy
Could be that gkanapathy was right and I just had the .conf files in the wrong places. Anyway, I got the following from Michael Wegener at Splunk support and this solution is working well. With his permission, I'm sharing here:
On the indexer etc/system/local/inputs.conf:
[monitor:///var/log/urchin]
disabled = 0
followTail = 0
index = test
sourcetype = urchin_history
On the forwarder etc/system/local/props.conf:
[urchin_history]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = -*\rUrchin\s+\d+\.\d+\.\d+\s+\(linux\d+\.\d+_kernel\)\s+starting:\s+\d+\s+\d+:\d+:\d+
MUST_BREAK_AFTER = (Urchin\s+\d+\.\d+\.\d+\s+\(linux\d+\.\d+_kernel\)\s+finishing:\s+\d+\s+\d+:\d+:\d+\r-*|DETAIL:\s+:.*)
Note that in the process of troubleshooting, I changed the monitor location from that mentioned in the original question.
Could be that gkanapathy was right and I just had the .conf files in the wrong places. Anyway, I got the following from Michael Wegener at Splunk support and this solution is working well. With his permission, I'm sharing here:
On the indexer etc/system/local/inputs.conf:
[monitor:///var/log/urchin]
disabled = 0
followTail = 0
index = test
sourcetype = urchin_history
On the forwarder etc/system/local/props.conf:
[urchin_history]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = -*\rUrchin\s+\d+\.\d+\.\d+\s+\(linux\d+\.\d+_kernel\)\s+starting:\s+\d+\s+\d+:\d+:\d+
MUST_BREAK_AFTER = (Urchin\s+\d+\.\d+\.\d+\s+\(linux\d+\.\d+_kernel\)\s+finishing:\s+\d+\s+\d+:\d+:\d+\r-*|DETAIL:\s+:.*)
Note that in the process of troubleshooting, I changed the monitor location from that mentioned in the original question.
You're almost there. I suspect that your sourcetype may not be getting applied. Put this props.conf on all servers (forwarders and indexers, and search head for good measure) and you'll be fine. (If you want to know more: http://www.splunk.com/wiki/Where_do_I_configure_my_Splunk_settings%3F )
The right set of settings should be:
[urchin_history]
SHOULD_LINEMERGE=false
LINE_BREAKER = (?!)
TIME_PREFIX = starting:
TIME_FORMAT = %Y%m%d %H:%M:%S
Thanks gkanapathy, but no dice so far.
I think my sourcetype is getting applied because I see it in my indexed events. My props.conf is in /opt/splunk/etc/deployment-apps/urchin/default on the search head/indexer (all one box) and in /opt/splunkforwarder/etc/apps/urchin/default on the forwarder. Otherwise, the stanza is as you defined it.
I'm still getting four events - unfortunately I don't have the space to post them all here. With reference to the file above (excluding whitespace lines), the events start at lines 1 (-----), 2 (Urchin start), 9 (data range), and 15 (Urchin finish).
You could probably set the LINE_BREAKER to recognize the timestamp. I believe this would do it:
LINE_BREAKER = ^\[\d\d:\d\d:\d\d\]
Alternatively, you can tell Splunk how far to look into the event for the timestamp as well as the timestamp format:
MAX_TIMESTAMP_LOOKAHEAD = 12
TIME_FORMAT =[%H:%M:%S]
He does not want multiple events. He only wants one single event for the whole file, and to just use the timestamp at the top and ignore the internal ones.