Getting Data In

Indexing Urchin data, specifying timestamps, line breaks

andyspusm
Explorer

Hi,

I am trying to index some processing data from Urchin and having trouble with timestamp recognition and line breaking. I would be happy for each file to be treated as a single event with a timestamp based on the second line of each file, or the file date (local, preferred) or filename (UTC - would need to be converted to local), so that is the direction I've been heading. However, I'm winding up with multiple events.

sample file - /opt/urchin6/data/history/%28NONE%29/splunktest/20110915_134600.log (file date 2011-09-15 09:47 )

------------------------------------------------------
Urchin 6.5.00 (linux2.6_kernel) starting: 20110915 09:46:34
------------------------------------------------------
Processing profile: Winter (on urchin1 6404)

[09:46:37] Logfile: /opt/urchin/remote-logs/web0/urchin_log-20110914
   data lines: 904129 (100%)
   data hits:  342
   data proc:  391.39 MB in 00:00:14  (27.956 MB/sec)
   data range: 2011-09-14 03:33 (-0400) - 2011-09-14 23:39 (-0400)

[09:46:51] Post processing data for 201109
   sessions: 623 (100%)

[09:46:52] Backing up database files for 201109: /opt/urchin6/data/reports/%28NONE%29/Winter/201109-backupv6-20110915051652.zip

[09:46:52] Removing outdated backup for 201109:  /opt/urchin6/data/reports/%28NONE%29/Winter/201109-backupv6-20110913051816.zip

------------------------------------------------------
Urchin 6.5.00 (linux2.6_kernel) finishing: 20110915 09:46:52
------------------------------------------------------

inputs.conf:

[monitor:///opt/urchin6/data/history]
disabled=false
sourcetype = urchin_history

props.conf:

[urchin_history]
LINE_BREAKER = (?!)
SHOULD_LINEMERGE = true
TRUNCATE = 0

This gets indexed as five events with times at 9/14 9:46:51 (1 event), 9/15 9:46:34 (1 event), 9/15 9:46:52 (2), and 9/15 9:47:10 (1 - index time, I think.) When put together with a "| transaction source", the events are out of sequence.

Thanks in advance,
Andy

Tags (2)
0 Karma
1 Solution

andyspusm
Explorer

Could be that gkanapathy was right and I just had the .conf files in the wrong places. Anyway, I got the following from Michael Wegener at Splunk support and this solution is working well. With his permission, I'm sharing here:

On the indexer etc/system/local/inputs.conf:

[monitor:///var/log/urchin]
disabled = 0
followTail = 0
index = test
sourcetype = urchin_history

On the forwarder etc/system/local/props.conf:

[urchin_history]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = -*\rUrchin\s+\d+\.\d+\.\d+\s+\(linux\d+\.\d+_kernel\)\s+starting:\s+\d+\s+\d+:\d+:\d+
MUST_BREAK_AFTER = (Urchin\s+\d+\.\d+\.\d+\s+\(linux\d+\.\d+_kernel\)\s+finishing:\s+\d+\s+\d+:\d+:\d+\r-*|DETAIL:\s+:.*)

Note that in the process of troubleshooting, I changed the monitor location from that mentioned in the original question.

View solution in original post

0 Karma

andyspusm
Explorer

Could be that gkanapathy was right and I just had the .conf files in the wrong places. Anyway, I got the following from Michael Wegener at Splunk support and this solution is working well. With his permission, I'm sharing here:

On the indexer etc/system/local/inputs.conf:

[monitor:///var/log/urchin]
disabled = 0
followTail = 0
index = test
sourcetype = urchin_history

On the forwarder etc/system/local/props.conf:

[urchin_history]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = -*\rUrchin\s+\d+\.\d+\.\d+\s+\(linux\d+\.\d+_kernel\)\s+starting:\s+\d+\s+\d+:\d+:\d+
MUST_BREAK_AFTER = (Urchin\s+\d+\.\d+\.\d+\s+\(linux\d+\.\d+_kernel\)\s+finishing:\s+\d+\s+\d+:\d+:\d+\r-*|DETAIL:\s+:.*)

Note that in the process of troubleshooting, I changed the monitor location from that mentioned in the original question.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

You're almost there. I suspect that your sourcetype may not be getting applied. Put this props.conf on all servers (forwarders and indexers, and search head for good measure) and you'll be fine. (If you want to know more: http://www.splunk.com/wiki/Where_do_I_configure_my_Splunk_settings%3F )

The right set of settings should be:

[urchin_history]
SHOULD_LINEMERGE=false
LINE_BREAKER = (?!)
TIME_PREFIX = starting:
TIME_FORMAT = %Y%m%d %H:%M:%S
0 Karma

andyspusm
Explorer

Thanks gkanapathy, but no dice so far.

I think my sourcetype is getting applied because I see it in my indexed events. My props.conf is in /opt/splunk/etc/deployment-apps/urchin/default on the search head/indexer (all one box) and in /opt/splunkforwarder/etc/apps/urchin/default on the forwarder. Otherwise, the stanza is as you defined it.

I'm still getting four events - unfortunately I don't have the space to post them all here. With reference to the file above (excluding whitespace lines), the events start at lines 1 (-----), 2 (Urchin start), 9 (data range), and 15 (Urchin finish).

0 Karma

Simeon
Splunk Employee
Splunk Employee

You could probably set the LINE_BREAKER to recognize the timestamp. I believe this would do it:

LINE_BREAKER = ^\[\d\d:\d\d:\d\d\]

Alternatively, you can tell Splunk how far to look into the event for the timestamp as well as the timestamp format:

MAX_TIMESTAMP_LOOKAHEAD = 12
TIME_FORMAT =[%H:%M:%S]
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

He does not want multiple events. He only wants one single event for the whole file, and to just use the timestamp at the top and ignore the internal ones.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...