Getting Data In

extracting timestamp from log with one date and multiple time fields

imrago
Contributor

Hi,

I am unable to extract a valid _time from the following log:

0168 004 07:59:03 09:01:35 0062 asdfghj ee bonfanyti Y                                             P1233443P       443386 0012 07:59:17    dial_in  1                                  1234 N N                                                       34567654555 000523456778 0000 09/20/10  0                                                                                                                                                                                                 1624443                                          01
0344 003 07:58:33 09:01:36 0063 Ssdfas Fd asdfffftim Y                                             P5243343P       455483 0032 07:58:48    dial_in  1                                  7950 N N                                                                   000234234218 0000 09/20/10  0                                                                                                                                                                                                 1624443                                          01
0433 007 08:00:14 09:01:36 0061 ewrwreerer asdfsdfff N                                             P5243443P       451333 0061 08:00:30    dial_in 19                                  7952 N N                                                       58916588270 000522349181 0000 09/20/10  0                                                                                                                                                                                                 5673443                                          01

timestamps I would like to extract are:

1) 09/20/10 07:59:03

2) 09/20/10 07:58:33

3) 09/20/10 08:00:14

Reading the documentation I have figured out that I can only extract it using a custom datetime.xml

I have tried to construct a datatime.xml:

<datetime>
<define name="ccm_1_date" extract="month,day,year,">
    <text><![CDATA[\s+\d+\s(\d+)/(\d+)/(\d+)]]></text>
  </define>
  <define name="ccm_1_time" extract="second,minute,hour,">
    <text><![CDATA[\s\d+:\d+:\d+\s]]></text>
  </define>

  <timePatterns>
    <use name="ccm_1_time"/>
  </timePatterns>
  <datePatterns>
    <use name="ccm_1_date"/>
  </datePatterns>

</datetime>

The date pattern is probably good, but the time pattern is suspicious.

props.conf:


[host::ccm]

SHOULD_LINEMERGE = false

DATETIME_CONFIG = /etc/apps/search/local/datetime.xml

MAX_TIMESTAMP_LOOKAHEAD = 300

Any help would be appreciated.

Tags (2)
0 Karma
1 Solution

elusive
Splunk Employee
Splunk Employee

Try the following datetime.xml:

<datetime>
    <define name="ccm_1_date" extract="month,day,year">
        <text><![CDATA[0000\s(\d{2})/(\d{2})/(\d{2})]]></text>
    </define>
    <define name="ccm_1_time" extract="hour,minute,second">
        <text><![CDATA[\*\*\*\s(\d{2}):(\d{2}):(\d{2})]]></text>
    </define>
    <define name="ccm_2_time" extract="hour,minute,second">
        <text><![CDATA[\d{3}s(\d{2}):(\d{2}):(\d{2})]]></text>
    </define>
    <timePatterns>
      <use name="ccm_1_time"/> 
      <use name="ccm_2_time"/>
    </timePatterns>
    <datePatterns>
      <use name="ccm_1_date"/> 
    </datePatterns>
</datetime>      

Also, I don't recommend updating the default datetime.xml. During upgrade your configuration will be overwritten. Name it something like datetime2.xml and specify this in your props.conf with DATETIME_CONF. ie:

[extracttime]
SHOULD_LINEMERGE = false
DATETIME_CONF=\etc\garfield.xml
MAX_TIMESTAMP_LOOKAHEAD = 1000

View solution in original post

0 Karma

elusive
Splunk Employee
Splunk Employee

Try the following datetime.xml:

<datetime>
    <define name="ccm_1_date" extract="month,day,year">
        <text><![CDATA[0000\s(\d{2})/(\d{2})/(\d{2})]]></text>
    </define>
    <define name="ccm_1_time" extract="hour,minute,second">
        <text><![CDATA[\*\*\*\s(\d{2}):(\d{2}):(\d{2})]]></text>
    </define>
    <define name="ccm_2_time" extract="hour,minute,second">
        <text><![CDATA[\d{3}s(\d{2}):(\d{2}):(\d{2})]]></text>
    </define>
    <timePatterns>
      <use name="ccm_1_time"/> 
      <use name="ccm_2_time"/>
    </timePatterns>
    <datePatterns>
      <use name="ccm_1_date"/> 
    </datePatterns>
</datetime>      

Also, I don't recommend updating the default datetime.xml. During upgrade your configuration will be overwritten. Name it something like datetime2.xml and specify this in your props.conf with DATETIME_CONF. ie:

[extracttime]
SHOULD_LINEMERGE = false
DATETIME_CONF=\etc\garfield.xml
MAX_TIMESTAMP_LOOKAHEAD = 1000
0 Karma

southeringtonp
Motivator

Your regex in ccm_1_time does not capture any groups.

Try adding parentheses to capture each value, and make sure that the timestamp regex only matches the first set of colon-delimited digits...

<datetime>
    <define name="ccm_1_date" extract="day,month,year,">
        <text><![CDATA[\s+\d+\s(\d+)/(\d+)/(\d+)]]></text>
    </define>
    <define name="ccm_1_time" extract="hour,minute,second,">
        <text><![CDATA[^(?:\d+\s)+(\d+):(\d+):(\d+)\s]]></text>
    </define>

    <timePatterns>
        <use name="ccm_1_time"/>
    </timePatterns>
    <datePatterns>
        <use name="ccm_1_date"/>
    </datePatterns>
</datetime>
0 Karma

southeringtonp
Motivator

The order is still wrong; it would need to be "day,month,year,". The regex looks like it should match, but in your sample data the second part is 20, which isn't a valid month.

0 Karma

imrago
Contributor

found a typo, the correct time line is :
<![CDATA[^(?:\d+\s)+(\d+):(\d+):(\d+)\s]]>

Now the time part is correctly recognised, date part is still not working as it should. What could be the problem with:


<![CDATA[\s+\d+\s(\d+)/(\d+)/(\d+)]]>

0 Karma

imrago
Contributor

modified accordingly, _time is again 9/23/10 9:50:47.000 PM

0 Karma

southeringtonp
Motivator

Did't notice the field order - modified above to correct.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

also, the extract must be in order of the capture groups. use hour, minute, second instead of second, minute, hour.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

because if Splunk fails to get a date or time from the data, it next tries the file/source name, and then the mod time of the file. http://www.splunk.com/base/Documentation/latest/Admin/HowSplunkextractstimestamps#Precedence_rules_f...

0 Karma

imrago
Contributor

9/23/10 9:50:47.000 PM is the time of last modificaton of the log file. Why is it used instead of the intended fields?

0 Karma

imrago
Contributor

No, it does not. Every event has the same _time field :
9/23/10 9:50:47.000 PM

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...