Splunk Search

Which ways are the best for missing custom extracted fields ?

sunrise
Contributor

Hi Splunkers,

I've encounter the same problems that i cannot get search results of my custom extracted fields.
I previously investigated this situation and I made a conclusion that those log records were not enough for Splunk to recognize the fields. Sample records and reference information are following.

input record : 20130624090015008SOMEWORDS_A20130624090016009SOMEWORDS_B

Here, I want to extract "20130624090015" as a record's time stamp and "20130624090016" as a field.
So I set parameters in props.conf, transforms.conf properly.But I got no result. In details, reference to following question "How to treat the concecutive numbers event ?".

Cannot search based on an extracted field | Splunk Blogs
http://blogs.splunk.com/2011/10/07/cannot-search-based-on-an-extracted-field/

How to treat the concecutive numbers event ? - Splunk Community
http://splunk-base.splunk.com/answers/92483/how-to-treat-the-concecutive-numbers-event

In Splunk blog, it seems that this problem was solves in Splunk 4.3 and after.
But I got this problem in Splunk 5.0.3 on Linux 64bit.
So I don't know what it means.

Anyway, I think there are three solutions for this problem.
1. Treat as other special fields (like host, source, and so on)
2. Search as "search sourcetype=MyEvents | search Myfield=ValidValue".
3. Specify INDEXED_VALUE = false in props.conf file

Actually, 1 is not recommended by Splunk Inc and I also don't want.
What do you think of 2, 3 ?
Which ways are the best ? Or it was the same in internal procedures of Splunk.

Thank you for your help.

0 Karma
1 Solution

Gilberto_Castil
Splunk Employee
Splunk Employee

I do not think you need to worry about specifying the time format for the type of event you've provided. By default Splunk will look through the message and recognize the date and time and will adopt the first instance in the message as the event timestamp.

In the example below, you can appreciate how this behaves, regardless of the format of the event.

alt text


Since your objective is to capture the first and second time definitions as fields, independently of the event timestamp, then a field extraction is necessary.

sourcetype="answers-1372987051" | rex "(?<First_Time>\d{14})\d{3}[a-zA-Z\s]+\_[A-Z]{1}(?<Second_Time>.{14})\d{3}[a-zA-Z\s]+\_[A-Z]{1}?$"
  • (?<First_Time>\d{14}) Capture the
    first 14 digits and assign capture to
    "First_Time"

  • \d{3} Account for the milliseconds
    digits

  • [a-zA-Z\s]+ A bunch of words,
    possibly separated with a space.

  • \_[A-Z]{1} ends with and underscore
    and a single capital letter (I don't know that you need to escape the underscore but ... whatever)

... repeat the same pattern above for the second field in a consecutive fashion.

alt text

Of course now you can automate the extraction by copying to your props.conf:

#props.conf
[answers-1372987051]
EXTRACT-my_fields = (?<First_Time>\d{14})\d{3}[a-zA-Z\s]+\_[A-Z]{1}(?<Second_Time>.{14})\d{3}[a-zA-Z\s]+\_[A-Z]{1}?$




It may be possible that the data sample does not match real-life data exactly. In which case you may need to readjust the regular expression to match the actual pattern in your events.

If it helps, you can also match the fields with simpler regular expressions.

#props.conf
[answers-1372987051]
EXTRACT-my_alt_field_1 = (?<alt_first_time>\d{14})\d{3}
EXTRACT-my_alt_field_2 = _[A-Z]{1}(?<alt_second_time>\d{14})\d{3}

Which will give you similar results as the other procedure.

alt text





At this point you can use the search function along with your fields.

sourcetype="answers-1372987051" First_Time="20130624090015*"

alt text

View solution in original post

0 Karma

Gilberto_Castil
Splunk Employee
Splunk Employee

I do not think you need to worry about specifying the time format for the type of event you've provided. By default Splunk will look through the message and recognize the date and time and will adopt the first instance in the message as the event timestamp.

In the example below, you can appreciate how this behaves, regardless of the format of the event.

alt text


Since your objective is to capture the first and second time definitions as fields, independently of the event timestamp, then a field extraction is necessary.

sourcetype="answers-1372987051" | rex "(?<First_Time>\d{14})\d{3}[a-zA-Z\s]+\_[A-Z]{1}(?<Second_Time>.{14})\d{3}[a-zA-Z\s]+\_[A-Z]{1}?$"
  • (?<First_Time>\d{14}) Capture the
    first 14 digits and assign capture to
    "First_Time"

  • \d{3} Account for the milliseconds
    digits

  • [a-zA-Z\s]+ A bunch of words,
    possibly separated with a space.

  • \_[A-Z]{1} ends with and underscore
    and a single capital letter (I don't know that you need to escape the underscore but ... whatever)

... repeat the same pattern above for the second field in a consecutive fashion.

alt text

Of course now you can automate the extraction by copying to your props.conf:

#props.conf
[answers-1372987051]
EXTRACT-my_fields = (?<First_Time>\d{14})\d{3}[a-zA-Z\s]+\_[A-Z]{1}(?<Second_Time>.{14})\d{3}[a-zA-Z\s]+\_[A-Z]{1}?$




It may be possible that the data sample does not match real-life data exactly. In which case you may need to readjust the regular expression to match the actual pattern in your events.

If it helps, you can also match the fields with simpler regular expressions.

#props.conf
[answers-1372987051]
EXTRACT-my_alt_field_1 = (?<alt_first_time>\d{14})\d{3}
EXTRACT-my_alt_field_2 = _[A-Z]{1}(?<alt_second_time>\d{14})\d{3}

Which will give you similar results as the other procedure.

alt text





At this point you can use the search function along with your fields.

sourcetype="answers-1372987051" First_Time="20130624090015*"

alt text

0 Karma

sunrise
Contributor

Thank you for your help.
I understood completely.

0 Karma

sunrise
Contributor

Thank you for your response, linu1988.
But I don't want to divide the record to two event that is what you said.

0 Karma

linu1988
Champion

I would like to say the thing for timestamp extraction. We have props.conf where we can set
TIME_PREFIX =
MAX_TIMESTAMP_LOOKAHEAD =
TIME_FORMAT =

which are really useful. If we are extracting a field with the same value as a different field type then we have to extact it with that intention. Proper regex usage will solve the second field extraction. Which ever TIME_FORMAT you mention with the proper values in props will extract the time field. Better you do the test in the GUI then set it in conf files.

0 Karma

sunrise
Contributor

Thank you for your response, TimMc. But this is not the problem what you said. How do you think of 2, 3 ?

0 Karma

TimMc
Explorer

Configure timestamp recognition for the first timestamp so that it becomes the time of the event.

Extract the second timestamp as a field so that the event pretty much just has one field.

Then consider using transactions to combine events (if that's what you want).

Tim.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...