Getting Data In

Set index time settings (timestamp, linebreak etc) for sourcetype set by transforms

salem34
Path Finder

Hi Ninjas

Im struggling with the following scenario:

I have a heavy forwarder whos collecting a merged data stream called "generic_sourcetype". For example, this stream consists of the following events (format wise):

Event 1

Sep 24 18:22:16 - 209.160.24.63 - - [24/Sep/2017:18:22:16.885] "GET /product.screen?productId=WC-SH-A02&JSESSIONID=SD0SL6FF7ADFF4953 HTTP 1.1" 200 3878 "http://www.google.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5" 349 sourcetype:a

Event 2

Sep 24 00:15:03 - 209.160.24.63 - - Thu Sep 24 2017 00:15:02.554 www1 sshd[4747]: Failed password for invalid user jabber from 118.142.68.222 port 3187 ssh2 sourcetype:b

This comes in as one merged data stream (no i cant influence that) - so i built a "routing" with transforms.conf on the heavy forwarder like:

props.conf

[generic_sourcetype]
TRANSFORMS-route_st = route_st_a, route_st_b

transforms.conf

[route_st_a]
REGEX = sourcetype:a
FORMAT = sourcetype::a
DEST_KEY = MetaData:Sourcetype

[route_st_b]
REGEX = sourcetype:b
FORMAT = sourcetype::b
DEST_KEY = MetaData:Sourcetype

So far so good, this config works fine and i got the two sourcetypes indexed properly. Now the problem I have is the following:
Those two events have a detailed timestamp after the header with milliseconds which i want to use as the indexed timestamp. So i configured parsing settings in props.conf for both of the sourcetypes (a+b) on the heavyforwarder like:

props.conf

[a]
TIME_PREFIX = $ProperSetting
TIME_FORMAT = $ProperSetting
MAX_TIMESTAMP_LOOKAHEAD = $ProperSetting

[b]
TIME_PREFIX = $ProperSetting
TIME_FORMAT = $ProperSetting
MAX_TIMESTAMP_LOOKAHEAD = $ProperSetting

Testing those settings by adding a oneshot with the dedicated sourcetype set during input shows that my configs are correct and the correct timestamp for both events is extracted.

But somehow it does not work with my generic stream, it does split it up but it ignores my timestamp configuration and keeps indexing the first timestamp for both events.

So it seems that the heavyforwarder assigns a timestamp automatically for the generic_sourcetype and then processes the transfomrs for the sourcetype filtering but then sends the events directly instead of "re-parse" them with the given settings for the new sourcetype.

Is this the way splunk handles this kind of data? Or am I missing something (or somewhere)?

Thanks as always

0 Karma

maciep
Champion

I'm pretty sure that is how splunk handles the data. The timestamp recognition happens before your transforms is called and it won't re-evaluate after the new sourcetype is assigned. Typically, the sourcetype assignment is more of a last step to prepare for specific field extractions over on the search head (or indexed extractions)

There's a very nice flow chart here:
https://wiki.splunk.com/Community:HowIndexingWorks

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...