Getting Data In

Best possible way to to create sourcetypes - Server role, place and config file?

thisissplunk
Builder

Forgive me if this has been answered before but my googling has failed me -

I have a forwarder that batches log files to our indexer. The sourcetypes are set on the forwarder in the inputs.conf file. I need to drastically change this and split one sourcetype into many based on log file name. This will require me to make around 6~ sourcetype entries PER INDEX (we have about 20) in the inputs.conf file.

Before I make any big changes, I was wondering if there was an easier or better way of doing this. I simply do not understand where all of the places to create a sourcetype exists and why. For instance, when I google how to make a sourcetype it tells me to edit props.conf.... what?

The data in question is very large. I think it's much too large for an index time transform on the indexer side but I Do not understand the strain of the transorms if any in the first place. What are my options here?

0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

In inputs.conf you tell the data what sourcetype it should take, in props.conf you define settings for that sourcetype such as event breaking, timestamp extraction, etc.
The settings made in inputs.conf can be overridden during parsing, those settings live in props.conf and transforms.conf - for example, consider this:

props.conf
[source::.../my_awesome_file.log*]
TRANSFORMS-set_awesome_sourcetype = set_awesome_sourcetype

transforms.conf
[set_awesome_sourcetype]
REGEX = .
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::awesome

That'll set sourcetype=awesome for any file name starting with my_awesome_file.log - depending on your situation, setting this during parsing can be an option. I'd recommend to fully explore setting things properly during input already.
If you're worried about indexing performance, don't be. First, a single reference machine can easily sustain 20MB/s indexing rate - search load is what kills you down the line, rarely indexing. Second, given that you probably don't know much about props.conf there's a lot of performance to be gained from defining sourcetype settings such as event breaking efficiently, much more than some sourcetype rewriting would usually cost. Here's a good overview: http://docs.splunk.com/Documentation/Splunk/6.3.3/Data/Overviewofeventprocessing

Remember, already-indexed data won't change.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

In inputs.conf you tell the data what sourcetype it should take, in props.conf you define settings for that sourcetype such as event breaking, timestamp extraction, etc.
The settings made in inputs.conf can be overridden during parsing, those settings live in props.conf and transforms.conf - for example, consider this:

props.conf
[source::.../my_awesome_file.log*]
TRANSFORMS-set_awesome_sourcetype = set_awesome_sourcetype

transforms.conf
[set_awesome_sourcetype]
REGEX = .
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::awesome

That'll set sourcetype=awesome for any file name starting with my_awesome_file.log - depending on your situation, setting this during parsing can be an option. I'd recommend to fully explore setting things properly during input already.
If you're worried about indexing performance, don't be. First, a single reference machine can easily sustain 20MB/s indexing rate - search load is what kills you down the line, rarely indexing. Second, given that you probably don't know much about props.conf there's a lot of performance to be gained from defining sourcetype settings such as event breaking efficiently, much more than some sourcetype rewriting would usually cost. Here's a good overview: http://docs.splunk.com/Documentation/Splunk/6.3.3/Data/Overviewofeventprocessing

Remember, already-indexed data won't change.

martin_mueller
SplunkTrust
SplunkTrust

Yeah, the sheer amount of possibilities in Splunk can be overwhelming.

The default unit for indexing performance is GB per day per indexer, 500gb/day usually overwhelms one indexer but usually bores ten indexers.
However, the biggest impact on how much any given hardware can take is search load. Each event is indexed once but searched for, or at least considered to be searched, countless number of times by searches you run. Doing a bit more at index time rarely is an issue.
That being said, it's of course possible to shoot yourself in the foot at any time - for example with less-than-ideal regular expressions running over large sets of data. Matching for your source path isn't one of those cases, you'll be fine (until proven otherwise by actually trying it). After making change, check out the indexing performance dashboards in the distributed management console to look for changes in the CPU usage by the various processes.

0 Karma

thisissplunk
Builder

Thanks again for the extra clarification! I'll keep and eye on the performance dashboards.

0 Karma

thisissplunk
Builder

Thanks. I have added break lines and transforms before. I was mostly confused about all of the areas sourcetypes can truly be set since there seems to be many different ways.

If I'm hearing you correctly, are you saying index time transforms are really not an issue for about 500gb a day? If this is the case, it would be easiest for me to add the new, chopped up sourcetypes in props.conf and transforms.conf like you provided. Would this have any effect on search time performance one way vs the other?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...