Getting Data In

Sourcetypes keep on multiplying.

marquiselee
Path Finder

Anyone know how I can remove the excess sourcetypes and prevent this from happening in the future?



alt text

Tags (1)
1 Solution

sideview
SplunkTrust
SplunkTrust

This is a by-product of how Splunk indexes CSV files when you give the file a sourcetype of "csv". It uses a Splunk feature called "check-for-header", where it checks the header row of the csv file, and then creates config for the data input such that at search-time the fields will all get extracted and named correctly.

The problem is that the way it does this, is although you give it the sourcetype "csv", it ends up with the sourcetype of "csv-N". the reason is that two unrelated CSV files being indexed will in general have different header rows, and when that happens, one will get assigned sourcetype of "csv-7", and the next one "csv-8".

If and when it comes across a csv file that matches a header row it's already assigned a number, it'll of course assign that pre-existing sourcetype to the data.

But anyway, it looks like on your instance you've indexed quite a lot of mutually distinct CSV files, so there's a big pile o sourcetypes.

I'm afraid as far as preventing the proliferation of "csv-N" sourcetypes, the answer is to just not use the "csv" sourcetype at all. You can manually configure the "AutoHeader" rules yourself, and once you get the hang of it, there's not much more to it than pasting the header row plus a little config into props.conf and transforms.conf.

There are lots of other less-than-ideal things about check-for-header indexing. For one thing it doesn't work with any kind of forwarding - the "csv-N" config ends up stranded on the forwarder where it does no good. And for another problem, if you actually assign one of the "csv-N" sourcetypes to a data input, you'll get the wrong field names (the ones from the previously indexed CSV-N file. You have to remember to assign it the "csv" sourcetype, and resign yourself to Splunk possibly autogenerating a "csv-N" for it. It's deeply unsatisfying.

View solution in original post

sideview
SplunkTrust
SplunkTrust

This is a by-product of how Splunk indexes CSV files when you give the file a sourcetype of "csv". It uses a Splunk feature called "check-for-header", where it checks the header row of the csv file, and then creates config for the data input such that at search-time the fields will all get extracted and named correctly.

The problem is that the way it does this, is although you give it the sourcetype "csv", it ends up with the sourcetype of "csv-N". the reason is that two unrelated CSV files being indexed will in general have different header rows, and when that happens, one will get assigned sourcetype of "csv-7", and the next one "csv-8".

If and when it comes across a csv file that matches a header row it's already assigned a number, it'll of course assign that pre-existing sourcetype to the data.

But anyway, it looks like on your instance you've indexed quite a lot of mutually distinct CSV files, so there's a big pile o sourcetypes.

I'm afraid as far as preventing the proliferation of "csv-N" sourcetypes, the answer is to just not use the "csv" sourcetype at all. You can manually configure the "AutoHeader" rules yourself, and once you get the hang of it, there's not much more to it than pasting the header row plus a little config into props.conf and transforms.conf.

There are lots of other less-than-ideal things about check-for-header indexing. For one thing it doesn't work with any kind of forwarding - the "csv-N" config ends up stranded on the forwarder where it does no good. And for another problem, if you actually assign one of the "csv-N" sourcetypes to a data input, you'll get the wrong field names (the ones from the previously indexed CSV-N file. You have to remember to assign it the "csv" sourcetype, and resign yourself to Splunk possibly autogenerating a "csv-N" for it. It's deeply unsatisfying.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...