Getting Data In

best practice question around CHECK_FOR_HEADER

sideview
SplunkTrust
SplunkTrust

So I've been using CHECK_FOR_HEADER=true for various csv data in some apps I'm building. I've learned a great deal about it recently, but I still have a lot to learn and I wonder if anyone can help me with advice about the following problem.

I'm using guided setup so that the user setting up the app can tell splunk up front which column to use as the timestamp. Specifically, the guided setup writes a value to TIME_PREFIX, and all is well. (I cant really let splunk figure it out because there are a couple other epochTime values in there and I cant allow the ambiguity)

Now the data comes in, and CHECK_FOR_HEADER now does it's really weird thing where it looks at the props stanza [foo], looks at the data, writes another stanza to etc/apps/learned, and calls the sourcetype, [foo-2].

( http://www.splunk.com/base/Documentation/4.1.7/Admin/Extractfieldsfromfileheadersatindextime#props.c... )

Another key ingredient is that I leave links back to the setup page -- the user can always run the app setup again later. The problem is that the CHECK_FOR_HEADER magic has meant that the real config is now hidden in etc/apps/learned. My guided setup's custom handler can write to the main props stanza to its heart's content, but it'll never effect the behavior of this 'learned' sourcetype.

This would maybe be OK if there was any way for the user to go edit etc/apps/learned/props.conf stanzas in Manager, but it looks like there isnt (That is question #1).

So I'm facing a choice of various evils, and I dont know much about any of them:

1) try to make a custom manager that can actually dredge up the learned stanzas. OK the custom manager XML side of this is fine, but the fact that etc/apps/learned is totally invisible in the normal manager pages makes me think that EAI wont even give the stanzas back to me or that it might not let me edit them, or that there might be evil consequences thereof (That is question #2).

2) In my custom rest endpoint, pull out any and all 'learned' stanzas and push config changes to them too as necessary (possibly same problem as above)

3) Tell the user that they have to go dig around in etc/apps/learned and hand-edit props.conf. Sadness.

4) abandon CHECK_FOR_HEADER, switch to setting up the app after the data is indexed, and have some crazy system on setup where I retreive the first events, and turn that text into an extraction. (doable, but nasty. Any paths-less-taken out there? )

advice, EAI lore, and/or cautionary tales?

tia

Tags (2)
1 Solution

alacercogitatus
SplunkTrust
SplunkTrust

I'm going to answer this in the best way possible.

alt text

But seriously, from http://docs.splunk.com/Documentation/Splunk/5.0.4/releasenotes/Deprecatedfeatures:

CHECK_FOR_HEADER props.conf attribute (for index-time field extractions): This feature is deprecated and might be removed in a future release.

View solution in original post

sideview
SplunkTrust
SplunkTrust

The answer these days is to use INDEXED_EXTRACTIONS=csv This has been around since basically 6.0, it's pretty simple to use, and some reference docs are here. http://docs.splunk.com/Documentation/Splunk/6.5.1/Data/Extractfieldsfromfileswithstructureddata

There was a strange time in which this question was askied, in between 5.0.4 and 6.0, when Splunk had deprecated the old CHECK_FOR_HEADER feature but had not actually introduced anything else that could do the same thing without the admin knowing the header fields in advance.

0 Karma

bhawkins1
Communicator

Actually, none of the examples in that document actually use INDEXED_EXTRACTIONS. I will try this feature but so far the only option that has worked for me is a custom regex transform to remove the header.

0 Karma

sideview
SplunkTrust
SplunkTrust

Well, I just noticed that if you click the link on a phone, the nav loads all collapsed and thus the page is quite different. The "reference docs" I was referring to are indeed on this page though - If you click the link on PC or a mac look under "Props.conf attributes for structured data".

As far as I know there are no examples of INDEXED_EXTRACTIONS anywhere on the splunk site (which is sad and strange). I meant to post only the link to the cell in the reference table.

0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

I'm going to answer this in the best way possible.

alt text

But seriously, from http://docs.splunk.com/Documentation/Splunk/5.0.4/releasenotes/Deprecatedfeatures:

CHECK_FOR_HEADER props.conf attribute (for index-time field extractions): This feature is deprecated and might be removed in a future release.

bhawkins1
Communicator

I downvoted this post because: meme answer with broken image and reference to deprecation document that doesn't describe the preferred alternative

0 Karma

sideview
SplunkTrust
SplunkTrust

@alacercogitatus answer was certainly correct at the time. Since then however INDEXED_EXTRACTIONS=csv shipped in 6.0 and that is actually what really has made CHECK_FOR_HEADER obsolete. So I've added a separate answer here, just to clean up this old question a bit.

0 Karma

sideview
SplunkTrust
SplunkTrust

Yes. I switched over as soon as possible and it's a lot better. I wish there was still some way for it to index the header lines, because the header "events" can be quite useful from the app layer. And it didn't actually work on 6.0 for some of my sourcetypes (fixed in 6.0.1), but it has been a vast vast improvement.

0 Karma

ogdin
Splunk Employee
Splunk Employee

sideview
SplunkTrust
SplunkTrust

Also the far larger problem with CHECK_FOR_HEADER is not mentioned in my original question, and that's that it is incompatible with all forms of splunk forwarding and all forms of distributed search.

0 Karma

sideview
SplunkTrust
SplunkTrust

I'm all for deprecation, but I prefer the higher bar for deprecation, where you can only deprecate something if some other better thing exists that can do what the first thing does. In this case there is not. AutoHeader-* rules can be used to wire up specific known-in-advance headers worth of extractions, but to automatically work with all possible headers and do it on the fly, only CHECK_FOR_HEADER can do that today. Possibly there is some secret hotness coming in 6.0.

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...