1)Lets say we have a text log file with 50 existing events in it and the file has NOT been indexed by Splunk.
2) When we add that file as a data input, no events are indexed. The source type is NOT seen on the main search screen. The host is NOT seen on the main search screen.
3) Then we turn on our application and it writes a new event to that SAME EXACT FILE, the source type IS then seen on the main search screen. The host IS then seen on the main search screen. Splunk shows 1 event indexed. As our application continues to write new events, the new and only the new events seem to be indexed.
The Splunk people that we were working with thought that the existing events would have been indexed as well.
The timestamps and formats are the same. This is occurring across our system. Not just for one 1.
Our expectation is that the existing 50 events (#1) and the new events would be indexed and searchable. How can we do that?
Splunk may have seen that file before (or thinks it has). Does the file have large headers in it?
You can force Splunk to re-index the entire file with the oneshot command.
opt/splunk/bin/splunk add oneshot -sourcetype test -index main -source /tmp/some_file.txt
NB: This will index the entire file, so lines from the file already in Splunk will be duplicated.
We need to see your input stanza. If followTail is set to true then splunk will tail a new file it sees and not index any historical data in it. This is also explained here: http://splunk-base.splunk.com/answers/57819/when-is-it-appropriate-to-set-followtail-to-true
Splunk may have seen that file before (or thinks it has). Does the file have large headers in it?
You can force Splunk to re-index the entire file with the oneshot command.
opt/splunk/bin/splunk add oneshot -sourcetype test -index main -source /tmp/some_file.txt
NB: This will index the entire file, so lines from the file already in Splunk will be duplicated.
I think we resolved this somehow by doing a few things. We removed the custom index reference in the stanza. Followtail was false (0) previously so that did not fix the problem.
We are new to Splunk and it seems pretty finicky. We are having trouble getting consistent results.
Yes good points. Can you edit your post to include your inputs.conf settings for that sourcetype.
Make sure you don't have such properties like MAX_DAYS_AGO defined incorrectly in your props.conf.
Also make sure you don't have anything like ignore_older_than property in the inputs.conf file.
Either of these cases could explain why you aren't processing the older events but you are processing the newer events.
All of the new events come in no problem?
I assume you tried it a few times and cleaned the index?
This will delete all data and you'll never get it back so be careful. http://docs.splunk.com/Documentation/Splunk/latest/admin/RemovedatafromSplunk
You could create a separate index to test it again if you have everything in the main index. That way, cleaning the index is not really a big deal. I would recommend a call to support if this doesn't get you anything.
That's what the Splunk people are saying as well...we tried that...we even tried it with an Splunk Sales Engineer
When you add an input for a file and choose - 'Continuously index data from a file or directory this Splunk instance can access' - it should index the existing file and then add new events as well.