Solved: Splunk stops indexing my data at 42 lines no matte...

vectorsc · ‎09-21-2012

Good times....good times. Splunk is refusing to index past 42 lines of my data regardless of what I do.

This is WC on my file.

5929 > 86549 > 2439613 report.csv

I have tried the following:

1. Set linemerge to false. This allowed splunk to break my data down in to separate events properly. It is 5929 lines, one event per line.

2. Set this item up as a monitor input on my universal forwarder. 42 great lines of everything OK, the CRUNK it stops.

3. Looked at the file - there are no special characters whatsoever other than EOL at the breaking point.

4. Netcatted the file to a TCP port on the indexer itself. Gave me 42 beautiful events before exploding.

5. Oneshot that thing. No help.

I KNOW that 42 is the meaning of life...but....but...

kristian_kolb · ‎09-25-2012

Well, I'm more than a bit curious, either you've found a bug - or there is something wrong with the parsing of the timestamps, is my guess. Did you try searching 'All Time', just to make sure that the events are not in the index?

One thing that might be worth testing is to create a new test-index, just to ensure that there is nothing else in it.

Then import the data again into that index. Take note of any DateParserVerbose errors/warnings in splunkd.log.

Have a look at the Manager -> indexes. How many events are there in the new index?

Take a close look a the events you see in the search app. Are the (correct) timestamps parsed correctly? In some cases, where the timestamp in an event is ambiguous/partly missing, Splunk will make a best-effort of finding timestamp information in the event data.

e.g 2012-09-25 1300hours 14.38 kb transferred blah blah

could be interpreted as 2012-09-25 14:38

Also, it'd be good to see your props.conf from the indexer

And yes, a few sample events, even if you have to edit them would really help.

Hope this helps,

Kristian

View solution in original post

vectorsc · ‎11-27-2012

It was the timestamp algo...but in a super strange fashion. It locked on to another timestamp and everything blew up.

kristian_kolb · ‎09-25-2012

Well, I'm more than a bit curious, either you've found a bug - or there is something wrong with the parsing of the timestamps, is my guess. Did you try searching 'All Time', just to make sure that the events are not in the index?

One thing that might be worth testing is to create a new test-index, just to ensure that there is nothing else in it.

Then import the data again into that index. Take note of any DateParserVerbose errors/warnings in splunkd.log.

Have a look at the Manager -> indexes. How many events are there in the new index?

Take a close look a the events you see in the search app. Are the (correct) timestamps parsed correctly? In some cases, where the timestamp in an event is ambiguous/partly missing, Splunk will make a best-effort of finding timestamp information in the event data.

e.g 2012-09-25 1300hours 14.38 kb transferred blah blah

could be interpreted as 2012-09-25 14:38

Also, it'd be good to see your props.conf from the indexer

And yes, a few sample events, even if you have to edit them would really help.

Hope this helps,

Kristian

vectorsc · ‎11-27-2012

It was the timestamp algo...but in a super strange fashion. It locked on to another timestamp and everything blew up.

vectorsc · ‎09-25-2012

No - exploding is just the term that popped to mind after spending a week building the event extraction for this data. The indexer still runs afterwards.

Unfortunately, I cannot provide a sample data set. This data is highly (instantly fired) confidential, and scrubbing it of proprietary information would likely kill any usefulness due to changing it up.

I can tell you:

a) it is a very large text file, CSV formatted.

b) field extraction does not occur on the indexer, only on the search head.

c) the file usually has a header and then 5000+ lines of single line/single event data.

4) The sourcetype is custom, and the only modifier on the sourcetype is no linemerging.

V) I have piped the data in over a network port with identical results.

f) No clue what goes here, but the differences in line number schemes makes this entry into my list utterly necessary.

yannK · ‎09-22-2012

Very strange, did you checked the splunkd logs ?
the question is :

is it the forwarder tailing only part of the file
- checked the tailing processor REST API for the status of the file
- https://localhost:8089/services/admin/inputstatus/TailingProcessor:FileStatus
or is the indexer skipping the end of the file
- check the splunkd logs
are the events from the file extracted with a timestamp way out of the previous time range
- Please search for the index=* source=myfile over all time to see if no events are not somewhere else.
are some events skipped because of transforms rules
- do you have nullqueue filterring ?

yannK · ‎09-24-2012

So this is an indexing time issue.

Please provide the sourcetype applied of your file, and a test sample.

Is there a timestamp in your events, and are they in chronological order ?
any errors/warnings in the splunkd.log ?

kristian_kolb · ‎09-24-2012

you say 'exploding' - does the splunkd on the indexer actually stop working?

vectorsc · ‎09-23-2012

No NQF, have moved from using the forwarder to netcatting straight to the indexer, so I know its dying at the indexing engine and not at the forwarder.

Splunk stops indexing my data at 42 lines no matter how I try to input it.

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!