Getting Data In

Splunk stops indexing my data at 42 lines no matter how I try to input it.

vectorsc
Explorer

Good times....good times. Splunk is refusing to index past 42 lines of my data regardless of what I do.

This is WC on my file.

5929 > 86549 > 2439613 report.csv

I have tried the following:

1. Set linemerge to false. This allowed splunk to break my data down in to separate events properly. It is 5929 lines, one event per line.

2. Set this item up as a monitor input on my universal forwarder. 42 great lines of everything OK, the CRUNK it stops.

3. Looked at the file - there are no special characters whatsoever other than EOL at the breaking point.

4. Netcatted the file to a TCP port on the indexer itself. Gave me 42 beautiful events before exploding.

5. Oneshot that thing. No help.

I KNOW that 42 is the meaning of life...but....but...

0 Karma
1 Solution

kristian_kolb
Ultra Champion

Well, I'm more than a bit curious, either you've found a bug - or there is something wrong with the parsing of the timestamps, is my guess. Did you try searching 'All Time', just to make sure that the events are not in the index?

One thing that might be worth testing is to create a new test-index, just to ensure that there is nothing else in it.

Then import the data again into that index. Take note of any DateParserVerbose errors/warnings in splunkd.log.

Have a look at the Manager -> indexes. How many events are there in the new index?

Take a close look a the events you see in the search app. Are the (correct) timestamps parsed correctly? In some cases, where the timestamp in an event is ambiguous/partly missing, Splunk will make a best-effort of finding timestamp information in the event data.

e.g 2012-09-25 1300hours 14.38 kb transferred blah blah

could be interpreted as 2012-09-25 14:38


Also, it'd be good to see your props.conf from the indexer


And yes, a few sample events, even if you have to edit them would really help.

Hope this helps,

Kristian

View solution in original post

vectorsc
Explorer

It was the timestamp algo...but in a super strange fashion. It locked on to another timestamp and everything blew up.

0 Karma

kristian_kolb
Ultra Champion

Well, I'm more than a bit curious, either you've found a bug - or there is something wrong with the parsing of the timestamps, is my guess. Did you try searching 'All Time', just to make sure that the events are not in the index?

One thing that might be worth testing is to create a new test-index, just to ensure that there is nothing else in it.

Then import the data again into that index. Take note of any DateParserVerbose errors/warnings in splunkd.log.

Have a look at the Manager -> indexes. How many events are there in the new index?

Take a close look a the events you see in the search app. Are the (correct) timestamps parsed correctly? In some cases, where the timestamp in an event is ambiguous/partly missing, Splunk will make a best-effort of finding timestamp information in the event data.

e.g 2012-09-25 1300hours 14.38 kb transferred blah blah

could be interpreted as 2012-09-25 14:38


Also, it'd be good to see your props.conf from the indexer


And yes, a few sample events, even if you have to edit them would really help.

Hope this helps,

Kristian

vectorsc
Explorer

It was the timestamp algo...but in a super strange fashion. It locked on to another timestamp and everything blew up.

0 Karma

vectorsc
Explorer

No - exploding is just the term that popped to mind after spending a week building the event extraction for this data. The indexer still runs afterwards.

Unfortunately, I cannot provide a sample data set. This data is highly (instantly fired) confidential, and scrubbing it of proprietary information would likely kill any usefulness due to changing it up.

I can tell you:

a) it is a very large text file, CSV formatted.

b) field extraction does not occur on the indexer, only on the search head.

c) the file usually has a header and then 5000+ lines of single line/single event data.

4) The sourcetype is custom, and the only modifier on the sourcetype is no linemerging.

V) I have piped the data in over a network port with identical results.

f) No clue what goes here, but the differences in line number schemes makes this entry into my list utterly necessary.

0 Karma

yannK
Splunk Employee
Splunk Employee

Very strange, did you checked the splunkd logs ?
the question is :

  • is it the forwarder tailing only part of the file
  • or is the indexer skipping the end of the file
    • check the splunkd logs
  • are the events from the file extracted with a timestamp way out of the previous time range
    • Please search for the index=* source=myfile over all time to see if no events are not somewhere else.
  • are some events skipped because of transforms rules
    • do you have nullqueue filterring ?
0 Karma

yannK
Splunk Employee
Splunk Employee

So this is an indexing time issue.

Please provide the sourcetype applied of your file, and a test sample.

  • Is there a timestamp in your events, and are they in chronological order ?
  • any errors/warnings in the splunkd.log ?
0 Karma

kristian_kolb
Ultra Champion

you say 'exploding' - does the splunkd on the indexer actually stop working?

0 Karma

vectorsc
Explorer

No NQF, have moved from using the forwarder to netcatting straight to the indexer, so I know its dying at the indexing engine and not at the forwarder.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...