Installation

how to find Number of files failed to ingest for a specific Index

athorat
Communicator

How do I find the Number of files failed to ingest for a specific Index.
Trying to compare files ingested vs files failed to ingest for a specific Index in Splunk.

0 Karma

woodcock
Esteemed Legend

If you have a list of files and how many events are in them, then you can do something like this and cross-reference:

| tstats count AS EventsInThisFile WHERE index=YourIndexNameHere BY source
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Are you looking for files that were once successfully ingested and are no longer being read or files that were never ingested at all?

The former case is a matter of searching some long period (like 30 days) to build a list of expected files then searching a short period (like today) to build a list of current files and comparing the two.

The latter case is more challenging. A source that was never read will not be in Splunk, but you may find an error message in _internal for files that could not be read, perhaps because of permissions. It's possible, of course, for a file to be silently skipped if it's not part of the monitor pattern, for instance.

Please clarify your requirements and we'll try to help.

---
If this reply helps you, Karma would be appreciated.
0 Karma

athorat
Communicator

Thanks for replying on this @richgalloway,aplogize for the delayed reply. yes we are trying to find the files which never reached splunk.
1 by permissions issue
2 OR by files not matching the whitelist pattern or Unknown reasons.

We have have got the count of files per index which are being read/indexed by Splunk UF
| tstats dc(source) WHERE host=10a*pd-* OR host=ew1a-* OR host=dub01pd-* OR host=uw2*-* OR host=ue1-* index=prod-online* by index

Failed attempt Below:
Now we want to list the number of files which are errored out / Not read by the Splunk UF. So for this the same hosts are being used to filter but how do we get that by Index name and have a bar chart comparing the above query?
index=_internal sourcetype=splunkd splunk_server=usw* host=10a*pd-* OR host=ew1a-* OR host=dub01pd-* OR host=uw2*-* OR host=ue1-* log_level=ERROR
| rex field=message "((?.*))"| stats dc(message) by host | sort – message

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Sources that fail are not indexed so you can't get stats by index. I suggest generating stats by source or host.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...