Splunk Search

Splunk search stuck

omesh4sv
New Member

Since upgrade from version 6.3.2 to 6.4, we are getting this problem. Search stuck at point of time and doesn't progress. From logs I found that the timestamp at which buckets rollover from hot to warm, search stuck at that timestamp and doesn't progress. This is not the case with all bucket rollovers but where ever there is below log message appears there is problem in search for buckets in that timespan.

INFO DatabaseDirectoryManager - idx=networks_syslog Writing a bucket manifest in hotWarmPath='/opt/splunk/datastore/networks_syslog/db', pendingBucketUpdates=0 . Reason='Buckets were rebuilt or tsidx-minified (bucket_count=1).'

And we DO NOT have tsidx reduction enabled. Also the '|dbinspect index=networks_syslog ' shows tsidx 'full'.

But I suspect there is some problem with indexing/tsidx files consistencies which is causing search stuck.

Tags (1)
0 Karma
1 Solution

jkat54
SplunkTrust
SplunkTrust

Yeah, so that's your problem right there... you're using NFS.

Not only that, but you've got terabytes of data on NFS. Splunk will probably be ok with small amounts of data on NFS but at your level... thats just flirting with disaster.

The inconsistency you speak of is most likely due to NFS latency. Run a ping -t to your nfs server and i bet it's latency is through the roof when you're doing heavy searching. When splunk times out connecting to your NFS, the inconsistency occurs and you need to fix your buckets again.

If you really are using NFS, then you're basically trying to push an elephant through the eye of a needle.

View solution in original post

0 Karma

sjohnson_splunk
Splunk Employee
Splunk Employee

Your hot/warm storage should always be on direct-attach storage (disk, ssd) or SAN.

NFS if fine for cold storage and it gives you the option to expand the available space if you need longer retention times or you add more indexing volume.

I'm not sure why 6.4.x is giving you more problems but not using NFS for hot/warm has been a recommendation for a looooong time.

The usual recommendation is you should get enough hot/warm storage to hold data for 2-4 weeks as that time window will probably be wide enough for most user searches (on the faster disks). After that it can roll to cold since relatively fewer searches will be run for 30 days or greater (your mileage may vary!).

0 Karma

omesh4sv
New Member

Thank You, sjohnson for reply!

Could you please help me with the steps to move from nfs to local storage and recommendation for local storage type?

0 Karma

jkat54
SplunkTrust
SplunkTrust

Yeah, so that's your problem right there... you're using NFS.

Not only that, but you've got terabytes of data on NFS. Splunk will probably be ok with small amounts of data on NFS but at your level... thats just flirting with disaster.

The inconsistency you speak of is most likely due to NFS latency. Run a ping -t to your nfs server and i bet it's latency is through the roof when you're doing heavy searching. When splunk times out connecting to your NFS, the inconsistency occurs and you need to fix your buckets again.

If you really are using NFS, then you're basically trying to push an elephant through the eye of a needle.

0 Karma

omesh4sv
New Member

We are already planning to move on local storage. I would appreciate the help on the best method to achieve the same.

But as I mentioned earlier, everything was fine before upgrade and after upgrade we started facing this problem.

NFS is hard mount, it is in same subnet, the latency is around 200 ms.

It is a standalone Splunk indexer receiving network syslog. Daily indexing volume is around 60-65 GB. Hot/warm/cold are stored on nfs. Below in the indexes.conf settings

maxMemMB = 20
maxConcurrentOptimizes = 6
maxHotIdleSecs = 86400
maxTotalDataSizeMB = 5000000
maxDataSize = auto_high_volume
frozenTimePeriodInSecs = 12960000
bucketRebuildMemoryHint = 0
compressRawdata = 1
enableDataIntegrityControl = 0
enableOnlineBucketRepair = 1
enableTsidxReduction = 0
syncMeta = 1

0 Karma

jkat54
SplunkTrust
SplunkTrust

What does your file system look like? Is your disk full, etc?

0 Karma

omesh4sv
New Member

Thank You, jkat for your reply!

No file system is not full.

Thing is that index is on nfs drive and is of 7TB. We have retention policy of frozenTimePeriodInSecs = 12960000 that is 5 months. Current index size is 4.4 TB so there is lots of space to breath.

One thing I would like to bring to your notice is, after rebuild of bucket in question the search works fine so I suspect there is some problem with tsidx consistency.

0 Karma

sjohnson_splunk
Splunk Employee
Splunk Employee

Please tell me that your hot/warm location is NOT on an NFS mount. From the installation guide:

If you use NFS, note the following:

Do not use NFS to host hot or warm index buckets as a failure in NFS can cause data loss. NFS works best with cold or frozen buckets.
Do not use NFS to share cold or frozen index buckets amongst an indexer cluster, as this potentially creates a single point of failure.
Splunk Enterprise does not support "soft" NFS mounts. These are mounts that cause a program attempting a file operation on the mount to report an error and continue in case of a failure.
Only "hard" NFS mounts (mounts where the client continues to attempt to contact the server in case of a failure) are reliable with Splunk Enterprise.
Do not disable attribute caching. If you have other applications that require disabling or reducing attribute caching, then you must provide Splunk Enterprise with a separate mount with attribute caching enabled.
Do not use NFS mounts over a wide area network (WAN). Doing so causes performance issues and can lead to data loss.

0 Karma

omesh4sv
New Member

I will clarify some more

Search stuck at exactly same timestamp the hot bucket moved to warm. In this warm bucket I checked the Hosts.data file, first line which is I think the summary line, found that the end timestamp is same as the timestamp of bucket move.

Splunk install directory has lot of space and only 17% used.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...