Why am i seeing indexing lag???

rjyetter · ‎01-24-2012

I have 11 indexing servers all with 16 cpu's RAID 10 configuration 1Gb full duplex no swap useage, and they all sit at about 60-80% idle. Something is not right that I am seeing indexing lag of up to 5 hours on most of the servers. Are there any tuning parameters that I need check within splunk to have better throughput on the indexer side?

Thanks,

Rick

hexx · ‎01-24-2012

Some suggestions :

1 - Install the Splunk on Splunk app on your search-head. Take a look at the "Indexing Performance" view. Are all queues blocked down to the indexer queue or is there blockage upstream of that?

2 - In the SoS app "Indexing Performance" view, do you see latency across the board in the "Measured indexing latency" table at the top of the page, or is it only affecting a subset of hosts/sourcetypes/sources/splunk_server?

3 - In the SoS app "Errors" view, do you see any reports of indexing throttling because some bucket may contain too many tsdix files?

4 - We might want to check the size of your metadata files, particularly your Sources.data. Run the following command against your $SPLUNK_DB and report the output :

find $SPLUNK_DB -name "*.data" -maxdepth 3 -size +25M | xargs ls -lh

You can get $SPLUNK_DB from $SPLUNK_HOME/etc/splunk-launcher.conf. By default, $SPLUNK_DB is set to $SPLUNK_HOME/var/lib/splunk.

If this command finds any metadata files larger than 25MB, that could be one of the reasons for your indexing performance degradation.

rjyetter · ‎01-24-2012

Stupid stupid stupid!!

One of my sources is from across the pond in a different timezone - hence massive indexing lag based on timezone -

JuhiSaxena · ‎07-18-2018

hi Rjyetter, how was the timezone issue resolved. I'm also facing similar issue.

gjanders · ‎07-19-2018

@JuhiSaxena , you will be better served by creating a new post, this one is from 2012

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

JuhiSaxena · ‎07-19-2018

Thanks, i did raised a fresh post : https://answers.splunk.com/answers/673339/debug-timestamp-issue-for-data-coming-from-udp-por.html

hexx · ‎01-24-2012

Aha! Good catch 🙂 Are you all set, then?

wwhitener · ‎01-24-2012

Might also check to make double sure that the times are synched up. Had one that was set for a different time and is made everything wonky.

rjyetter · ‎01-24-2012

Time synch is fine - all servers are within a second of each other - write I/O wait is less than 1 second on each server.

Simeon · ‎01-24-2012

Rick, this could be due to a few things.

First - check the index time to be sure Splunk is seeing and indexing the data later then expecting. Below, the search will find the delay in seconds:

source=<your delayed source> | eval delay=_indextime-_time | fields delay

Typically, delays of hours will mean that the indexer is backed up OR the data is being read in at a slower pace then expected. To see if the indexer is backed up (we call it blocked), search as follows:

index=_internal source=*metrics.log blocked

If this returns events, this means the system is being blocked. If there are a lot of them, that is not a good sign and you should contact support. Support can determine if it's disk speed or something else by identifying the particular part of the queue system that is backed up.

Another thing to check is the maximum thruput for the indexers and forwarders. There is a maximum thruput setting within limits.conf that will be set to 256 kb per second on light weight forwarders:

[thruput]
maxKBps = 256

Simeon · ‎01-24-2012

The thruput will be applied on indexers or forwarders, meaning any splunk instance.

The distribution of the blocked events and which queue they are from will tell us where to look next. If you are constantly adding new data sets and they are very large, then I suspect you need to tune some of the new inputs so they are parsed faster. It is also possible that your disks just can't keep up with your thruput (particularly if it is running well over 3 mb/sec of indexing thruput per indexer).

tskinnerivsec · ‎08-13-2015

you could run a search like this:

index=_internal host=* source=*metrics.log group=queue blocked=true | rename host as Indexer | chart count(blocked) as "Queue Blocks" by Indexer, name

to create a chart of the count of blocks by indexer and queue.

rjyetter · ‎01-24-2012

So what I have now is 800 events for index=_internal source=*metrics.log blocked - is there a setting for index throughput? Problem is we're adding new sources daily and need to figure this out before the lag starts affecting more realtime searches. Calling support now!

Why am i seeing indexing lag???

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio