Monitoring Splunk

Best Practices to Measure Performance improvement after Splunk Migration.

Esky73
Builder

Hi,

We are moving a 3 tier clustered splunk env from an on prem environment to a cloud instance - where we have been told we will be getting much better performance all round.
My question is how do we measure this ? what KPI's should we be measuring before and after the migration and what would be the best way ?
My initial thoughts are disk IO, search response, mem. cpu usage etc.

Any recommendations gratefully received.

0 Karma
1 Solution

woodcock
Esteemed Legend

Run 4 searches on each system and use the Job Inspector from the Job -> Inspect job to examine how long each step took and the overall response time. Run these:

1: A long search like for Last 2 years, that uses something complicated like |timechart span=1mon avg(_time) AS junk,
2: A short search like for Last 24 hours, that uses something complicated like |timechart span=1h avg(_time) AS junk.
3: A long search, like for Last 2 years, that uses something easy and reduceable like dedup host.
4: A short search, like for Last 24 hours, that uses something easy and reduceable like dedup host.

Also, you chould use DMC to see what your "worst" search is and run that both places. You obviously have some idea of what "isn't working" so just run that both places and compare the Job Inspector.

View solution in original post

0 Karma

woodcock
Esteemed Legend

Run 4 searches on each system and use the Job Inspector from the Job -> Inspect job to examine how long each step took and the overall response time. Run these:

1: A long search like for Last 2 years, that uses something complicated like |timechart span=1mon avg(_time) AS junk,
2: A short search like for Last 24 hours, that uses something complicated like |timechart span=1h avg(_time) AS junk.
3: A long search, like for Last 2 years, that uses something easy and reduceable like dedup host.
4: A short search, like for Last 24 hours, that uses something easy and reduceable like dedup host.

Also, you chould use DMC to see what your "worst" search is and run that both places. You obviously have some idea of what "isn't working" so just run that both places and compare the Job Inspector.

0 Karma

Esky73
Builder

Hi I cannot find "worst search" within DMC - any pointers ?

DMC only appears on the indexers and the Long-running searches have No results found.

there is no DMC on the SH cluster

thanks.

0 Karma

hmclaren_splunk
Splunk Employee
Splunk Employee

I agree, use some of the dashboards / searches built into the DMC (Distributed Management Console) to give you some info on Searches, Index Pipelines, Etc.

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

I dont see disk i/o / mem / cpu usage as good KPIs. Mainly because in a cloud environment, these should be watched by the SaaS provider. However, on premise, yes these are good metrics, but again its hard to compare these to SaaS.. { different types of storage and compute tiers.. }

You're better metrics would be to watch:

1) Search performance, get a baseline of your onprem searches vs what they run in your cloud
2) Index vs ingest times (latency)
3) Queues... Backed up indexing queues would represent potential I/o bottlenecks, typing queues, parsing etc for related Splunk bottlenecks
4) Skipped / Deferred searches

Those are a few major indicators to look out for and compare between instances.. Hope that helps.

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...