Deployment Architecture

Hitting limit: maximum number of concurrent auto-summarization - with 0 events and idle CPU

MichalG1
Explorer

Hello Team,

Pre staging environment (not production), a single server with 12 CPU + 24 GB or memory + raid0 nvme (2.5GB/s write, 5GB/s read). All in one deployment (SH + indexer). CPU cores with HT on dedicated server (6 cores with HT = 12 CPU -> but not used by any other VM).

Splunk 9.1.1 and ES 7.1.1. Fresh install. NO data ingested (0 events in most of the indexes including main, notable, risk etc...) - so basically no data yet to be processed.

Default ES configuration, i have not yet tuned any correlation searches etc. Defaults. And already performance problems:

1. MC Scheduler Activity Instance showing 22% skipped.

2. ESX reporting minimal CPU usage (the same with memory):

3. MC showing more details, many different Accelerated DM tasks are skipped, all the time:

Questions:

1. obviously the first recommendation would be to disable many of correlation searches/accelerated DMs, but that not what i would like do because the aim is to test complete ES functionality (by generating a small number of different types of events). Why do i have those problems in a first place ?

I can see that all the tasks are very short, finishes in 1 second, just few takes several seconds. And that is expected since i have 0 events everywhere and i do always expect to have a small number of events on this test deployment. What should i do to tune it and make sure there are no problems with skipped jobs ?

Shall i increase 

max_searches_per_cpu 
base_max_searches 

Any other ideas ? Overall that seems weird, 

Screenshot 2023-10-15 at 19.05.32.png

Screenshot 2023-10-15 at 19.02.57.png

Screenshot 2023-10-15 at 19.00.46.png

Labels (1)
0 Karma

meetmshah
Contributor

Hello @MichalG1, ES requires 16 CPU, 32 GB Memory (https://docs.splunk.com/Documentation/ES/7.2.0/Install/DeploymentPlanning). However, if the ask is to update max_searches_per_cpu and base_max_searches on pre-prod environment (and not prod), you can go ahead and try doing that.

 

I would also suggest disabling the Data Model Accelerations, as well as, reviewing the correlation searches which are enabled by default - because the issue seems to be with the scheduler getting a lot of searches to execute at any given time (and not resources issue). You can also review the alert actions and corn schedules, through this search (and stagger cron schedule if needed) - 

| rest splunk_server=local count=0 /servicesNS/-/SplunkEnterpriseSecuritySuite/saved/searches
| where match('action.correlationsearch.enabled', "1|[Tt]|[Tt][Rr][Uu][Ee]")
| where disabled=0
| eval actions=split(actions, ",")
| rename title as "Correlation Search", cron_schedule as "Cron Schedule" "dispatch.earliest_time" as "Earliest Time" dispatch.latest_time as "Latest Time" actions as "Actions"
| table "Correlation Search" "Cron Schedule" "Earliest Time" "Latest Time" "Actions"

 

Please accept the solution and hit Karma, if this helps!

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Get the T-shirt to Prove You Survived Splunk University Bootcamp

As if Splunk University, in Las Vegas, in-person, with three days of bootcamps and labs weren’t enough, now ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...