Query on Splunk DLTK scalability in training and i...

indranilr · ‎10-11-2023

The splunk DLTK 5.1.0 documentation suggests below :

No indexer distribution

Data is processed on the search head and sent to the container environment. Data cannot be processed in a distributed manner, such as streaming data in parallel from indexers to one or many containers. However, all advantages of search in a distributed Splunk platform deployment still exist.

Does the above imply that data from splunk are not distributed (such as data parallelism) among multiple containers in the Kubernetes execution environment during training or inference phase ?

Further, is the distribution only vertical in nature (multi CPU or multi GPU in a single container) or the jobs can scale horizontally as well (multiple containers) with each container working on a partition of data ?

Further, for executing Tensorflow, PyTorch, Spark or Dask jobs do we need to have required operators/services pre-installed prior to (Spark K8s operator for example) submitting the jobs from Splunk Jupyter notebook ? Or are these services setup during DLTK app installation and configuration in Splunk ?

Appreciate any inputs on above query.

Thanks in advance !

Query on Splunk DLTK scalability in training and inferencing

other

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio