All Apps and Add-ons

Query on Splunk DLTK scalability in training and inferencing

indranilr
Observer

The splunk DLTK 5.1.0 documentation suggests below :

No indexer distributionData is processed on the search head and sent to the container environment. Data cannot be processed in a distributed manner, such as streaming data in parallel from indexers to one or many containers. However, all advantages of search in a distributed Splunk platform deployment still exist.


Does the above imply that data from splunk are not distributed (such as data parallelism) among multiple containers in the Kubernetes execution environment during training or inference phase ?

Further, is the distribution only vertical in nature (multi CPU or multi GPU in a single container) or the jobs can scale horizontally as well (multiple containers) with each container working on a partition of data ?

Further, for executing Tensorflow, PyTorch, Spark or Dask jobs do we need to have required operators/services pre-installed prior to (Spark K8s operator for example) submitting the jobs from Splunk Jupyter notebook ? Or are these services setup during DLTK app installation and configuration in Splunk ?

Appreciate any inputs on above query.

Thanks in advance !

Labels (1)
0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...