All Apps and Add-ons

Query on Splunk DLTK scalability in training and inferencing

indranilr
Observer

The splunk DLTK 5.1.0 documentation suggests below :

No indexer distributionData is processed on the search head and sent to the container environment. Data cannot be processed in a distributed manner, such as streaming data in parallel from indexers to one or many containers. However, all advantages of search in a distributed Splunk platform deployment still exist.


Does the above imply that data from splunk are not distributed (such as data parallelism) among multiple containers in the Kubernetes execution environment during training or inference phase ?

Further, is the distribution only vertical in nature (multi CPU or multi GPU in a single container) or the jobs can scale horizontally as well (multiple containers) with each container working on a partition of data ?

Further, for executing Tensorflow, PyTorch, Spark or Dask jobs do we need to have required operators/services pre-installed prior to (Spark K8s operator for example) submitting the jobs from Splunk Jupyter notebook ? Or are these services setup during DLTK app installation and configuration in Splunk ?

Appreciate any inputs on above query.

Thanks in advance !

Labels (1)
0 Karma
Get Updates on the Splunk Community!

Everything Community at .conf24!

You may have seen mention of the .conf Community Zone 'round these parts and found yourself wondering what ...

Index This | I’m short for "configuration file.” What am I?

May 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with a Special ...

New Articles from Academic Learning Partners, Help Expand Lantern’s Use Case Library, ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...