The splunk DLTK 5.1.0 documentation suggests below :
No indexer distribution | Data is processed on the search head and sent to the container environment. Data cannot be processed in a distributed manner, such as streaming data in parallel from indexers to one or many containers. However, all advantages of search in a distributed Splunk platform deployment still exist. |
Does the above imply that data from splunk are not distributed (such as data parallelism) among multiple containers in the Kubernetes execution environment during training or inference phase ?
Further, is the distribution only vertical in nature (multi CPU or multi GPU in a single container) or the jobs can scale horizontally as well (multiple containers) with each container working on a partition of data ?
Further, for executing Tensorflow, PyTorch, Spark or Dask jobs do we need to have required operators/services pre-installed prior to (Spark K8s operator for example) submitting the jobs from Splunk Jupyter notebook ? Or are these services setup during DLTK app installation and configuration in Splunk ?
Appreciate any inputs on above query.
Thanks in advance !