Comments and answers for "Splunk Machine Learning App / Toolkit - Using DBSCAN Clustering Algorithm"
https://answers.splunk.com/answers/557827/splunk-machine-learning-app-toolkit-using-dbscan-c.html
The latest comments and answers for the question "Splunk Machine Learning App / Toolkit - Using DBSCAN Clustering Algorithm"Answer by nryabykh
https://answers.splunk.com/answering/591046/view.html
You need to modify $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/bin/algos/DBSCAN.py file. In ```__init__``` function replace string
out_params = convert_params(options.get('params', {}), floats=['eps'])
with this one:
out_params = convert_params(options.get('params', {}), floats=['eps', 'min_samples'])
After this you can write something like ```fit DBSCAN eps=0.1 min_samples=2 *``` in your SPL queries.Wed, 15 Nov 2017 10:56:28 GMTnryabykhComment by niketnilay
https://answers.splunk.com/comments/557880/view.html
@hbrandt84, I concur, scikit learn also mentions two parameters i.e. `min_samples` and `eps` (http://scikit-learn.org/stable/modules/clustering.html#dbscan)
However, algorithm description and class detail mention that these parameters are optional:
http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html
Based on the following code for DBSCAN algorithm, I would expect that initialization default value is `min_samples=5` (https://github.com/scikit-learn/scikit-learn/blob/ab93d65/sklearn/cluster/dbscan_.py#L156):
def dbscan(X, eps=0.5, min_samples=5, metric='minkowski',
algorithm='auto', leaf_size=30, p=2, sample_weight=None, n_jobs=1):
And:
def __init__(self, eps=0.5, min_samples=5, metric='euclidean',
algorithm='auto', leaf_size=30, p=None, n_jobs=1):
self.eps = eps
self.min_samples = min_samples
self.metric = metric
self.algorithm = algorithm
self.leaf_size = leaf_size
self.p = p
self.n_jobs = n_jobs
However, this needs to be confirmed and possibly `enhanced in Machine Learning Toolkit to create a min_samples input parameter for DBSCAN`.Tue, 25 Jul 2017 18:31:52 GMTniketnilay