Hi Everyone,
So I have data like this in my lookup table
fields
A | B | C
10| 2 | red
4 | 6 | red
9 | 1 | red
110| 102 | blue
104 | 106 | blue
109 | 101 | blue
So if I use the fit command
| inputlookup fitcommandexample.csv | fit KMeans k=2 "A" "B" by C
Results
A B C cluster cluster_distance
10 2 red 1 6.44444444444
4 6 red 1 22.4444444444
9 1 red 1 5.77777777778
110 102 blue 0 6.44444444444
104 106 blue 0 22.4444444444
109 101 blue 0 5.77777777778
But
| inputlookup fitcommandexample.csv | where C like "blue"| fit KMeans k=2 "A" "B"
Result
A B C cluster cluster_distance
110 102 blue 0 0.5
104 106 blue 1 0.0
109 101 blue 0 0.5
Likewise
| inputlookup fitcommandexample.csv | where C like "red"| fit KMeans k=2 "A" "B"
yields
A B C cluster cluster_distance
10 2 red 1 0.5
4 6 red 0 0.0
9 1 red 1 0.5
So what I was hoping for was that the by clause would make the fit command fit to each of the subsets red and blue in isolation such that the result yielded
| inputlookup fitcommandexample.csv | fit KMeans k=2 "A" "B" by C
A B C cluster cluster_distance
10 2 red 1 0.5
4 6 red 0 0.0
9 1 red 1 0.5
110 102 blue 0 0.5
104 106 blue 1 0.0
109 101 blue 0 0.5
blue and red were essentially separate clusters other wise I am not sure how to quickly break up the data and apply fit to the subsets without writing and external script via API. Any ideas?
Thanks
Tim
... View more