Answers for "Similarity "score" of similar values of fields>"
https://answers.splunk.com/answers/519385/similarity-score-of-similar-values-of-fields.html
The latest answers for the question "Similarity "score" of similar values of fields>"Answer by DalJeanis
https://answers.splunk.com/answering/520882/view.html
Answering this question in the general (as opposed to answering it for a specific application) requires roughly two semesters of graduate statistics.
Basically, you have to define and measure Similarity, which also requires that you define and measure Difference, or Variability. All of which requires some kind of scoring methodology, which **usually** would be determined in conjunction with understanding what the underlying measures are.
As a first, awfully simplistic way of looking at the question, you could take the measures that the two items have in common, and calculate the stdev for the entire population of items on each measure, and then calculate how many stdevs away from each other the two are. You could initially do that in terms of z score or percentile or whatever... the "right" choice will have to use successive approximation until the answers are coming in sensibly based on reference items you KNOW to be similar and items you KNOW to be different. The only requirement is that all the measures are scored the same, relative to their baselines (which is why you use zscore or stdevs or percentile rather than gross score difference).
Any measures that the two do NOT have in common must be treated as differences, and assigned some arbitrarily high distance/zscore/percentile.
Their gross geometric difference score then becomes the square root of the sum of the squares of their differences... which may yield some information or may not.
----------
One of the basic problems with the strategy behind your source data is that you've ALREADY extracted various statistical information which identifies relationships between things, but then you've deleted the metaknowledge that relates those statistics to each other. Assuming items were different models of car, you have twelve numbers that represent, in no particular order and in no particular standard of measurement, the car's wheel base, miles per gallon, horsepower, weight of car, number of passengers, recommended mileage for first maintenance, sticker price, overall length, turning radius, number of cylinders, customer satisfaction rating, number of such cars produced and sold per year, and so on.
A proper treatment analyzing differences between car models would have to be cognizant of which variables were expected to move together. Smaller cars get better gas mileage, therefore as weight drops, MPG goes up and length and wheel base drop. A car which violates this rule is likely to be an outlier of some sort, and "different" from those that track the rule. However, following the rule does not make two cars at different points on the curve "similar" to each other, they are just exemplars of their portion of the weight-performance curve.Wed, 12 Apr 2017 16:08:42 GMTDalJeanis