Purity and Confidence

  • Updated

Supervised clustering groups rows in the data set by class. However, it uses the relative weighted factors, “the Whys,” to perform the clustering. As a result, there may be clusters composed of similar predictive factors and weights, but of different classes. The Purity value indicates the percentage of non-majority-classes data points in that cluster. For example, if a cluster contains 100 rows, but 90 are of one class (the majority class for that cluster) and 10 are from other classes, then the Purity value is 0.9. The purity value is greater than 0 and less than or equal to 1. In general, a Purity value of 1 is most desirable, but for some applications knowing the relative weighted factors in that cluster and the fact that is not pure (Purity < 1.0), may have a use, as well.

In addition to Purity, supervised clustering also displays Predictive Probability for the cluster. This is the mean predictive probability of the rows in that cluster. Typically, the higher the predictive probability is for a cluster, the more strongly related that cluster is to class prediction.

Was this article helpful?

0 out of 0 found this helpful



Please sign in to leave a comment.