In addition to batch queries for classification models, you can also batch query a cluster model. Instead of producing predictions like simClassify/simClassify+, simCluster/simCluster+ produce cluster membership results. Cluster centroids will not be updated by the query, but the query results will indicate in which cluster the query objects would be.
The conceptual diagram below shows three clusters that have resulted from running simCluster or simCluster+. Label them red, green, and blue (which are indicated with colors in the diagram). Suppose a query file is prepared with three objects: 1, 2, and 3 and these objects have the relationship to the clustered objects as shown in the diagram.
In this example, the results of the batch query would be: 1: red, 2: blue, and 3: green. The original clusters would not be modified by the batch query.
There are several cases where this would be useful. For example, if you have a very large data set to cluster, you can produce clusters with a sample and use Cluster Batch Query to label the remaining objects with the appropriate cluster. Another example would be to cluster with one set of data, say results for a specific month, and then Cluster Batch Query would be used with data from the next month to determine which objects (customers, prospects, etc.) changed cluster membership.
Please sign in to leave a comment.