simClassify can process any file that meets the Platform File Specifications as long as it has an “ID” column, which assigns a unique value to each row, and a “Class” column, which tells the engine the column it will be attempting to predict.
simClassify accepts queries in the form of an object with identical columns to the training data and returns the predicted result for the (previously unspecified) Class column, along with the weighted factors behind that prediction and the nearest neighbors to the queried object.
simClassify also accepts Batch Queries. A Batch Query will take a file of objects and return the predicted class for each object in the file. The output of a Batch Query will be a CSV file with the object ID, its predicted class(es), and the confidence of that prediction.
simClassify Sample Output
The first value is the predicted classification.
The second value is the probability that the specified class is correct amongst the returned classes. If there are multiple likely classes simClassify will show each and present their respective probabilities. As can be seen in the example above, the total confidence may not sum to 1. (This is because the confidence is based on the query’s distance from its nearest neighbors.)
The query also shows the predicted class based upon the threshold. In the above case the confidence level of the “F” class was above the threshold of 0.8 and was labeled the “winner”.
If the parameters used to create the model came from a selection out of grid results, the threshold will correspond to the value that gives you the test accuracy shown in the grid results table. As a result, some confidence levels that appear high, may not be high enough to meet the recall and accuracy measures you used to select the model parameters from the grid results. For example, if the model that gave you the class “F” recall you desired had a threshold of 0.92, then the winning class in the above query would be “N” and not “F”, because the confidence level of 0.90799, although high, was not high enough to pass the threshold of 0.92.
simClassify’sresults are supported by a weighted list of the most important values used in reaching the predicted outcome. This information can be viewed in the “Justification” tab.
The [+] indicates that a weighted factor is important for considering the query object to be similar member of the predicted class. A [ - ] indicates a factor not part of the query object and, therefore, makes it more similar to the predicted class. For simClassify, all weighted factors will be [+], but simClassify+ uses both indications.
In the Hypothesis tab you can see the strength of the factors for each possible classification. This information is presented as both a Radar and Bar graph.
In the neighbors tab you can see the objects that were identified as the nearest neighbors to the query and see how their underlying factors compare to each other
Please sign in to leave a comment.