simRecommend Data Type Specifications

  • Updated

Data Type Specifications tell the model what form the data from each column is in so it knows how to properly compare values. Unlike other models, the class column will be set to “CLASS_ITEM_SET”. This item set must be duplicated in the data set, with the duplicated column type set to “ITEM_SET”. The item set being described here is what will tell the model the individual’s historical preferences. More detail on the format of the item set will come in later sections.

ID

A mandatory field which uniquely identifies each object.

CLASS_ITEM_SET

A mandatory field which specifies the field to be classified. Item set format. A series of values with weights. (Formatted as item1:weight1;item2:weight2;item3:weight3)

ITEM_SET

Duplicate of the column of type CLASS_ITEM_SET

REAL

Numerical values.

NOMINAL

Values that do not bear a quantitative relationship with each other (i.e., strings and numbers which represent non-numerical information).

MULTI_PLAIN

Multiple NOMINAL values separated by spaces. Non-language specific.

MULTI_ENGLISH

Multiple NOMINAL values separated by spaces. The text is English language.

MULTI_SPANISH

Multiple NOMINAL values separated by spaces. The text is Spanish language.

MULTI_JAPANESE

Multiple NOMINAL values separated by spaces. The text is Japanese language.

IGNORE

The column shall be ignored by the program.

NULL_INDICATOR

This column type identifies the presence or absence of non-numerical data, assigning different weights to any cell with data versus those without data.

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.