Clustering (K-means) feature
In Datameer's Clustering feature, it allows to include columns that are of "string" type. I am new to Clustering and Data Science but from what I have read and seen about K-means algorithm, the data points need to be numeric (since the averages are calculated etc.). My question is:
1) What values does Datameer assign to a column that is of a string type? I am doing the Bank Analytics example from the app market and the CampaignSuccess column has 4 values (success, failure, unknown, other). What values are assigned to these?
thanks & regards,
Hello Rahul and thanks for your question!
We use indicator values to allow K-means to represent STRING columns. The number of indicator values is a setting that can be changed in the Advanced tab of the Clustering Wizards dialog box to change the STRINGs into INTEGERS for seamless usage of Clusters.
You always want to make sure that you are leveraging the right amount of Indicator Values to best represent your data. To do this, examine the cardinality of the columns you wish to use. This can be easily accomplished using the Flipside in Datameer.
If you find that you have a cardinality below 100 (the max allowable number of Indicator values) you can adjust the setting to match your Included Column cardinality.I hope that was able to clear up your question!
Please sign in to leave a comment.