Decision tree algorithm
What algorithm does datameer use for 'decision tree' feature under smart analytics?
what method does it employ to prevent overfitting?
does datameer provide option to specify 'stopping rules' for decision tree classification?
-
Hello enaven!
The Smart Analytics Decision Tree module uses a Classification and Regression Trees (CART) algorithm. Datameer offers pruning and validates against a fraction of the sample data when building the Decision Tree, but doesn't offer a way to provide stopping criteria.
The point of the Smart Analytics modules are to offer pretty powerful algorithms to business analysts, out of the box. If there is a need to have more advanced functionality, you could write what you need in R or PMML and use the appropriate plugin to integrate it into your Datameer workflow.
I hope this helps!
- Jason -
Hi Enaven,
The decision tree algorithm prevents overfitting by pruning the tree. Pruning is done on a validation set. The size of the validation set is specified by the user in the advanced tab. When pruning is switched off, then the only way to reduce overfitting is by limiting the maximum tree depth, though this is a less accurate way of avoiding overfitting.
I assume you mean "stopping rules" when to stop pruning? The rule is: prune as long it improves the accuracy of the tree on the validation set. Pruning is stopped when accuracy on validation set gets worse.
-
Hello Enaven,
Some public available information regarding the R plugin you will find under How to Install the R Plug-in.
After installing R within the cluster and the R plugin in Datameer you have new functions available. One is i.e
=RMapWrapper("<RCode>";#Column1;#Cloumn2;...)
Please take note that R-Integration and PMML Support are not part of the Enterprise Edition. It is recommended to speak with your sales person or Customer Success Manager (CSM) about obtaining the R or PMML plug-in ZIP file for your Datameer version.
Please sign in to leave a comment.
Comments
4 comments