决策树算法中的random_state是什么意义？

来源：12-1 什么是决策树

烈焰卡卡

2018-03-19

DecisionTreeClassifier(max_depth=2, criterion="entropy", random_state=42)，这其中随机种子在算法中起到了一个什么样的作用？

写回答

1回答

liuyubobobo

2018-03-19

已采纳

sklearn中的决策树实现，在寻找最大熵的切分的时候，所考虑的features的是乱序的。这样在多个features中，如果出现最大熵一致的情况，可能选择的切分位置不一样（对于拥有大量01二分属性的数据来说，这种情况很常见），从而使得决策树尽量在每一层照顾不同的特征。具体可以参见sklearn文档中下面这种的这句话：http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data and max_features=n_features, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.

另外，对于sklearn的DecisionTreeClassifier, 其中的splitter参数可以选择“random”，也将引入随机的影响。

烈焰卡卡

非常感谢！

2018-03-19

共1条回复

Python3入门机器学习经典算法与应用

Python3+sklearn，兼顾原理、算法底层实现和框架使用。

5917 学习 · 2455 问题

查看课程

相似问题

决策树的应用场景

回答 1