老师,请问用网格搜索跑MNIST数据集报MemoryError是什么问题啊?
来源:9-8 OvR与OvO
scientist272
2018-09-12
这里是代码
import numpy as np
from sklearn.datasets import fetch_mldata
#PCA对数据进行降维
minst = fetch_mldata('MNIST original')
X,y = minst['data'],minst['target']
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y)
from sklearn.decomposition import PCA
pca = PCA(0.9)
pca.fit(X_train)
X_train_reduction = pca.transform(X_train)
X_test_reduction = pca.transform(X_test)
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
def PolynomialLogisticRegression(degree = 1, C = 0.1):
return Pipeline([
('poly', PolynomialFeatures(degree=degree)),
('std_scaler', StandardScaler()),
('log_reg', LogisticRegression(C=C))
])
# 待进行网格搜索的算法
poly_log_reg = PolynomialLogisticRegression()
# 准备待搜索的参数列表
C_PARM = [0.1,0.2,0.3,0.4,0.5]
param_grid = [
{
'poly__degree': [i for i in range(1, 11)],
'log_reg__C': [i for i in C_PARM]
}
]
# 实例化GridSearchCV进行网格搜索
grid_search = GridSearchCV(poly_log_reg, param_grid)
grid_search.fit(X_train_reduction ,y_train)
跑了17分钟以后报MemoryError
写回答
1回答
-
MNIST是一个28*28=784维的数据。使用多项式特征,你的poly_degree最多是10,也就是有784^10=
87732524600823436081182539776个特征。就算只有一个样本,有这么多特征。假设每个特征只使用8个bit,算算看,大概需要多少内存?
======
我简单估算了一下,大概要20亿个亿的GB。别说内存了,你的外存也远远不够啊:)
022018-09-12
相似问题