网格搜索报错:ValueError: Expected

来源:4-6 网格搜索与k近邻算法中更多超参数

哈哈笑笑9632300

2021-02-06

我自己写了一个关于把股票数据应用于knn的函数,但是在网格搜索时运行报错,不知道什么原因,请老师解答下,错误提示如下:

joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\18211\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\18211\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\18211\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\18211\anaconda3\lib\site-packages\joblib\parallel.py", line 262, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\18211\anaconda3\lib\site-packages\joblib\parallel.py", line 262, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\18211\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 560, in _fit_and_score
    test_scores = _score(estimator, X_test, y_test, scorer)
  File "C:\Users\18211\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 607, in _score
    scores = scorer(estimator, X_test, y_test)
  File "C:\Users\18211\anaconda3\lib\site-packages\sklearn\metrics\_scorer.py", line 90, in __call__
    score = scorer(estimator, *args, **kwargs)
  File "C:\Users\18211\anaconda3\lib\site-packages\sklearn\metrics\_scorer.py", line 372, in _passthrough_scorer
    return estimator.score(*args, **kwargs)
  File "C:\Users\18211\anaconda3\lib\site-packages\sklearn\base.py", line 499, in score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
  File "C:\Users\18211\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py", line 175, in predict
    neigh_dist, neigh_ind = self.kneighbors(X)
  File "C:\Users\18211\anaconda3\lib\site-packages\sklearn\neighbors\_base.py", line 616, in kneighbors
    raise ValueError(
ValueError: Expected n_neighbors <= n_samples,  but n_samples = 35, n_neighbors = 36
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\18211\Desktop\股票\knn\myknn.py", line 34, in my_knn_gp
    grid_search.fit(X_train_standard, y_train)
  File "C:\Users\18211\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
    return f(**kwargs)
  File "C:\Users\18211\anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 736, in fit
    self._run_search(evaluate_candidates)
  File "C:\Users\18211\anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 1188, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "C:\Users\18211\anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 708, in evaluate_candidates
    out = parallel(delayed(_fit_and_score)(clone(base_estimator),
    self.retrieve()
  File "C:\Users\18211\anaconda3\lib\site-packages\joblib\parallel.py", line 940, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\18211\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\18211\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\18211\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: Expected n_neighbors <= n_samples,  but n_samples = 35, n_neighbors = 36

我写的函数:

from cxgp.cxgp import cxgp,cxjg #导出训练数据集
from sklearn.model_selection import train_test_split #分类测试数据和训练数据
from sklearn.neighbors import KNeighborsClassifier #分类器
from sklearn.model_selection import GridSearchCV #找到最佳超参数
from sklearn.preprocessing import StandardScaler #均值方差归一化



def my_knn_gp(gpdm,k = 11,p = 6):
    #输入股票代码获取该股票的最佳超参数和准确率
    t= cxgp(gpdm)
    T = cxjg(t) 
    X = t[:-1]
    y = T[1:]
    X_train, X_test, y_train, y_test = train_test_split(X,y)
    skl = StandardScaler()
    skl.fit(X_train)
    skl.transform(X_train)
    X_train_standard = skl.transform(X_train)
    X_test_standard = skl.transform(X_test)
    param_grid = [
        {
            'weights':['uniform'],
            'n_neighbors':[i for i in range(1,k)]
        },
        {
            'weights':['distance'],
            'n_neighbors':[i for i in range(1,k)],
            'p':[i for i in range(1,p)]
        }
    ]
    knn_clf = KNeighborsClassifier()
    grid_search = GridSearchCV(knn_clf,param_grid, n_jobs=-1, verbose=2)
    grid_search.fit(X_train_standard, y_train)
    sj = {
        "超参数":grid_search.best_params_,
        "准确度":grid_search.best_score_,
        '测试数据x':X_test_standard,
        '测试数据y':y_test
    }
    return sj
写回答

1回答

liuyubobobo

2021-02-06

ValueError: Expected n_neighbors <= n_samples,  but n_samples = 35, n_neighbors = 36


这行报错的意思是:knn 中的 k 必须小于样本数。但是你的 k(n_neighbors) 是 36,样本数只有 35。


继续加油!:)

0
1
哈哈笑笑9632300
谢谢老师解答!
2021-02-06
共1条回复

Python3入门机器学习 经典算法与应用  

Python3+sklearn,兼顾原理、算法底层实现和框架使用。

5893 学习 · 2455 问题

查看课程