预测准确度极低
来源:4-4 分类准确度

MrZLeo
2020-01-21
老师,我按照课程的内容写了一个我自己的score函数,用手写数字的数据集进行了测试,对比了sklearn中封装的同类型函数,结果预测值相去甚远,我的预测准确度极低。
我写的算法是这样的:
- 这是knn类,score函数封装在这里
import numpy as np
from math import sqrt
from collections import Counter
class KNNClassifier:
def __init__(self, k):
"""初始化KNN分类器"""
assert k >= 1, "k must be valid"
self.k = k
self._X_train = None
self._y_train = None
def fit(self, X_train, y_train):
"""根据训练数据集X_train和y_train训练KNN分类器"""
assert X_train.shape[0] == y_train.shape[0], \
"the size of X_train must be equal to the size of y_train"
assert self.k <= X_train.shape[0], \
"the size of X_train must be at least k."
self._X_train = X_train
self._y_train = y_train
return self
def predict(self, X_predict):
"""给定待预测的数据集X_predict, 返回表示X_predict的结果向量"""
assert self._X_train is not None and self._y_train is not None, \
"must fit before predict"
assert X_predict.shape[1] == self._X_train.shape[1], \
"the feature number of X_predict must be equal to X_train"
y_predict = [self._predict(x) for x in X_predict]
return np.array(y_predict)
def _predict(self, x):
"""给定单个待预测数据,返回x的预测结果值"""
assert x.shape[0] == self._X_train.shape[1], \
"the feature number of x must be equal to X_train"
distance = [sqrt(np.sum(x_train - x) ** 2)
for x_train in self._X_train]
nearest = np.argsort(distance)
topK_y = [self._y_train[i] for i in nearest[:self.k]]
votes = Counter(topK_y)
return votes.most_common(1)[0][0]
def score(self, X_test, y_test):
"""根据预测值和准确值计算模型的预测准确度"""
test = self.predict(X_test)
num = np.sum(test == y_test) / len(y_test)
return num
def __repr__(self):
return "KNN(k = %d)" % self.k
- 这是拆分train和test的函数
import numpy as np
def train_test_split(X, y, test_ratio=0.2, seed=None):
"""将数据 X 和 y 按照test_ratio 分割成X_train, X_test, y_train, y_test"""
assert X.shape[0] == y.shape[0], \
"the size of X must be equal to the size of y"
assert 0.0 <= test_ratio <= 1.0, \
"test_ration must be valid"
if seed:
np.random.seed(seed)
# 获得一个长度为X的,0-x的随机数组
shuffled_indexes = np.random.permutation(len(X))
# 获得随机分配的test和train的index,使用fancy indexing的方法
test_size = int(len(X) * test_ratio)
test_indexes = shuffled_indexes[:test_size]
train_indexes = shuffled_indexes[test_size:]
# 将原数组进行分组
X_test = X[test_indexes]
X_train = X[train_indexes]
y_test = y[test_indexes]
y_train = y[train_indexes]
return X_test, X_train, y_test, y_train
不知道哪里出了问题…
写回答
2回答
-
我用课程的代码测试了一下,包括 k 也选择使用 4,结果没有这个问题。所以应该是你的代码有问题。
我将我的测试代码放到了课程的官方代码中,传送门:https://git.imooc.com/coding-169/coding-169/src/master/04-kNN/Optional-03-kNN-for-digits
请在你的环境下运行这个代码,看看是否有一样的问题?
如果没有问题,请仔细比对调试,看看自己的代码问题在哪里?
加油!:)
012020-01-22 -
月明否
2020-04-26
这里错了 distance = [sqrt(np.sum(x_train - x) ** 2) for x_train in self._X_train]
应改为 distances = [sqrt(np.sum((x_train - x)**2)) for x_train in self._X_train]
212021-09-23
相似问题