预测准确度极低

来源:4-4 分类准确度

MrZLeo

2020-01-21

老师,我按照课程的内容写了一个我自己的score函数,用手写数字的数据集进行了测试,对比了sklearn中封装的同类型函数,结果预测值相去甚远,我的预测准确度极低。

我写的算法是这样的:

  1. 这是knn类,score函数封装在这里
import numpy as np
from math import sqrt
from collections import Counter


class KNNClassifier:

    def __init__(self, k):
        """初始化KNN分类器"""
        assert k >= 1, "k must be valid"
        self.k = k
        self._X_train = None
        self._y_train = None

    def fit(self, X_train, y_train):
        """根据训练数据集X_train和y_train训练KNN分类器"""
        assert X_train.shape[0] == y_train.shape[0], \
            "the size of X_train must be equal to the size of y_train"
        assert self.k <= X_train.shape[0], \
            "the size of X_train must be at least k."

        self._X_train = X_train
        self._y_train = y_train
        return self

    def predict(self, X_predict):
        """给定待预测的数据集X_predict, 返回表示X_predict的结果向量"""
        assert self._X_train is not None and self._y_train is not None, \
            "must fit before predict"
        assert X_predict.shape[1] == self._X_train.shape[1], \
            "the feature number of X_predict must be equal to X_train"

        y_predict = [self._predict(x) for x in X_predict]
        return np.array(y_predict)

    def _predict(self, x):
        """给定单个待预测数据,返回x的预测结果值"""
        assert x.shape[0] == self._X_train.shape[1], \
            "the feature number of x must be equal to X_train"

        distance = [sqrt(np.sum(x_train - x) ** 2)
                    for x_train in self._X_train]
        nearest = np.argsort(distance)

        topK_y = [self._y_train[i] for i in nearest[:self.k]]
        votes = Counter(topK_y)

        return votes.most_common(1)[0][0]

    def score(self, X_test, y_test):
        """根据预测值和准确值计算模型的预测准确度"""
        test = self.predict(X_test)
        num = np.sum(test == y_test) / len(y_test)

        return num

    def __repr__(self):
        return "KNN(k = %d)" % self.k

  1. 这是拆分train和test的函数
import numpy as np


def train_test_split(X, y, test_ratio=0.2, seed=None):
    """将数据 X 和 y 按照test_ratio 分割成X_train, X_test, y_train, y_test"""
    assert X.shape[0] == y.shape[0], \
        "the size of X must be equal to the size of y"
    assert 0.0 <= test_ratio <= 1.0, \
        "test_ration must be valid"

    if seed:
        np.random.seed(seed)

    # 获得一个长度为X的,0-x的随机数组
    shuffled_indexes = np.random.permutation(len(X))

    # 获得随机分配的test和train的index,使用fancy indexing的方法
    test_size = int(len(X) * test_ratio)
    test_indexes = shuffled_indexes[:test_size]
    train_indexes = shuffled_indexes[test_size:]

    # 将原数组进行分组
    X_test = X[test_indexes]
    X_train = X[train_indexes]

    y_test = y[test_indexes]
    y_train = y[train_indexes]

    return X_test, X_train, y_test, y_train

不知道哪里出了问题…

写回答

2回答

liuyubobobo

2020-01-22

我用课程的代码测试了一下,包括 k 也选择使用 4,结果没有这个问题。所以应该是你的代码有问题。


我将我的测试代码放到了课程的官方代码中,传送门:https://git.imooc.com/coding-169/coding-169/src/master/04-kNN/Optional-03-kNN-for-digits


请在你的环境下运行这个代码,看看是否有一样的问题?


如果没有问题,请仔细比对调试,看看自己的代码问题在哪里?


加油!:)

0
1
MrZLeo
非常感谢!
2020-01-22
共1条回复

月明否

2020-04-26

这里错了 distance = [sqrt(np.sum(x_train - x) ** 2) for x_train in self._X_train]

应改为    distances = [sqrt(np.sum((x_train - x)**2)) for x_train in self._X_train]

2
1
慕尼黑7051737
大哥可以去做测试,这发现错误太强了。
2021-09-23
共1条回复

Python3入门机器学习 经典算法与应用  

Python3+sklearn,兼顾原理、算法底层实现和框架使用。

5893 学习 · 2454 问题

查看课程