如果KNN的性能评估没有输出,可能是因为代码中存在逻辑错误或者运行时未捕获到相关的输出信息。让我们来检查并修正代码中可能存在的问题。
首先,请确保您已经在代码中包含了评估KNN性能和绘制图表的相关部分。此外,如果您使用了 warnings.filterwarnings("ignore")
,所有的警告信息将不会被显示,包括那些可能对调试有帮助的信息。建议仅在确认代码无误后过滤警告信息。
以下是修改后的代码示例,我将添加必要的打印语句以确保性能评估的输出可见,并移除了警告过滤以便能够看到警告信息:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import Perceptron
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import make_scorer, precision_score, recall_score, f1_score
import numpy as np
import matplotlib.pyplot as plt
# 读取数据
df = pd.read_csv('./Data/SMSSpamCollection.txt', sep='\t', names=['label', 'message'])
# 显示前五行数据
print("前五行数据:")
print(df.head())
# 提取标签并转换为数值
le = LabelEncoder()
df['label'] = le.fit_transform(df['label'])
# 文本特征提取
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['message'])
y = df['label']
# 数据划分
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# ...
# KNN模型的性能评估
k_values = list(range(1, 31))
knn_scores = {
'accuracy': [],
'precision': [],
'recall': [],
'f1': []
}
for k in k_values:
knn_clf = KNeighborsClassifier(n_neighbors=k)
accuracy = cross_val_score(knn_clf, X_train, y_train, cv=5, scoring='accuracy').mean()
precision = cross_val_score(knn_clf, X_train, y_train, cv=5, scoring=make_scorer(precision_score, average='macro', zero_division=0)).mean()
recall = cross_val_score(knn_clf, X_train, y_train, cv=5, scoring=make_scorer(recall_score, average='macro', zero_division=0)).mean()
f1 = cross_val_score(knn_clf, X_train, y_train, cv=5, scoring=make_scorer(f1_score, average='macro', zero_division=0)).mean()
# 将结果存储在字典中
knn_scores['accuracy'].append(accuracy)
knn_scores['precision'].append(precision)
knn_scores['recall'].append(recall)
knn_scores['f1'].append(f1)
# 输出性能评估结果
print("KNN性能评估结果:")
for k in k_values:
print(f"k = {k}, Accuracy: {knn_scores['accuracy'][k-1]:.3f}, Precision: {knn_scores['precision'][k-1]:.3f}, "
f"Recall: {knn_scores['recall'][k-1]:.3f}, F1: {knn_scores['f1'][k-1]:.3f}")
# 确定最优的k值(使用准确率作为示例)
optimal_k = k_values[np.argmax(knn_scores['accuracy'])]
# 绘制准确率、精确率、召回率和F1值随k值变化的图
plt.figure(figsize=(14, 10))
for metric, values in knn_scores.items():
plt.plot(k_values, values, label=f'{metric} (Optimal k={optimal_k} when {metric} is maximized)')
plt.title('Performance Metrics of KNN vs. Number of Neighbors')
plt.xlabel('Number of Neighbors (k)')
plt.ylabel('Scores')
plt.legend()
plt.grid(True)
plt.show()
print(f"The optimal k value for KNN is {optimal_k} based on the highest accuracy.")
请注意,我已经修正了文件路径中的拼写错误(./Date/
改为 ./Data/
),并确保性能评估的结果在循环结束后被打印出来。
如果代码执行后仍然没有输出,请检查以下几点:
SMSSpamCollection.txt
存在于指定的路径下,并且格式正确。