WEKA线性回归误差率过高

我正在尝试对一组数据(即书籍)执行线性回归,并使用所有属性预测评分。 下面是我如何格式化我的数据在Excel然后传送文件到CSV在WEKA上传它

Book Author Genre Publisher Year Rating 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 1 1 5 1 2008 5 

我做了25本书的清单,总共有2431个实例。 在WEKA上,我将“NumericToNominal”中的前四个属性转换为“Linear Regression”。 这是我的结果:

 Scheme:weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8 Relation: Books WEKA-weka.filters.unsupervised.attribute.NumericToNominal-Rfirst-4 Instances: 2430 Attributes: 6 Book Author Genre Publisher Year Rating Test mode:10-fold cross-validation === Classifier model (full training set) === Linear Regression Model Rating = 0.2267 * Book=18,15,25,13,8,24,20,17,16,19,11,4,6,21,3,7,23,12,1,9,10,14,2 + 0.4458 * Book=8,24,20,17,16,19,11,4,6,21,3,7,23,12,1,9,10,14,2 + -0.1527 * Book=24,20,17,16,19,11,4,6,21,3,7,23,12,1,9,10,14,2 + -0.314 * Book=20,17,16,19,11,4,6,21,3,7,23,12,1,9,10,14,2 + 0.6751 * Book=19,11,4,6,21,3,7,23,12,1,9,10,14,2 + 0.475 * Book=4,6,21,3,7,23,12,1,9,10,14,2 + -0.4018 * Book=3,7,23,12,1,9,10,14,2 + 0.2522 * Book=7,23,12,1,9,10,14,2 + -0.4505 * Book=23,12,1,9,10,14,2 + -0.2583 * Book=12,1,9,10,14,2 + 0.4949 * Book=10,14,2 + -0.3875 * Author=1,6,2,4,11,12,9,3,13,10,15 + -0.7318 * Author=6,2,4,11,12,9,3,13,10,15 + 0.594 * Author=2,4,11,12,9,3,13,10,15 + 0.379 * Author=4,11,12,9,3,13,10,15 + 0.6818 * Author=11,12,9,3,13,10,15 + 0.4396 * Author=12,9,3,13,10,15 + 1.0057 * Author=9,3,13,10,15 + -1.4347 * Author=3,13,10,15 + -0.4547 * Author=13,10,15 + 0.3638 * Author=10,15 + -0.4921 * Author=15 + 0.2706 * Genre=7,5,2,1,6,4,8 + -0.4036 * Genre=5,2,1,6,4,8 + -0.7927 * Genre=2,1,6,4,8 + -0.4448 * Genre=1,6,4,8 + 0.5731 * Genre=6,4,8 + 0.5519 * Genre=8 + 0.4517 * Publisher=21,9,8,2,20,10,3,22,5,11,1,18 + -0.4474 * Publisher=2,20,10,3,22,5,11,1,18 + -0.3018 * Publisher=10,3,22,5,11,1,18 + 0.474 * Publisher=5,11,1,18 + 0.6567 * Publisher=1,18 + -0.492 * Publisher=18 + 3.5816 Time taken to build model: 0.28 seconds === Cross-validation === === Summary === Correlation coefficient 0.2415 Mean absolute error 0.7883 Root mean squared error 0.9772 Relative absolute error 98.4114 % Root relative squared error 97.0741 % Total Number of Instances 2430 

而不是显示每个属性的计算,它显示多个计算,你可以看到错误率是相当高的。 我提交导致此问题的数据的方式有什么问题吗?