song.yz@foxmail.com wechat: math-box

统计机器学习



葡萄酒品种推断



通过物理化学手段测得的葡萄酒的一些成分分析,进而进行葡萄酒种类识别。数据来源是,葡萄来源于意大利同一产地,但是酿造品种略有差异。 通过化学方法测定3个品种中都共有的13个成分数值,进行葡萄酒分类。

原始数据来源是由 Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy. 提供。

主要特征包括
  • a)葡萄酒种类
  • b)苹果酸
  • c)矿物质
  • d)矿物质碱性
  • e) 镁含量
  • f)总酚类
  • g) 黄酮类化合物
  • h)非烷酚类
  • i)原青花素
  • j)颜色强度
  • k)色调
  • j)稀释葡萄酒OD280/OD315
  • j)脯氨酸


由于样本较少,这里采用MKNN方法进行分类测试,结果如下
$ go run mknn.go
the 1 object predicion class is     1
 , the real object is     1

the 2 object predicion class is     1
 , the real object is     1

the 3 object predicion class is     1
 , the real object is     1

the 4 object predicion class is     1
 , the real object is     1

the 5 object predicion class is     1
 , the real object is     1

the 6 object predicion class is     1
 , the real object is     1

the 7 object predicion class is     1
 , the real object is     1

the 8 object predicion class is     1
 , the real object is     1

the 9 object predicion class is     1
 , the real object is     1

the 10 object predicion class is     1
 , the real object is     1

the 11 object predicion class is     2
 , the real object is     2

the 12 object predicion class is     2
 , the real object is     2

the 13 object predicion class is     1
 , the real object is     2

the 14 object predicion class is     2
 , the real object is     2

the 15 object predicion class is     1
 , the real object is     2

the 16 object predicion class is     2
 , the real object is     2

the 17 object predicion class is     2
 , the real object is     2

the 18 object predicion class is     2
 , the real object is     2

the 19 object predicion class is     2
 , the real object is     2

the 20 object predicion class is     2
 , the real object is     2

the 21 object predicion class is     2
 , the real object is     2

the 22 object predicion class is     2
 , the real object is     3

the 23 object predicion class is     3
 , the real object is     3

the 24 object predicion class is     3
 , the real object is     3

the 25 object predicion class is     3
 , the real object is     3

the 26 object predicion class is     2
 , the real object is     3

the 27 object predicion class is     3
 , the real object is     3

the 28 object predicion class is     3
 , the real object is     3

the 29 object predicion class is     3
 , the real object is     3

the 30 object predicion class is     3
 , the real object is     3

the 31 object predicion class is     3
 , the real object is     3

The prediction accuracy is 0.870968	
采用naive bayes进行测试,分类结果如下
 go run naivebayes.go
the 1 object predicion class is     2
 , the real object is     1

the 2 object predicion class is     2
 , the real object is     1

the 3 object predicion class is     2
 , the real object is     1

the 4 object predicion class is     1
 , the real object is     1

the 5 object predicion class is     2
 , the real object is     1

the 6 object predicion class is     2
 , the real object is     1

the 7 object predicion class is     2
 , the real object is     1

the 8 object predicion class is     2
 , the real object is     1

the 9 object predicion class is     2
 , the real object is     1

the 10 object predicion class is     2
 , the real object is     1

the 11 object predicion class is     2
 , the real object is     2

the 12 object predicion class is     2
 , the real object is     2

the 13 object predicion class is     2
 , the real object is     2

the 14 object predicion class is     2
 , the real object is     2

the 15 object predicion class is     2
 , the real object is     2

the 16 object predicion class is     2
 , the real object is     2

the 17 object predicion class is     2
 , the real object is     2

the 18 object predicion class is     2
 , the real object is     2

the 19 object predicion class is     2
 , the real object is     2

the 20 object predicion class is     2
 , the real object is     2

the 21 object predicion class is     2
 , the real object is     2

the 22 object predicion class is     2
 , the real object is     3

the 23 object predicion class is     3
 , the real object is     3

the 24 object predicion class is     3
 , the real object is     3

the 25 object predicion class is     3
 , the real object is     3

the 26 object predicion class is     3
 , the real object is     3

the 27 object predicion class is     3
 , the real object is     3

the 28 object predicion class is     3
 , the real object is     3

the 29 object predicion class is     3
 , the real object is     3

the 30 object predicion class is     3
 , the real object is     3

the 31 object predicion class is     3
 , the real object is     3

The prediction accuracy is 0.677419
	
学习算法 正确率
NKNN 87.09%
NAIVE BAYSE 67.74%


在采用naive bayes进行分类时,发现有不少产地1的葡萄酒被分类为产地2。 国外有研究人员采用以上方法则达到了QDA 99.4%, LDA 98.9%, 1NN 96.1%。 所以这里还有优化的空间。暂时未去查原因,不知道是不是程序本身读数据的问题。