博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
The Glowing Python: K- means clustering with scipy
阅读量:7101 次
发布时间:2019-06-28

本文共 2015 字,大约阅读时间需要 6 分钟。

K- means clustering with scipy

K-means clustering is a method for finding clusters and cluster centers in a set of unlabeled data. Intuitively, we might think of a cluster as comprising a group of data points whose inter-point distances are small compared with the distances to points outside of the cluster. Given an initial set of K centers, the K-means algorithm alternates the two steps:
  • for each center we identify the subset of training points (its cluster) that is closer to it than any other center;
  • the means of each feature for the data points in each cluster are computed, and this mean vector becomes the new center for that cluster.
These two steps are iterated until the centers no longer move or the assignments no longer change. Then, a new point
x can be assigned to the cluster of the closest prototype.
The Scipy library provides a good implementation of the K-Means algorithm. Let's see how to use it:
from pylab import plot,showfrom numpy import vstack,arrayfrom numpy.random import randfrom scipy.cluster.vq import kmeans,vq# data generationdata = vstack((rand(150,2) + array([.5,.5]),rand(150,2)))# computing K-Means with K = 2 (2 clusters)centroids,_ = kmeans(data,2)# assign each sample to a clusteridx,_ = vq(data,centroids)# some plotting using numpy's logical indexingplot(data[idx==0,0],data[idx==0,1],'ob',     data[idx==1,0],data[idx==1,1],'or')plot(centroids[:,0],centroids[:,1],'sg',markersize=8)show()
The result should be as follows:
In this case we splitted the data in 2 clusters, the blue points have been assigned to the first and the red ones to the second. The squares are the centers of the clusters.
Let's see try to split the data in 3 clusters:
# now with K = 3 (3 clusters)centroids,_ = kmeans(data,3)idx,_ = vq(data,centroids)plot(data[idx==0,0],data[idx==0,1],'ob',     data[idx==1,0],data[idx==1,1],'or',     data[idx==2,0],data[idx==2,1],'og') # third cluster pointsplot(centroids[:,0],centroids[:,1],'sm',markersize=8)show()
This time the the result is as follows:

转载地址:http://ghzhl.baihongyu.com/

你可能感兴趣的文章
# 小贼音乐--Swift开发笔记 Step 1
查看>>
【项目管理】低成本提高关键路径成功率
查看>>
使用LUMPY检测结构变异
查看>>
安装Coturn(TURN / STUN服务器)
查看>>
出差第三天
查看>>
度小满获南京银行三年100亿元授信额度,双方并合作共同发力消费金融
查看>>
自动化运维工具Ansible的简单使用
查看>>
at,crontab定时程序
查看>>
zabbix添加端口监控
查看>>
放假前的“例行安检”
查看>>
基本形态学算法
查看>>
PostgreSQL 11 1Kw TPCC , 1亿 TPCB 7*24 强压耐久测试
查看>>
修改toolbar自适应报表宽度
查看>>
Linux基础命令---chkconfig
查看>>
Arista Networks推出400千兆以太网交换机
查看>>
企业网站需要什么样内容才能满足和吸引到用户?
查看>>
关于 Java NIO Buffer 使用的详细解读
查看>>
以太坊系列之十三: evm指令集
查看>>
9、MySQL函数
查看>>
powerdesigner使用sql文件生成uml图
查看>>