1 min readAug 29, 2019
Nicely written article, but not technically sound.
kmeans shouldn’t be used for one-dimensional data, it is a multivariate technique. It’s neither efficient nor optimal for this case. Especially when you’re talking about introductory techniques, why not just bin the data?
If more accuracy is required a technique like Jenks Natural Breaks Optimisation can be used.
Alternatively, do kmeans clustering across the three dimensions you specified with three clusters! Then you don’t need to do the adjustment at the end to merge the categories.