{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Clustering\n", "\n", "A common task or goal within data analysis is to learn some kind of structure from data. One way to do so is to apply clustering analysis to data. This typically means trying to learn 'groups' or clusters in the data. \n", "\n", "This example is a minimal example of clustering adapted from the `sklearn` tutorials, to introduce the key points and show an introductory example of a clustering analysis. \n", "\n", "As with many of the other topics in data analysis, there are many resources and tutorials available on the clustering analyses. A good place to start is the extensive coverage in the `sklearn` documentation. If you are interested in clustering analyses, once you have explored the basic concepts here, we recommend you go and explore some of these other resources. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Clustering is the process of trying to find structure (clusters) in data.\n", "
\n", "\n", "
\n", "Clustering\n", "article from wikipedia. The sklearn \n", "user guide \n", "has a detailed introduction to and tutorial on\n", "clustering. \n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Imports\n", "%matplotlib inline\n", "\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn import cluster, datasets\n", "from sklearn.cluster import KMeans\n", "from scipy.cluster.vq import whiten" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load an Example Dataset\n", "\n", "Scikit-learn has example datasets that can be loaded and used for example.\n", "\n", "Here, we'll use the iris dataset. This dataset contains data about different species of plants. It includes information for several features across several species. Our task will be to attempt to cluster the data, to see if we can learn a meaningful groupings from the data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Load the iris data\n", "iris = datasets.load_iris()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that `iris`, as loaded by `sklearn` is an object. The data is stored in `iris.data`. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 150 samples of data, with 4 features and 3 labels.\n" ] } ], "source": [ "# Let's check how much data there is\n", "[n_samples, n_features] = np.shape(iris.data)\n", "n_labels = len(set(iris.target))\n", "print(\"There are {} samples of data, with {} features and {} labels.\".format(\\\n", " n_samples, n_features, n_labels))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sepal length (cm)\n", "sepal width (cm)\n", "petal length (cm)\n", "petal width (cm)\n" ] } ], "source": [ "# Check out the available features\n", "print('\\n'.join(iris.feature_names))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "setosa\n", "versicolor\n", "virginica\n" ] } ], "source": [ "# Check out the species ('clusters')\n", "print('\\n'.join(iris.target_names))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Let's set up some indexes, so we know what data we're using\n", "sl_ind = 0 # Sepal Length\n", "sw_ind = 1 # Septal Width\n", "pl_ind = 2 # Petal Length\n", "pw_ind = 3 # Petal Width" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEcCAYAAADDfRPAAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3de7wcdX3/8dc7hwDhIjESEEIgSPmhSIzYSMBQDVK8oD9NEYQIFmgrrZaK8hMrlSpa8NKoFa8UFAHBaAWMaKlIlYvcogkXoyIKEsiFS7iEa4AQPr8/ZjaZ3TN7zszZ3dndc97Px2Mf2f3Od2Y+Mzm737l8P/NVRGBmZlYzrtsBmJlZb3HDYGZmddwwmJlZHTcMZmZWxw2DmZnVccNgZmZ13DDkkHSqpEhfp5aY79zMfNPaFMuyzDKfl/S0pHslXSfp45K2b3H5c9PtPVXSxHbEXHC9czLblX09Jul6Sce2uOzaNk1rU4ynFqi/YTtGus4qSTqmtp9yppXa9l4j6Y2Z+P+zYdo5mWmnNUy7NDNtz7Ss9h1cVmC92d+OYzLlTb9nkqZl5jm3hc1um026HYCVImAz4MXp6zXACZIOj4grRrjMucDR6ftzgTWtBtmirYH9gP0kvSIiPjiCZcwBPp6+vwpY1pbIRp9jgNel70/tXhgdcQPwPMnB7+yGaa/JvG827RHgtjbG02vfsyH5jKENJG0OEBHHRITS17IOrGpXkoZhOskfF8ALgR9IelkH1leVqyNCwOYkP1Y1J7TrzMvGloh4DPhN+nFPSS8EkLQtsEem6j6SNkmnvRR4UVp+faTZvxExLf1OT6sk+B7ghqGEhktFfyHpIkmPkh5ZNLuUJOk9khZLeljSM5JWSrpC0tFNVtVURDwbEb+JiGOBH6bFW7LxCBlJh6fLXy7pqXSdf5J0ZvbSU3rJIxvDXdn4JW0l6TxJSyU9JGmdpDWSrpF0eM7+qc27rOx2pdv2TEScByytLRKYmVn+vpJ+IOn+NJZV6T6flqmzLLsvgCszcc1J63xd0k2SVqfLeVzSLyW9V5JGEntZkt6V7sdH0/+fP0g6XdIWDfVqsV8l6WBJv5K0VtKdkj7cGG9a51Yllxxvl/Tuhr/LObVLF2w8Wxj2Mpik49MYn5J0i6Q3D7N9kyU9my7zJw3T3pxZ3yfTshmSLkm/G8+kf283SfpPSeNL7t6a62qrJDkLhY1nBIuB1cAWwN5pWfbs4fpMvLmXkiTtJ+mGdF8vk3RiXhDDfc9y6r9T0q/T/+fbRvI70bKI8KvhRXJaHenr1Ez5uZnyBzPvl+VMn5aWHZYpa3xdVCCWZY3LzEzbLzPtMWBcWn7mEOv8PbBp7WBoiNc0kstVQ9U5uiGeuv0xzHbNydS/qmHabzLTDk3L3gk81ySOh4A9cvZX42tOWufpIep8vEmMpxbYpg3LGabel4dY/6+ACTnLfJTk0khj/aMa4s3bRyuz+yD9v236/5qz7ffl1HsWeMkw23lJWncdMDlTfn5a/jzJWfAWJD/SzWLaaoTf4yMzyzg9LftM+vmLwA/S9yek087J1H9dzndwWaZsT+DJnFhXZd4fU/B7lv3/yNvXAexf5W+gzxhG7jGSH+YJwMFD1Htt+u8TJKewmwG7kPzQ/aTZTAX9PvN+azaeBn8HmAVsC4wHtge+lU7boxZvJJdvzsssY9eovxT2OHA4yR/uFiSXel4DPJXWH8n1/6YkbZoeHb08LQrgV+lR9NeBAeAm4KUk+/EAkh+oScD8dJumAZ/ILPaAzDZdlZb9DbA7yT7bFHgFsCKddkInzxok7Qscn348l6Tx3QI4KS2bCbw3Z9YXAJ8muXR4fKb83Zn3p5PsI0jOmrYB5gE7ZhcUEcvS//urM2W1fZS37ZOAQ4CJwIVp2XiSv42hnJP+uwnJ3zuSJpBcb4fkgOAu4GUkf6sAHyb5O5sM7J9u83PDrKeZ6zLv90//nZ2Zdl1DWa3OOpIGeij/SvL/BvA1kn3zepJ9XqfA9yxre+B96fI+myl/N1WqshXqlxfFzhjelTNfdvq0tOxENh4dnQ+cALwB2KZgLMsal5mZ9iLqjyomp+W7A98G7iH54Ww8+vjnoWLOTBPJH+mN5B+xrm1hH8/JiavxdUZa96ACdddmlp39/5uTs+7DSW5KPwysz1nW9jkxnlpgm+qOupvUOb3AtvwkZ5n3AQNp2VaZ8t+nZVtktuWhWt102nV5+yPdB7nxNmz7RZnyt2bKzxxmfwyw8Qj62rQsewZ9VFq2HcmPcQBLgI+RNCR/1obv8op0uU+l+21t+nlHYN/0/UqShqgW1y+bfAeXZcruz9TfJlP+7Uz5MQW/Z9My0xZnyvfK+5uo4uUzhpG7uWC9rwHfJ/lRfTfJKezlwP2SPtJiDC/NvH8MeEjSNsC1wFHAVJIju0YTCi7/n4Gvkpx9vICkocjavFS0xTwJLAKOAz6Qlm1XYL7NJW05XCVJRwDfJbm+/kLy77MV3T8jUWRbXpRTdmdErE/fP5kpr/0fTGLjtqzM1IXkAKEVt2fe5607VxrD+enH10jaBXhX+vlR4OK03gPAP5Bcnn0VyRnf94A/SvqFpBe0EHvtXsEE4O/SmJdFxCqSRmgtSSNxZGae6xhe7f/o8Yh4NFO+Iq9yCSPa1+3mhmHk1hapFBFPR8Q7Sb64+5NcxlhEcinkU5KmtBDDyZn3/x0Rz5NcXqn9+PwM2CGSU9n3NwtxiOUfkXk/F9gsXdZDI4y3matj46n1VhGxb0ScHelhE/BApu7ZmbrZyx/jIqL2RSq6Tf9Ecj1fJJeoqpDdliObbMs+OfOtq73J7Jesh0kOPgB2kJT9bk9tEstQ+yl33SXmqflW+q9IfvxrN60XRMSG71BEfJPkstp04B3Al9JJ+wP/WHKdWdkf+dqlz+vTdWYvGX2wyTzNPJj+u3V6MFazU5P6VezrtnHD0GGS3iHpeGAKcCvJ2cOttck0/0NqtrzxkvZSkgjzlrT4SeCT6fvs9dingSclvZzkRzBP9kd+RsP19eyy1gDjJf0r+Ue0LfdKGsL1JP3KAY5Oe/RsnfZ8mS1pPsmZWE12m17R8COZ3abHkrB1LBt7prSFpDflvLYDfpypdloa/+aSpqa9db5D/dFrIRHxFMklP0iu138o3UdHUN9vP2vDfpL0yrLrLBjX7Ww8av8QyQERbLz/gKRtJX2O5J7dA8CP2NjjDmDnTN2yf2PZH/mdc8qubVxHJt6hXJl5/ylJ20g6gOReTJ6hvme9p8rrVv3yotg9hmk58w2aDpySKWt8rSLTA6VJLMuGmD9IjhQPytR/IcmXq7HeH5ps06E5dZel0z6aM201yY/0oGvTjfMPs11zMvWvKlB/Hvn3A2qvczN1Z+bVSacdmTPtKWB5zv9dNsZTC8Q41P9TAHPTel8bpt4xOcu8qsm6ljXs07xeSdmeMq/L1P9QTt2rhtr2hvJzh9sn6Tx/27COpQ3Tdyqy38r+jaX1NyHp+JFd3ozM9IMbpt01xHcwu6+b9UrK9q7K/j8O9T2blrdPG8qH/Y608+Uzhs77GUkvoTtI/kDXA/eSXueOzOl0AQE8Q3Ij8nqSnicvi0zWc0Q8QnK6fi3JD94qkobuM02WeTHJDdF70tiyPgt8iuTm3FqSXiyvJ7k+XKmIWEByWeFikht/z5F8CRencX4+U3cxyaWzO6k/NSciLiS5bHAXyRnVYpL9dWfHN2JjDO8juQd0Ncm+XEdybfpKkl45/zPC5V4FvJ0kD+RZkoOBo6m/TJY9cv0qSdfme6Gjly2+R/318nMapj8C/AfJZZ0HSf4OHyf5G39XRCwc6Yoj4jngl5mix9mYJ0O6jucbPhdZ7u9IOkUsItnXy0kOpL7WZJahvmc9R2nLZGZ9Lk0EOwC4MpLr50h6E7CQ5BLOvcBOkdyLMmvKz0oyGz02I+nxtk7S/SR5GrUbo88B/+BGwYrwpSSz0eMZkkSqu0l6wU0guXRxAfDqiLi0i7FZH/GlJDMzq+MzBjMzq9P39xi23XbbmDZtWrfDMDPrK0uWLHkwIibnTev7hmHatGksXry422GYmfUVSXc3m+ZLSWZmVscNg5mZ1XHDYGZmddwwmJlZHTcMZmZWxw2DmZnVqay7qqSpJKM5vZjkaYZnRcQZDXXmkDyH/a606JKI+CRmZjkW3ryS+Zffzqo1a9lx4gROeuMezN272NhXefMCI17eaFJlHsNzwP+LiJskbQ0skXRF+vjarF9ExFsrjMvM+tDCm1dy8iVLWbsueYr1yjVrOfmS5Inaw/2Y58170vdvBcG69VF6eaNNZZeSIuLeiLgpff84cBvJqGZmZqXNv/z2DT/sNWvXrWf+5bc3mWPoedc9HxsahbLLG226co9B0jSSoRQX5UzeT9Ktkv4nHZIyb/7jJC2WtHj16tUdjNTMetWqNfljXDUrL1tnJHVHi8obBklbkYxm9IGIeKxh8k3ALhExA/gyyQAjg0TEWRExMyJmTp6c+6gPMxvldpw4oVR52TojqTtaVNowpCNMXQxcGBGXNE6PiMci4on0/WUkg89vW2WMZtYfTnrjHkwYP1BXNmH8wIabyGXnHT9OjB/QiJY32lTZK0nAN4HbIuILTeq8GLg/IkLSPiQN10N5dc1sbKvdEB5JL6Jm8450eaNNZQP1SNof+AXJQNy14QX/BdgZICLOlHQ88F6SHkxrgRMjYsjBuWfOnBl+uqqZWTmSlkTEzLxplZ0xRMS1gIap8xXgK9VEZGZmefp+PAYz665WksyKOmXhUhYsWs76CAYk5s2aymlzp7d1HbaRGwYzG7FWksyKOmXhUi648Z4Nn9dHbPjsxqEz/KwkMxuxVpLMilqwaHmpcmudGwYzG7FWksyKWt+kg0yzcmudGwYzG7FWksyKGlB+n5Vm5dY6NwxmNmKtJJkVNW/W1FLl1jrffDazEWslyayo2g1m90qqTmUJbp3iBDczs/J6IsHNzEanogPeFC0rc7ZRNIei3fW6qYoYfcZgZiPWmMcAycPosgPeAMnD6SIZ82CoehPGD/DpQ6YX+qHLW3fe/O2u103tjHGoMwbffDazESs64M269VHXKDSrVyYHomgORbvrdVNVMbphMLMR68QgNkWXWTSHot31uqmqGN0wmNmIdWIQm6LLLJpD0e563VRVjG4YzGzEig54M35AyT2FYeqVyYEomkPR7nrdVFWM7pVkZiNWZsCbomVFb6IWzaFod71uqipG90oyMxuDnMdgZn2tTN/9fshFyNNLcbthMLOeVmbMhyrGh+iEXovbN5/NrKeV6bvfD7kIeXotbjcMZtbTyvTd74dchDy9FrcbBjPraWX67vdDLkKeXovbDYOZ9bQyfff7IRchT6/F7ZvPZtbTyvTd74dchDy9FrfzGMzMxiDnMZhZT2plLIe5e0/p23EWei2eRj5jMLOuyB3LocS4De/48ylcvGRl342z0CvxeDwGM+s5uWM5lBi3YcGi5X05zkKvxZPHDYOZdUWrffTXN7na0evjLPRaPHncMJhZV7TaR39Ayi3v9XEWei2ePG4YzKwrcsdyKDFuw7xZU/tynIVeiyePeyWZWVe0OpbD3L2nMHOXSX03zkKvxZPHvZLMzMagnshjkDQVOB94MfA8cFZEnNFQR8AZwMHAU8AxEXFTVTGaWaJZP/tW8w5Gu1byE3opt6GyMwZJOwA7RMRNkrYGlgBzI+J3mToHA/9E0jDMAs6IiFlDLddnDGbt1ayffV7eQJm8g27lDVSllfyEbuQ29EQeQ0TcWzv6j4jHgduAxi1+O3B+JG4EJqYNiplVpFk/+7y8gTJ5B73UT78TWslP6LXchq70SpI0DdgbWNQwaQqwPPN5BYMbDyQdJ2mxpMWrV6/uVJhmY1Kz/vTN8gZaXe5o0Up+Qq/lNpRqGCTtKukASQdLerWkzcuuUNJWwMXAByLiscbJObMM+muMiLMiYmZEzJw8eXLZEMxsCM360zfLG2h1uaNFK/kJvZbbMGzDIGmapM9Kuge4A/gZ8GOSo/01kq6QdJikIssaT9IoXBgRl+RUWQFMzXzeCVhVYDvMrE2a9bPPyxsok3fQS/30O6GV/IRey20Y8sdc0hnArcBLgI8CewLbAJuS9C46GLgW+Dfg15JePcSyBHwTuC0ivtCk2qXAXyuxL/BoRNxbbpPMrBVz957Cpw+ZzpSJExAwZeIEPn3IdE6bO31Q+fxDZzD/sBn1ZYfNYP6hMwbNP5pvPEPz/VZku1uZtxOG7JUkaT7w2Yh4cNgFJT2KtoiIi5pM3x/4BbCUpLsqwL8AOwNExJlp4/EV4E0k3VWPjYghuxy5V5KZWXkjzmOIiJOKriQiLhtm+rXk30PI1gngH4uu08yqldfXfvHdD7Ng0XLWRzAgMW/WVE6bO73QvL12FtEPMVbBj8Qws0Ia+9qvXLOW//f9W1mf6a66PoILbrwHoK5xyJv35EuWAvTMD28/xFiVwr2SJL1Q0hmSfi3pPkkPZF+dDNLMui+vr/365/MvRS9YtLzuc6/108/TDzFWpcwZw/nAy4HzgPvJ6UZqZqNXmT71jTkPvdZPP08/xFiVMg3DHOB1fnaR2di048QJrCz4I9mY89Bs3l7KbeiHGKtSJsHtzpL1zWwUyetrPzAuvz/JvFlT6z73Wj/9PP0QY1XK/NCfAHxa0gxJA8PWNrNRJa+v/ecPm8FR++684QxhQOKofXce1Cup1/rp5+mHGKtS+OmqkqYA3wP2y5seEV1pLJzHYGZWXrvGY1hAkvX8fnzz2WzUOPLsG7juzoc3fJ692yR2nbxVodwEaH/f/1MWLh207iIjtZWNZ7SMndAJZc4YngL2iYjfdDakcnzGYDZyjY3CUPIuEbV7HIFTFi7dkAeRNY6Nj0sYah1F4+m3sRM6oV3jMfwOeEF7QjKzXlC0UYDBuQnQ/r7/eeuA+kZhqHUUjWc0jZ3QCWUahlOAL0j6S0nbS5qUfXUqQDPrDXnjMbS773+ZMR/y1lE0ntE0dkInlGkYLgP2AX5K8ijs1enrwfRfMxvF8sZjaPc4AmXGfMhbR9F4RtPYCZ1QpmE4IPN6feZV+2xmfWb2bsVP9htzE6D9ff/z1gGDf6iaraNoPKNp7IROKNwrKSKu7mQgZla9C9+zX0u9kmo3W9vVQ6e2jpH2SioaTytxt3ube1GZXknHA2si4oKG8qOAF0TE1zoQ37DcK8nMrLx29Ur6AJDXZWAZ8MERxGVmZj2oTILbTsDdOeUr0mlm1iVVJVyN9sQuS5RpGO4DXklyhpD1KpKeSWbWBVUNMOOBbMaOMpeSvgN8SdJBksanrzcAXwQu7Ex4ZjacqhKuxkJilyXKnDF8HNgVuByo/XWMA74P/Gub4zKzgqpKuBoLiV2WKHzGEBHrImIe8H+AdwFHAntExBERsa5TAZrZ0KpKuBoLiV2WKD3wTkTcERHfj4j/iog7OhGUmRVXVcLVWEjsssSQDYOkUyRtWWRBkmZL+r/tCcvMiqpqgBkPZDN2DJngJulc4P8CFwOXAosj4r502ubAnsD+wFHAi4CjI+LaDsdcxwluZmbljXignog4RtJ04Hjg28ALJAWwDtgUEHATcBZwbkQ829bIzayQKganaXXdVem1ePrRsL2SImIp8PeS3gu8AtgFmECSu3BLRDiHwayLiuYXdCIPoddyG3otnn5VplfS8xFxS0T8MCK+GxH/60bBrPuqGJym1XVXpdfi6VeleyWZWW+pYnCaVtddlV6Lp1+5YTDrc1UMTtPquqvSa/H0KzcMZn2uisFpWl13VXotnn5V5pEYZtaDqhicptV1V6XX4ulXhQfq6VXOYzAzK2/EeQw5CzocOBDYjobLUBHxtmHmPQd4K/BAROyVM30O8EPgrrTokoj4ZJn4zPpVq33vZ51+Bfc/vjGNaPutN+Xkg/cctEwodjR9ysKluUN75sVZdJnWP8oM7TmfZBS3K4FVQN2MEXHsMPO/FngCOH+IhuFDEfHWQgGlfMZg/a6x7z0k18WLPm6isVFoZvw4gWDd+o1f3bz1nLJwKRfceM+g+WfvNomb7nm0Ls6iy7Te064zhr8G5kXERSMJIiKukTRtJPOajWZD9b0v8uNapFEAWPf84IPAvPUsWJQ3gi9cd+fDI16m9ZcyvZLGAbd0KpDUfpJulfQ/kl7erJKk4yQtlrR49erVHQ7JrLO63fe+cT3r23Df0XkD/a1Mw3AWycPyOuUmYJeImAF8GVjYrGJEnBURMyNi5uTJkzsYklnndbvvfeN6BqS2L9P6y3CP3f5S7QVsA5wg6TpJX89OS6e3JCIei4gn0veXAeMlbdvqcs16Xat977ffetNC9caPE+MH6n/089Yzb9bU3Pln7zZpUJxFl2n9ZbgzhumZ18tJLiU9C7y0Ydr0VgOR9GIpOVSRtE8a20OtLtes17U6zsGijx40qHHYfutN+eLhr6xb5vzDZjD/0BnDrue0udM5at+dN5w5DEgcte/OXPie/QbFWXSZ1l8qy2OQtACYA2wL3E8yhvR4gIg4U9LxwHuB54C1wIkRcf1wy3WvJDOz8trSKynNQzghIh5vKN8S+HJE/M1Q86fjRQ81/SvAV4rGY9ZrqhoHoFmOQZF4Ft/98KB5Z+4yqefzEDzGQrXK5DGsB3aIiAcayrcF7ouIrjxew2cM1gtazUUoqlmOwVH77lzXOOTFMzBOrM/pXtpY3mt5CFXt27FmqDOGYXslSZok6UUko7W9MP1ce00myWa+v70hm/WXqsYBaJZj0FieF09eo5BX3mvjF3iMheoVOcp/kCTLOYDf5UwPkvsFZmNWVbkIzXIMGstbXW8v5SF0O89jLCrSMBxAcrbwc+AdQDb98Vng7ohY1YHYzPrGjhMnsDLnh6rd/fkHpNzGoTH3oFk8RfVSHkJV+9Y2GvZSUkRcHRFXAbsCC9PPtdcNbhTMqhsHoFmOQWN5XjwD4/IT1xrLey0PwWMsVG/IM4b0wXdZu6hJVmREXNOuoMz6TVXjANRuMA/XK6lZPP3YK8ljLFRvyF5Jkp4nuYdQaw1qlRs/ExH1TXpF3CvJzKy8VvIYsg8imgV8DjgduCEt2w/4F+DDrQZp1gva3V/+yLNvqHsq6ezdJrHr5K0GHbVD/llAXs5C3hE+DD6izivzUbYVUSaPYQnwkYi4oqH8IODfI2LvDsQ3LJ8xWLu0u798Y6NQ1u7bbckfH3hyUPk4QbaH6fgBQdQ/AtvjJNhwWspjyNgTWJFTvpLk2Ulmfa3d/eVbaRSA3EYB6hsFSH78G8dFWPd81DUK4L7/VlyZhuG3wMclbegjlr7/WDrNrK+Nhf7yo2lbrHPKPMbivcCPgZWSfp2WTQfWA29pd2BmVRsL/eVH07ZY5xQ+Y4iIX5HkMnyEZFCdm9P3u6bTzPpau/vLz95tUkvx7L7dlrnljekI4weU3FPIlnmcBGtBmUtJRMRT6ehpJ0bEByPi7IjIvxBq1mdaHReh0YXv2W9Q4zB7t0m5Yx3klV1x4pzc8i+8s2GchUNnMP+wGR4nwdpmuDyGQ4AfRcS69H1TEXFJu4Mrwr2SzMzKayWP4SLgxcAD6ftmAuhKgptZr8jLgYBiuQRl8idaybXwuAZWRGUjuHWKzxisF+TlQBTNJSiTP9FKroXHNbCsVsdj2Kz9IZmNLnk5EEVzCcrkT7SSa+FxDayoIt1VH5V0A8ljt68EboyI5zoblll/KZMf0Fi3TP5EK7kWYyFPw9qjSK+kfyLJbv574BrgEUk/kfTPkl4tqVTPJrPRqEx+QGPdZvPmlZep2855bWwpMh7D2RFxVETsBLwMOAlYA3wAuBF4WNIPOxumWW/Ly4EomktQJn+ilVwLj2tgRZXJfCYibgduB86UtAPwPuD9JOM+m41ZzcYMyCtrvNFbZryBVsYm8LgGVlSZp6tuC8whGerzAOAlwBKSy0tXRcTlHYpxSO6VZGZWXit5DEg6g6Qh2J2kIbgaOAG4LiKeamegZtAffe1byVkw63XDnjGko7jdTTJIz2URcVcVgRXlM4bRpR/62ufmLOSMidBrcZtltToew2uBbwKHAL+VdLek8yQdK2nXdgZq1g997XNzFnLGROi1uM2KKtIr6dqIOC0iDgQmAkcDd6X/1hqKczsbpo0V/dDXvpWcBbN+UPbpqs9GxFXAp4CPA18maSze3f7QbCzqh772reQsmPWDQg2DpE0kzZZ0iqSfkeQx/Bw4jOThekd3MEYbQ/qhr31uzkLOmAi9FrdZUUV6Jf0UeA2wBUkG9JXAPwI/j4i7OxuejTX90Ne+lZwFs35QpFfSAtLnJEXEHZVEVYJ7JZmZlddSr6SImJc+FqOlRkHSOZIekPSbJtMl6UuS7pD0a0mvamV9ZmY2MqUeidGic4GvAOc3mf5mkiS63YFZwNfTf80KO2XhUhYsWs76CAYk5s2aymlzp4+4HrR/YBzwJSfrbZU1DBFxjaRpQ1R5O3B+JNe2bpQ0UdIOEXFvJQFa3ztl4VIuuPGeDZ/XR2z4nP3RL1oPBiezrVyzlpMvWQpQemCclWvWctL3b60bvKfM8syq0kuPzJ4CLM98XpGWmRWyYNHyQuVF60H7B8YpOniPWTf1UsOgnLLcO+OSjpO0WNLi1atXdzgs6xfrm3SkaCwvWg86MzBOq3XNOq2XGoYVwNTM552AVXkVI+KsiJgZETMnT55cSXDW+waUd2wxuLxoPejMwDit1jXrtCEbBkmPS3qsyKsNsVwK/HXaO2lf4FHfX7Ay5s2aWqi8aD1o/8A4RQfvMeum4W4+H9+uFaX5EHOAbSWtIHmkxniAiDgTuAw4GLgDeAo4tl3rtrGhduN4uN5GRetBZwbGGenyzKpSeKCeXuUENzOz8lp97LaZmY0hhRsGSZtK+oSkP0h6WtL67KuTQZqZWXXKnDH8G8lTVD8PPA+cBHwVeAh4X/tDMzOzbijTMLwT+IeI+E9gPfDDiHg/yU3kgzoRnJmZVa9Mw7A98Lv0/RMkA/QA/AR4QzuDMjOz7inTMNwD7Ji+vwN4Y/p+P8Bpm2Zmo0SZhuEHwIHp+zOAT0i6i+Spqd9oc1xmZtYlhZ+uGhEnZ9Oj2DMAABBBSURBVN5fJGk5MBv4Q0T8uBPBmZlZ9Qo3DJJeC1wfEc8BRMQiYFE6HvRrI+KaTgVpZmbVKXMp6UpgUk75Nuk0MzMbBco0DCL/MdgvAp5sTzhmZtZtw15KknRp+jaACyQ9k5k8AOwFXN+B2MzMrAuK3GN4KP1XwCPUd019FrgWOLvNcZmZWZcM2zBExLEAkpYBn4sIXzYyMxvFCt9jiIhPRMSTkmZKOlzSlgCStpRUuHeTmZn1tjLdVbcnGWXt1ST3G3YH/gR8AXgaOKETAZqZWbXK9Er6D+A+kl5IT2XKv4+flWRmNmqUuQR0IHBgRDyi+kHT7wR2bmtUZmbWNWXOGCaQ9EJqNJnkUpKZmY0CZRqGa4BjMp9D0gDwz8DP2hmUmZl1T5lLSR8Grpb0amAzkpHcXk7ySIzZHYjNzMy6oEx31d8BrwBuAH4KbE5y43nviLizM+GZmVnVSuUfRMS9wMc6FIuZmfWAYc8YJG0h6auSVkp6QNJ3JG1bRXBmZla9ImcMnyC56XwhSe+jecDXgcM6F9botvDmlcy//HZWrVnLjhMncNIb92Du3lO6HZaZGVCsYTgE+NuI+C6ApAuA6yQNRMT6jkY3Ci28eSUnX7KUteuSXbdyzVpOvmQpgBsHM+sJRW4+TwV+UfsQEb8EngN27FRQo9n8y2/f0CjUrF23nvmX396liMzM6hVpGAYYnNj2HCVvXFti1Zq1pcrNzKpW5MddDB6gZ3PgbEkbnpkUEW9rd3Cj0Y4TJ7AypxHYceKELkRjZjZYkTOG84BVJAP21F4XAMsbyqyAk964BxPGD9SVTRg/wElv3KNLEZmZ1Ss8UI+1R+0Gs3slmVmv8n2CLpi79xQ3BGbWs8o8RK9lkt4k6XZJd0j6SM70YyStlnRL+vq7KuPrpoU3r2T2Z37Orh/5b2Z/5ucsvHllt0MyszGqsjOG9EmsXwUOAlYAv5J0afoMpqzvRcTxVcXVC5zbYGa9pMozhn2AOyLiTxHxLPBd4O0Vrr9nObfBzHpJlQ3DFJKeTDUr0rJG75D0a0kXSZqatyBJx0laLGnx6tWrOxFrpZzbYGa9pMqGQTll0fD5R8C0iHgF8L8kXWUHzxRxVkTMjIiZkydPbnOY1WuWw+DcBjPrhiobhhUkj9eo2YkkP2KDiHgoImqJdGcDf15RbF3l3AYz6yVVNgy/AnaXtKukTYEjgEuzFSTtkPn4NuC2CuPrmrl7T+HTh0xnysQJCJgycQKfPmS6bzybWVdU1ispIp6TdDxwOcnzl86JiN9K+iSwOCIuBd4v6W0kz2J6mPoxpkc15zaYWa9QRONl/v4yc+bMWLx4caXrLDOewpFn38B1dz684fPs3SZx2MydB80PxbKhPZaDmbWDpCURMTN3mhuGchpzDiC5H5B36aexUagR9Xfdxw8IAtY9v7E0b5ll1m1mNpShGoZKM59HgzI5B3mNAgzuirVufdQ1Cs2W6XwHM6uCG4aSqsw5aFym8x3MrApuGEqqMuegcZnOdzCzKrhhKKlMzsHs3SblLqMx02/8gBg/rr40b5nOdzCzKrhhKKlMzsGF79lvUOMwe7dJ/Mfhr6ybf/6hM5h/2Ixhl+l8BzOrgnslmZmNQe6VZGZmhXkEtxE4ZeFSFixazvoIBiTmzZrKXaufGJTIduF79iuckObENTPrFb6UVNIpC5dywY33FKq7+3ZbsuKRp4dNSHPimplVzZeS2mjBouXDV0r98YEnCyWkOXHNzHqJG4aS1rfhDMuJa2bWy9wwlDSgvPGGynHimpn1MjcMJc2blTvaaK7dt9uyUEKaE9fMrJe4YSjptLnTOWrfnTecOQxIHLXvzrmJbFecOKdQQpoT18ysl7hXkpnZGDRUr6Qxm8dQNG8gL2dh0Z8e4o8PPLmhzu7bbcldq5/kuUwbu4ngjk+/hZd+9DKeXr9xwuYDYpstxnP/489uKNt+6005+eA9ne9gZj1hTJ4xFM0bKJOz0G7OdzCzTnIeQ4OieQNlchbazfkOZtYtY7JhKJo30I6chVY438HMumFMNgxF8wbakbPQCuc7mFk3jMmGoWjeQJmchXZzvoOZdcuYbBiK5g00y1nYfbst6+rtvt2WbNJwcrGJYNln3sLmA/UTNh8Q22+9aV3Z9ltvyhcbBu9xvoOZdcuY7JVkZjbWOY+hoFZyBPLyHU6bO71puZlZr3LDkGrMEVi5Zi0nX7IUYNjGoTHfYX0EF9x4z6BEuFo54MbBzHrWmLzHkKeVHIFm+Q7ZRqFIfTOzXuCGIdVKjkDZfIdu50eYmQ3FDUOqlRyBsvkO3c6PMDMbihuGVCs5As3yHRq7tQ5X38ysF7hhSLWSI9As3+GKE+fklvvGs5n1MucxmJmNQT3zdFVJb5J0u6Q7JH0kZ/pmkr6XTl8kaVqV8ZmZWYUNg6QB4KvAm4E9gXmS9myo9rfAIxHxZ8B/AJ+tKj4zM0tUecawD3BHRPwpIp4Fvgu8vaHO24Hz0vcXAQdK7sJjZlalKhuGKUA2s2tFWpZbJyKeAx4FXtS4IEnHSVosafHq1as7FK6Z2dhUZcOQd+TfeOe7SB0i4qyImBkRMydPntyW4MzMLFFlw7ACyHbg3wlY1ayOpE2AbYCHK4nOzMyAah+i9ytgd0m7AiuBI4B3NdS5FDgauAE4FPh5DNOfdsmSJQ9KuruFuLYFHmxh/l4ymrYFRtf2jKZtgdG1PWN1W3ZpNqGyhiEinpN0PHA5MACcExG/lfRJYHFEXAp8E/i2pDtIzhSOKLDclq4lSVrcrC9vvxlN2wKja3tG07bA6Noeb8tglT52OyIuAy5rKPtY5v3TwGFVxmRmZvX8SAwzM6vjhgHO6nYAbTSatgVG1/aMpm2B0bU93pYGff+sJDMzay+fMZiZWR03DGZmVmfMNgySzpH0gKTfdDuWVkmaKulKSbdJ+q2kE7od00hJ2lzSLyXdmm7LJ7odU6skDUi6WdKPux1LqyQtk7RU0i2S+v5595ImSrpI0u/T789+3Y5pJCTtkf6f1F6PSfrAiJc3Vu8xSHot8ARwfkTs1e14WiFpB2CHiLhJ0tbAEmBuRPyuy6GVlj40ccuIeELSeOBa4ISIuLHLoY2YpBOBmcALIuKt3Y6nFZKWATMjYlQkhEk6D/hFRHxD0qbAFhGxpttxtSJ9kvVKYFZEjCj5d8yeMUTENYySx21ExL0RcVP6/nHgNgY/oLAvROKJ9OP49NW3Ry+SdgLeAnyj27FYPUkvAF5LklhLRDzb741C6kDgzpE2CjCGG4bRKh3caG9gUXcjGbn00sstwAPAFRHRt9sCfBH4MPB8twNpkwB+KmmJpOO6HUyLXgKsBr6VXur7hqT8gdr7yxHAglYW4IZhFJG0FXAx8IGIeKzb8YxURKyPiFeSPGhxH0l9ealP0luBByJiSbdjaaPZEfEqkgG3/jG9JNuvNgFeBXw9IvYGngQGjSzZT9LLYW8Dvt/KctwwjBLp9fiLgQsj4pJux9MO6Wn9VcCbuhzKSM0G3pZel/8u8HpJF3Q3pNZExKr03weAH5AMwNWvVgArMmekF5E0FP3szcBNEXF/KwtxwzAKpDdsvwncFhFf6HY8rZA0WdLE9P0E4C+B33c3qpGJiJMjYqeImEZyev/ziDiqy2GNmKQt084NpJdc3gD0ba++iLgPWC5pj7ToQKDvOmw0mEeLl5Gg4ofo9RJJC4A5wLaSVgAfj4hvdjeqEZsNvBtYml6bB/iX9KGF/WYH4Ly0Z8U44L8iou+7eY4S2wM/SEfb3QT4TkT8pLshteyfgAvTSzB/Ao7tcjwjJmkL4CDg71te1ljtrmpmZvl8KcnMzOq4YTAzszpuGMzMrI4bBjMzq+OGwczM6rhhsFFP0jGSnhi+Zm9qJX5Jr5P0h7T7b0dImi5p5Sh5nIThhsEqIulcSZG+1kn6k6TPlfkxSZfRkZyG9HHSH+rEsrscx3zg9IhY38Zl1omIpcCNwImdWodVyw2DVel/SRLYXgKcArwP+FxXIxrFJL0GeCktPjenoG8B75U0ZpNmRxM3DFalZyLivohYHhHfAS4E5tYmStpT0n9LejwdRGmBpBen004FjgbekjnzmJNO+4yk2yWtTY+4/13S5u0MfKjY0unnSvqxpBPSyyqPSPpWmo1aq7OlpPMlPSHpfkknp/Ocm06/CtgFmF/bxoYYDpT0G0lPKhmYaddhwn4X8L8R8VTDct4iaVG6vx6S9KPa/kr338fS7Xlc0nJJhysZ0Oa7aex/lPSGhnX9FJhE8jQB63NuGKyb1pKMt1AbbOgakmfv7EPyjKStgEsljSM5s/gvNp517ABcny7nSeBvgJeRnIUcAXy0XUEWiK3mL4C90umHA38FZEfT+zzwurT89cCMdJ6aQ0ge7PbJzDbWbAacTLKd+wETgTOHCf0vgLpR1iS9CfghcAXw58ABwNXU/xZ8APglyQPl/gs4D/gOcBnwynRfXJBtfCPiWeCWdPus30WEX351/AWcC/w483kf4EHge+nnTwI/a5jnhSTP/98nbxlDrOsfgDsyn48BnhhmnmXAh5pMKxrbcmCTTJ2zSY7YIWlIngWOyEzfEngEOHeoONL4A9gjU3ZkurxxQ2zTGuDYhrLrgO8Osx8WZD5vla77S5myaWnZzIZ5LwG+3e2/Nb9af/l6oFXpTWnvmk1IzhR+SPIQM0iOXl/bpPfNbiRHsLkkHUpylPtnJD9kA+mrXYrG9ruIeC4zbRUwK1NvfKYuEfGkio85/kxE3N6w7PEkZw7NRiKcADzdULY3SSM2lF9nYnxC0lPA0sz02iOdt2uYb226TutzbhisStcAxwHrgFURsS4zbRzw30Bej5ymz5aXtC/JWAefAD5IcpT8Ntp7U7tobOsapgUbL9EoUzYSzzV8ri1nqMvBD5Kc2ZSVtx3rGj7nrXsSyRmH9Tk3DFalpyLijibTbgLeCdzd0GBkPcvgM4HZwMqI+LdagaRdWo60fGzDuYPkx3Uf4C7Y8JjkvYA7M/XytnGkbgb2zCk7kOQyV7vtRXI5yfqcbz5br/gqsA3wPUmzJL1E0l9KOqs2OAzJ0ehekvaQtG06at0fgCmSjkzneS/JYCUjsaOkVza8ti0Y25Ai4gngHOCzae+iPYFvkHwHs2cRy4C/kDQlXXcrLgf2byg7HThM0mlpT6uXS/pgtvfUSCgZa3wKSe8k63NuGKwnRDJk5GzgeeAnwG9JfpCfSV+QHOXeRtLTZjXJ+MM/Ikni+iLJtfGDgI+NMIwPkhxRZ19HFIytiA8BvwAuBa5M411M/X2AjwFTSc4iVo9wO2ouAP6PpJfXCiIZvOmvSIaAvJmkR9IBJNvWinnATyPi7haXYz3AA/WYdYmkzYC7gfkR8fkOreMzwOSI+NtOLD9dx2bAH4F5EXFdp9Zj1fEZg1lFJO0t6V2S/kzS3iT5AVsD3+vgaj8F/EkdfFYSSVLe6W4URg+fMZhVJG0Mzgb2IOlldAtJzsKSrgZm1sANg5mZ1fGlJDMzq+OGwczM6rhhMDOzOm4YzMysjhsGMzOr8/8B0ksJLM4NI0AAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Let's start looking at some data - plotting petal length vs. petal width\n", "plt.scatter(iris.data[:, pl_ind], iris.data[:, pw_ind])\n", "\n", "# Add title and labels\n", "plt.title('Iris Data: Petal Length vs. Width', fontsize=16, fontweight='bold')\n", "plt.xlabel('Petal Length (cm)', fontsize=14);\n", "plt.ylabel('Petal Width (cm)', fontsize=14);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just from plotting the data, we can see that there seems to be some kind of structure in the data. \n", "\n", "In this case, we do know that there are different species in our dataset, which will be useful information for comparing to our clustering analysis. \n", "\n", "Note that we are not going to use these labels in the clustering analysis itself. Clustering, as we will apply it here, is an unsupervised method, meaning we are not going to use any labels to try and learn the structure of the data. \n", "\n", "To see the structure that is present in the data, let's replot the data, color coding by species." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plot the data color coded by species\n", "for ind in range(n_labels):\n", " plt.scatter(iris.data[:, pl_ind][iris.target==ind],\n", " iris.data[:, pw_ind][iris.target==ind],\n", " label=iris.target_names[ind])\n", "\n", "# Add title, labels and legend\n", "plt.title('Iris Data: Petal Length vs. Width', fontsize=16, fontweight='bold')\n", "plt.xlabel('Petal Length (cm)', fontsize=14);\n", "plt.ylabel('Petal Width (cm)', fontsize=14);\n", "plt.legend(scatterpoints=1, loc='upper left');\n", "\n", "# Note that the data, colored by label, can also be plotted like this:\n", "# plt.scatter(iris.data[:, pl_ind], iris.data[:, pw_ind], c=iris.target)\n", "# It is, however, more difficult to add labelled legend when plotted this way" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this data, we know we have 3 different 'groups' of data, which are the different species. \n", "\n", "As we can see in the plots above, these different species seem to be fairly distinct in terms of their feature values.\n", "\n", "The question then is whether we can learn a clustering approach, based on the feature data and without using the labels, that can learn a meaningful grouping of this data. \n", "\n", "If this approach works, then we might be able to try to use it on other data, for which we have feature data, but might not be sure about the groupings present in the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Apply K-Means Clustering\n", "\n", "Clustering is the process of trying to learn groups algorithmically. \n", "\n", "For this example, we are going to use the K-means clustering algorithm. \n", "\n", "K-means attempts to group the data into `k` clusters, and does so by labeling each data point to be in the cluster with the nearest mean. To learn the center means, after a random initialization, an iterative procedure assigns each point to a cluster, then updates the cluster centers, and repeats, until a final solution is reached. \n", "\n", "
\n", "K-means is a clustering algorithm that attempts to learn k clusters by grouping datapoints to the nearest cluster center mean. \n", "
\n", "\n", "
\n", "For more information on K-means, see the article on \n", "wikipedia\n", "or on the sklearn\n", "user guide. \n", "
" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Pull out the data of interest - Petal Length & Petal Width\n", "d1 = np.array(iris.data[:, pl_ind])\n", "d2 = np.array(iris.data[:, pw_ind])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Whitening Data\n", "\n", "In this example, we are using two features (or two dimensions) of data. \n", "\n", "One thing to keep in mind for clustering analyses, is that if different dimensions use different units (or have very different variances), then these differences can greatly impact the clustering. \n", "\n", "This is because K-means is isotropic, which means that it treats different in each direction as equally important. Because of this, if the units or variance of different features are very different, this is equivalent to weighting certain features / dimensions as more or less important.\n", "\n", "To correct for this it is common, and sometimes necessary to 'whiten' the data. 'Whitening' data means normalizing each dimension by it's respective standard deviation. By transforming the data to be on the same scale (have the same variance), we can ensure that the clustering algorithm treats each dimension with the same importance. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Check out the whiten function\n", "whiten?" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Whiten Data\n", "d1w = whiten(d1)\n", "d2w = whiten(d2)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Combine data into an array to use with sklearn\n", "data = np.vstack([d1w, d2w]).T" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Initialize K-means object, and set it to fit 3 clusters\n", "km = KMeans(n_clusters=3, random_state=13)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,\n", " n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',\n", " random_state=13, tol=0.0001, verbose=0)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Fit the data with K-means\n", "km.fit(data)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Let's check out the clusters that KMeans found\n", "plt.scatter(d1, d2, c=km.labels_);\n", "\n", "# Add title, labels and legend\n", "plt.title('Iris Data: PL vs. PW Clustered', fontsize=16, fontweight='bold')\n", "plt.xlabel('Petal Length (cm)', fontsize=14);\n", "plt.ylabel('Petal Width (cm)', fontsize=14);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the plot above, each data point is labeled with it's cluster assignment, that we learned from the data. \n", "\n", "In this case, since we do already know the species label of the data, we can see that it seems like this clustering analysis is doing pretty well! There are some discrepancies, but overall a K-means clustering approach is able to reconstruct a grouping of the datapoints, using only information from a couple of the features. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other Clustering Approaches\n", "\n", "Clustering is a general task, and there are many different algorithms that can be used to attempt to solve it. \n", "\n", "For example, below are printed some of the different clustering algorithms and approaches that are available in sklearn." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Clustering approaches in sklearn:\n", " AffinityPropagation\n", " AgglomerativeClustering\n", " Birch\n", " DBSCAN\n", " FeatureAgglomeration\n", " KMeans\n", " MeanShift\n", " MiniBatchKMeans\n", " OPTICS\n", " SpectralBiclustering\n", " SpectralClustering\n", " SpectralCoclustering\n" ] } ], "source": [ "print('Clustering approaches in sklearn:')\n", "for name in dir(cluster):\n", " if name[0].isupper():\n", " print(' ', name)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }