Unsupervised learning refers to the process in which a computer finds nonintuitive patterns in unlabeled data. It’s different from supervised learning because the datasets are not labeled and the computer is not given a specific question to answer.
There are many different types of unsupervised learning including K-means clustering, hierarchical clustering, anomaly detection, and principal component analysis to name a few. The most commonly discussed uses are clustering and anomaly detection.
Clustering is used to find natural groups, or clusters, within a dataset. These clusters can be analyzed to group like customers together (e.g, customer segmentation), identify products that are purchased at the same time (e.g., peanut butter and jelly), or better understand the attributes of successful executives (e.g., technical skills, personality profile, education).
In our dogs and cats example, assume you input pictures of dogs and cats but don’t label them. Using clustering, the computer will look for common traits (body types, floppy ears, whiskers, etc.) and group the photos. However, while you may expect the computer to group the photos by dogs vs. cats, it could group them by fur color, coat length, or size. The benefit of clustering is that the computer will find nonintuitive ways of looking at data which enable the discovery of new data trends (e.g., there are twice as many long-coated animals as short-coated) which allow for new marketing opportunities (e.g., dry pet shampoo and brush marketing increases).
In anomaly detection, however, the computer looks for rare differences rather than commonalities. For example, if we used anomaly detection on our dog and cat photos, the computer might flag the photo of a Sphynx cat because it is hairless or an albino dog due to its lack of color.
Here are some other applications of anomaly detection.
Banks analyze all sorts of transactions: deposits, withdrawals, loan repayments, etc. Unsupervised learning can group these data points and flag outlier transactions (e.g., transactions that don’t align with the majority of data points) that may indicate fraud.
Companies use anomaly detection to identify and understand actions competitors may take in the marketplace. For example, a retailer may expect to take three share points in every new market they open a store during the first month of operations; however, they may notice certain new stores are underperforming and don’t know why. Anomaly detection can be used to identify likely competitive activity which is preventing share growth. Specifically, the anomaly of common products not being found in their shoppers’ baskets (e.g., bread, milk, eggs, chicken breast) which may indicate covert competitor incentives that are successfully impacting the retailer’s shopper frequency and average order size.
Computers use unsupervised learning to perform all sorts of image recognition tasks including facial recognition to open your mobile phone and healthcare imaging where identifying cell-structure anomalies can assist in cancer diagnosis and treatment.