Data mining is vital to business operations across many industries. Companies use data mining to manage risk, anticipate demands for resources, project customer sales, detect fraud, and increase response rates to their marketing efforts.
According to a MicroStrategy report on the Global State of Enterprise Analytics (PDF 11 MB), 60 percent of respondents used analytics to save money, 57 percent used it to drive strategy and change, and 52 percent sought to improve financial performance.
Perhaps the best known data mining process is called CRISP-DM, or Cross-Industry Standard Process for Data Mining.
This is a six-step procedure for turning data into insight. The model works like this:
This is the starting point. What questions do you have? What do you want to learn from your data? Companies and organizations first must identify their objectives, including what insights they want to extract or problems they want to solve using their collected data. Determining project goals is important for collecting the right data to be analyzed.
Once the objective is defined, it’s time to define the data. Not every data point stored on a server or in the cloud is appropriate for every project. Determining the right data to be sourced saves time and the potential hassle of retracing steps later.
In this phase, data is collected from multiple sources based on the problem being addressed. Is the company looking for historical sales of a certain item? The type of credit card used to make a purchase? Whether items were bought in store or online? Each type of data may be relevant — or not — depending on the project.
This part of the process is important for verifying data quality as well. Missing, errant, or duplicate data can be corrected before moving to the next phase.
Data preparation is considered the most demanding phase of data mining, often consuming at least half of the project’s time and effort. It’s in this step that the most helpful data is selected, cleaned, and sorted to account for errors or coding inconsistencies. Data from multiple sources can be merged, organized, or adjusted in different ways to prepare for the next phase: modeling.
Now the data begins to take shape. Data miners can run a variety of models (ways of organizing data) to generate solutions. For instance, models can seek to detect patterns or anomalies in the data or use the data to predict an outcome. Companies will choose the model based on the type of data they’re analyzing, the project’s specific requirements, and the goals being pursued.
Several modeling techniques can be used on the same set of data to derive different results. Rarely do companies answer their data mining question with just one model.
At this point, data miners assess whether the models have produced a satisfactory answer to the question asked and whether the results contain any unexpected or unique findings.
If the initial question remains unanswered, a new model might be required, or the data might need to be changed. If the results meet their criteria, the project moves to its final phase.
At this point, companies have answered the question they asked. In the flower shop example, perhaps the model suggested an increased order due to past sales and expected customer demand. The florist can deploy that knowledge to ensure they have enough flowers on hand when a major event arrives.
Why Is Data Mining Important for Businesses?
Put simply, data mining improves business; it can save money, drive competitive advantage, improve the customer experience, and identify new customers and revenue streams.
According to the MicroStrategy survey (PDF 11 MB), 63 percent of respondents said analytics had improved their company’s efficiency and productivity, 57 percent said it helps them make faster decisions, and 51 percent cited improved financial performance.
Data mining is about discovery — hence the term and its relation to mining for precious materials. And, in a consumer world overwhelmed by data, companies need efficient ways to sift through that data to find relevant, actionable points. They can customize all the data they generate to learn who’s buying their products, where they’re buying them, and how to sell more.
One of the primary benefits of data mining is speed. Decades ago, large data sets required weeks or months to analyze. Banks and credit card companies had to sift through millions of records to detect fraud or errors. With advances in neural networks, machine learning, and artificial intelligence, those huge data sets can now be analyzed in hours or minutes. More advanced data mining tools and techniques have helped to bring together disparate data into usable groups like never before.
Data can be divided into two main formats: structured and unstructured. Structured data consists of the numbers we recognize in a table or Excel spreadsheet, such as last month’s sales records and this month’s inventory. Unstructured data, meanwhile, exists in different formats, such as text or video. It’s included in emails, social media posts, photos, and even satellite images.
Companies certainly need to evaluate structured data, but mining for insight in unstructured data is a booming enterprise. According to a Forbes survey, more than 95 percent of businesses say they need stronger ways to manage unstructured data.
How Is Data Mining Being Used by Different Industries?
What is data mining used for? And who uses it? In reality, data mining can be applied to every industry that generates data and wants to leverage it. As long as you have access to data and a curiosity to discover meaning or answer questions, data mining can help you find your way.
Here are some examples of how data mining is being used within specific industries.
Data mining has been embedded in healthcare for years. Physicians take advantage of more effective treatment methods based on data mined from clinical trials and patient studies. Hospitals and clinics can improve patient outcomes and safety while cutting costs and lowering response times. Data mining can even match patients with doctors based on reports of successful diagnosis rates.
Banking and Finance
Among the first uses of data mining was the detection of credit card fraud. Financial companies also mine their billions of transactions to measure how customers save and invest money, allowing them to offer new services and constantly test for risk.
Retailers have an enormous amount of customer data (purchase trends, preferences, and spending habits among them) that they attempt to leverage to boost future sales. Retail companies that don’t produce insight from data mining risk falling behind the competition.
Fraud detection is a critical component of the insurance industry, but insurers also use data to manage risk, understand why they’re losing customers, and price their products more effectively. For instance, a car insurance company could study mileage and accident rates for a certain region to determine whether it should raise or lower rates for customers who live there.
Media and Telecommunications
Media and telecommunications companies have loads of data on consumer preferences, including the programming they watch, books they read, and video games they play. With that data, companies can target programming to consumers by taste, region, or other factors. They can even suggest media to consume — an approach companies like Netflix have mastered.
By measuring student achievement data, educators believe they can predict when students might drop out of school before the students even consider it. Further, this data can help educators intervene with at-risk students and potentially keep them in school.
Manufacturers use data to align their production schedules with demand, ensuring that products are on store (or virtual) shelves when they’re needed. This helps maximize production at critical times and predict when assembly lines might need maintenance.
Safety is a primary driver of data mining in the transportation industry. Cities and communities can conduct traffic studies to determine the busiest roads and intersections. And public transportation entities can mine data to understand their busiest zones and travel times.