Did you know that just over the last couple of years humans have accumulated more data than over the entire history of mankind? This incredible amount of data is collected from all kinds of sources – sensors in your cars and mobile phones, website usage statistics, shopping habits, product safety information and even things like historical weather data. All of this data is what has become known as big data.
Why is big data important?
Big data is slowly changing the way companies and other organizations work. From marketing campaigns to product safety and risk analysis – we are beginning to rely increasingly more on information discovered during data analysis.
Over the last few years, large companies have accumulated so much information about their customers, that without proper tools to make some sense out of all that data, they are unable to further optimize their business processes. From things like customer support and user-experience to customer retention and the success of new product launches – most major business decisions are becoming more and more reliant on data analysis and discovery.
Big data differs from regular data mining techniques in several ways:
- Size. Numerous data sets are collected into a single database, making it extremely difficult to store and curate efficiently.
- Variety. Big data includes all kinds of data types – from sensor readings to plain text. All of this data must be stored and organized into a single data set that makes sense.
- Analysis and retrieval. Analyzing extremely large and complex data sets is not a simple task. Fast and efficient analysis, search, sharing and visualization of information is very difficult.
If using big data is such a challenging task, why bother? Why not analyze multiple smaller and less complex data sets separately? The answer is very simple: by combining data sets of different types and sizes, we can find patterns and other information which would otherwise have been impossible to see.
The techniques used to analyze big data sets are very similar to those used in other data mining approaches. The only major difference is the machines performing the analysis: we need extremely fast and scalable systems in order to make big data analysis worthwhile (results should be returned instantly or almost instantly, otherwise, at least in most cases, they are not going to be very useful).
We have all read some article or another telling us how effective data mining is and how it can benefit your company to increase sales, optimize your business and such things. But are there any real life examples to put some weight behind this?
If you take a moment to look around you will find a great deal of examples on the Internet. Last year we implemented data mining at a local customer med24 whois selling a heart defibrillator called Hjertestarter. The cost of producing these Hjertestarter modules, shipping them and promoting them on various medias more or less canceled out the profit of selling them. We started out by installing a datamining software on their server, took all the data they had related to the hjertestarter and put them into an optimized data warehouse. Then spend a few week letting the data mining tools go through all the data and find any patterns in them. After analysing the data we noticed a few things. The first thing that stood out was that 1/4 of the places they promoted their product never bought a single unit. In other words, they were wasting money shipping, stocking and promoting their product in those areas. So we dropped all sales in these places and allowed them to only buy the product online. We took a closer look at the best selling places and noticed that we couldn’t always deliver enough Hjertestarter for the high demands there, so many of the units we before send to the low selling places was now rerouted to the high selling places thereby eleminating the wait time. We could also see what kinds of marketing that was working and which types wasn’t working.
Based on all these “finds” we were able to optimize the whole business with about 19% changing the break even to a nice profit.
The bottom line here would be, get to know your customers, know where to sell your product and how to sell it. Using data mining is one way of getting this information if you don’t always have it.
As most of us already know, handling large amount of data in a datawarehouse, also commonly known as big data, has always been a problem even with the powerful computers we use today. Most data mining software are unable to handle big data real time which makes it unsuitable for software such as business intelligence systems and decision support systems.
But more and more companies does need to process big data real time and how do they solve these problems with the data mining software used today? The most common solution is to just focus on the most relevant data, often with large data amounts you only need a fraction of these data for your decision support system and the rest is something which analysing over several days would be just fine. This requires a pretty advanced datamining solution to know which parts to focus on and which to give a lower priority. One company who have made a special BI solution for thie big data problem is the company Targit, you can read more about their approach to the problem over at their own webpage by clicking on the image at the top of this article, if you require more information that what is shown on their webpage, send them a mail and tell them what you need, they got a great customer support service and usually gets back to you within the same day.
The other solution to the big data problem, is a much more expensive solution. But store the data on several high end servers and setting these up as a parallel cluster, each of them handeling their own seperat part of the datawarehouse and by combining their power be able to process the large data amounts real time. Even that isn’t always enough to display the data fast enough for some companies. For those special cases, fast access to all the data is essential and and even the best SSD (Solid State Drive) isn’t always fast enough. Here you may need a ram drive, which is similar to a sdd disk but normally based on high speed DDR3 memory. Such a solution is often very expensive but will enable you to analyse large amounts of data lightning fast.
But for most companies a normal business intelligence solution is more than enough and it will in 99.9% of the cases be able to pull the data from the data warehouse fast enough to display the various data real time, drill down and compare various datatables in a few clicks.
Data mining involves collecting, processing, storing and analyzing data in order to discover (and extract) new information from it. There are numerous benefits of data mining, but to understand them fully, you have to have some basic knowledge of what data mining actually is.
What is data mining?
Data mining techniques range from extremely complex to basic. Each technique serves a slightly different purpose or goal. In essence, data mining helps organizations analyze incredible amounts of data in order to detect common patterns or learn new things. It would be impossible to process all this data without automation. Here are a few example approaches to data mining:
- Cluster detection is a type of pattern recognition that is used to detect patterns within large data sets. It’s a bit like arranging a large amount of information into categories using patterns which emerge during data analysis (and might not be very obvious).
- Anomaly detection aims to find abnormalities in data. This can be used in many areas, such as detecting anomalies in weather patterns or even forensic computing.
- Regression is a technique that aims to predict future outcomes using large sets of existing variables. This is used to predict future user engagement, customer retention and even property prices.
There are many other approaches to data mining. Ultimately, the technique that you choose will depend on your end goal and there is no single technique that covers every topic out there.
What are the benefits of data mining?
There are many benefits of data mining. For example:
- In finance and banking, data mining is used to create accurate risk models for loans and mortgages. They are also very helpful when detecting fraudulent transactions.
- In marketing, data mining techniques are used to improve conversions, increase customer satisfaction and created targeted advertising campaigns. They can even be utilized when analyzing the needs in the market and coming up with ideas for completely new product lines. This is done by looking at historical sales and customer data and creating powerful prediction models.
- Retail stores use customer shopping habits/details to optimize the layout of their stores in order to improve customer experience and increase profits.
- Tax governing bodies use data mining techniques to detect fraudulent transactions and single out suspicious tax returns or other business documents.
- In manufacturing, data discovery is used to improve product safety, usability and comfort.
In essence, data mining benefits everyone: from individuals to large corporations and governments.