By Vaughn Stewart, Chief Evangelist at Pure Storage
It’s been described as Digital Exhaust – the small pieces of transactional information left behind by people, applications and interactions. George Dyson says it began when the cost of keeping information became less than the cost of throwing it away. Either way, the business case for Big Data is strong. By analysing vast tracts of information, it is possible to see correlations, patterns and – potentially – make predictions.
The problem is, we aren’t there yet. People misunderstand what is needed to take a technological innovation – Big Data – and turn it into a business tool - something an organisation can use to further its cause. Big Data can be used to spot correlations – but, as Tim Harford, the FT correspondent who coined the phrase Digital Exhaust said, it doesn’t provide the theory to test against the pattern, and it can be easy to mistake correlation for causation. It would be easy to discard Big Data as hype, but, done right, it creates actual, tangible business benefits.
One of the most significant benefits is rapid business intelligence. As organisations gather more granular data on what they do, the potential to gain understanding and plan accordingly becomes a more profitable undertaking. Retailers we have helped with our All Flash Arrays have had this problem repeatedly: they know that, inside the sales and distribution data for each day’s trading lies valuable insight. But because it takes a prohibitively long time to process batches, they can’t get that insight in enough time to act on it. This is a standard problem across all industries; the scale of the data they handle is increasing rapidly, possibly far faster than their IT budgets can cope with. On top of increased scale is significant complexity. The variety of datapoints collected by store card programmes, for example, offer the chance to understand shopper habits and overall buying patterns – but only if there is enough data processing power to sift insight from the data.
The problem is that, while the software and compute elements of a company’s data warehouse may well be up to the task, the last bit of the data centre that still physically moves – the hard disks – simply can’t keep up without huge injections of cash. It shouldn’t be like this, but, thanks to the dominance of a few players in the storage industry, that’s how it’s been for years.
One of the biggest problems the storage industry faces is that established vendors lock customers into three or four year upgrade and maintenance cycles that are only good for those vendors’ bank balances. They’re certainly no good for the customer.
The storage component of Big Data simply isn’t working as hard as the rest of the components – unless serious amounts of cash are spent on it. It shouldn’t be this way and, thanks to the disruption that a combination of good software and Flash storage provides, it needn’t be.
Companies like the retailers I described above are used to running batches, used to querying large chunks of data. The next logical step, as the data they can query grows to enormous proportions, is to look at a big data deployment. Understanding that it needn’t involve prohibitively expensive, slow disk arrays that suck the budget out of such a deployment is a vital first step to working out the business benefits.