17/07/2014

By Karen Padmore, Operations Director at High Performance Computing (HPC) Wales

‘Big Data’ is a term popularly used to describe the exponential growth and availability of data in recent years, and can range in size across sectors from a few dozen terabytes to multiple petabytes (thousands of terabytes).

Datasets can fall into two categories, commonly referred to as structured data - data that’s in a format that is simple for analysis, i.e. a spreadsheet, application or database — and unstructured data, where it is in a raw format like an article, Google searches or infographics.

To provide a picture of how much data is being generated, it was estimated by the International Data Corporation (IDC) that 2.7 billion terabytes of data were created worldwide in 2012, a figure that is growing rapidly, with significant growth year on year in the generation of unstructured data. To put that into perspective, here is an overview of one minute of online activity that can be analysed for business purposes:

• Two million Google searches
• 685,000 Facebook updates
• 200 million sent emails
• 48 hours worth of video content uploaded to YouTube

Companies such as Amazon and Tesco have been harnessing Big Data for some time now. These firms gather tons of data on customers, not only what they’ve purchased but also what websites they visit, where they live, when they’ve contacted customer service, and if they interact with their brand on social media. To maximise profits, companies are using Big Data analytics to identify the right products for the right customers and marketing these through the right channels.

Amazon long ago mastered the recommendation of books, toys, and kitchen utensils that their customers might be interested in. Other companies have followed suit, recommending music on Spotify, films and TV programmes on Netflix, and Pins on Pinterest to mention a few.

The recent study by the International Data Corporation (IDC) has forecast that the Big Data technology and services market will grow at a 27% compound annual growth rate (CAGR) to $32.4 billion through 2017 - or about six times the growth rate of the overall information and communication technology (ICT) market.

To many small to medium sized enterprises (SMEs) however, the potential of analysing this vast amount of data - such as customer purchase records, online content and even social media - may seem out of reach. So how can they harness the potential of Big Data to avoid falling behind their competitors?

To analyse large data-sets, growing numbers of SMEs are turning to the power of supercomputing, to provide answers quickly and effectively. Applications that can be run on supercomputers range from predictive analytics, social media analytics and text analytics to disease detection, prevention and treatment; financial modeling and smart energy metering.

Also known as high performance computers, supercomputers can be thought of as large collections of individual computers connected together, working in parallel on a single problem. These supercomputers are capable of performing complex and high-volume calculations and simulations at top speeds.

SMEs can connect to advanced supercomputing technology remotely using a simplified web browser interface, without having to purchase and house the technology. This considerably increases a business’ output, while also saving on the cost of investing in new hardware and software. It also helps avoid the energy costs associated with keeping machines running data-intensive analysis for days, or even weeks. Thanks to the power of supercomputing, users can process vast amounts of data with lightning efficiency, considerably reducing the time-taken to generate meaningful results.

HPC Wales’ supercomputing network can run 320 trillion operations per second. To give you an idea of just how fast this is, imagine this; all seven billion people in the world have a calculator and are asked to perform one calculation per second twenty-four hours a day, nonstop. It would take the world’s population approximately thirteen and a half hours to do what our supercomputing network can do in just one second. If you think of a regular computer as travelling as fast as a snail, a supercomputer is a jet aircraft.

A firm already using supercomputing to power its Big Data analysis is Butterfly Projects, which has previously provided Big Data and predictive analytics services to the likes of Lloyds Banking Group and Zurich Insurance. Before accessing supercomputing technology, the company was restricted to processing up to 120 million records and its statistical model building machines would have to run for at least 12 hours overnight to complete this processing.

Working with Big Data on a supercomputer now means that it can process terabyte scale data and gain a significant time saving against what could be achieved in-house. The firm no longer need limit itself by size of data and can increase the complexity and volume of its workload, competing for larger and more competitive contracts.

There is an endless supply of data and a whole world of processing power at the fingertips of businesses of all sizes - and firms in any sector can realise the potential of Big Data for super-powering growth.