Tuesday, May 8, 2012

What is Big Data?

Big Data is a wonderful term, but what does it mean? Many IT professionals will say that they have been providing systems which capture, analyse and present data -- often in very "Big" quantities -- for years. What's new? I think the most useful definition of a Big Data system is a very pragmatic one: you need a Big Data system to capture, analyse and present information when traditional Relational Database centered systems cannot meet your Users' requirements. The reason Big Data is such a current topic right now is because so many businesses are running up against these limits and being forced to look beyond them. Why might this be? Here are the three key drivers, in my opinion. We will look at each in much more detail over the course of this blog:
  1. First and foremost is time. Time comes in two flavours: Latency -- how long it takes to get your data -- and what I'll call "Freshness" -- how up-to-date it is when you get it. If you cannot meet your Users' time expectations with traditional technologies, then you are into the realm of Big Data. These expectations are becoming more demanding, and data volume, which  always costs time, is growing by orders of magnitude. We will talk a lot more about time in this blog.
  2. Second is flexibility. The need to be flexible and adaptable can overload a traditional approach to data systems in three ways: Variety of Sources, Variety of Structures and Future Uncertainty. A common requirement for Big Data systems is that they collect their data from an enormous variety of sources. This data comes organised in a similarly large variety of structures or "schemas". While it is possible to come up with ingenious ways to get data from many sources into relational databases, these databases do not take kindly to data which does not conform to a predefined structure, or schema. Unfortunately, Big Data systems are often required to accept data coming in a huge variety of structures, and often have to deal with new, previously unanticipated, data structures and analysis requirements as time goes on.
  3. Third is complexity. What we are talking about here is the ability to model complex real-world behaviour and perform complex analyses, often in real time. Not something talked about a lot in the world of Big Data at the moment, but I believe will become more and more important as the applications of these systems become more sophisticated, and again, not something relational database technology, with its focus on selection of lists and relatively simple aggregations -- average, max, count, etc -- is ideally suited for.
So in summary, Big Data systems are by definition different in kind from 'traditional', relational database centric systems, and are driven to be so by User requirements which beyond certain limits of volume, structure and sophistication, are not deliverable by this 'traditional' technology. So if you are an IT professional, and your User or Customer is asking for something not achievable with your existing tool-set, it is not because they are wrong, it's because they are asking for Big Data!

No comments: