Of the people I speak to, wherever they are in their data journey, whether that is head down into a major program with Hadoop technologies or just dipping their toe into the ‘big data’ waters (pun intended), the majority of time the impression I get is that people think big data only means Hadoop.
I hold my hands up that I’ve probably been guilty of contributing to that message. However, in my opinion, we shouldn’t be thinking like that. Big data (and I mean the volume) is just another data challenge, but then so is fast data, poor quality data, exploitation of data and so on....and they are just the ones we could consider more directly linked to data...it ignores the challenges we have about process, organization, delivery, governance etc.
When I think about ‘big data’, I think of it more in those terms...of the challenges or perhaps opportunities that we have to find solutions for. Of course not every organization will be facing the same challenges, so our solutions and thinking obviously need to reflect that. Sometimes I think that gets lost in conversation!
We often talk about the democratisation of data, making it available to all, and of course we see examples of that today both in business and at home. Wearables like Fitbit, smart meters, the plethora of apps, connected cars (albeit not my car!) are all making this more common place. Big data solutions, in the general sense, are playing a part in supporting those things, but that doesn’t just mean Hadoop; it means Hadoop plus whatever else is needed to deliver the data to the right place at the right time...in-memory, a data warehouse, appliances, data marts, discovery tools and so on.
At Capgemini, our Business Data Lake (BDL) solution has been crafted to recognize the fact that in a big data world, we need to be agile, not rigid in providing solutions to data problems. Yes it needs to cope with big/volume data challenges, but it also has to cope with fast data, poor quality data and all of the ‘traditional’ data demands we’ve been dealing with for years....and that’s why for me, someone who has been doing BI for 20 years, the BDL just works.
So no, big data isn’t just about Hadoop...it’s that plus so much more.