back to article Big data lakes? Too many ponds, that’s the problem

In recent chats I had with end users and vendors I found a common pattern that made me think about Big Data analytics and how data is collected, organized and analysed in many organisations. This is also, I think, an explanation for the slow growth of some Big Data companies and slower than expected ROI in some Big Data …

Silver badge

Enrico, very informative approach when tackling Big Data. With the explosion of big data, companies are faced with data challenges in three different areas. First, you know the type of results you want from your data but it’s computationally difficult to obtain. Second, you know the questions to ask but struggle with the answers and need to do data mining to help find those answers. And third is in the area of data exploration where you need to reveal the unknowns and look through the data for patterns and hidden relationships. The open source HPCC Systems big data processing platform can help companies with these challenges by deriving insights from massive data sets quickly and simply. Designed by data scientists, it is a complete integrated solution from data ingestion and data processing to data delivery. Their built-in Machine Learning Library and Matrix processing algorithms can assist with business intelligence and predictive analytics. More at


Not as easy as it looks

Big data as a start is actually a mirage and that is the really reason why the companies that bet in this horse are having issues.

Most of the companies refered with closet solutions they don't have big data requirements, implementing very big solutions for several customers, the majority of them doesn't even gets DBs, or whatever may be bigger than 1TB and that isn't big data, just name them, Oracle, SAP, etc, etc.

Big data systems and concept actually where thought for a really small set of enteties that actually need that level of efficient, lets say google, big scientific projects from pharmaceitical companies, to CERN, ESA, NASA, etc.

The data that is really growing a lot in all companies are other type of sta, like email systems, share drives for PDFs, words, images.... For these purposes Big data concept is useless.

Additionally one think that doesn't make it so easy is often enough legal requirements, that for example multinational companies that are usually the ones with more data in the enterprise world, they have all type of issues, like finance documents to be stored for 10 years, in some countries travel expenses and other sensitive data from employees for example can't leave their country... this will mean that you can't actually centralized things in one system alone often, most likely several smaller implementations, hopefully all standardized. Finally but not least... containers, yes very good for apps and a bunch of stuff, but lets face it, finance, HR, CRM and generally ERPs will always be dominance of the big sharks and containers very often aren't a solution.

In resume, there aren't magical receipts, learn your environment 1st, check legal requirements and actually how much data you have to consider and few TBs these days are big data, and be ready to take commitments, there aren't perfect solutions, always something will be compromised, you just need to decide what is really important you and your company.

@sdunga Re: Not as easy as it looks

Big data is extremely easy if you know what you are doing.

Even if your pointy haired business sponsors do not.

The trouble starts when you don't hire the really good experts who actually know what they are talking about. (Most don't work for the vendors but with the vendors.)

The hardest part about it isn't the technology, but the paradigm shift and what it means for the business.

There's more to it, but to really "get it" you need to be a RAD developer and can deal with environments where change is the only constant.

The more you learn about big data, the more you realize what you don't know and that there are more ways to take advantage of it.

The key is getting the right people and the right amount of alcohol and food in a room where the walls are all white boards. ;-)

Posted anon because I know more than I can legally talk about.


Re: @sdunga Not as easy as it looks

You just pointed some of the reasons isn't as simple as it seems I pointed:

1 - Most people (business deciders) don't know what they are doing or asking, big data appears as a trendy thing and all people want it, regardless if they need it or not.

2 - Getting the right guys for the job... that is the biggest of the problems. Worldwide there is a huge deficit of experienced engineers/analysts across areas... there are too many position for what the market was ready and companies endup to close the positions with anyone from the street because if they don't they loose the headcount for the project (welcome to enterprise world). Try to hire 10 good people in the area that mark all the point you wish and let me know how long you took to hire them.

3 - Even if you are ok on point 1 and 2, you often have to run parallel platforms, migrate things, fine tune etc... That takes time and resources, both human and material. Resources mean money flying out of the bank account and depending how complex the enviroment, until you manage to get everything done, quite easily is a 1 to 2 years project, best of options, most businesses rotate their management every year and that is highly disruptive for long term projects and often generates frictions in the business.

And I could point another number of reason, key thing from my comment is simple, big data is awesome, the same a ferrari or a bugati veron is, but do I need it?

Maybe a VW Golf does the job and I will spend way less money.

Business is all about budget efficiency, like it or not, so if you can't deliver results before 12 months.... most likely you are screwed. Remember that you leave in a world that finds normal to fire the founder of the company just because he doesn't care about immediate financial results and IT history is full of examples.

Bring the alcohol and the white boards and good luck :)


@Pomeroy, you varnish sniffing programming type...

You sir would do well in Big Data.

The simple truth is that the path to any solution or how to get the most out of big data is not always the most direct or linear path.

For those of us with ADD or ADHD, beside copious amounts of caffeine , aderall or (mondafil and dompamine ) You need to have a certain flexible way at looking at the problems.

While you said something in jest... there is some truth to the mindset. ;-)


