The avalanche of business data that is created each day surpasses anything we have dealt with in the past. In fact, most organizations do not have the tools to deal with these data lakes, thereby allowing significant competitive advantages to slide off the board.
A data lake can be defined as an environment where a data warehouse resides within Hadoop. The idea is to bring greater efficiency to managing unstructured information. The trade-off is that those using the data lake approach are putting all of their eggs in one basket, which brings a number of potential risks and increased administration requirements. As a result, data lake adoption could emerge as one of the most critical decisions Hadoop users make in 2015.
Reliance on a single platform may well be counterproductive, as there are other issues to consider that will make IT think twice before adopting a data lake environment.
Despite the replication capabilities of the Hadoop Distributed File system (HDFS), real enterprise data protection features are not yet fully available in off-the-shelf Hadoop distributions. Enterprise IT folks also need snapshots, backup, and disaster recovery capabilities.
For more information, read the entire article, Why ‘Data Lakes’ May Create Drowning Risks, by Lionel Gibbons, VP Marketing, Bright Computing here.