Report: Extending Hadoop Towards the Data Lake

Our library of 1700 research reports is available only to our subscribers. We occasionally release ones for our larger audience to benefit from. This is one such report. If you would like access to our entire library, please subscribe here. Subscribers will have access to our 2017 editorial calendar, archived reports and video coverage from our 2016 and 2017 events.
hadoop-logo
Extending Hadoop Towards the Data Lake by Paul Miller:
The data lake has increasingly become an aspect of Hadoop’s appeal. Referred to in some contexts as an “enterprise data hub,” it now garners interest not only from Hadoop’s existing adopters but also from a far broader set of potential beneficiaries. It is the vision of a single, comprehensive pool of data, managed by Hadoop and accessed as required by diverse applications such as Spark, Storm, and Hive, that offers opportunities to reduce duplication of data, increase efficiency, and create an environment in which data from very different sources can meaningfully be analyzed together.
Fully embracing the opportunity promised by a comprehensive data lake requires a shift in attitude and careful integration with the existing systems and workflows that Hadoop often augments rather than replaces. Existing enterprise concerns about governance and security will certainly not disappear, so suitable workflows must be developed to safeguard data while making it available for newly feasible forms of analysis.
Early adopters in a range of industries are already finding ways to exploit the potential of their data lakes, operationalizing internal analytic processes and integrating rich real-time analyses with more established batch processing tasks. They are integrating Hadoop into existing organizational workflows and addressing challenges around the completeness, cleanliness, validity, and protection of their data.
In this report, we explore a number of the key issues frequently identified as significant in these successful implementations of a data lake.
To read the full report, click here.

Basho, creator of NoSQL Riak database, raises $25M

Basho, the company behind the Riak key-value database and Riak CS cloud-storage system, has raised a $25 million series G round of venture capital led by Georgetown Partners. The company has now raised nearly $60 million in a combination of equity and debt financing since it was founded in 2008.

Basho is among a handful of companies, including MongoDB, DataStax and Couchbase, that seems to have garnered some real traction in the NoSQL space over the past few years. Riak, its flagship open source, database competes most directly against Cassandra, around which DataStax was built. Basho released its Riak CS storage system in 2012 to help users build distributed object stores a la Amazon Web Services’ S3 or OpenStack Swift.

Although it has raised much less capital than its NoSQL peers (MongoDB, for example, just announced an $80 million round on top of the $150 million in closed in October 2013) and had a major executive shakeup in 2014 — the company replaced both its CEO and CTO — Basho claims it’s doing just fine. In an interview on Monday, new CEO Adam Wray cited an 89 percent annual increase in bookings, tens of millions in annual revenue and accounts at some of the world’s largest companies.

Big data, the internet of things and hybrid cloud computing environments are driving many of Basho’s deployments, he added.

Assuming the market for non-relational databases keeps growing like many expect (“One day, we’ll be a $50 billion market space,” Wray said), there’s no reason it can’t support a handful of successful companies. Riak might never have the the user base of MongoDB or the webscale reputation of Cassandra, but if the company can get its act together operationally and the technology remains solid, there should be plenty of business to go around.

And if a large software vendor starts going shopping for NoSQL software, Basho will likely have a much more-palatable price tag than the other big-name options.

NoSQL company Basho loses CEO and CTO

Basho, a NoSQL startup whose Riak database competes against the likes of Cassandra in scale-out environments, has lost its CEO Greg Collins, CTO Justin Sheehy and Chief Architect Andy Gross. In an interview with the Register, Sheehy said the departures aren’t as bad as they look and that the company is in good hands. Perhaps, although whoever replaces Collins will be the company’s fourth CEO since it was founded in 2007, and neither of the company’s co-founders remain. Basho has raised more than $31 million in venture capital, with its last funding round of $11.1 million coming in July 2012.

Getting beyond the cult of big data

Asking how something is better than Hadoop is not the right question. For strategic thinking around big data companies need to figure out what they want to achieve, not what tool to use.

NoSQL startup Basho raises $11.1M and storms Japan

Basho Technologies, the company behind the Riak NoSQL database and the Riak CS cloud storage platform, has raised $11.1 million and has entered into a partnership with data center provider IDC Frontier to distribute its technology throughout Japan.

Basho arms would-be Amazon killers with AWS-compatible storage

Basho, a startup that’s already jumped into the NoSQL database deep end, just released cloud-based storage services built on that NoSQL foundation. Expect the competition to be fierce: RiakCS joins a huge pool of cloud storage offerings from everyone from Amazon to Zettanet.

How NoSQL database Riak makes Bump work

Database startup Basho on Tuesday released details of how its Riak NoSQL database underpins Bump. Bump is the seventh most-downloaded free iPhone app of all time — with more than 80 million downloads — so it has a lot of data to store and transfer.

The NoSQL tapes and documenting a technical movement

Databases aren’t sexy. Except for possibly a brief moment in 2010 and perhaps a bit of 2011 when every reader of Hacker News was sharing his or her experience and every coder on GitHub wanted to know more. The NoSQL Tapes captures this moment.