Report: Bringing Hadoop to the mainframe

Our library of 1700 research reports is available only to our subscribers. We occasionally release ones for our larger audience to benefit from. This is one such report. If you would like access to our entire library, please subscribe here. Subscribers will have access to our 2017 editorial calendar, archived reports and video coverage from our 2016 and 2017 events.
Hadoop-elephant_rgb
Bringing Hadoop to the mainframe by Paul Miller:
According to market leader IBM, there is still plenty of work for mainframe computers to do. Indeed, the company frequently cites figures indicating that 60 percent or more of global enterprise transactions are currently undertaken on mainframes built by IBM and remaining competitors such as Bull, Fujitsu, Hitachi, and Unisys. The figures suggest that a wealth of data is stored and processed on these machines, but as businesses around the world increasingly turn to clusters of commodity servers running Hadoop to analyze the bulk of their data, the cost and time typically involved in extracting data from mainframe-based applications becomes a cause for concern.
By finding more-effective ways to bring mainframe-hosted data and Hadoop-powered analysis closer together, the mainframe-using enterprise stands to benefit from both its existing investment in mainframe infrastructure and the speed and cost-effectiveness of modern data analytics, without necessarily resorting to relatively slow and resource-expensive extract transform load (ETL) processes to endlessly move data back and forth between discrete systems.
To read the full report, click here.

Pinterest is experimenting with MemSQL for real-time data analytics

Pinterest shed more light on how the social scrapbook and visual discovery service analyzes data in real time, it said in a blog post on Wednesday, also revealing details about how it’s exploring a combination of MemSQL and Spark Streaming to improve the process.

Currently, Pinterest uses a custom-built log-collecting agent dubbed Singer that the company attaches to all of its application servers. Singer then collects all those application log files and with the help of the real-time messaging framework Apache Kafka it can transfer that data to Storm or Spark and other “custom built log readers” that “process these events in real-time.”

Pinterest also uses its own log-persistence service called Secor to read that log data moving through Kafka and then write it to Amazon S3, after which Pinterest’s “self-serve big data platform loads the data from S3 into many different Hadoop clusters for batch processing,” the blog post stated.

Although this current system seems to be working decently for Pinterest, the company is also exploring how it can use MemSQL to help when people need to query the data in real time. So far, the Pinterest team has developed a prototype of a real-time data pipeline that uses Spark Streaming to pass data into MemSQL.

Here’s what this prototype looks like:

Pinterest real-time analytics

Pinterest real-time analytics

In this prototype, Pinterest can use Spark Streaming to pass the data related to each pin (along with geolocation information and what type of category does the pin belong to) to MemSQL, in which the data is then available to be queried.

For analysts that understand SQL, the prototype could be useful as a way to analyze data in real time using a mainstream language.

Hitachi Data Systems to buy Pentaho for $500-$600M

Storage vendor Hitachi Data Systems is set to buy analytics company Pentaho at a price rumored to be between $500 million and $600 million (closer to $500 million, from what I’ve heard). It’s an interesting deal because of its size and because Hitachi wants to move beyond enterprise storage and into analytics for the internet of things. Pentaho sells business intelligence software, and can transform even big data from stream-processing engines for real-time analysis. According to a Hitachi press release, “The result will be unique, comprehensive solutions to address specific challenges through a shared analytics platform.”

Uber’s first test of crisis surge cap went unnoticed in October

All eyes are on New York, where along with a massive incoming storm, Uber is rolling out its emergency surge pricing cap. On Monday, there was a flurry of coverage by media outlets from Bloomberg to Time, with some saying this marks, “a chance for Uber Technologies Inc. to show it has learned from past mistakes.”

But this isn’t the first time Uber has capped surge pricing during a state of emergency — it’s the second.

According to a source familiar with the testing, Uber used its new surge price capping system in October during Hurricane Ana in Hawaii, which appears to have gone unreported by media. The company didn’t make a fuss of the development, choosing to introduce the system without scrutiny. Although Hawaii declared a state of emergency during that time, Hurricane Ana didn’t cause much damage.

Here’s how Uber calculates surge pricing in states of emergency: It chooses the fourth highest surge rate in the 60 days prior and makes that the capped rate for the storm. The top three highest surge rates from the prior two months will be ignored, in hopes of keeping the fare reasonable. It’s not clear why Uber won’t just cap surge at a designated amount, like 2x. The company will donate all of its revenue, which is 20 percent of each ride, to the American Red Cross during this time.

Uber announced the new emergency surge pricing policy in July, in light of tropical storm Arthur, which hit the East Coast. But according to an SF Examiner story, the pricing cap never went into effect because a state of emergency was never declared during the storm. Hawaii’s Hurricane Ana was its first test in October, but the New York blizzard will be its biggest.

The capped fare for New York’s upcoming blizzard Juno comes after the state’s attorney general penned a New York Times op-ed shaming Uber for what he called “price gouging” in the wake of surge pricing during Hurricane Sandy. The blowback for Uber surge pricing during times of crises stretch across the globe, with the recent outcry notably occurring after a hostage situation in Sydney. During instances like these, Uber has initially repeated the company line about how surge pricing gets more drivers on the road during times they might not otherwise drive.

This is true, but it doesn’t subvert the ethical quandary of leaving those who can’t afford the surge pricing in a potentially dangerous situation. The reoccurring outcry appears to have prompted Uber to have a change of heart.

Hortonworks has big plans to make Storm work for the enterprise

Hortonworks is working to integrate the Storm stream-processing engine with its Hadoop distro, and hopes to have it ready for enterprise apps within a year’s time. It’s the latest non-batch functionality for Hadoop thanks to YARN, which lets Hadoop run all sorts of processing frameworks.