Need to wrangle SQL, NoSQL data? Espresso Logic says it can help

Espresso Logic, which offers a backend service to help businesses connect applications with SQL data sources, is adding NoSQL to the mix with new support for MongoDB — as well as support for Salesforce.com and Microsoft Dynamics business applications coming soon.

The company says its service makes it easier to create RESTful APIs that facilitate data flow from repository to applications. REST, short for representational state transfer, has become something of a lingua franca for connecting disparate applications.

Espresso Logic CEO R. Paul Singh (pictured above) said the product and its reliance on reactive programming lets non-programmers accelerate development by connect apps by clicking, dragging and dropping — or perhaps writing a few lines of code.

 

That promised ease of use appealed to Bill Kuklinski, director of systems development for Boston’s Joslin Diabetes Center. His group doesn’t have the IT and programming resources needed to integrate applications by hand. Like many organizations Joslin runs many legacy applications that are treasure troves of data needed by other applications.

And as the need to let patients funnel readings from their glucose monitors into the center’s system means that data has to traverse organizational walls. Navigating an array of in-house healthcare apps and those built more with a consumer in mind, is tricky.

“Everyone has a unique set of problems they want to report on and unique analytics and that requires custom development. If you can’t get the data from those data sources, every vendor’s answer is ‘here’s our API,'” Kuklinski, an Espresso Logic customer, added.

Supporting REST makes life easier because the business doesn’t don’t have to support a zillion different APIs.

The so-called API economy has led to the rise of companies such as Apigee that manage APIs and which just added new analytics services. Espresso Logic competes with backend services platforms as Strongloop, which recently announced a life-cycle management tool for Node.js-centric REST APIs; Kinvey and Dreamfactory.

Gigaom Research analyst Rich Morrow agreed that it’s important to support the right APIs, but there’s more blocking and tackling to be done. “Exposing your datastore to mobile endpoints via an API is really powerful, but looks way more easy than it is — you’ve got to build accessibility, security, access controls, management and extensibility. It makes way more sense for most organizations to buy the capability rather than build it themselves,” he said.

Espresso-Platform-6 (2)

 

 

MongoDB snaps up WiredTiger as new storage engine option

NoSQL fan favorite MongoDB has purchased WiredTiger and its storage engine technology, and as part of the deal snags itself some database stars in Keith Bostic and Dr. Michael Cahill.

Bostic was co-founder of Sleepycat Software and creator and primary developer of Berkeley DB, an open-source embedded database. Oracle bought Sleepycat in 2006. Bostic worked with Cahill to architect Berkeley DB. Terms of this acquisition were not disclosed.

WiredTiger is a storage engine that will be offered as an option in the upcoming MongoDB 2.8 release, expected in January. It will be the first time that MongoDB has offered more than one storage engine option, said Kelly Stirman, director of product marketing for New York–based MongoDB.

Customers with write-intensive applications may want to opt for WiredTiger, while MongoDB’s existing MMAP engine is probably better suited for read applications, he noted.

In the past, some databases offered a range of storage options to fit the job at hand. MySQL, now also part of Oracle, is the best example.

MongoDB will not charge separately for the WiredTiger engine. “It’s now part of our open-source commitment,” Stirman said. “Everyone will have access and there will be an upgrade path without downtime.”

Want to make data scientist money? Learn data science tools

O’Reilly Media released the results of its second-annual data science salary survey on Thursday (available for free download here), and the results were not too surprising. Essentially, it shows that people who work with tools designed for big data, machine learning, statistical computing and cloud computing make more money — often between $20,000 and $30,000 more a year, based on median incomes — than people whose jobs only involve tools such as SQL and Excel.

In that regard, the survey doesn’t really tell us anything new. All the talk over the past couple years about competitive recruitment and high salaries for data scientists was true, and it was true precisely because companies want the people who know how to work with new technologies. They want this decade’s data scientists — people who can build AI systems or pipelines for streaming sensor data — not last decade’s data analysts.

The survey has all sorts of interesting findings about how much people earn based on tools or combinations of tools they use, but two charts probably sum it up the best. The first shows the gap in median incomes between people who use Hadoop and people who do not.

survey2

The second shows how use of bigger, faster and sometimes more advanced tools such as HBase, Storm and Spark increases median salaries even more.

survey1People can dismiss buzzwords like big data and data science all they want, but they’re tied to some very real and very powerful technologies. And apparently a lot more money, too.

It’s a good thing companies are still hiring.

Citus Data open sources tool for scalable, transactional Postgres

Database startup Citus Data has open sourced a tool, called pg_shard, that lets users scale their PostgreSQL deployments across many machines while maintaining performance for operation workloads. As the name suggests, pg_shard is a Postgres extension that evenly distributes, or shards, the database as new machines are added to the cluster.

Earlier this year, Citus developed and open sourced an extension called Cstore that lets users add a columnar data store to their Postgres databases, making them more suitable for interactive analytic queries.

It’s all part of a move to transition Citus Data from being just another analytic database company into a company that’s helping drive advanced uses of Postgres, Co-founder and CEO Umur Cubukcu said. Citus launched in early 2013 promising to let Postgres users use the same SQL to query Hadoop, MongoDB and other NoSQL data stores, but has come to realize that its customers aren’t as excited about those capabilities as they are enamored with Postgres.

[protected-iframe id=”49aa437994cc19939e148f897521bcf2-14960843-6578147″ info=”http://www.indeed.com/trendgraph/jobgraph.png?q=postgresql%2C+mysql&relative=1″ style=”width:540px”]

As Postgres undergoes something of a renaissance among web startups (it’s also the database foundation of PaaS pioneer Heroku and its managed database service), Cubukcu thinks there’s a big opportunity to provide tooling that lets developers take advantage of everything they love about Postgres and not have to worry about whether they’ll outgrow it or bring on another database to handle their analytic workloads.

The NoSQL connectivity is still there, but Cubukcu acknowledges that running analytics on those workloads might be a job best left for the technologies (e.g., Spark) focused on that world of data.

And whether or not pg_shard or Citus Data are the ultimate answer for scale-out Postgres, Cubukcu is definitely onto something when he talks about how the narrative around SQL and scalability has changed over the past few years. His company’s work, along with that of startups such as MemSQL and Tokutek, and open-source projects such as WebScaleSQL and Postgres-XL, have shown that SQL can scale. The tradeoff for developers is no longer relational capabilities for the scale of NoSQL.

Rather, Cubukcu thinks the new tradeoff is between open-source ecosystems and proprietary software as companies try to scale out their relational databases. At least when it comes to Postgres, he said, “Our take is, ‘You don’t have to do this.'”

Microsoft makes its cloud move

It’s taken Microsoft quite a while to get traction in the cloud, and even longer for it to get its cloud data story right. For the longest time, things weren’t looking good. I say that as someone who has worked with – and at various times championed – Microsoft technology for most of my career. As much as I’ve wanted Microsoft to do well in the cloud data arena, I thought it was doomed to an eternity of near misses.

Fast forward

But things have been steadily improving since the summer, especially in the last few weeks. The glass that was half empty in the spring is now nearly full, with a complete HDInsight Big Data service based on Hadoop 2.0; an able machine learning service called Azure Machine Learning; a document store NoSQL database called DocumentDB; a publish-subscribe service for capturing streaming data called Event Hubs; a service for processing and analyzing that data called Azure Stream Analytics; a data transformation workflow service called Data Factory; and an eponymous Search service based on ElasticSearch at its core.

Beyond all of these “house brand” products, partnerships announced in the past two weeks mean that customers can or will soon be able to spin up Hadoop clusters based on Cloudera’s Distribution of Hadoop (CDH) and Hortonworks Data Platform (HDP), running on either Linux or Windows; IBM’s Cloudant NoSQL database, based on BigCouch and Apache CouchDB, is also available; and so is IBM’s relational database standby, DB2. Oracle and DataStax provide access to Oracle 12c and Cassandra on Azure, and other partners allow customers to run MySQL, PostgreSQL and MongoDB.

Competitive landscape

Azure competes well with Google Compute Engine’s Cloud DataFlow service and parts of its BigQuery service. In numerous other areas, Mountain View has some work to do to catch up with Redmond. But what about the cloud juggernaut, Amazon Web Services (AWS)?

Amazon’s Relational Database Service (RDS), Elastic MapReduce (EMR), DynamoDB, Kinesis, Data Pipeline and CloudSearch each now have opposite numbers in the Azure camp. Amazon does not yet have a service to compete with Azure Machine Learning. On the other hand, its gangbuster-growth service Redshift has no answer from Redmond (although a competitive offering from BitYota is now available on the Azure platform as well as on AWS).

Some of the pieces that have filled out the Azure data story are in preview while others are in general availability. Many of them appeared in just the last 4 months. The drinking water on the east side of Seattle is normally very good, but it seems like something extra got in the reservoir this summer.

What went right?

How has such a seemingly rapid improvement taken place? To begin with, Corporate Vice President Scott Guthrie has been laying the groundwork for this for years, pushing to make Azure more innovative and easier to use, and putting it on a path of accelerated iterative improvement. As Guthrie now runs Microsoft’s entire Enterprise and Cloud (E & C) division, and succeeded his boss, CEO Satya Nadella, in that role, support for this innovation and continuous improvement is coming all the way from the top.

Opening up Azure to offer an infrastructure as a service tier, and that tier’s accommodation of Linux, has paved the way for numerous partnerships that would not have been possible otherwise. Oracle, Cloudera, and IBM are but three examples. And the partners aren’t just coming because of operating system compatibility; they’re coming because of an open, apolitical attitude and business spirit that the old guard at Microsoft just couldn’t muster.

Will continuous improvement continue?

As good as all this is, Microsoft still has some loose ends to tie down. A data story isn’t complete without data discovery, modeling, and visualization, and right now that’s all tied up in the Power BI offering on Microsoft’s other cloud, Office 365. The Power BI subscription is available as a standalone subscription starting at $480/user/year or an add-on to the Office 365 E3/E4 subscriptions for an additional $240/user/year, but the latter is a promotional price. Worse yet, the basic reporting story which used to be available in the Azure SQL Reporting service hasn’t been added at all yet to Power BI. Azure SQL Reporting was fully shut down on Friday — for some customers, that was more trick than treat.

Personnel at Microsoft are fond of describing full coverage of something as “all up.” If Microsoft wants a good all-up cloud data story, then Power BI should either move to Azure or be a much more welcome guest there. Customers who are already using and paying for big pieces of the Azure data stack should be welcome guests too, on the Power BI side.

Those customers shouldn’t be forced to pay up to $624/year or have a subscription to services like Exchange, SharePoint and/or Lync that they don’t necessarily need. Microsoft should also have more of its data services available on-premises, to facilitate hybrid scenarios, and generally give enterprise customers’ workloads an on-ramp to the cloud.

Talk back

But Microsoft is doing a lot of things right. And for customers and vendors we’re talking to, Azure is garnering attention as the most enterprise-friendly of all clouds. Put this all together, and Amazon and Google have some explaining to do.

With Google’s Cloud Platform Live event in San Francisco happening on November 4th and Amazon’s re:Invent event a week later in Las Vegas, we may get those explanations pretty soon. And given that both events are sold out, it seems lots of people will be listening. I have a feeling a few PCs in Redmond will be tuned in to the live streams.

Microsoft makes its cloud data move

Microsoft’s cloud data stack was short and slow-growing. But this summer, something changed, raising its stature considerably.

IBM builds up its cloud with Netezza as a service and NoSQL as software

IBM announced a new, promising collection of cloud data services on Monday, adding to an already-impressive collections services on its Bluemix platform. At this point, though, IBM’s biggest challenge isn’t selling enterprise users on the cloud, but convincing them it’s still the best choice.

eBay open sources a big, fast SQL-on-Hadoop database

eBay has open sourced a database technology, called Kylin, that takes advantage of distributed processing and the HBase data store in order to return faster results for SQL queries over Hadoop data.