Moving to SaaS: Start with SQL Functionality

This post is sponsored by NuoDB. All thoughts and opinions are my own.
To leverage existing SQL tools and skills when moving to the cloud without significant rework, a solution should support ANSI-standard SQL, not a partial or incompatible variant.
If you’re a software vendor moving to a SaaS business model either by creating new product lines (from scratch or by adding cloud characteristics to existing products) or converting an existing product portfolio, the transition to a SaaS model will impact every aspect of the company right down to the company’s DNA. New software companies typically start with a SaaS model — not on-premises software – so this is more often a common consideration for many legacy software companies today. Customers see the value and software companies see the agility and the valuation result.
Ultimately there are major architectural changes that will be required to succeed. It is a good time to do a reevaluation of all major architectural components of the solution, including the underlying database, along with hosting plans, customer onboarding procedures, billing & pricing, security & regulatory, monitoring and the assorted challenges associated with the move to SaaS.
In these posts, I will address the top four considerations for choosing the database in the move. The database selection is critical and acts as a catalyst for all other technology decisions. The database needs to support both the immediate requirements as well as future, unspecified and unknown requirements. Ideally the DBMS selection should be one of the first technology decisions made for the move.
There are severe consequences of making an inappropriate DBMS selection including long development cycles related to needing new skillsets or converting existing application code, as well as cost and support expansion.
SQL is the long-standing common language of the database, supported by thousands of tools and known by millions of users.  Backward compatibility to core SQL is essential, particularly for operational applications that rely on the ACID compliance that usually comes hand-in-hand with SQL databases. SQL is essential as you move to the cloud, and it needs to be standard SQL that works everywhere and scales all of the time and for all queries.
To do this, modern databases (such as NuoDB) should support ANSI-standard SQL for both reads and writes, not limited or partial SQL or an incompatible variant as many NoSQL and NewSQL databases do. The SQL 2011 standard is the latest revision and added improved support for temporal databases, time period definitions, temporal primary keys with referential integrity, and system versioned tables, among other enhancements.
SQL remains the most viable and useful method for managing and querying data, and will be a primary language to use in the foreseeable future and should be the foundation for a software move to SaaS today. 

DAM – That’s Secure?

The holy grail of database protection may come in the form of DAM married to an agentless platform that employs Artificial Intelligence (AI)

Review: DB Networks Enhances Database Security with Machine Learning

San Diego based DBNetworks may very well have the answers to many of those security shortcomings in the form of their IDS-6300, a security appliance which detects intrusions into databases and provides administrators with the intelligence to do something about it.

Analytics startup Mode wants to give SQL a shiny home in the cloud

Mode, a startup that’s trying to be something like a GitHub for data scientists, has added new features to its collaboration platform that make it easier to write SQL queries and share the resulting reports. While that news in itself might not be too interesting, the company claims the new stuff represents a major upgrade for a service already popular among some well-known users.

The new features, explained here in a company blog post, really boil down to improving the workflow for data analysts. Among other things, users can now position data tables and the SQL editor as they see fit on their screens, easily edit schema and preview reports as they’re building them. A new activity feed, a la Slack or any other enterprise social platform, enables what Mode Co-founder and Chief Analyst Benn Stancil calls “implicit collaboration” — sharing reports and other work without expressly tagging individual colleagues.

A screenshot of the new report preview.

A screenshot of the new report preview.

But the bigger news in all of this might be the type of traction Mode claims it’s getting. It cites, TuneIn and Munchery as customers, and Stancil said Twitch does nearly all of its analytics via Mode. “Twitch almost certainly is operating at a scale near petabytes,” he said. “…It’s not like an Excel-sized thing by any means.”

The way Mode works, fundamentally, is as an overlay atop a company’s existing SQL data store. About half of the company’s users connect via their Amazon Redshift data warehouses, but systems range from MySQL to Cloudera Impala. “It works with anything that speaks SQL,” Stancil said.

I asked Stancil if Mode is acting as an alternative to [company]Tableau Software[/company] among its customers, and he said it only is to the degree that they’re both trying to simplify the process of analyzing data and creating reports. Aside from the collaboration angle, the biggest difference is the target audience, where Stancil sees Tableau targeting savvy business users while Mode is all about the data analyst.

“There are certain people,” he said, “for whom Tableau is more difficult to use than just writing SQL because you have to go through a UI that constrains what you can do.”

You can learn more about the companies building and the people using next-generation analytics tools at our Structure Data conference next month in New York. Speakers include Tableau Vice President of Analytics Jock Mackinlay, Interana CEO Ann Johnson and BuzzFeed Director of Data Science Ky Harlin.

MemSQL open sources tool that helps move data into your database

Database startup MemSQL said today that it open sourced a new data transfer tool called MemSQL Loader that helps users haul over vast quantities of data from sources like Amazon S3 and the Hadoop Distributed File System (HDFS) into either an MemSQL or MySQL database.

While moving data from one source to another may seem relatively straightforward, there’s a lot of nuts and bolts in the process; if one thing goes awry, the whole endeavor can fail. For example, if you’re trying to move over thousands of files and one fails to transfer for some reason, you may have to start the process over again and hope all goes well, according to the MemSQL announcement.

MemSQL Loader is essentially an automation tool that lets users set up multiple transfers and queues that can restart “at a specific file in case of any import issues,” the release stated.

From the MemSQL blog post explaining the tool:
[blockquote person=”MemSQL” attribution=”MemSQL”]MemSQL Loader lets you load files from Amazon S3, the Hadoop Distributed File System (HDFS), and the local filesystem. You can specify all of the files you want to load with one command, and MemSQL Loader will take care of deduplicating files, parallelizing the workload, retrying files if they fail to load, and more.[/blockquote]

MemSQL in action

MemSQL in action

The new tool is available in open source through the MIT License and can be downloaded at GitHub.

MemSQL has been on a roll launching new tools and features since its 2012 inception. In September, Gigaom’s Derrick Harris reported that MemSQL now supports cross-data-center replication, which is good for disaster recovery in case a database takes a hit; cross-data-center replication also helps distribute the load across two data centers, which could cut down on latency and boost performance.

Need to wrangle SQL, NoSQL data? Espresso Logic says it can help

Espresso Logic, which offers a backend service to help businesses connect applications with SQL data sources, is adding NoSQL to the mix with new support for MongoDB — as well as support for and Microsoft Dynamics business applications coming soon.

The company says its service makes it easier to create RESTful APIs that facilitate data flow from repository to applications. REST, short for representational state transfer, has become something of a lingua franca for connecting disparate applications.

Espresso Logic CEO R. Paul Singh (pictured above) said the product and its reliance on reactive programming lets non-programmers accelerate development by connect apps by clicking, dragging and dropping — or perhaps writing a few lines of code.


That promised ease of use appealed to Bill Kuklinski, director of systems development for Boston’s Joslin Diabetes Center. His group doesn’t have the IT and programming resources needed to integrate applications by hand. Like many organizations Joslin runs many legacy applications that are treasure troves of data needed by other applications.

And as the need to let patients funnel readings from their glucose monitors into the center’s system means that data has to traverse organizational walls. Navigating an array of in-house healthcare apps and those built more with a consumer in mind, is tricky.

“Everyone has a unique set of problems they want to report on and unique analytics and that requires custom development. If you can’t get the data from those data sources, every vendor’s answer is ‘here’s our API,'” Kuklinski, an Espresso Logic customer, added.

Supporting REST makes life easier because the business doesn’t don’t have to support a zillion different APIs.

The so-called API economy has led to the rise of companies such as Apigee that manage APIs and which just added new analytics services. Espresso Logic competes with backend services platforms as Strongloop, which recently announced a life-cycle management tool for Node.js-centric REST APIs; Kinvey and Dreamfactory.

Gigaom Research analyst Rich Morrow agreed that it’s important to support the right APIs, but there’s more blocking and tackling to be done. “Exposing your datastore to mobile endpoints via an API is really powerful, but looks way more easy than it is — you’ve got to build accessibility, security, access controls, management and extensibility. It makes way more sense for most organizations to buy the capability rather than build it themselves,” he said.

Espresso-Platform-6 (2)



ThoughtSpot’s data analytics hardware is now available to the public

Big data startup ThoughtSpot said Tuesday that its core product, the ThoughtSpot Relational Search Appliance, is now available to the general public. ThoughtSpot wants to bring a Google-like search experience for data analytics with its hardware appliance, which contains an in-memory database and a custom-built search engine. When a user types in keywords into the search interface, ThoughtSpot can predict what a user wants to find and run SQL queries based on the search. The startup — whose founding team includes former members of Nutanix, Google and Yahoo — landed $30 million in funding in June.

Metanautix’s data analyzing engine now available to the public

Metanautix, the Palo Alto-based startup founded by a former Google big data expert and a former Facebook senior engineer, said today that its Quest engine for analyzing data is now available to the general public. Quest uses SQL to link together disparate data silos and can convert the data into tables that even business departments like sales and marketing can understand. Metanautix co-founder and CEO Theo Vassilakis helped develop the querying system Dremel at Google; Google’s BigQuery analytics service was based on Dremel. The startup’s other co-founder, CTO Apostolos Lerios, was a Facebook senior software engineer who worked on Facebook’s image processing architecture. Metanautix counts Hewlett-Packard and Shutterfly among its six customers.

Teradata embraces the big data ecosystem, buys Think Big Analytics

Data warehouse vendor Teradata continues to step up its game in the broader big data market, this time by acquiring consulting firm Think Big Analytics, which specializes in helping clients deploy open source technologies and build analytics applications.