The Druid real-time database moves to an Apache license

Druid, an open source database designed for real-time analysis, is moving to the Apache 2 software license in order to hopefully spur more use of and innovation around the project. It was open sourced in late 2012 under the GPL license, which is generally considered more restrictive than the Apache license in terms of how software can be reused.

Druid was created by advertising analytics startup Metamarkets (see disclosure) and is used by numerous large web companies, including eBay, Netflix, PayPal, Time Warner Cable and Yahoo. Because of the nature Metamarkets’ business, Druid requires data to include a timestamp and is probably best described as a time-series database. It’s designed to ingest terabytes of data per hour and is often used for things such as analyzing user or network activity over time.

Mike Driscoll, Metamarkets’ co-founder and CEO, is confident now is the time for open source tools to really catch on — even more so than they already have in the form of Hadoop and various NoSQL data stores — because of the ubiquity of software as a service and the emergence of new resource managers such as Apache Mesos. In the former case, open source technologies underpin multiuser applications that require a high degree of scale and flexibility on the infrastructure level, while in the latter case databases like Druid are just delivered as a service internally from a company’s pool of resources.

However it happens, Driscoll said, “I don’t think proprietary databases have long for this world.”

Disclosure: Metamarkets is a portfolio company of True Ventures, which is also an investor in Gigaom.

Google had its biggest quarter ever for data center spending. Again

Google just finished off another record-setting quarter and year for infrastructure spending, according to the company’s earnings report released last week. The web giant spent more than $3.5 billion on “real estate purchases, production equipment, and data center construction” during the fourth quarter of 2014 and nearly $11 billion for the year.

[dataset id=”858606″]

As we have explained many times before, spending on data centers and the gear to fill them is a big part of building a successful web company. When you’re operating on the scale of companies such as [company]Google[/company], [company]Microsoft[/company], [company]Amazon[/company] and even [company]Facebook[/company], better infrastructure (in terms of hardware and software) means a better user experience. When you’re getting into the cloud computing business as Google is — joining Amazon and Microsoft before it — more servers also mean more capacity to handle users’ workloads.

Google Vice President of Infrastructure — and father of the CAP theorem — Eric Brewer will be speaking at our Structure Data conference in March and will share some of the secrets to building the software systems that run across all these servers.

But even among its peers, Google’s capital expenditures are off the chart. Amazon spent just more $1.1 billion in the fourth quarter and just under $4.9 billion for the year. Microsoft, spent nearly $1.5 billion on infrastructure in its second fiscal quarter that ended Dec. 31,  and just under $5.3 billion over its past four quarters. Facebook only spent just over $1.8 billion in 2014 (although it was a 34 percent jump from 2013’s total).

[dataset id=”912454″]

How NASA launched its web infrastructure into the cloud

Among U.S. government agencies, the adoption of cloud computing hasn’t been moving full steam ahead, to say the least. Even though 2011 saw the Obama administration unveil the cloud-first initiative that called for government agencies to update their old legacy IT systems to the cloud, it hasn’t been the case that these agencies have made great strides in modernizing their infrastructure.

In fact, a September 2014 U.S. Government Accountability Office report on federal agencies and cloud computing explained that while several agencies boosted the amount of IT budget cash they spend on cloud services since 2012 (the GAO studied seven agencies in 2012 and followed up on them in 2014), “the overall increase was just 1 percent.” The report stated that the agencies’ small increase in cloud spending compared to their overall budget was due to the fact that they had “legacy investments in operations and maintenance” and were not going to move those over to the cloud unless they were slated to be either replaced or upgraded.

But there’s at least a few diamonds in the rough. The CIA recently found a home for its cloud on Amazon Web Services. And, in 2012, NASA contracted out with cloud service broker InfoZen for a five-year project worth $40 million to migrate and maintain NASA’s web infrastructure — including including NASA.gov — to the Amazon cloud.

This particular initiative, known as the NASA Web Enterprise Services Technology (WestPrime) contract, was singled out in July 2013 as a successful cloud-migration project in an otherwise scathing NASA Office of Inspector General audit report on NASA’s progress in moving to cloud technology.

Moving to the cloud

In August, InfoZen detailed the specifics of its project and claimed it took 22 weeks to migrate 110 NASA websites and applications to the cloud. As a result of the project’s success, the Office of Inspector General recommended that NASA departments use the WestPrime contract or a smilier contract in order to meet policy requirements and move to the cloud.

The WestPrime contract primarily deals with NASA’s web applications and doesn’t take into account high-performance computing endeavors like rocket-ship launches, explained Julie Davila, the InfoZen cloud architect and DevOps lead who helped with the migration. However, don’t let that lead you to believe that migrating NASA’s web services was a simple endeavor.

Just moving NASA’s “flagship portal” of nasa.gov, which contains roughly 150 applications and around 200,000 pages of content, took about 13 weeks to move, said Roopangi Kadakia, a Web Services Executive at NASA. And not only did NASA.gov and its related applications have to get moved, they also had to be upgraded from old technology.

NASA was previously using an out-of-support propriety content management system and used InfoZen to help move that over to a “cloudy Drupal open-source system,” she said, which helped modernize the website so it could withstand periods of heavy traffic.

“NASA.gov has been one of the top visited places in the world from a visitor perspective,” said Kadakia. When a big event like the landing of the Mars Rover occurs, NASA can experience traffic that “would match or go above CNN or other large highly traffic sites,” she said.

NASA's Rover Curiosity lands on Mars

NASA’s Rover Curiosity lands on Mars

NASA has three cable channels that the agency runs continually on its site, so it wasn’t just looking for a cloud infrastructure that’s tailored to handle only worst-case scenarios; it needed something that can keep up with the media-rich content NASA consistently streams, she said.

The space agency uses [company]Amazon[/company] Web Services to provide the backbone for its new Drupal content management system, and has worked out an interesting way to pay for the cloud, explained Kadakia. NASA’s uses a contract vehicle called Solutions for Enterprise-Wide Procurement (SEWP) that functions like a drawdown account between NASA and Amazon.

The contract vehicle takes in account that the cost of paying for cloud services can fluctuate based on needs and performance (a site might get a spike in traffic on one day and then have it drop the next day). Kadakia estimates that NASA could end up spending around $700,000 to $1 million for AWS for the year; the agency can put in $1.5 million into the account that can cover any unforeseen costs, and any money not spent can be saved.

“I think of it like my service card,” she said. “I can put 50 bucks in it. I may not use it all and I won’t lose that money.”

Updating the old

NASA also had to sift through old applications on its system that were “probably not updated from a tech perspective for seven-to-ten years,” said Kadakia. Some of the older applications’ underlying architecture and security risks weren’t properly documented, so NASA had to do an audit of these applications to “mitigate all critical vulnerabilities,” some of which its users didn’t even know about.

“They didn’t know all of the functionalities of the app,” said Kadakia. “Do we assume it works [well]? That the algorithms are working well? That was a costly part of the migration.”

After moving those apps, NASA had to define a change-management process for its applications so that each time something got altered or updated, there was documentation to help keep track of the changes.

To help with the nitty gritty details of transferring those applications to AWS and setting up new servers, NASA used the Ansible configuration-management tool, said Davila. When InfoZen came, the apps were stored in a co-located data center where they weren’t being managed well, he explained, and many server operating systems weren’t being updated, leaving them vulnerable to security threats.

Without the configuration-management tool, Davila said that it would “probably take us a few days to patch every server in the environment” using shell scripts. Now, the team can “can patch all Linux servers in, like, 15 minutes.”

NASA currently has a streamlined devops environment in which spinning up new servers is faster than before, he explained. Whereas it used to take NASA roughly one-to-two hours to load up an application stack, it now takes around ten minutes.

What about the rest of the government?

Kadakia claimed that moving to the cloud has saved NASA money, especially as the agency cleaned out its system and took a hard look at how old applications were originally set up.

The agency is also looking at optimizing its applications to fit in with the more modern approach of coupled-together application development, she explained. This could include updating or developing applications that share the same data sets, which would have previously been a burden, if not impossible, to do.

A historical photo of the quad, showing Hangar One in the back before its shell was removed. Photo courtesy of NASA.

A historical photo of the quad, showing Hangar One in the back before its shell was removed. Photo courtesy of NASA.

Larry Sweet, NASA’s CIO, has taken notice of the cloud-migration project’s success and sent a memo to the entire NASA organization urging other NASA properties to consider the WestPrime contract first if they want to move to the cloud, Kadakia said.

While it’s clear that NASA’s web services have benefited from being upgraded and moved to the cloud, it still remains hazy how other government agencies will follow suit.

David Linthicum, a senior vice president at Cloud Technology Partners and Gigaom analyst, said he believes there isn’t a sense of urgency for these agencies to covert to cloud infrastructure.

“The problem is that there has to be a political will,” said Linthicum. “I just don’t think it exists.”

Much like President Obama appointed an Ebola czar during the Ebola outbreak this fall, there should be a cloud czar who is responsible for overseeing the rejiggering of agency IT systems, he said.

“A lot of [government] IT leaders don’t really like the cloud right now,” said Linthicum. “They don’t believe it will move them in the right direction.”

Part of the problem stems from the contractors that the government is used to working with. These organizations like [company]Lockheed Martin[/company] and [company]Northrop Grumman[/company] “don’t have cloud talent” and are not particularly suited to guiding agencies looking to move to the cloud.

Still, as NASA’s web services and big sites are now part of the cloud, perhaps other agencies will begin taking notice.

Images courtesy of NASA

Why applications are still Microsoft’s biggest asset in the cloud

Microsoft is talking a lot about the scale of its cloud computing platform lately, but scale alone won’t help it steal revenue from Amazon Web Services or Google. Microsoft’s advantage is in commercial software, which brings more profi and acts as a gateway to other services.

How the right tools can create data-driven companies, even at Facebook

The co-founders of analytics startup Interana came on the Structure Show podcast this week to talk about how to spread data analysis throughout customer accounts, the types of things you can do with event data, and the experience of starting a company with your spouse.

Baidu is trying to speed up image search using FPGAs

Chinese search engine Baidu is trying to speed the performance of its deep learning models for image search using field programmable gate arrays, or FPGAs, made by Altera. Baidu has been experimenting with FPGAs for a while (including with Altera rival Xilinx’s gear) as a means of boosting performance on its convolutional neural networks without having to go whole hog down the GPU route. FPGAs are likely most applicable in production data centers where they can be paired with existing CPUs to serve queries, while GPUs can still power much behind-the-scenes training of deep learning models.

Google wants to show the world how sexy cluster management really is

A partnership between Google and Mesosphere further’s Google’s strategy to sell the world on its way of automating applications and resources. Cluster management is important — even sexy when wrapped in the lore of Google or Facebook — and now Google claims it’s easier than ever.