Dataiku Offers Advice on how to Create Data Team Harmony

Building an effective data team can come at a high cost, yet open source tools may be the key to creating harmony and potentially reducing short term and long term costs according to Florian Douetteau, CEO of Dataiku.

Why BI’s shift to stream intelligence is a top priority for CAOs

Nova is co-founder and CEO at Bottlenose.
A quick search on LinkedIn reveals thousands of professionals in the United States now hold the recently established title of Chief Analytics Officer (CAO). Analytics officers have ascended to the C-suite amongst a constellation of new roles and responsibilities that cut across departmental lines at Fortune 1,000 companies. These positions are driven by the influx of data with which companies now need to contend, across even industries that were not previously data-oriented.
The CAO’s role most closely aligns with business intelligence, leveraging data analytics to create real business value and inform strategic decisions. Yet, the CAO’s responsibilities also encompass discovering the various and constantly changing threats and opportunities impacting a business.
The most dramatic shift in data-driven business intelligence that has necessitated this role is the sheer volume, variety, and velocity of data now available to the enterprise. Data is no longer just static or historical, but real-time, streaming, unstructured, and abundant from both public and proprietary sources.
Unstructured data is the fastest growing category of data, and within it stream data – time-stamped series of records – is the fastest growing sub-category. Stream data spans messaging, social media, mobile data, CRM, sales, support, IT data, sensor and device data such as the emerging internet-of-things, and even live video and audio.
The CAO’s charge is to enable the enterprise to deal with all of this data and generate timely, actionable intelligence from it – increasingly in real-time. I’ve been calling this process of business intelligence for streaming data “stream intelligence” for a while now. Among the dozens of CAOs I’ve spoken with recently, moving from classical business intelligence on static data to stream intelligence is one of their biggest priorities for 2016. This emerging form of BI creates unique problems for enterprise companies, but it also creates unique opportunities for those companies to discern and discover trends early, while there is still time to act on them.

Understanding BI 3.0

Thomas Davenport is a professor at Babson College, a research fellow at the MIT Center for Digital Business, and a senior advisor to Deloitte Analytics. He has written eloquently about these topics since 2010 and offers a framework for thinking about the past, present, and future of analytics.
For Davenport, BI 1.0 was about traditional analytics, providing descriptive reporting from relatively small internally sourced data. It was about back-room teams and internal decision reports.
BI 2.0 was about complex, much larger unstructured data sources. It was also about new computational capabilities that ran on top of traditional analytics. With big data, we saw data scientists first emerge, alongside several waves of new data-based products and services. This is where we are today.
BI 3.0 is about rapid, agile insight delivery – analytical tools at the point of decision, and decision making at scale. Today, analytics are considered a key asset enabling strategic decision making, not merely a mirror reflecting an organization’s past and present.
The “how” of accomplishing this vision amounts to balancing support for the “three V’s” of data — volume, variety, velocity — in the enterprise analytics stack. Most big data and BI technologies to-date were engineered to solve volume and variety, with very little emphasis placed on the velocity of data and analytics. This has to change.
Analysts are already drowning in volume, variety, and velocity of data. To make matters worse, the rate at which new analysts are being trained is far less than the growth rate of demand for analysts and data scientists. In fact, the gap between the supply of analyst hours and the demand for analyst cycles is growing exponentially. This means that there will never be enough data scientists to cope with the rise of unstructured stream data in the enterprise.
To solve the growing “analyst gap” we either need to figure out how to make exponentially more analysts, or we have to figure out how to make the finite supply of analysts exponentially more productive. I prefer the latter solution, but to accomplish it, analysts need automation.
Manual analysis by humans is still possible for structured data, but not for streaming data. Streaming data is just too complex, and changes too fast for human analysts to keep up with on their own. Automation is the only practical way to keep up with changing streams of big data.
BI 3.0 emphasizes real-time business impact and makes use of automation in the analytics process. This will increasingly be achieved with a seamless blend of traditional analytics and big data. BI 3.0 analytics are now integral to running the business day-to-day and hour-to-hour.

Following the flow of big data investment

I’ll close by talking about where big data investment dollars are starting to go. In short, real-time stream data is now a major priority, and historical data is now riding in the back seat.
According to Harvard Business Review, 47 percent of expenditures are directed towards process improvement. 26 percent are directed towards accommodating a greater variety of data, and 16 percent address a greater volume of data. Velocity of data today represents a very small slice, at three percent of overall investment, but that slice will grow quickly in 2016.
In fact, organizations that have prioritized real-time data are outpacing all others, according to Aberdeen Group. Companies that are competent across volume, variety, and velocity alike have seen 26 percent growth in their pipelines, 15 percent increase in cash generated, and 67 percent in operational cost reduction.
It’s hard to argue with those numbers. CAOs understand that BI 3.0 is happening now, which is why it’s become a top priority for 2016.

The Druid real-time database moves to an Apache license

Druid, an open source database designed for real-time analysis, is moving to the Apache 2 software license in order to hopefully spur more use of and innovation around the project. It was open sourced in late 2012 under the GPL license, which is generally considered more restrictive than the Apache license in terms of how software can be reused.

Druid was created by advertising analytics startup Metamarkets (see disclosure) and is used by numerous large web companies, including eBay, Netflix, PayPal, Time Warner Cable and Yahoo. Because of the nature Metamarkets’ business, Druid requires data to include a timestamp and is probably best described as a time-series database. It’s designed to ingest terabytes of data per hour and is often used for things such as analyzing user or network activity over time.

Mike Driscoll, Metamarkets’ co-founder and CEO, is confident now is the time for open source tools to really catch on — even more so than they already have in the form of Hadoop and various NoSQL data stores — because of the ubiquity of software as a service and the emergence of new resource managers such as Apache Mesos. In the former case, open source technologies underpin multiuser applications that require a high degree of scale and flexibility on the infrastructure level, while in the latter case databases like Druid are just delivered as a service internally from a company’s pool of resources.

However it happens, Driscoll said, “I don’t think proprietary databases have long for this world.”

Disclosure: Metamarkets is a portfolio company of True Ventures, which is also an investor in Gigaom.

The rise of self-service analytics, in 3 charts

I’m trying really hard to write less about business intelligence and analytics software. We get it: Data is important to businesses, and the easier you can make it for people to analyze it, the more they’ll use your software to do it. What more is there to say?

But every time I see Tableau Software’s earnings reports, I’m struck by the reality of how big a shift the business intelligence market is undergoing right now. In the fourth quarter, Tableau grew its revenue 75 percent year over year. People and departments are lining up to buy what’s often called self-service analytics software — that is, applications so easy even those lay business users can work with them without much training — and they’re doing it at the expense of incumbent software vendors.

Some analysts and market insiders will say the new breed of BI vendors are more about easy “data discovery” and that their products lack the governance and administrative control of incumbent products. That’s like saying Taylor Swift is very cute and very good at making music people like, but she’s not as serious as Alanis Morrisette or as artistic as Björk. Those things can come in time; meanwhile, I’d rather be T-Swift raking in millions and looking to do it for some time to come.

[dataset id=”914729″]

Above a quick comparison of annual revenue for three companies, the only three “leaders” in Gartner’s 2014 Magic Quadrant for Business Intelligence and Analytics Platforms (available in the above hyperlink) that are both publicly traded and focused solely on BI. Guess which two fall into the next-generation, self-service camp and are also Gartner’s two highest-ranked. Guess which one is often credited with reimagining the data-analysis experience and making a product people legitimately like using.

[dataset id=”914747″]

Narrowing it just to last year, Tableau’s revenue grew 92 percent between the first and fourth quarters, while Qlik’s grew 65 percent. Microstrategy stayed relatively flat and is trending downward. It’s fourth quarter was actually down year over year.

[dataset id=”914758″]

And what does Wall Street think about what’s happening? [company]Tableau[/company] has the least revenue for now, but probably not much longer, and has a market cap more than [company]Qlik[/company] and [company]Microstrategy[/company] combined.

Here are a few more data points that show how impressive’s Tableau’s ongoing coup really is. Tibco Software, another Gartner leader and formerly public company, recently sold to private equity firm Vista for $4.2 billion after disappointing shareholders with weak sales. Hitachi Data Systems is buying Pentaho, a BI vendor hanging just outside the border of Gartner’s “leader” category, for just more than $500 million, I’m told.

A screenshot from a sample PowerBI dashboard.

A screenshot from a sample PowerBI dashboard.

Although it’s worth noting that Tableau isn’t guaranteed anything. As we speak, startups such as Platfora, ClearStory and SiSense trying to match or outdo Tableau on simplicity while adding their own new features elsewhere. The multi-billion-dollar players are also stepping up their games in this space. [company]Microsoft[/company] and [company]IBM[/company] recently launched the natural-language-based PowerBI and Watson Analytics services that Microsoft says represent the third wave of BI software (Tableau is in the second wave, by its assessment), and [company]Salesforce.com[/company] invested a lot of resources to make its BI foray.

Whatever you want to call it — data discovery, self-service analytics, business intelligence — we’ll be talking more about it at our Structure Data conference next month. Speakers include Tableau Vice President of Analytics (and R&D leader) Jock Mackinlay, as well as Microsoft Corporate Vice President of Machine Learning Joseph Sirosh, who’ll be discussing self-service machine learning.

New Relic boosts revenue growth in first post-IPO earnings

New Relic’s first earnings report since going public last December seemed to please investors as the application-performance and analytics company took in $29 million in revenue in what it considers its third quarter 2015 earnings. That’s a 14 percent quarter-over-quarter increase from the second quarter of 2015 and a 69 percent year-over-year increase from the third quarter in 2014.

The San Francisco-based company also said it now has 11,270 paid business accounts as of December 31, 2014, which is up from the 10,590 paid business accounts it had as of September 30, 2014, as disclosed in an SEC filing.

New Relic also signed on some new customers during the quarter including [company]Capital One Services[/company], [company]Hootsuite Media[/company] and [company]Walgreens Boots Alliance[/company].

Seventy-five percent of [company]New Relic[/company]’s customer base is made up of small to medium-size businesses with the other 25 percent coming from companies with over 100 employees. However, those bigger clients account for roughly half of the company’s revenue, said New Relic CFO Mark Sachleben in a conference call.

New Relic sees its recently launched Insights real-time analytics product line as the main differentiator from competitors, and is part of the company’s “land and expand” strategy that involves selling a product line to a client and then persuading it to purchase more goods, explained Sachleben.

The company has also seen “quite a bit of success” in migrating clients from monthly billing cycles to up-front annual payments, which is something larger enterprises are more prone to do, said Sachleben.

In an interview with Gigaom after the conference call, New Relic CEO Lew Cirne wouldn’t say which of its many product lines has been the fastest growing in the past quarter, but he did say that the company is looking to boost staff in Dublin and London as it attempts to grow its market share in those regions. Cirne said 34 percent of New Relic’s business comes from outside the U.S., but the company doesn’t currently have a large global salesforce. So far, the plans are to expand outside the U.S. starting with Europe, but Cirne said the company has “nothing yet to share beyond those markets” at this time.

Here’s some of the numbers based on the company’s earnings report:

  • Revenue for the third quarter of 2015 was $29 million, which is a 14 percent increase from the second quarter of 2015 and a 69 percent increase from the third quarter in 2014.
  • New Relic took $15.6 million in GAAP loss from operations for the third quarter of 2015, which was an increase from the $11.7 million GAAP loss from operations it took in the third quarter of 2014.
  • The company ended up raising $119.9 million in net proceeds during its IPO.
  • For the fourth quarter of fiscal 2015, New Relic is projecting revenue between $30.0 million and $30.5 million and expects a non-GAAP loss from operations ranging between $11.0 million and $12.0 million.

Hitachi Data Systems to buy Pentaho for $500-$600M

Storage vendor Hitachi Data Systems is set to buy analytics company Pentaho at a price rumored to be between $500 million and $600 million (closer to $500 million, from what I’ve heard). It’s an interesting deal because of its size and because Hitachi wants to move beyond enterprise storage and into analytics for the internet of things. Pentaho sells business intelligence software, and can transform even big data from stream-processing engines for real-time analysis. According to a Hitachi press release, “The result will be unique, comprehensive solutions to address specific challenges through a shared analytics platform.”

Microsoft throws down the gauntlet in business intelligence

[company]Microsoft[/company] is not content to let Excel define the company’s reputation among the world’s data analysts. That’s the message the company sent on Tuesday when it announced that its PowerBI product is now free. According to a company executive, the move could expand Microsoft’s reach in the business intelligence space by 10 times.

If you’re familiar with PowerBI, you might understand why Microsoft is pitching this as such a big deal. It’s a self-service data analysis tool that’s based on natural language queries and advanced visualization options. It already offers live connections to a handful of popular cloud services, such as [company]Salesforce.com[/company], [company]Marketo[/company] and GitHub. It’s delivered as a cloud service, although there’s a downloadable tool that lets users work with data on their laptops and publish the reports to a cloud dashboard.

James Phillips, Microsoft’s general manager for business intelligence, said the company has already had tens of thousands of organizations sign up for PowerBI since it became available in February 2014, and that CEO Satya Nadella opens up a PowerBI dashboard every morning to track certain metrics.

A screenshot from a sample PowerBI dashboard.

A screenshot from a sample PowerBI dashboard.

And Microsoft is giving it away — well, most of it. The preview version of the cloud service now available is free and those features will remain free when it hits general availability status. At that point, however, there will also be a “pro” tier that costs $9.99 per user per month and features more storage, as well as more support for streaming data and collaboration.

But on the whole, Phillips said, “We are eliminating any piece of friction that we can possibly find [between PowerBI and potential users].”

This isn’t free software for the sake of free software, though. Nadella might be making a lot of celebrated, if not surprising, choices around open source software, but he’s not in the business of altruism. No, the rationale behind making PowerBI free almost certainly has something to do with stealing business away from Microsoft’s neighbor on the other side of Lake Washington, Seattle-based [company]Tableau Software[/company].

Phillips said the business intelligence market is presently in its third wave. The first wave was technical and database-centric. The second wave was about self service, defined first by Excel and, over the past few years, by Tableau’s eponymous software. The third wave, he said, takes self service a step further in terms of ease of use and all but eliminates the need for individual employees to track down IT before they can get something done.

The natural language interface, using funding data from Crunchbase.

The natural language interface, using funding data from Crunchbase.

IBM’s Watson Analytics service, Phillips said, is about the only other “third wave” product available. I recently spent some time experimenting with the Watson Analytics preview, and was fairly impressed. Based on a quick test run of a preview version of PowerBI, I would say both products have their advantages over the other.

But IBM — a relative non-entity in the world of self-service software — is not Microsoft’s target. Nor, presumably, is analytics newcomer Salesforce.com. All of these companies, as well as a handful of other vendors that exist to sell business intelligence software, want a piece of the self-service analytics market that Tableau currently owns. Tableau’s revenues have been skyrocketing for the past couple years, and it’s on pace to hit a billion-dollar run rate in just over a year.

“I have never ever met a Tableau user who was not also a Microsoft Excel user,” Phillips said.

That might be true, but it also means Microsoft has been leaving money on the table by not offering anything akin to Tableau’s graphic interface and focus on visualizations. Presumably, it’s those Tableau users, and lots of other folks for whom Tableau (even its free Tableau Public version) is too complex, that Microsoft hopes it can reach with PowerBI. Tableau is trying to reach them, too.

“We think this really does 10x or more the size of the addressable business intelligence market,” Phillips said.

A former Microsoft executive told me that the company initially viewed Tableau as a partner and was careful not to cannibalize its business. Microsoft stuck to selling SharePoint and enterprise-wide SQL Server deals, while Tableau dealt in individual and departmental visualization deals. However, he noted, the new positioning of PowerBI does seem like a change in that strategy.

Analyzing data with more controls.

Analyzing data with more controls.

Ultimately, Microsoft’s vision is to use PowerBI as a gateway to other products within Microsoft’s data business, which Phillips characterized the the company’s fastest-growing segment. PowerBI can already connect to data sources such as Hadoop and SQL Server (and, in the case of the latter, can analyze data without transporting it), and eventually Microsoft wants to incorporate capabilities from its newly launched Azure Machine Learning service and the R statistical computing expertise it’s about to acquire, he said.

“I came to Microsoft largely because Satya convinced me that the company was all in behind data,” Phillips said. For every byte that customers store in a Microsoft product, he added, “we’ll help you wring … every drop of value out of that data.”

Joseph Sirosh, Microsoft’s corporate vice president for machine learning, will be speaking about this broader vision and the promise of easier-to-use machine learning at our Structure Data conference in March.

Microsoft CEO Satya Nadella.

Microsoft CEO Satya Nadella.

Given all of its assets, it’s not too difficult to see how the new, Nadella-led Microsoft could become a leader in an emerging data market that spans such a wide ranges of infrastructure and application software. Reports surfaced earlier this week, in fact, that Microsoft is readying its internal big data system, Cosmos, to be offered as a cloud service. And selling more data products could help Microsoft compete with another Seattle-based rival — [company]Amazon[/company] Web Services — in a cloud computing business where the company has much more at stake than it does selling business intelligence software.

If it were just selling virtual servers and storage on its Azure platform, Microsoft would likely never sniff market leader AWS in terms of users or revenue. But having good data products in place will boost subscription revenues, which count toward the cloud bottom line, and could give users an excuse to rent infrastructure from Microsoft, too.

Update: This post was updated at 10:15 a.m. to include additional information from a former Microsoft employee.