Dataiku Offers Advice on how to Create Data Team Harmony

Building an effective data team can come at a high cost, yet open source tools may be the key to creating harmony and potentially reducing short term and long term costs according to Florian Douetteau, CEO of Dataiku.

Microsoft embraces Python, Linux in new big data tools

Continuing its quest to make Microsoft Azure comfy for the non-Windows world, Microsoft just launched a preview of its Hadoop-based cloud tool (HDInsight) that runs on Linux. It’s also making its Azure ML machine learning service widely available now with new support for Python as well as the already-planned support for the popular R language. Microsoft bought Revolution Analytics, the company behind a commercial version of R, last month.

Azure HDInsight is thus “Microsoft’s first fully Linux-based service for big data,” Joseph Sirosh, Microsoft’s corporate VP of machine learning, said in an interview. Microsoft says 20 percent of all VMs running on Azure run Linux.

Asked if he sees any open-source oriented developers still wary of using Microsoft’s cloud, Sirosh said the perception of Microsoft as a Windows-only company is fading. “There is a new breed of developers [who want] to leverage features … whether they are Linux- or Windows-based is becoming less important,” he said. With cloud services, “you really don’t have to know a lot about deep inner details to use these services.”

Azure ML’s embrace of Python also shows just how popular that language has become and that [company]Microsoft[/company] Azure is building on its promise of language agnosticism. “Python has become the number one language of choice for developers. We can now claim to be the most comprehensive analytics service — no other product lets you integrate SQL, R and Python into one project,” Sirosh said.

Microsoft CEO Satya Nadella.

Microsoft CEO Satya Nadella

Microsoft is also making Storm, the open-source stream analytics tool, available for HDInsight with support for both .NET and Java. The company already offered Azure Stream Analytics and will continue to sell, support and upgrade that as well. Storm is another option, Sirosh said.

In the massive public cloud infrastructure arena, Microsoft must contend with [company]Amazon[/company] Web Services and [company]Google[/company] Cloud Platform, both of which are targeting developers with fancy analytics and other services. I agree with Sirosh that Microsoft has done a good job of embracing open-source frameworks and languages in Azure. But the perception, especially among young startups, of Microsoft as a Windows-and-Office-first monolith dies hard.

I’ll be sure to ask Sirosh more about how Microsoft Azure can win over startups as well as big business accounts when we’re on stage next month at Structure Data.

This story was updated at 10:05 a.m. PST to reflect Microsoft’s assertion that 20 percent of all VMs on Azure run Linux

Cloudera bought DataPad because data scientists need tooling, too

Cloudera has acquired a data-visualization startup called DataPad, the founding team of which specializes in data analysis using the Python programming language. As Hadoop competition heats up, Cloudera might be ramping up its Python tooling in order to attract more data scientists and developers.

Researchers hope deep learning algorithms can run on FPGAs and supercomputers

The NSF has funded projects that will investigate how deep learning algorithms run on FPGAs and across systems using the high-performance RDMA interconnect. Another project, led by Andrew Ng and two supercomputing experts, wants to put the models on supercomputers and give them a Python interface.

DARPA puts $3M into startup pushing big data in Python

As part of its new big-data-focused XDATA initiative, DARPA has invested $3 million in a startup called Continuum Analytics. The company’s aim is to extend Python’s prowess in scientific computing into the world of big data and analytics.

Mortar Data closes $1.8M seed round for Python-wrapped Hadoop

Mortar Data has raised $1.8 million for its cloud-based service that wraps Hadoop in a custom — and supposedly developer-friendly — blend of Pig and Python, meaning even novice Hadoop programmers can be writing jobs in about an hour.

Another reason you should learn to code: Python for Excel

Anyone who has used Microsoft Excel since 1993 has likely dabbled at least once with VBA, or Visual Basic for Applications. Now, a pair of MIT students have created an plug-in alternative to VBA called IronSpread, which uses the cross-platform Python scripting language.