From Storage to Data Virtualization

Do you remember Primary Data? Well, I loved the idea and the team but it didn’t go very well for them. It’s likely there are several reasons why it didn’t. In my opinion, it boiled down to the fact that very few people like storage virtualization. In fact, I expressed my fondness for Primary Data’s technology several times in the past, but when it comes to changing the way to operate complex, siloed, storage environments you come across huge resistance, at every level!
The good news is that Primary Data’s core team is back, with what looks like a smarter version of the original idea that can easily overcome the skepticism surrounding storage virtualization. In fact, they’ve moved beyond it and presented what looks like a multi-cloud controller with data virtualization features. Ok, they call it “Data as a Service,” but I prefer Data Virtualization…and being back with the product is a bold move.
Data Virtualization (What and Why)
I’ve begun this story by mentioning Primary Data first, because David Flynn (CEO of HammerSpace and former CTO of Primary Data) did not start this new Hammerspace venture from scratch. He bought the code which belonged to Primary Data and used it to build the foundation of his new product. That allowed him and his team to get on the market quickly with the first version of HammerSpace in a matter of months instead of years.
HammerSpace is brilliant just for one reason. It somehow solves or, better, hides the problem of data gravity and allows their Data-as-a-Service platform to virtualize data sets by presenting virtualized views of them available in a multi-cloud environment through standard protocols like NFS or S3.
Yes, at first glance it sounds like hot air and a bunch of buzzwords mixed together, but this is far from being the case here… watch the demo in the following video if you don’t trust me.
The solution is highly scalable and aimed at Big Data analytics and other performance workloads for which you need data close to the compute resource quickly, without thinking too much about how to move, sync, and keep it updated with changing business needs.
HammerSpace solutions have several benefits but the top two on my list are:

  • The minimization of egress costs: This is a common problem for those working in multi-cloud environments today. With HammerSpace, only necessary data is moved where it is really needed.
  • Reduced latency: It’s crazy to have an application running on a cloud that is far from where you have your data. Just to make an example, the other day I was writing about Oracle cloud, and how  good they are at creating high-speed bare-metal instances at a reasonable cost. This benefit can be easily lost if your data is created and stored in another cloud.

The Magic of Data Virtualization
I won’t go through architectural and technical details, since there are videos and documentation on HammerSpace’s website that address them (here and here).  Instead, I want to mention one of the features that I like the most: the ability to query the metadata of your data volumes. These volumes can be anywhere, including your premises, and you can get a result in the form of a new volume that is then kept in sync with the original data. Everything you do on data and metadata is quickly reflected on child volumes. Isn’t it magic?
What I liked the least, even though I understand the technical difficulties in implementing it, is that this process is one-way when a local NAS is involved… meaning that it is only a source of data and can’t be synced back from the cloud. There is a workaround, however, and it might be solved in future releases of the product.
Closing the Circle
HammerSpace exited stealth mode only a few days ago. I’m sure that by digging deeper into the product, flaws and limitations will be found.t is also true that the more advanced features are still only sketched on paper. But I can easily get excited by innovative technologies like this one and I’m confident that these issues will be fixed over time. I’ve been keeping an eye on multi-cloud storage solutions for a while, and now I’ve added Hammerspace to my list.
Multi-cloud data controllers and data virtualization are the focus of an upcoming report I’m writing for GigaOm Research. If you are interested in finding out more about how data storage is evolving in the cloud era, subscribe to GigaOm Research for Future-Forward Advice on Data-Driven Technologies, Operations, and Business Strategies.

Four Questions For: Greg Green

What are the biggest issues facing Big Data and marketing in 2017?
There are three primary issues currently facing “Big Data” and Marketing. First, the issue of irrelevance regarding the vast majority of collected data – and conversely, the power of focused, actionable insights from the right slice of data. The pressure on today’s data and analytics professionals continues to grow, forcing them to sift through the noise to find the hidden diamonds.
Second, consumers are changing their behavior faster than data collection and algorithms teams can handle. This leads to marketers chasing trends and opportunities, as they are overwhelmed with data from older paradigms. For example, marketers with a 360-degree view of the customer built in the 1990s, rebuilt in the 2000s and updated again after 2015, completely miss the power of social media while reacting to real-time events.
Third, and most importantly, while the behavioral data shared from our devices gives us some information, the intuition and insights from data science, operations and customer service teams remain equally, if not more so, important. Each business, whether an auto manufacturer or a local pizza pub, needs both the data and the on-the- ground insight to find the golden nuggets and avoid potential disasters.
What new technologies should marketers be using to better analyze and act on data?
It’s no longer a question of if marketers are adopting technology; it’s a matter of what kind of technology. Technology is critical for connecting data silos and catching shoppers’ attention. Brands should tap solutions that help them better understand consumer behavior across print, digital (mobile, video, etc.), in-store, TV and e-commerce channels to get the full scope of their audiences’ actions.
We are especially interested in technology that helps us visualize trends and detect changes in consumer behavior quickly. Consumers tend to lead brands into new arenas by adopting technologies, apps and devices faster than clients can adapt. Marketers have access to this comprehensive data intelligence and ultimately need to adjust their media strategies to take advantage of new behavior patterns, dynamically changing consumer segments and shopper preferences. Segmentation tools, technology that accounts for both integrated online and offline targeting and utilizing vendors with open partner ecosystems should all be part of a marketer’s arsenal. By not accounting for all media channels, marketers are missing the bigger picture. They are formulating a customer profile based on a partial view – technology must be utilized to gain a more holistic audience overview.
It’s important to note that while technology can clearly improve marketing results, it must be combined with a human touch to ensure it’s a more nuanced experience.
How can marketers tread the line between personalization and targeting while ensuring consumer privacy is respected?
It’s no surprise that privacy is a major issue today – consumers increasingly feel “creeped-out” as marketers aim to provide a relevant experience. We especially see this across mobile platforms with location data. Many consumers actually don’t mind sharing this data – it’s when brands use the information in unintended manners or without providing a relevant ROI for customers, so to speak, that brand/consumer friction appears. To avoid this, marketers need to build trust with their audience. This can be achieved by ensuring that the collected data benefits the consumer.
Another way to build trust is to establish a clear opt-in option. Marketers shouldn’t be luring their audience into opting-in – it should be a transparent process laying out exactly what the consumer can expect to receive in return. What if a consumer says “no?” Leave it be. Don’t annoy them with constant opt-in requests. If and when they’d like to share data, you can bet a new app or value proposition will draw them in.
What are some strategies that brands/marketers can leverage to make data as valuable as possible for consumers?
As shoppers, we all want to feel valued and understood. Knowing this mindset, it’s up to marketers to engage their audience in meaningful ways across the channels they prefer. Brands should use the data at their disposal for a catered experience. If a shopper has a birthday coming up, send a special discount code. Want to take it a step further? Allow them to share the code with up to five friends. This ends up being a win-win for the consumer and brand. The shopper feels valued, and the brand extends its reach to five potential new customers.
Data should also be tapped to include real-time offers in marketing campaigns. Imagine a consumer is departing work for the day and looking for a local spot to pick up some pizza for dinner. A simple mobile alert offering a pizza coupon in that exact neighborhood can be the needed influence to spur a shopper to make a purchase decision. Making data relevant for consumers shouldn’t feel like a hassle for marketers. The little extra effort can go a long way in customer engagement, retention and happiness. Yes – happiness.

Fluent in applying data science to business applications and driving revenue, Greg possesses nearly 25 years of industry experience. In his role as Chief Data and Analytics Officer at Valassis, he brings a unique combination of expertise across disciplines including data management, consumer analytics, marketing tech, sales operations and product development.
 

Dataiku Offers Advice on how to Create Data Team Harmony

Building an effective data team can come at a high cost, yet open source tools may be the key to creating harmony and potentially reducing short term and long term costs according to Florian Douetteau, CEO of Dataiku.

Four Questions For: Sara Spivey

In what ways is technology changing marketing today?
In every way. I can’t think of an area in marketing that technology hasn’t changed in some way. From research to social media to everything in between, technology is omnipresent and essential to today’s marketers. Technology has enabled marketers to better identify our target audiences, track their behavior across the spectrum of touch points, provide customized offerings to our clients, work faster to move our audiences through the sales funnel, build better products based on real-time feedback, engage with customers on their preferred medium and much more. However, for all its benefits, I think technology can also be a crutch that we rely on too much at times. Like any tool, it should be used for a specific purpose, but not over relied upon to do everything for us. The human element still provides the competitive advantage to take all the benefits and insights technology offers and decide how and when they should be used.
 
How do you see big data influencing marketing in 2017? Any trends that you expect to see?
Big data has already played a tremendous role in helping marketers provide personalized offerings for their clients and respond quickly to changes in the industry. Yet I think it has also led to a loss of creativity in marketing. In today’s world, marketers are measuring every single thing with tools and systems trying to find an underlying analytical theme. This is leading to an abundance of data that doesn’t necessarily have any meaning or practical application. With a profusion of automated services collecting information, I think marketers have reached a point of saturation. For instance, we don’t need 25,000 data points about a consumer; we need the 12 that matter in their purchase journey. I think marketers will soon realize the need to distill the most important insights from data being collected and to use it to develop thoughtful, actionable, creative strategies to engage and reach their target audiences more effectively.
 
What role do you believe artificial intelligence will play in marketing?
Let’s take chatbots as an example of artificial intelligence in marketing. While there will still be a place for chatbots, companies will be much more selective about using them. Many companies have implemented these AI programs, but in some areas, they hurt the customer experience. Though automation technologies will still be used in marketing, I don’t think chatbots will be applied unilaterally. Companies are realizing that human interaction is still crucial to engaging consumers and sustaining relationships with the ones they already have. As far as virtual reality goes, it definitely has made a major splash, but I’m not sure how it leads to sales conversion, particularly in retail. For example, when consumers use Oculus devices in-store, they might be interested in purchasing that specific VR device, but it rarely leads to lateral purchases for clothes, electronics, etc. There’s a gap between correlating the VR experience between general interest and purchase intent. It’s also an incredibly expensive marketing tactic to employ.
 
How can marketers better incorporate data analytics into their strategies? Are consumers still as willing to share personal facts about themselves in hopes of receiving more personalized experiences?
Marketers must go the extra mile to provide more value to their companies by using their data to reach the customer when it matters most. I believe the industry will only continue to build and sharpen its data focus. The more marketers can learn about their consumers through data, the easier it is to personalize the outreach for a customized experience. That’s how they go the ‘last mile’ with consumers – by reaching them when they are actually in-market with personalized offers.
I am amazed by consumers’ continued willingness to share data, even with growing concerns around privacy. Consumers are sharing personalized facts about themselves at a much higher rate, and while they know companies are using this data to market to them, they are hoping that it will lead to more personalized experiences. It is crucial for companies, brands and retailers to take the data customers are sharing and apply it in a meaningful way that leads to better, more personalized experiences. Whether these organizations offer coupons based on consumers’ past purchases or exclusive access to certain products due to brand loyalty, these little factors can make a significant difference. Our recent study found that more than 70 percent of US internet users said their purchase decisions were influenced by coupons and discounts. What’s more is that 54 percent of consumers claim to have made a purchase as a result of brand outreach regarding abandoned shopper cart items or recommendations based on past purchases. When we have so many tools and data at our disposal, we owe it to our customers to get it right when reaching out to them and providing content that matters to them.

As Chief Marketing Officer, Sara Spivey is responsible for overall leadership of Bazaarvoice’s global marketing programs, including demand generation, solutions marketing, brand strategy, and communications. Sara has more than 30 years of marketing, strategy, and leadership experience with industry-leading organizations.

Report: Rethinking the enterprise data archive for big data analytics and regulatory compliance

Our library of 1700 research reports is available only to our subscribers. We occasionally release ones for our larger audience to benefit from. This is one such report. If you would like access to our entire library, please subscribe here. Subscribers will have access to our 2017 editorial calendar, archived reports and video coverage from our 2016 and 2017 events.
big data storage
Rethinking the enterprise data archive for big data analytics and regulatory compliance by Ahair Baig:
This research report explores today’s big data archive, in which analytical and compliance solutions are implemented by large organizations for the purpose of:

  • Moving large, historical data from tier 1 or tier 2 to cheaper storage for improved efficiency and future scale
  • Making data available, usable, and queryable to organizational stakeholders for easy lookups, analysis, and revenue-generating endeavors
  • Providing robust data security and data retention capabilities that facilitate regulatory compliance and data governance audits.

To read the full report, click here.

Report: Big Data and Big Agriculture

Normally, our library of 1700 research reports are available only to our subscribers. Occasionally we release ones for our larger audience to benefit from. This is one such report. If you would like access to our entire library, please subscribe here. Subscribers will have access to our 2017 editorial calendar, archived reports and video coverage from our 2016 and 2017 events.
 bigdata1
Big Data and Big Agriculture by Adam Lesser:
As the global population increases, weather volatility grows, and fuel prices surge, there will be more incentives to use data and analytics on the farm to increase yields and minimize risks. This report reviews a host of data driven services, which can have a valuable role to play on the farm.
Click here to read the full report.

Who needs traditional storage anymore?

The traditional enterprise storage market is declining and there are several reasons why. Some of them are easier to identify than others, but one of the most interesting aspects is that there’s a radicalization in workloads, hence storage requirements.
Storage as we know it, SAN or NAS, will become less relevant in the future. We’ve already had a glimpse of it from Hyperconvergence, but this kind of infrastructure is trying to balance all the resources – at the expense of overall efficiency sometimes – and they are more compute-driven than data-driven. Data intensive workloads have different requirements and need different storage solutions.

The Rise of Flash

sign-flash-trash1All-flash systems are gaining in popularity, and are more efficient than hybrid and all-disk counterparts. Inline compression and deduplication, for example, are much more viable on a Flash based system than on others, making it easier to achieve better performance even from the smallest of configurations. This means doing more with less.
At the same time, All-flash allows for a better performance and lower latency and, even more important, the latter is much more consistent and predictable over time.
With the introduction of NVMe and NVMeoF, protocols which are specifically designed to access flash media (attached to PCI bus) faster, latency will be even lower (in the order of hundreds of microseconds or less).

The Rise of Objects (and cloud, and scale-out)

At the same time, what I’ve always described as “Flash & Trash” is actually happening. Enterprises are implementing large scale capacity-driven storage infrastructures to store all the secondary data. I’m quite fond of object storage, but there are several ways of tackling it and the common denominators are scale-out, software-defined and commodity hardware to get the best $/GB.
Sometimes, your capacity tier could be the cloud (especially for smaller organizations with small amounts of inactive data to store) but the concept is the same, as are the benefits. At the moment the best $/GB is still obtained by Hard Disks (or tapes) but with the rate of advancement in Flash manufacturing, before you know it we’ll be seeing the large SSDs replacing disks in these systems too.

The next step

High Self Efficacy Level - Efficiency ObjectiveTraditional workloads are served well by this type of two-tier storage infrastructure but it’s not always enough.
The concept of memory-class storage is surfacing more and more often in conversations with end users, and also other CPU-driven techniques are taking the stage. Once again, the problem is getting results faster, before others if you want to improve your competitiveness.
With new challenges coming from real-time analytics, IoT, deep learning and so on, even traditional organizations are looking at new forms of compute and storage. You can also see it from cloud providers. Many of them are tailoring specific services and hardware options (GPUs or FPGAs for example) to target new requirements.
The number of options is growing pretty quickly in this segment and the most interesting ones are software-based. Take DataCore and its Parallel I/O technology as an example. By parallelizing the data path and taking advantage of multicore CPUs and RAM, it’s possible to achieve incredible storage performance without touching any other component of the server.
This software uses available CPU cores and RAM as a cache to reorganize writes while avoiding any form of queuing to serve data faster. It radically changes the way you can design your storage infrastructure, with a complete decoupling of performance from capacity. And, because it is software, it can be installed also on cloud VMs.
A persistent storage layer is still necessary, but will be inexpensive if based on the scale-out systems I’ve mentioned above. Furthermore, even though software like DataCore’s Parallel I/O can work with all existing software, modern applications are now designed relying on the fact that they could run on some sort of ephemeral storage, and when it comes to analytics we usually work with copies of data anyway.

Servers are storage

Software-defined scale-out storage usually means commodity X86 servers, for HCI is the same and very low latency solutions are heading towards a similar approach. Proprietary hardware can’t compete, it’s too expensive and evolves too slowly compared to the rest of the infrastructure. Yes, niches good for proprietary systems will remain for a long time but this is not where the market is going.
Software is what makes the difference… everywhere now. Innovation and high performance at low cost, is what end users want. Solutions like DataCore do exactly that, making it possible to do more with less but also do much more, and quicker, with the same resources!

Closing the circle

Storage requirements are continuing to diversify and “one-size-fits-all” no longer works (I’ve been saying that for a long time now). Fortunately, commodity x86 servers, flash memory and software are helping to build tailored solutions for everyone at reasonable costs, making high performance infrastructures accessible to a vaster public.
Most modern solutions are built out of servers. Storage, as we traditionally know it, is becoming less of a discrete component and more blended with the rest of the distributed infrastructures with software acting as the glue and making things happen. Examples can be found everywhere – large object storage systems have started implementing “serverless” or analytics features for massive data sets, while CPU intensive and real-time applications can leverage CPU-data vicinity and internal parallelism through a storage layer which can be ephemeral at times… but screaming fast!

data.world CEO Brett Hurt Interview

dsc_4205Brett Hurt is the CEO and Co-founder of data.world, which is building the most meaningful, collaborative, and abundant data resource in the world. He is also a seed-stage investor at Hurt Family Investments (HFI) in partnership with his wife, Debra. HFI are involved in 45 startups and 11 VC funds. HFI has directly made 34 startup investments, and Brett has also joined the Advisory Board of 11 additional companies. (Full disclosure: HFI has a minority investment in Gigaom). Prior to HFI, Brett founded Bazaarvoice (NASDAQ: BV) and served as CEO and President for seven and a half years, leading the company from bootstrapped concept to almost 2,000 clients worldwide and through its successful IPO. Prior to Bazaarvoice, Brett founded Coremetrics and helped grow the company into a global, leading marketing analytics solution for the eCommerce industry before its acquisition by IBM.


Byron Reese: Tell us about data.world.
data.world is the social network for data people. We’ve seen social networks for questions and answers, Quora, for friends, Facebook, for developers, GitHub. The world has not yet seen a successful social network for data and the people who work with it, and I find that kind of baffling, because data is often something that is hard to understand without proper communication and context. There are many issues which prevent it from being easily analyzed and put to use. For example, government agencies put out data, change the format year to year, change the taxonomy and the ontologies every so often, so all of that needs to be documented and understood before the data makes sense.
We’ve spent, as a nation, just a massive amount of money funding these “industrial strength” analytics platforms, but we haven’t addressed the core problem: garbage in, garbage out.
Data people are spending 50% – 80% of their time as “data janitors,” finding, cleaning, and otherwise preparing data for use—before they can get to the real exploration, analysis, and ultimately, the solutions to the problems they’re working on. We call this frustrating janitorial stage the “first mile” of data work. Most of this work happens in isolation, and all that preparation is used once and forgotten. But what if it could be shared with other data people working with the same datasets and tackling similar problems so they don’t have to needlessly repeat it? What if preparation itself could be done collaboratively?
And the most important question: What if we reduced prep time from 80% to 30%, or 10%?
Think about that productivity boost to data workers. If they spent less time preparing data, they could solve important problems faster, create new knowledge faster.
So we’re bringing them together, giving them a powerful collaboration workspace, and linking datasets to make it happen.
Where are you in the lifecycle of the company?
logo-colorOur preview release launched July 11th, and things are going extremely well. We’ve got lots of people signing up, and it’s really neat to see who comes in. From corporate analysts, to people in government, to citizen data scientists, and even cities–recently the city of San Diego and the city of Austin signed up. As we’re in preview release, we can really study exactly what users are doing, exactly what features they’re asking for. We have a daily metrics standup where we look for patterns, create experiments, share results. It’s revving very quickly, we’re coming out with around 4 or 5 new feature releases each week. This is the fastest pace of execution of any startup I’ve been with, and this is my 6th startup personally. Even though we’re in preview, the platform is already very powerful and anybody can visit data.world today and sign up for free—no waitlist. We’re working on adding more and more social features so it really has the best qualities of a social network.
Stepping back for a moment, we’ve been at this now for about a year. We stayed in stealth mode for several months prior to raising money for the company. Before any business I’ve ever launched, from Bazaarvoice to Coremetrics, to others, I spent a lot of time in analysis to make sure that the market timing is right, that we’re building the right type of team, that we’re really learning from previous people in the industry. We really felt like the market timing was great, we went out there to raise capital in December, and as you know, Byron, we’re Austin-based. We started on November 29th, for our capital raise right after Thanksgiving, and we actually signed on December 15th, for our $14M Series A, then quickly closed that round in January.
So we were off to the races on the fundraising front, and now we’re in that mode of having just launched the preview, all that initial excitement that accumulates around something that’s so ambitious, and we really believe we’ve built something that can do so much good for the world. To those ends, we’ve set up the company as a public benefit corporation, and it’s the first time in my entrepreneurial career to actually launch a public benefit corporation. I’m happy to speak about that as well.
Please do.
I’ve founded C corporations in the past, and to be honest, I didn’t know what a public benefit corporation was. I’m involved in the Henry Crown Fellowship, which is the flagship program of The Aspen Institute, and it’s really amazing. You’re basically put with 21 to 22 fellows each year, and I was fortunate enough to get in the program. The fellows are very diverse. Previous Fellows in tech include Reid Hoffman, Aneel Bhusri (co-founder and CEO of Workday), Reid Hastings, and so many others, but there’s also many prominent people from government, non-profits, musicians like Lupe Fiasco.
The program gets you really thinking about your utility as a human being, like what is your ultimate value to the world? And it has had a profound impact on me, which combined with support from my family, got me to actually step back into the arena to launch data.world. The Henry Crown program exposed me to benefit corporations, because the founders of B Lab, the leading certification service for benefit corporations, are also Henry Crown fellows and recently won an award for their work from The Aspen Institute. Benefit corps (also know as B-Corps) include Etsy, Patagonia, Ben & Jerry’s—there’s quite a few of them out there.
The more I dug into it, the more I realized that it was so aligned with what data.world is doing. Ultimately, the way we’re going to get to a world of clean, interoperable data, the type of data that would make that onboard computer from Star Trek possible: you talk to it, ask it any question you could think of, and boom, it would give you your answer. What we have with Siri today is just the beginning. The better the data, the better the output. But that clean data is going to take a lot of people improving it, sharing, collaborating, so that everybody has access to the fruits of that labor: the best, cleanest dataset on any subject, and thus the best starting point for data people trying to solve problems in that domain. This is what it will take to stop Zika in its tracks, to cure cancer sooner, to quickly find viable solutions to climate change—all of these critical projects are fundamentally data-driven.
The public benefit corporation structure is a lot like a C corporation, it’s really a sister or brother to the C corporation, vs. an LLC or S-corp, but it has the added benefit of protecting the mission of the company, and allowing you to publicly report on progress against that mission in kind of the same ways you would report on your finances.
Here is our public benefit statement, or mission:
“The specific public benefit purposes of the Corporation are to (a) strive to build the most meaningful, collaborative and abundant data resource in the world in order to maximize data’s societal problem-solving utility, (b) advocate publicly for improving the adoption, usability, and proliferation of open data and linked data, and (c) serve as an accessible historical repository of the world’s data.”
And it’s been extremely well-received in our launch. When we go in and we speak with people in universities where there’s lots of data silos ready to be freed and used by the public, or we go in and speak with government decision-makers, they really value our public benefit status and are much more willing to partner. The public and official nature of our mission makes us accountable and creates trust. It shows we care deeply about what we’re doing.
I’m lucky to have three co-founders who are all entrepreneurs, and all had been successful executives at HomeAway leading up to that acquisition by Expedia, and we’re proud that our mission is baked not only into the way we operate, but the actual legal structure of the company itself.
You touched on things like cancer and Zika and all of that, what are your hopes for the things that are done with the data. Where do you see the biggest contributions of what you’re doing, to the world?
Well there are so many problems in the world that would be solved faster if people were collaborating more using high-quality data. Vice President Biden launched the Cancer Moonshot after his son passed away from brain cancer, and everyone knows someone who has been affected by cancer. My connection is very personal, like his, and it’s just kind of crazy to me that a lot of the data we need to fight it lives in silos, which is an artifact from the days when the world was less connected and data was stored in large data warehouses on premises, before the open data movement had picked up steam. We live in the cloud age, and networking things together is creating so much value for the world, but it’s just not happening fast enough with data. For example, the U.S. Census has some of the most valuable data available, and it powers businesses like ancestry.com and Zillow, but the U.S. Census partnered with us early on to make their data even more usable, and much more connected. Census CMO Jeff Meisel comes from private industry, and he really cares about serving the public with data in much the same way a CMO in the private sector cares about delivering value to his or her customers. And of course, he wants to know, who is using this data? How are they using it? Part of the goal with Census is to provide them with this visibility, and I think that we’re creating a framework for public entities, private firms, and individuals to continuously improve the way data is shared and used.
So you’re storing all the data, you’re normalizing across disparate data sets, and then you’re providing an API to query multiple data sets at once. Is that kind of the idea overall?
Yeah, that’s very close. As we ingest datasets from users or from supply partners like Census, we put them into a very large graph database. That allows us to break the data up into what are known as atomic triples, which makes it highly interoperable and joinable. For example, if a user uploads a dataset containing ZIP codes, and another user or data supply partner uploads a dataset containing ZIP codes, you can join the datasets quickly, run queries against it, and share the query and what it returns.
But it’s important to note that we’re not trying to replace every part of the data professional’s toolchain. We’re actually building this so it enhances a user’s existing data work toolchain, making it easy to get data into and out of the platform. So we already preview more than 30 data filetypes, we’re creating various connectors and integrations, and we’re not asking you to give up R Studio or any other tool you know and love.
What are the biggest challenges or problems you are running into?
The biggest challenge in building a business like this, versus my previous businesses, like Bazaarvoice, and Coremetrics, is that we have to win in the court of public opinion. We believe that, at scale, there’s only going to be one primary social network for data people, and we intend to be it. But to get there, you have to win the hearts and minds of the community, and that means you need to be obsessively focused on how a wide range of users use data.world. Anybody from a business analyst working inside a large company, to the data scientist at an NGO who needs advanced capabilities. So the challenge is winning over a very broad spectrum of users, across a very broad spectrum of industries, across a very broad spectrum of countries.
To achieve this, we have our daily metrics standup meetings and interviewing users every week to understand how they’re using the system and how it has to evolve. It actually takes me back to my days of creating one of the first Internet games in 1990. We were obsessively focused on how players played our game. Of course, players could show up from any country. It was one of the most popular games on the Internet by 1992, this is back in the days of TELNET, pre-HTML.
One of the things I really love about the business is how technical it is. We’re democratizing the Semantic Web, which is something that Tim Berners Lee has been advocating for a long time. But the truth is, the Semantic Web just hasn’t taken off yet. The reality is that it’s used only by the wealthiest few, it’s used by the NSA, it’s used by Palantir, it’s used by Facebook and Google, it’s used by Goldman Sachs. And it’s kind of sad that the most powerful data technology to ever come along in human history hasn’t been democratized. So we’re very mission-driven, very fired up, and you can read it in our public benefit statement, which of course is public, about what we’re all about, but it really started up as democratizing the most powerful data technology, for both the average user, as well as the more advanced user.
You’re well known for a being a strong advocate of Austin. But didn’t you raise most of the money for this out of California? Does Austin not have a vibrant fundraising scene?
Austin has a vibrant fundraising scene, but there is a limitation in terms of the size of funds. And that’s going to change over time. Used to not be the case, we used to have one megafund, in Austin Ventures, which could write $15 to 20 million checks, for the right type of bold, ambitious business that data.world is. We don’t have that type of early-stage funding capability anymore. But we’re a great city to get started in on the fundraising front. Let’s say that you’re doing something that’s still exciting, but less ambitious. And I’ve never worked on anything as ambitious as data.world, this is the most ambitious project I’ve ever worked on. So let’s say you’re starting the next Bazaarvoice. We raised $3.8M, and we actually got to cash flow breakeven on the first $2M. The amazing thing about Bazaarvoice is we went from 0 to $100M in sales in 6 years from inception. We only raised $25M in total, and we had $12M of it left in the bank when we went public. So we were incredibly capital efficient as a SaaS business. You know, I was able to raise money in Austin, for Bazaarvoice, that wasn’t a problem. And I actually raised it from Austin Ventures. And I’d be able to raise it today, I’d be able to raise it from LiveOak, Silverton, S3, as well as perhaps a few others. That type of check is not a problem in Austin. I went to Silicon Valley because we’re in the post-Austin Ventures era, and I really wanted a partner that understood building platform businesses. For all the amazing things that have happened in Austin, entrepreneurially, there aren’t a lot of big platform businesses in Austin. And I wanted that expertise from the beginning on our board, and I’m glad to say I found it in California, with Jason Pressman, someone I had actually known for 17 years prior to him joining our Board. And he’s with a firm named Shasta Ventures. Jason is particularly interesting because he’s been on the boards at Nextdoor and Smule since the beginning. Nextdoor, as you know is the largest online community for people that live in neighborhoods and own houses. Smule is one of the largest online communities now for people who love music and want to share their music, and I’m talking about individuals, not artists, but Smule has a lot of interesting functionality where individuals and artists team up and do things on that social network. It’s a kind of huge social network for music. So Jason has that experience, and they’re used to backing platform companies. So to be honest, I didn’t even start in Austin with our fundraise, I went straight to California, and I only spoke with VCs that are used to backing platform businesses. Because I wanted someone who understood exactly what we were going after, and had the capital to back it up. Having said that, I very quickly got Austin money into our company. I got our largest VC in Austin into our company, that’s LiveOak Venture Partners. And they’ve been fantastic. I got a lot of individuals in Austin into our company, like John Mackey the co-founder and co-CEO of Whole Foods. Clayton Christopher, who founded DEEP EDDY vodka in town, and a number of others, but it is fair to say that out of the $14M we’ve raised, I would say, and these are rough numbers, I would say around $10M of it came from California, around $1.5M or so came from Austin. Around $1M came from Chicago, and then the rest came from New York. Now part of the reason for that is that I’ve got relationships all over. A lot of the people I spoke with I’ve known for many years. And that’s as a result of building several businesses that have become global businesses, and I’ve traveled a lot in my lifetime.
I know that with Bazaarvoice, and I assume before that, you were very deliberate about the culture of the company. What kind of culture are you building with this company?
You know it is a very different culture for a platform business, but I’d say we’re being very deliberate about it. And we recently we were honored as one of Austin’s best places to work by the Austin Business Journal, so it feels good to get an early win like that for such a young company. We’ve really been able to hire some of the best people that we’ve ever worked with at companies like Trilogy, Bazaarvoice, HomeAway, Indeed, and others. It’s really exceptional, the people that we have here, it’s the best early-stage startup team I’ve ever been a part of. And that’s not just the result of my experience, that’s the result of the fact that I’ve got three amazing co-founders who are also very accomplished and very well-connected in the Austin area. So the reason I tell you all that is that culture ultimately is a reflection of the foundation and the raw material, which is your people. And we’ve hired people that were the cultural standard bearers, and the performance bearers, at their previous companies.
But it is a very different type of culture than Bazaarvoice. I would say that Bazaarvoice was a very sales-driven culture. It was a verticalized, B2B SaaS company. The play was to convince the 10,000 or so VPs of e-commerce in the world to join the Bazaarvoice solution, and ultimately buy a fairly expensive enterprise-grade software solution to host something so critical, their customer reviews, which is so effective at driving online sales. There are 700M consumers each month that use Bazaarvoice, they just don’t know it. Because it’s like the Intel Inside of clients like Best Buy, Walmart, and others. So if Bazaarvoice is a very sales-driven culture, data.world is a very engineering-driven culture, and if you think about what we’re doing and building a platform business, it’s much more technical than the initial solution was at Bazaarvoice. We’re going after a slightly more technical audience, although we’re also going to win over and we are winning over the business analysts of the world. And out of the 26 people we have today, 17 of them are in engineering.
One of the ways that we stay focused on our users is through a tremendous amount of communication on Slack. I would say 80% of our communication is on Slack, vs. email, which keeps a very real-time pace.
You always feel connected to pretty much everything occurring in the company, or occurring with our users, because you even see the support channel for example hooked up to Slack. So I personally see every single support request that comes through. Obviously that won’t scale, but when you’re in a preview release mode and you have thousands of users, it’s very important to see what every single person is doing, and stay focused on that and stay very close to, ultimately, the person you’re trying to win over in that court of public opinion. We have tons of standup meetings, including a daily standup for our executive team, a daily standup for metrics, and every team has their own as well. And then we have lots and lots of user interviews.
It’s different than in the early days of Bazaarvoice. There, when you won a client, whoever won the client would hit a massive gong we had in the office. Everybody would gather around, and it was very tribal, and it was very sales-driven, where that person would talk about how they won that client over and ultimately how excited that client is to launch a very transparent form of communication on their own site. With this business, it’s much subtler when we win. You’re winning on an individual user basis, and you’re winning thousands and thousands of them over. Instead of hitting a gong, we Slack our success and recognize each other during our standups.
You mentioned Slack. What other tools are you finding to be particularly useful?
We’re also big users of Google Hangouts. We wired up all of our conference rooms with Google Hangouts in the first few weeks of having an office. We also use all the standard tools that you would imagine for this type of company. We’re obsessed users of GitHub, which obviously is the social network for coders. We manage social with Hootsuite. For our metrics, we use Segment, and Mixpanel, which allow us to see user behavior at an aggregate level, and really understand exactly how people use the platform, so we can constantly rev the technology. We’re running multiple experiments simultaneously with tools like Optimizely.
We don’t have a phone system. I mean there is no general phone number that people call into. Being a platform type of business, all of our communication with users is digital with the one exception of when we do user tests and we get on Google Hangouts and then we can see each other and see what they’re doing, or we have them into the office in person.
So why did you decide you wanted to be a CEO again?
Well, there were two reasons. One I talked about earlier, which is the impact of the Henry Crown fellowship on me. I had been backing entrepreneurs pretty heavily over a 3-year period after Bazaarvoice, and I really enjoyed that. And I still back entrepreneurs, and I invest in one startup once every 2 to 3 months right now. But I was investing at a much, much faster pace for that 3 years. When I joined the Henry Crown fellowship, it got me thinking in a very meditative way about my age, and whether I was doing the best I could to improve the world, in the way that I know. What I know is how to create innovative technologies that benefit a lot of people.
One really cool thing to think about with Bazaarvoice is that, yes, we had 10,000 VPs of e-commerce to reach, but ultimately we were helping hundreds of millions of consumers get transparent information about products. When we started Bazaarvoice, 11 years ago, there were only 3 retailers in the entire United States who had customer reviews on their site, which is kind of crazy to think about. Amazon, back then, was only around 4% of US online retail sales.
A lot has changed in 11 years, but the other thing that had a big impact on me was my family. I thought that the most important thing that I could do over that 3-year period is be Super Dad. So I was at every field trip, I would constantly be on call with my wife, kind of Super Husband as well, and was just extremely present for things. I also learned this very important lesson from my parents, which is involve your kids with your business, because if you’re not doing something you’re passionate about in business, why the hell are you doing it? If you involve your kids in it, they ultimately will see that their parents are doing something that they really care about, and they’ll pick up some lessons along the way, without you explicitly trying to teach them. During those three years, I involved our daughter in quite a few of these meetings with entrepreneurs, as well as startup competitions I was judging. And sometimes she would make the decision of whether or not we actually invested. Especially if it was a product where I thought she’s going to understand this better than we’re going to understand this as adults. She’s going to know if this product will win in the marketplace. And that was very empowering to her, and really I think helped her in a lot of ways with her development. But it’s interesting, because during the act of being Super Dad and Super Husband, Rachel turns to me one day, and she says, “Dad, when are you going to start another business?” And I said, “Well Rachel, look at what I’m doing, look at how many entrepreneurs I’m helping. Look at the different startup competitions we’re going to,” etc. And she said, “Yeah, yeah Dad, all that’s great. When are you going to start another business?”
It really kind of took me back. I didn’t know what to say. And I had to think about it for several days. And then it hit me, when she was growing up, I started Bazaarvoice when she was 6 months old, and I would bring her into Bazaarvoice, and I was very passionate about what Bazaarvoice was doing from the beginning, and I would bring her into client summits, I would bring her into all-hands meetings, I would bring her into different things in the company at Bazaarvoice, and she was a part of it. And it hit me that, and of course she would tell her friends what her dad did, and they would say, “Oh yeah we use customer reviews.” I realized that it was the best example that I could be as a parent.
Abraham Lincoln has this really beautiful quote, which goes something like: “If you would like your child to walk in a certain direction, you yourself must walk in that direction.” I started to think at that point that the best thing I could do is go back into the arena, and hopefully have an even bigger impact on society. And data.world is set up that way, where once we’re successful, we’ll totally transform the way people work with data. We’ll totally transform that problem of the data janitorial work, and Rachel and our son, Levi, years from now, will ask, “What was the world like before data.world? You actually had to go find the data in some university site and then it wasn’t documented? And then you had all these problems with it? That you needed to clean up before you could even do analysis? That doesn’t make any sense.”
It doesn’t make any sense, but that is the way the world of data is today. It’s still highly, highly-siloed. I have a lot of energy around solving that problem, and I have a lot of energy around working with my co-founders. Our COO, Matt Laessig, is just an amazing person. We first met in grad school. We told each other then that we’d start a business together, and here we are many years later doing it. He’s one of the best leaders I’ve ever worked with, and he’s also a great family person and individual. He’s been on American Ninja Warrior for the last 7 years, and one of the only people I went to grad school with who has gotten into progressively better shape ever since we graduated. Jon Loyens is our Chief Product Officer, and he’s incredible. Not only is he the best engineering leader I’ve ever worked with, but he also has been in his own band for 15 years. He’s played with Melissa Etheridge and all types of people. He serves on the Board of Directors of the Andy Roddick Foundation, which works for underserved communities in Austin, especially children. Bryon Jacob, our CTO, was at HomeAway for the last 10 years, and one of the smartest technologists I’ve ever worked with. We’ve become very, very good friends. I get a lot of energy from being part of a team on such an ambitious and impactful mission.
Final question: Why a blue owl for the logo?
There’s a geeky reason and a non-geeky reason. So let me start with the non-geeky reason. An owl in many cultures, represents knowledge and wisdom. And data.world is clearly going to accelerate that on a global scale. The geeky reason is that it’s a nod to Tim Berners-Lee original version of the Semantic Web, the most powerful data technology to ever come along and not yet be fully realized. And the ontology language for the Semantic Web is called Web Ontology Language, but the acronym for it, strangely enough, is OWL. The query language for the semantic web is called SPARQL, it’s a WC3 standard. So we named the owl Sparkle. Everyone in the company has their own Sparkle-tar, which represents something uniquely individual about them. Of course, Matt Laessig’s is the Ninja Sparkle, and Joh Loyens’s is The Guitarist Sparkle. We have a lot of fun with it. Personality counts in business and in community.