From Storage to Data Virtualization

Do you remember Primary Data? Well, I loved the idea and the team but it didn’t go very well for them. It’s likely there are several reasons why it didn’t. In my opinion, it boiled down to the fact that very few people like storage virtualization. In fact, I expressed my fondness for Primary Data’s technology several times in the past, but when it comes to changing the way to operate complex, siloed, storage environments you come across huge resistance, at every level!
The good news is that Primary Data’s core team is back, with what looks like a smarter version of the original idea that can easily overcome the skepticism surrounding storage virtualization. In fact, they’ve moved beyond it and presented what looks like a multi-cloud controller with data virtualization features. Ok, they call it “Data as a Service,” but I prefer Data Virtualization…and being back with the product is a bold move.
Data Virtualization (What and Why)
I’ve begun this story by mentioning Primary Data first, because David Flynn (CEO of HammerSpace and former CTO of Primary Data) did not start this new Hammerspace venture from scratch. He bought the code which belonged to Primary Data and used it to build the foundation of his new product. That allowed him and his team to get on the market quickly with the first version of HammerSpace in a matter of months instead of years.
HammerSpace is brilliant just for one reason. It somehow solves or, better, hides the problem of data gravity and allows their Data-as-a-Service platform to virtualize data sets by presenting virtualized views of them available in a multi-cloud environment through standard protocols like NFS or S3.
Yes, at first glance it sounds like hot air and a bunch of buzzwords mixed together, but this is far from being the case here… watch the demo in the following video if you don’t trust me.
The solution is highly scalable and aimed at Big Data analytics and other performance workloads for which you need data close to the compute resource quickly, without thinking too much about how to move, sync, and keep it updated with changing business needs.
HammerSpace solutions have several benefits but the top two on my list are:

  • The minimization of egress costs: This is a common problem for those working in multi-cloud environments today. With HammerSpace, only necessary data is moved where it is really needed.
  • Reduced latency: It’s crazy to have an application running on a cloud that is far from where you have your data. Just to make an example, the other day I was writing about Oracle cloud, and how  good they are at creating high-speed bare-metal instances at a reasonable cost. This benefit can be easily lost if your data is created and stored in another cloud.

The Magic of Data Virtualization
I won’t go through architectural and technical details, since there are videos and documentation on HammerSpace’s website that address them (here and here).  Instead, I want to mention one of the features that I like the most: the ability to query the metadata of your data volumes. These volumes can be anywhere, including your premises, and you can get a result in the form of a new volume that is then kept in sync with the original data. Everything you do on data and metadata is quickly reflected on child volumes. Isn’t it magic?
What I liked the least, even though I understand the technical difficulties in implementing it, is that this process is one-way when a local NAS is involved… meaning that it is only a source of data and can’t be synced back from the cloud. There is a workaround, however, and it might be solved in future releases of the product.
Closing the Circle
HammerSpace exited stealth mode only a few days ago. I’m sure that by digging deeper into the product, flaws and limitations will be found.t is also true that the more advanced features are still only sketched on paper. But I can easily get excited by innovative technologies like this one and I’m confident that these issues will be fixed over time. I’ve been keeping an eye on multi-cloud storage solutions for a while, and now I’ve added Hammerspace to my list.
Multi-cloud data controllers and data virtualization are the focus of an upcoming report I’m writing for GigaOm Research. If you are interested in finding out more about how data storage is evolving in the cloud era, subscribe to GigaOm Research for Future-Forward Advice on Data-Driven Technologies, Operations, and Business Strategies.

Who needs traditional storage anymore?

The traditional enterprise storage market is declining and there are several reasons why. Some of them are easier to identify than others, but one of the most interesting aspects is that there’s a radicalization in workloads, hence storage requirements.
Storage as we know it, SAN or NAS, will become less relevant in the future. We’ve already had a glimpse of it from Hyperconvergence, but this kind of infrastructure is trying to balance all the resources – at the expense of overall efficiency sometimes – and they are more compute-driven than data-driven. Data intensive workloads have different requirements and need different storage solutions.

The Rise of Flash

sign-flash-trash1All-flash systems are gaining in popularity, and are more efficient than hybrid and all-disk counterparts. Inline compression and deduplication, for example, are much more viable on a Flash based system than on others, making it easier to achieve better performance even from the smallest of configurations. This means doing more with less.
At the same time, All-flash allows for a better performance and lower latency and, even more important, the latter is much more consistent and predictable over time.
With the introduction of NVMe and NVMeoF, protocols which are specifically designed to access flash media (attached to PCI bus) faster, latency will be even lower (in the order of hundreds of microseconds or less).

The Rise of Objects (and cloud, and scale-out)

At the same time, what I’ve always described as “Flash & Trash” is actually happening. Enterprises are implementing large scale capacity-driven storage infrastructures to store all the secondary data. I’m quite fond of object storage, but there are several ways of tackling it and the common denominators are scale-out, software-defined and commodity hardware to get the best $/GB.
Sometimes, your capacity tier could be the cloud (especially for smaller organizations with small amounts of inactive data to store) but the concept is the same, as are the benefits. At the moment the best $/GB is still obtained by Hard Disks (or tapes) but with the rate of advancement in Flash manufacturing, before you know it we’ll be seeing the large SSDs replacing disks in these systems too.

The next step

High Self Efficacy Level - Efficiency ObjectiveTraditional workloads are served well by this type of two-tier storage infrastructure but it’s not always enough.
The concept of memory-class storage is surfacing more and more often in conversations with end users, and also other CPU-driven techniques are taking the stage. Once again, the problem is getting results faster, before others if you want to improve your competitiveness.
With new challenges coming from real-time analytics, IoT, deep learning and so on, even traditional organizations are looking at new forms of compute and storage. You can also see it from cloud providers. Many of them are tailoring specific services and hardware options (GPUs or FPGAs for example) to target new requirements.
The number of options is growing pretty quickly in this segment and the most interesting ones are software-based. Take DataCore and its Parallel I/O technology as an example. By parallelizing the data path and taking advantage of multicore CPUs and RAM, it’s possible to achieve incredible storage performance without touching any other component of the server.
This software uses available CPU cores and RAM as a cache to reorganize writes while avoiding any form of queuing to serve data faster. It radically changes the way you can design your storage infrastructure, with a complete decoupling of performance from capacity. And, because it is software, it can be installed also on cloud VMs.
A persistent storage layer is still necessary, but will be inexpensive if based on the scale-out systems I’ve mentioned above. Furthermore, even though software like DataCore’s Parallel I/O can work with all existing software, modern applications are now designed relying on the fact that they could run on some sort of ephemeral storage, and when it comes to analytics we usually work with copies of data anyway.

Servers are storage

Software-defined scale-out storage usually means commodity X86 servers, for HCI is the same and very low latency solutions are heading towards a similar approach. Proprietary hardware can’t compete, it’s too expensive and evolves too slowly compared to the rest of the infrastructure. Yes, niches good for proprietary systems will remain for a long time but this is not where the market is going.
Software is what makes the difference… everywhere now. Innovation and high performance at low cost, is what end users want. Solutions like DataCore do exactly that, making it possible to do more with less but also do much more, and quicker, with the same resources!

Closing the circle

Storage requirements are continuing to diversify and “one-size-fits-all” no longer works (I’ve been saying that for a long time now). Fortunately, commodity x86 servers, flash memory and software are helping to build tailored solutions for everyone at reasonable costs, making high performance infrastructures accessible to a vaster public.
Most modern solutions are built out of servers. Storage, as we traditionally know it, is becoming less of a discrete component and more blended with the rest of the distributed infrastructures with software acting as the glue and making things happen. Examples can be found everywhere – large object storage systems have started implementing “serverless” or analytics features for massive data sets, while CPU intensive and real-time applications can leverage CPU-data vicinity and internal parallelism through a storage layer which can be ephemeral at times… but screaming fast!

Serverless-enabled storage? It’s a big deal

The success of services like AWS Lambda, Azure Functions or Google Cloud Functions is indisputable. It’s not for all use cases, of course, but the technology is intriguing, easy to implement and developers (and sysadmins!) can leverage it to offload some tasks to the infrastructure and automate a lot of operations that, otherwise, would be necessary to do at the application level, with a lower level of efficiency.
The code ( a Function) is triggered by events and object storage is perfect for this.

Why object storage

Object storage is usually implemented with a shared-nothing scale-out cluster design. Each node of the cluster has its own capacity, CPU, RAM and network connections. At the same time, modern CPUs are very powerful and usually underutilized when the only scope of the storage node is to access objects. By allowing the storage system to use its spare CPU cycles to run Functions, we obtain a sort of very efficient hyperconverged infrastructure (micro-converged?).
Usually, we tend to bring data close to the CPU but in this case we do the exact opposite (we take advantage of CPU power which is already close to the data), obtaining even better results. CPU-data vicinity coupled with event triggered micro-services is a very powerful concept that can radically change data and storage management.
Scalability, is not an issue. CPU power increases alongside the number of nodes and the code is instantiated asynchronously and in parallel, triggered by events. This also means that response time, hence performance, is not always predictable and consistent but, for the kind of operations and services that come to mind, it’s good enough.
Object metadata is another important key element. In fact, the Function can easily access data and metadata of the object that triggered it. Adding and modifying information is child’s play… helping to build additional information about content for example.
These are only a few examples, but the list of characteristics that make scale-out storage suitable for this kind of advanced data service is quite long. In general, it’s important to note that, thanks to the architecture design of this type of system, this functionality can boost efficiency of the infrastructure at an unprecedented level while improving application agility. It’s no coincidence that most of the triggering events implemented by cloud providers are related to their object storage service.

Possible applications

Ok, Serverless-enabled storage is cool but what can I do with it?
Even though this kind of system is not specifically designed to provide low latency responses, there are a lot of applications, even real time applications, can make use of this feature. Here are some examples:
Image recognition: for each new image that lands in the storage system, a process can verify relevant information (identify a person, check a plate number, analyze the quality of the image, classify the image by its characteristics, make comparisons and so on). All this new data can be added as metadata or in the object itself.
Security: for each new, or modified, file in the system, a process can verify if it contains a virus, sensitive information, specific patterns (i.e. credit card numbers) and take proper action.
A businessman or an employee is drawing an analytics optimisation chart on the glass screen in a modern panoramic office in New York.Analytics: each action performed on an object can trigger a simple piece of code to populate a DB with relevant information.
Data normalization: every new piece of information added to the system can be easily verified and converted to other formats. This could be useful in complex IoT environments for example, where different types of data sources contribute to a single large database.
Big Data: AWS has already published a reference architecture for Map/Reduce jobs running on S3 and Lambda! (here The link)
And, as mentioned earlier, these are only the first examples that come to my mind. The only limit here is one’s imagination.

Back-end is the key

There are only a few serverless-enabled storage products at the moment, with others under development and coming in 2017. But I found two key factors that make this kind of solution viable in real production environments.
The first is multiple language support – in fact the product should be capable of running different types of code so as not to limit its possibilities. The second, is the internal process/Function scheduler. We are talking about a complex system which shares resources between storage and compute (in a hyperconverged fashion) and resource management is essential in order to grant the right level of performance and response time for storage and applications.
One of the most interesting Serverless-enabled products I’m aware of is OpenIO. The feature is called Grid For Apps while another component called Conscience technology is in charge of internal load balancing, data placement and overall resource management. The implementation is pretty slick and efficient. The product is open source, and there is a free download from their website. I strongly suggest taking a look at it to understand the potential of this technology. I installed it in a few minutes, and if I can do it… anyone can.

No standards… yet

Contrary to object storage, where the de facto standard is S3 API, Serverless is quite new and with no winner yet. Consequently, there are neither official nor de facto standards to look at.
I think it will take a while before one of these services will prevail over the others but, at that time, API compatibility won’t be hard to achieve. Most of these services have the same goal and similar functionalities…

Closing the circle

Data storage as we know it is a thing of the past. More and more end users are looking at object storage, even when the capacity requirement is under 100TB. Many begin with one application (usually as a replacement of traditional file services) but after grasping its full potential it gets adopted for more use cases ranging from backup to back-end for IoT applications through APIs.
Serverless-enabled storage is a step forward and introduces a new class of advanced data services which will help to simplify storage and data management. It has a huge potential, and I’m keeping my eye on it… I suggest you do the same.

Originally posted on Juku.it

Storage product development is getting worse

Storage is becoming less conservative than in the past. It has its pros and cons, but if it means poorer quality of the final product and increasing risk of data loss, then it’s not the way to go.

Bad behavior

I stumbled on this article which talks about all the problems, bugs and mistakes made by Maxta with one of its (now former) customers. I won’t talk about Maxta and this particular case (also because there are different versions of this story), but this is an example on how some vendors, especially small startups, are setting the bar too high and then consequently struggle to deliver.
Data loss is the worst case scenario, but it’s quite common now to hear about storage startups in trouble when the game gets tough. Sometimes they miserably fail to scale when they promise “unlimited scalability”, or performance is far lower than expected or some of the features don’t actually work as documented.
This time round it’s happened to Maxta, but I’m sure that many others could fall in the same mistake.

DevOp-izing Data storage is dangerous

n the last couple of years, I’ve been hearing a lot about the drastic change in the development process of storage systems. Most of the vendors are adopting new agile development processes, and some of them have been openly talking about a DevOps-like approach.
I’ve always been keen on this type of development approach: modern, fast and produces results quickly. But… I can appreciate it on my smartphone apps, not on my storage system. I can imagine a continuous refinement of the UI or management features, but not for core performance or data protection aspects of the product.
What ever happened to the golden rule “if it works, leave it alone!”??? I’m not saying to apply it literally, but couldn’t more time be spent on testing and QAs instead of releasing a new version every other week? Do we really need a storage software update every fortnight? I don’t think so.

Fierce competition

It’s all about competition in the end. In the past a single good feature was enough to make a product, set a new market and have success (take DataDomain for example). It took time for others to follow and the development cycle was not as fast as today. Now everything is much more complicated, things have accelerated and an on going product evolution is needed to keep the pace with your competitors. Look at hyperconvergence or All-Flash for example, in many cases it is really difficult to find a differentiator now, and end users want all the features that are taken for granted (and the list is very long!). What is now considered table stake is already hard to achieve, on top of which you have to promise more to be taken seriously.

Closing the circle

I know times have changed and everything runs at a faster pace… but when it comes to data and data storage, data protection, availability and durability are still at the top of the list, aren’t they?
Standing out in a crowd is much harder now than in the past. Even established vendors are much quicker in reacting to market changes. Lately, when a new potential market segment is discovered, they’ve shown their ability to buy out a startup or come out with their own product pretty quickly and successfully (take VMWARE VSAN for example). First movers, like Nutanix for example, have an advantage (a vision) and can aim at successful exits, but for a large part of the me-too startups it’s tough because the lack of innovation and differentiation puts them in an awkward position: they are constantly trying to catch up with the leaders.
Software-defined or not, product quality is still fundamental, especially when dealing with Storage. I’d like to see more storage vendors talk about how thoroughly they test their products and how long they maintain them in Beta before going into production… instead of how many new releases they are able to provide per month!
And please, find some time to write better documentation too!

>Originally posted on Juku.it

S3, to rule them all! (storage tiers, that is)

Last week I was at NetApp Insight and it was confirmed, not that it was necessary, that S3 protocol is the key to connecting and integrating different storage tiers. You know, I’ve been talking about the need of a two tier architecture for modern storage infrastructures for a long time now (here a recent paper on the topic) and I also have strong opinions about object storage and its advantages.

The missing link

The main reasons for having two storage tiers is cost and efficiency. $/GB and $/IOPS (or better $/Latency today) are the most important metrics while, on the other hand, efficiency in terms of local and distributed performance (explained here in detail) are other fundamental factors. All the rest is taken for granted.
The challenges come when you have to move data around. In fact, data mobility is a key factor in achieving high levels of efficiency and storage utilization contributing, again, to lower $/GB. But, primary and secondary storage use different protocols for data access and this makes it quite difficult to have a common and consistent mechanism to move data seamlessly in the front-end.
Some solutions, like Cohesity for example, are quite good in managing data consolidation and its re-utilization by leveraging data protection mechanisms… but it means adding additional hardware and software to your infrastructure, which is not always possible either because of cost or complexity.

S3 to the rescue

It seems several vendors are finally discovering the power of object storage and the simplicity of RESTful-based APIs. In fact, the list of primary storage systems adopting S3 to move (cold) data to on-premises object stores or to the public cloud is quickly growing.
Tintri and Tegile have recently joined Solidfire and DDN in coming up with some interesting solutions in this space, and NetApp previewed its Fabric Pools at Insight. I’m sure I’ve left someone out, but it should give you an idea of what is happening.
The protocol of choice is always the same (S3) for multiple and obvious reasons, while the object store in the back-end can be on-premises or on the public cloud.
Thanks to this approach the infrastructure remains simple, with your primary storage serving front-end applications while internal schedulers and specific APIs are devoted to supporting automated “to-the-cloud” tiering mechanisms. It’s as easy as it sounds!
Depending on each specific implementation, S3 allows to off-load old snapshots and clones from primary storage, make a copy of data volumes in the cloud for backup or DR, automated tiering, and so on. We are just at the beginning, and the number of possible applications is very high.

Closing the circle

It is pretty much clear that we are going to see more and more object-based back-ends for all sorts of storage infrastructures with the object store serving all secondary needs, no matter where the data comes from. And in this case we are not talking about Hyper-scale customers!
In the SME it will primarily be cloud storage, even though many object storage vendors are refocusing their efforts to offer options to this kind of customer: the list is long but I can mention Scality with its S3 Server, OpenIO with it’s incredible easy of use, NooBaa with its clever “shadow storage” approach, Ceph and many others. All of them have the ability to start small, with decent performance, and grow quickly when needed. Freemium license tiers (or open source versions of the products) are available, easing the installation on cheap (maybe old and used) x86 servers and minimizing adoption risks.
In large enterprises object storage is now part of most private cloud initiatives, but it is also seen as a stand-alone convenient storage system for many different applications (backup, sync&share, big data archives, remote NAS consolidation, etc).

Originally posted on Juku.it

WeTransfer Moves Toward File Transfer as a Microservice

It shouldn’t be news that enterprise file storage, sync, and sharing software and services (EFSS) have largely become a commodity. Prices continue to fall, in part because providers’ storage costs are still decreasing. More importantly, their cost to actually transfer a file has always been negligible, even with the application of strong encryption.
With costs low and decreasing, it’s fair to ask which of the aspects of file storage, sync, and sharing creates enough value for customers that providers can charge for the service. When you stop and think about it, the sharing or transfer of the file has always been the action that the rest of the bundled offer hangs on, especially for cloud-based services. A file can’t be stored on a provider’s servers until a copy has been transferred there. Similarly, changes to files must be transferred to keep copies in sync. The vast majority of the value proposition clearly lies in the transfer (sharing) of the file.
So it makes sense for the file transfer element to be the focal point for providers’ monetization strategies. If you accept that premise, then the next logical conclusion to be made is that file transfer can be monetized as a stand-alone service. In today’s world, that service would be built and licensed as a microservice, which can be used in any application that can call a RESTful API.
WeTransfer, a company based in Amsterdam (despite claiming San Francisco as its headquarters), has announced today the first step toward the creation of such a commercially-available file transfer microservice. A new partnership makes WeTransfer’s file transfer service an option (alongside Dropbox) for delivering photos and videos purchased from Getty Image’s iStock library. WeTransfer works in the background while the customer remains in iStock.
WeTransfer has exposed its file transfer API to Getty Images only at this point, but will be able strike up similar partnerships with other providers of graphics services. Of course, WeTransfer could also license API access to any developer looking to incorporate file transfer into an application. While it isn’t clear from their statement today if and when that will happen, the possibility is very real and quite compelling.
It’s important to note that both Box and Dropbox have made their file sharing APIs commercially available to developers for several months now, so WeTransfer is playing catch up in this regard. However, WeTransfer has emphasized file sharing almost exclusively since its founding in 2009 as a web-based service that only stores a file being shared for seven days before deleting it from their servers. Dropbox, on the other hand, originally was popular because of its simple-but-effective sync feature, and Box was initially perceived as a cloud-based storage service.
The potential market for file transfer microservices is so young and large that no provider has a clear advantage at this point. The recent nullification of the Safe Harbor agreement (PDF) between the European Union and the United States also presents a significant challenge to file services vendors that provide file storage for a global and multinational customer base. If WeTransfer emphasizes its legacy as an easy-to-use, dependable file transfer-only service with its newly-created microservice, it could gain a larger share of the market and expand well beyond its current niche of creative professional customers.

Tiger (er, Shark) of the Month: Digital Ocean Makes Getting to Cloud Easy

 
Digital Ocean Shark Whilst at Github Universe last month, on my way to learn more about the conference host’s new hardware two-factor security initiative for its developers, I was sidetracked by the sight of a group of developers crowded around a small kiosk, each holding a blue smiling toy shark. Curious, I stopped by to chat with the kiosk owners’, the crew of Digital Ocean, this month’s featured cloud computing “tiger.”
Why is this particular cloud provider a tiger/shark? Simply put, because they make getting onto the cloud easy for developers at a price point that won’t cause any nightmares as the customer company scales up. But it’s not as simple as calling Digital Ocean an “AWS lite” either because at the moment it’s a very different offering. And particularly for companies that were born pre-cloud, or for the largest unicorns like Uber, DO does not have all the features and support that you would need — at least not yet.
From 0 to 236 countries in Three Years
What Digital Ocean does offer to new companies and other cloud first entities is a facilitated and positive user experience. And developers love them — 700,000+ of them representing 8+ million cloud servers. From a 2012 $3.2 million seed round to a March 2014 $37.2 million Series A to its most recent Series B round in July 2015 for $83 million (Access Industries, Andreesson Horowitz, others), the company has found exponential success with a customer-first approach — delivering a streamlined UX with a straight-forward, transparent “no B.S.” or hard upsell approach.
DigitalOcean currently reaches 230 countries and territories. After their Series A the company added datacenters in Singapore, London and Frankfurt. Another facility was added in Toronto since its Series B, and the company expects to onboard India and South America in 2016.
Starter Web Services 
Digital Ocean has established itself in the web services arena primarily by capturing the entry market. If you look at the three aspects of being in the cloud — Computing, Networking and Storage — DO really only serves the first leg of the stool. Which means that the company can only handle small co or early stage company requirements — non-dynamic content web pages, etc. So a chunk of the developers currently using their services eventually outgrow them. *But* flush with cash and building momentum, the company launched its first networking service in late October — floating IPs that solve the problem of reassigning IPs to any droplet in the same datacenter — and may be able to offer a comprehensive cloud solution as early as next year.
The Hidden Barrier to Cloud is Actually HR
What set’s Digital Ocean’s products from the rest is that you don’t necessarily need a senior engineer to get your company’s web presence set up. So DO is not only cheaper and easier to use, but a platform to get you faster to launch. This provides the business with some breathing room as it develops, expanding the base of potential staff whom can handle the website. This has made DO’s Droplet a popular service for not just newcos, but also for discreet web projects and microsites like the launch of Beyonce’s secret album and Universe.com (owned by TicketMaster).
Customer Service Your Way (It’s All About the Love) 
DO’s UX goes far beyond the product UI. As strange as it sounds coming from a web infrastructure company, one of the company’s core values is “Love” and it is practiced throughout the company. I would characterize this love as a “passion for helping others” and a joie de vivre” that infuses the organization and is transferred to its customers. Duly noted that their mascot shark is a smiling happy one. 

How exactly does this Love manifest itself in the business? Zachary Bouzan-Kaloustian, Digital Ocean’s Director of Support describes its IaaS in this way: “Our entire platform is self-managed, which means that the customer is responsible for what runs on their Droplets. Our Platform Support Specialists provide free support if something’s not working with the infrastructure. One way that we demonstrate our core value of love is to ensure we reply quickly, and our 30-day average response time for 1,000+ tickets / day is under 30 minutes! Of course, our 24/7 team doesn’t stop there. We often do extensive troubleshooting with a customer to diagnose the issue, even if it involves parsing server logs. This involves extensive experience, and great relationship skills as we don’t have access to our customer’s Droplets for security reasons.”

But is this love scalable? Maybe not, but certainly the desire for love amongst developers (and us all) is strong so no doubt there is no shortage of demand for DO’s particular brand of customer relations.

Building the Next Generation Infrastructure 
By getting in with developers early, Digital Ocean has set itself up to take advantage of the tipping point of the Internet of Everything — when not only all major services but customer adoption for them reaches critical mass worldwide — likely well within the next 5 years. While newcos are signing up with Digital Ocean today, the company is fortifying and expanding its technical and services staff — growing from 150 to 200 employees in the past quarter alone.
And the big fish are taking notice: Google, Microsoft and Amazon have sliced their prices 3x since Digital Ocean launched and prices continue to drop. So increasingly the companies will begin to compete on volume — of customers and services used.
Fast forward 5 years and DO will have all the pieces of the cloud stool well established as well as worldwide presence. If DO can maintain its vision of making web services simple to consume, and successfully build out its offerings so that it can scale with its customers, the company is well positioned to become the go-to web services company for the post-millennial generation. Considering that there are some 20-30 million potential developer customers out there — it wouldn’t surprise me to see Digital Ocean as the most distributed — if not the biggest — and certainly the most beloved fish in the sea by 2020.
**This post was updated at 1:38pm on November 11, 2015 to reflect factual corrections. Access Industries, not Accel Partners, is a lead investor in Digital Ocean. 

IoT and the Principle of Resource Constraints

Technology may be fast-moving but some concepts have remained stable for decades. Not least the principle of resource constraints. Simply put, we have four finite variables to play with in our technological sandpit:

  • electrical power
  • processor speed
  • network bandwidth
  • storage volume

This principle is key to understanding the current Internet of Things phenomenon. Processing and storage capacities have increased exponentially — today’s processors support billions of instructions per second and the latest solid state storage can fit 32 gigabytes on a single chip.
As we expand our abilities to work with technology, so we are less constrained, creating new possibilities that were previously unthinkable due to either cost or timeliness. Such as creating vast networks of sensors across our manufacturing systems and supply chains, termed the Industrial Internet.
This also means, at the other end of the scale, that we can create tiny, yet still powerful computers. So today, consumers can afford sports watches that give immediate feedback on heart rate and walking pace. Even five years ago, this would not have been practical. While enterprise business may be operating on a different scale, the trade-offs are the same.
Power and end-to-end network bandwidth have not followed such a steep curve, however. When such resources are lacking, processing and storage tend to be used in support. So for example, when network bandwidth is an issue (as it so often is, still), ‘cache’ storage or local processing can be added to the architecture.
In Internet of Things scenarios, sensors (attached to ’things’) are used to generate information, sometimes in considerable volumes, which can then be processed and acted upon. A ‘thing’ could be anything from a package being transported, a motor vehicle, an air conditioning unit or a classic painting.
If all resources were infinite, such data could be transmitted straight to the cloud, or to other ’things’. In reality however, the principle of resource constraints comes into play. In the home environment, this results in having one or more ‘smart hubs’ which can collate, pre-process and distil data coming from the sensors.
As well as a number of startups such as Icontrol and (the Samsung-led) Smartthings, the big players recognise the market opportunity this presents. Writes Alex Davies at Rethink IoT, “Microsoft is… certainly laying the groundwork for all Windows 10 devices, which now includes the Xbox, to act as coordinating hubs within the smart home.”
Smart hubs also have a place in business, collating, storing and forwarding information from sensors. Thinking more broadly however, there are no constraints on what the architecture needs to look like, beyond the need to collate data and get the message through as efficiently as possible – in my GigaOm report I identify the three most likely architectural approaches.
Given the principle of principle of resource constraints, the idea of form factor becomes more a question of identifying the right combination of elements for the job. For example, individual ‘things’ may incorporate some basic processing and solid state storage. Such capabilities can even be incorporated in disposable hubs, such as the SmartTraxx device which can be put in a shipping container to monitor location and temperature.
We may eventually move towards seemingly infinite resources, for example one day, quantum sensors might negate the need to transport information at all. For now however, we need to deal in the finite — which creates more than enough opportunity for both enterprises and consumers alike.