Publishing-industry analytics startup Parse.ly found itself facing one of those bittersweet startup moments in 2011, the kind where you’re growing at a faster rate than your current infrastructure can support. CTO Andrew Montalenti had a decision to make: Re-engineer the company’s service or start adding new capacity in colocation facilities. The third option — continuing to scale up its existing Rackspace (S RAX) environment — might have bankrupted the company.
Montalenti chose the physical servers in the colocation facility, a move he said probably saved Parse.ly a couple hundred thousand dollars over the past two years. But now the company has raised more money and has a lot more revenue coming in, and managing those servers doesn’t look like such a good deal as the cost of cloud resources keeps dropping.
“In a year, it’ll probably be be par,” Montalenti said, “and then I’ll be sorry I’m running my own facility.”
But Parse.ly probably won’t be taking its business back to Rackspace. If any cloud provider ends up getting its business, it will be Amazon Web Services(S AMZN), which Montalenti said looks like the best option in terms of cost, convenience and a future-proof platform.
Montalenti’s math: 10x less is much cheaper
Disclaimer: If you’re not into hearing about server counts, cloud pricing and dollars per gigabyte, now would be a good time to tune out.
Parse.ly’s story is actually pretty interesting, if you’re into the economics of running a startup. The company formed in 2009 and first ran on a 1U server Montalenti built while he was still in college.
“I had a friend who just so happened to run a colocation facility,” Montalenti said, “and he just sort of snuck my server in.”
However, the then-fledlging Parse.ly couldn’t ignore the cloud computing revolution going on around it for long. Montalenti weighed the two primary options at the time — AWS and Rackspace — and chose the latter. He liked that it was just servers, essentially, with no other bells and whistles. Oh, and that Rackspace had local storage attached to its instances.
Parse.ly’s service requires reliable disks, but if you wanted stateful storage in AWS at that point, Montalenti explained, you had to use the platform’s Elastic Block Storage service. That brings on a whole new level of cost, architecting specifically for the AWS platform and notoriously uneven reliability.
“When I looked at AWS, my CTO Spidey brain said, ‘Oh, it’s a lot of lock-in,'” Montalenti joked.
By 2011, Parse.ly’s Rackspace environment had grown to about 50 nodes and Montalenti realized that if Parse.ly needed to start scaling ahead of its growth, the bills were going to add up fast. In part, this was because of the company’s need to keep its production platform, an analytics service called Dash, and the traffic data that its customers query in memory. Rackspace’s cloud instances topped out at 30 gigabytes of RAM at the time and memory cost about $40 per gigabyte per month, Montalenti estimates.
Parse.ly couldn’t just rent high-memory instances, but rather had to overpay for the excess CPU and disk space that came with that higher RAM capacity. At its peak, it was running 20 cloud instances dedicated just to that in-memory database.
Having recently closed its $800,000 round of seed funding, Montalenti said Parse.ly was in a position to get those costs under control, but it meant either re-engineering the application or moving it to hosted servers that the company had to manage itself. He calculated it could cost only $4 per gigabyte per month to run five physical servers each packed with about 150 gigabytes of RAM. Still, it was a tough choice because he know cloud providers would soon start offering the types of instances that Parse.ly required.
However, he was in total cost-saving mode, so the 10x reduction in cost per gigabyte per month won the day. “I chose the easy option,” Montalenti admits.
Well, that was a fun experiment
Fast-forward to late 2013, and Dash has been generally available for more than a year and Parse.ly has raised another $5 million in venture capital. Its footprint now consists of those co-located production servers (totaling 60 cores and 720 gigabytes of RAM), about 60 basic nodes in Rackspace (primarily for backend data processing) and Amazon’s S3 storage service for backup. The company also uses Amazon CloudFront as a CDN and runs some batch Hadoop jobs on Amazon Elastic MapReduce.
But that will probably change pretty drastically in the not-too-distant future. Montalenti is intrigued by how much memory AWS now offers per instance, and how little it costs. AWS’s high-memory cluster compute instances, for example, contain 244 gigabytes of RAM and cost just over $10 per gigabyte per month with on-demand pricing. That number would drop even more with the platform’s reserved instance option. They include local storage, too.
“My general inclination here is I always want to bet on the market,” Montalenti said. “… Now that the market offers [what we need], it’s very likely I’ll come back.”
And he’s tempted to just move everything over to AWS. Rackspace hasn’t improved its high-memory options or its prices too much, while AWS just keeps getting less expensive and keeps rolling out new instance types and services. When you’re standing back and watching those prices fall, Montalenti said, all of a sudden the effort of maintaining your own servers “starts to become something more like an albatross than something that’s helpful.”
This despite Montalenti’s assertion that Rackspace is actually the simpler cloud service to use and that AWS still presents more of a lock-in risk. And that if it went with any of a handful of up-and-coming cloud providers, such as CloudSigma or ProfitBricks, Parse.ly could actually pay for just the amount of RAM, CPU and disk it needs without getting too much of either. But when you balance everything out — cost, simplicity, scale, ecosystem, overall platform — Montalenti thinks AWS is is the best long-term bet.
“Don’t evaluate what you’ve done in the past,” he explained. “Think about the best place to end up.”
As for those smaller, possibly more flexible providers, Parse.ly is now dealing with around 100 enterprise customers who want to make sure their analytics service is built on solid ground. “At this stage in my company’s life,” Montalenti said, “it’s unwise for me to adopt a small provider.”
Pitfalls along the path?
Montalenti is 80 percent sure he’ll move the company to AWS, but acknowledges that a couple things do give him pause. One big concern is the “noisy neighbor” problem that happens when someone else on the same virtual machine is hogging all the resources and bandwidth, making your performance suffer as a result. It’s particularly annoying when you’re trying to handle interactive data queries for customers rather than just trying to serve web requests.
The bigger you get, the higher the probability one of your nodes will suffer from this at any given time. Parse.ly had this problem when it was running production on Rackspace, Montalenti said, but “when we switched to the colo, the problem just went away.”
And although Parse.ly is following in Netflix’s(s nflx) footprint as a growing company bucking the trend by moving back to the cloud from physical infrastructure (once they reach a certain scale, many companies find physical gear is less expensive over the long term and also offers better performance), Montalenti says he’s not too inclined to follow in Netflix’s trailblazing path in the cloud. Yes, he loves the Simian Army and the fact that Netflix is seemingly ready for anything AWS can throw at it in terms of instances dying or whole zones going offline, but he also thinks it’s probably overkill for a company the size of Parse.ly.
Parse.ly doesn’t need to be ephemeral and chaotic and load balance in 100 directions — it needs to to focus on building its product. Netflix, Montalenti noted, had already developed its streaming service to a near finished state and racked up millions of customers when it began optimizing its cloud architecture around that service. Montalenti wants to maximize reliability, but he might not be ready for full chaos mode just yet.
Premature scaling and getting ahead of yourself on the infrastructure side trying to support a product and customer base that might never come can cause serious problems. “That’s how you kill a startup, ” Montalenti said, “in my opinion.”
All images courtesy of Andrew Montalenti; Montalenti image from Flickr.