CloudFlare goes down, cites router issue in DDoS attack

CloudFlare’s web security service went down for about an hour starting at 2:47 PDT Sunday morning, taking its customers down with it. The service was back up at 3:49 PDT, according to a post-mortem. CloudFlare attributed the outage to a system-wide failure of its Juniper edge routers that started after the company tried to prevent a DDoS attack on one of its customers.

Affected sites include Wikileaks, 4chan and others according to this Techcrunch report.

One reason CloudFlare opts for Juniper(s jnpr) is the latter’s support for the Flowspec protocol which enables customers to propagate router rules across a large number of routers fast, according to the company post. This comes in handy because CloudFlare is always updating rules to combat ever-changing attacks and to re-route traffic as needed to optimize performance.

This morning CloudFlare detected a DDoS attack on one of its customers and its attack profiler ascertained the offending packets were  between 99,971 and 99,985 bytes.

That attack profile was sent out to Flowspec to stop the spread of attacks. From the post mortem:

“Flowspec accepted the rule and relayed it to our edge network. What should have happened is that no packet should have matched that rule because no packet was actually that large. What happened instead is that the routers encountered the rule and then proceeded to consume all their RAM until they crashed.”

Service was restored after about an hour, although CloudFlare said it continues to examine the issue and has contacted Juniper to see if there is a known bug involved or the problem is unique to CloudFlare’s implementation.

Update: On Monday, Juniper said via email that it is looking into the reported network outage.  “While we have not completed our investigation, we believe this incident was triggered by a product issue that Juniper identified last October, when a patch was also made available. Our customer support team is actively supporting Cloudflare in its efforts to resolve the issue and we are not aware of any other customers experiencing similar issues.”

Cedexis' Radar view of CloudFlare outage.

Cedexis’ Radar view of CloudFlare outage.

Given that the number of DDoS attacks is on the rise, web sites had better gird themselves and hope their security vendors are taking proactive steps to keep ahead of the problem.

This story was updated at 12:25 p.m. PDT with Juniper’s comment.