Verizon explains its string of LTE outages

Verizon’s (s VZ)(s VOD) LTE network has had a hell of a month. After a year of smooth performance, interrupted only by one major glitch in April, the new ultra-fast 4G network has experienced a string of three outages in a single month, shutting down access to smartphone and wireless hotspot customers across the country. In an interview with GigaOM, Verizon Wireless VP of network engineering Mike Haberman tried to shed some light on the LTE network’s recent problems and explain how Verizon was taking the necessary steps to ensure that they don’t happen again.
Haberman said that LTE is still a brand new wireless technology and Verizon was the first global operator to launch it on a large scale. That means Verizon will be the first operator to encounter the bugs and glitches hiding within any new technology. “Being the pioneers, we’re going to experience some growing pains,” Haberman said. “These issues we’ve been experiencing are certainly regrettable but they were unforeseeable.”
All three outages were caused by problems in Verizon’s service delivery core — in telecom-speak called the IP Multimedia Subsystem (IMS) — which replaces the old signaling architectures used in 2G and 3G networks, Haberman said. While IMS has been around for some time, Verizon’s is the first implementation in an LTE network and it has continued to be a problem spot ever since April, when a software bug originating deep within the IMS core led to a complete failure, kicking LTE customers off both Verizon’s 3G and 4G networks nationwide.
Verizon fixed that software bug, but new IMS glitches have reared their heads – none as big as the one that caused April’s outage, but all taken seriously by Verizon nonetheless, Haberman said. The first outage on Dec. 7 was caused by the failure of a back-up communications database. The second, last week, was the result of an IMS element not responding properly, while Wednesday’s outage was caused by two IMS elements not communicating properly, Haberman said.
So while the LTE radio network was working just fine, customers weren’t able to connect to it since the IMS network simply wasn’t able to recognize to them. Verizon was able to force phones to stop trying to access 4G and fall back on its 3G CDMA network after it identified an IMS failure. But before the switch-over took effect some customers were left without 3G, as their phones kept trying to log into the 4G network.
Haberman said once each problem was fixed, it never recurred. Every subsequent outage is a result of a new bug, and it just so happens that December was the month many of these bugs chose to reveal themselves, Haberman said. Veizon’s IMS systems are a complex network of databases, servers, routers, gateways and policy managers supplied by multiple vendors. Alcatel-Lucent (s ALU), Nokia Siemens Networks (s NOK)(S SI), Acme Packet and Tekelec all provide different parts, but Haberman declined to identify which particular elements or which particular vendors were responsible for the problems. In fact, Haberman defended Verizon’s vendors saying that they were experiencing the same LTE growing pains as Verizon.
While Verizon won’t promise that no more outages will occur, Haberman said it has taken measures to ensure that they’re minimized when they do happen in the future. He said he’s begun geographically segmenting the LTE network, so if a software bug does break out it can be isolated to a particular region or market instead of spreading nationwide. Verizon is also upgrading all of its software and cutting down on the signaling clutter running over its IMS grid.
“Our goal is to ensure that our 4G network meets the same high standard that our 3G network does,” Haberman said. “We’re not there yet, but we’ll get there.”
As I’ve said before Verizon needs to be cut a little slack. LTE isn’t some upgrade like HSPA. It’s a fundamental rethinking of every aspect of the wireless network: moving from hardware to software driven base stations, evolving network service delivery systems from old hierarchical voice-centric chains of gateways to new flat IP architectures, and replacing old copper backhaul links with fiber Ethernet to the tower. And as the first to launch LTE, Verizon will be the first operator to encounter its faults. I’m surprised we hadn’t seen a string of outages before December.
But Verizon does have to uphold its claims as having the country’s “most reliable network.” Many customers pay a big premium to use Verizon’s service versus its competitors’ precisely because of its network performance and coverage. Three outages – even if they were intermittent – during the biggest month of the year for phone sales and activations will hardly help that reputation. Verizon must have had hundreds of thousands of activations in the last week due to Christmas gift giving. Many of those customers probably turned on their phones to discover they had no 4G service.