App Engine, a powerful and easy-to-use cloud-based development environment, definitely has its issues, among them voice. Google Voice, on the other hand, is a nice solution for managing incoming calls, but a boilerplate one with a fairly rudimentary set of features. Combine the two, however, and you’d have a powerful — and extensible — communication platform that supports both a basic set of features for ordinary users, plus the ability to build custom softswitch applications using App Engine to control them.
What might this look like from an architectural standpoint? There would be two basic components. One would be a highly scalable, distributed softswitch that accepts calls via SIP and IAX2 (more on IAX in a moment). This switch would be fairly dumb in that it would simply answer and route calls, play prompts and capture utterances or keystrokes from callers, then report these events to App Engine via HTTP requests. The other would be a simple library for App Engine that idiot-proofs basic call handling tasks such as answering, call transfer, prompt playback and speech recognition. Since App Engine is a fully featured programming environment in its own right, with support for Python and Java, these applications can be as sophisticated as developers want them to be.
Google doesn’t need to get into the business of selling phone numbers if it doesn’t want to. There are plenty of carriers that offer trunking services in dozens of countries — Voxbone, for example — so users can choose a carrier based on the type of inbound and outbound access required. The calls are then routed to the softswitch which, in turn, queries your web service for directions on what to do at each stage of the call.
This is also an opportunity for Google to promote alternatives to SIP for voice/video over IP, such as IAX (Inter Asterisk eXchange protocol). Built by Digium, the company behind the Asterisk open-source PBX, it’s designed to do one thing and do it very well. As such, it’s more efficient, better at firewall traversal and generally easier to deal with than SIP. IAX supports low bandwidth voice, high fidelity voice, as well as video and media streams, so it’s a versatile protocol, and perfectly adequate for transporting calls. It can also interleave many calls into a single media stream, which both reduces network overhead and makes security easier, since all calls and signaling messages about them come and go via a single port.
The concept of App Engine + Voice is similar to what providers like Voxeo and Twilio offer. Whether Google acquired or built this capability internally is a straightforward build vs. buy decision. If it can find a company whose infrastructure is compatible with the way Google operates, buy is probably the least risky way to go. Building something on top of Freeswitch, another open source softswitch, is another path worth investigating. The value to Google is the ability to create a developer community around communication applications. Google Voice is great, but it’s a pretty limited solution. Skype is awesome, but it’s a closed system (although the just released SkypeKit SDK is a sign that may change). Moreover, there are a number of niche providers that offer cloud-based communication services, but none have Google’s resources or scale.
But most importantly, marrying such services would see Google do what it does best: provide an open and scalable platform, and let partners and developers figure out what to build and who to sell it to.