Monday, August 20, 2007

Skype's Wipe-Out

Just because a surfer wipes out every now and then, you don't jump to the inevitable conclusion that he's a bad surfer. And if a relatively new technology suffers a massive failure that puts it out of action for a few days, that isn't necessarily a reason to give up on it, condemn it, or conclude that it will never work. All the same, the recent collapse of the peer-to-peer function of what one source calls the world's most popular Internet telephone service has some lessons about reliability, the Internet, and using things for what they were designed for in the first place.

First of all, what is Internet phone service? The form provided by Skype works like this. With some inexpensive hardware such as a headphone and microphone, you can log on to Skype and call any of the millions of its other subscribers without incurring a per-use or per-minute fee. My understanding is there is a flat monthly fee, but that's it. Your phone call is routed directly over the Internet, completely independently of landline telephone wires or cellphone networks. So as long as the party you wish to call is on Skype too, you can say good-bye to concerns about talking too long on long distance calls, using up your cellphone minutes, and all those other worries.

Well, the other day (Thursday, August 16, to be exact), all Skype users woke up to a rude surprise—Skype was down worldwide. Despite initial concerns that it might have been a malware attack, the latest news is that a software glitch caused it. From the description posted on Skype's official website by staffer Villu Arak, Skype inadvertently caused the problem itself. Apparently, they sent out a routine software update to every user's computer. This update told the computers to restart. Well, all those computers restarting all over the world woke up and started trying to log on to Skype again. This massive pile of logon requests should have been handled by Skype's system, but due to a software defect, it wasn't. The end result was that the whole thing came unraveled and took a couple of days to put back together.

I don't know whether anyone uses Skype as their main form of telecommunications. Probably there are a few people in special situations in remote areas, but only a few. If there were, they were high and dry without a phone for the time that Skype was down. Probably most users take advantage of it as one of several communications options, an inexpensive alternative, possibly within a company where a central authority can enforce the use of Skype rather than conventional telecomm systems that cost more. But the convenience and low cost come at a price.

Technologies are not just hardware, or hardware and software, but a combination of that physical stuff and ideas, aspirations, and habits in the minds of billions of users. As new technologies come into being, to be successful they have to fit into the existing complex of human activity and the material environment, while changing both. In the process, existing technologies are often adapted for uses that their original designers never thought of.

Internet phone service is a case in point. If you were going to set up a worldwide computer network from scratch and design it mainly to provide telephone service, it would look like nothing that exists today except in a few laboratories. Why is that?

The closest thing to it is what is operated by the old-line telephone companies—the Bell System babies, or teenagers, or however you want to describe them. Their fiber-optic based networks are full of compromises because they've had to keep handling their huge amounts of traffic ever since the dawn of the telephone age. This requirement to use existing hardware rather than throwing everything away, starting from scratch, and going broke in the process has left them with a material burden that is matched by the regulatory burden which prevents them from doing a lot of things that they'd like to do. Because of the burdens of history, neither their physical environment nor their legal environment is what they'd like if they were starting over from the beginning.

The Internet was built basically from scratch over the last two or three decades, so in principle it comes closer to the ideal. But it wasn't designed for rapid, reliable, two-way audio signal transmission. You can force internet protocols to deliver up something that resembles an old-fashioned analog phone conversation, but it's difficult, it wastes bandwidth, and you're basically making the system to do something it wasn't designed initially to do. Fortunately, with enough bandwidth a lot of hard things become easy, which is why Skype can be as successful as it generally is. Still, Skype has the huge problem that not everybody in the world is on it. On the other hand, everybody with a telephone of some kind can in principle dial anyone else with a phone, and that fact makes the conventional international telecomm system that much more valuable. Every person added to that system makes it incrementally more valuable to everyone else already on the system. This is why communications networks tend to be dominated by a few large players, or only one.

And then there's the reliability problem. Since the public telecomm systems have gone heavily software-intensive, they have had their share of software glitches. But decades of conservative engineering practice have taught them to be hyper-cautious about changing anything. I once spoke with a woman who was a software engineer with one of the major "baby Bells" in an office near Chicago. She said that in order to make a small change in one line of code in the master operating software for their network, she had to put in about six months of work testing, checking, getting authorizations, and so on, before she could make the change. Only large, established organizations have the resources to take such pains, but it pays off in reliability.

Maybe Skype will learn from this experience, and spend a little more time testing new software. As it happened, the problem they had was more of an inconvenience than a disaster, except maybe to their bottom line. But as we rely more on Internet-based communications systems for things like medical records and emergency communications, reliability will move up the list of desirable features closer to the top. Let's just hope that the Internet can stand the strain.

Sources: The San Jose Mercury-News carried an article by Sarah Jane Tribble on Skype's outage at http://www.siliconvalley.com/news/ci_6656717. Mr. Arak's comments can be found on the Skype website under the title "What happened on August 16" at heartbeat.skype.com.

No comments:

Post a Comment