The Mahou Blog

January 2, 2007

The geeks who saved Usenet

Filed under: Uncategorized — mahou @ 3:21 pm

http://archive.salon.com/tech/feature/2002/01/07/saving_usenet/index.html

Jan. 7, 2002 | On May 11, 1981, one Mark Horton, then a graduate student at the University of California at Berkeley, using the e-mail address “ucbvax^mark,” posted this message to the Usenet newsgroup Net.general:

Rusty is right (or is that “Rusty is Wright”?)
– we have ALL in our .ngfile so I tend to forget
this. ALL.ALL may or may not work, but
ALL certainly does. Mark
Then, the ancient Internet scribe added this ominous postscript:

I plan to make the change on Tuesday
unless something horrible happens.

“The zoology department may sound like a funny place for pioneering networking work,” says Spencer. “But the computer science department wasn’t very interested in this inferior networking. It was very low-tech by their standards. But it worked and theirs didn’t. Their opinion changed fast when we started providing e-mail.”

That’s how, in the spring of 1981, with a 300 baud modem, the zoology department at the University of Toronto became a central distribution point for Usenet, when the network was just 2 years old.

Traffic was almost unimaginably lighter in those days. Only about 200 people had access to Usenet: “In the first few years, it was at least plausible to come in in the morning and read all the Usenet traffic that had come in, and 15 minutes later be off doing something useful,” remembers Spencer. But even that low level of traffic was too much for the storage requirements of the day. “Pretty soon, it was necessary to think about expiring old stuff,” he says.

It wasn’t a sense of historical importance that initially led Spencer to think about creating an archive. His motivation was much more pragmatic than that: Most of the conversations on Usenet at the time were very technical, and he was reluctant to see the information in them disappear, because it might be useful to the university’s geeks: “A lot of the early traffic was about things like Unix systems bugs, and it seemed unwise to just throw it out.”

So the archiving began with 40 megabytes filling up a new mag tape — each reel one-half inch thick and 10 inches in diameter — every few months. In this era, messages from the outside world came in at the tortoise rate of 300 baud. (“When we got a 1,200 baud auto-dialing modem, that was just wonderful. Twelve-hundred baud was just total luxury,” Spencer recalls.) As Usenet grew, this meant that Spencer and his system administrators had to be selective about which newsgroups they received and archived, keeping technical conversations but throwing away some of the more general discussions that generated a lot of traffic.

“We started dumping stuff that we thought was obviously of no future use, groups that specialized in a lot of talk and no substance, so to speak. For example, fairly early on there was a newsgroup about abortion which specialized in violent arguments.”

That’s why not only the very earliest Usenet posts, before Spencer started archiving in 1981 (Usenet began in 1979) but even some of the posts in the 1980s are still lost. It’s too bad; today, wouldn’t more of us rather see what was being said about abortion in 1984 than sift through the arcana of bug fixes in systems that have probably been long since retired? “It was perfectly reasonable from the viewpoint of stuff that we might want to use again, but a little sad from today’s viewpoint,” Spencer admits.

For 10 years, the nine-track mag tapes piled up, hanging in a huge rack at the zoology department’s computer facility. Finally, in the early ’90s, with the growth of Usenet outpacing the zoology department’s budget for $15-a-pop tapes, the general archiving project ended.

In the spring of 1991, Bruce Jones, then a grad student in the communications department at the University of California at San Diego, flew to Ontario at his own expense. He was writing his Ph.D. dissertation on the history of Usenet and was eager to get his hands on Spencer’s tapes. The 141 tapes, most of which held 120 megabytes of posts, now lived at the University of Western Ontario, thanks to a road trip in the middle of the Canadian winter that David Wiseman, the university’s network administrator, had taken earlier that year to unburden the University of Toronto’s zoology department of them.

Jones would spend the next two weeks rescuing the data off them. Not only was the tape technology rapidly becoming obsolete — just try to find a working tape-reader today — but the tapes themselves do not have anything like a 10-year shelf life.

By now the historical import of the tapes was already apparent. But spending two weeks running tapes through a tape-cleaning machine and dumping them on disks was the prerequisite to even looking at them. “Spencer had written a program for removing data from tapes when the tapes went bad,” Jones explains. “I was just the first person who was willing to invest my time and money — a lot of people wanted to see what was on them.” In two weeks, Jones got through the first 105 tapes.

“Usenet has always been about arguing about itself,” Jones says of the posts that were unearthed. “And the arguments that you see today are the same arguments that go way back into the early ’80s, and I’m sure that those arguments will continue well into the future.”

Case in point: the fact that the older parts of the archive are now available on Google has given Usenet denizens something new to argue about. “I’ve already gotten three letters from people accusing me of trying to make money off these archives,” Jones observes wryly. All the “archive donors” gave the posts to Google for posterity.

Over the next 10 years, Wiseman got through the remaining three dozen or so tapes by wangling the time and energies of “bored graduate students.” But by 1995, constrained by university budgets, the archiving project was running out of disk space.

So, Brewster Kahle, the creator of the Web’s other major archiving project, the Internet Archive Wayback Machine, chipped in, donating a then-humongous nine-gigabyte hard drive to the cause. In the end, they pulled more than 2,056,000 posts off the 141 tapes. “It took us 10 years. I got so busy and everybody else got less interested,” says Wiseman, almost sheepishly. More than 2 million posts: It doesn’t sound like a lot compared to the 700 million total in Google’s archive, but they’re the oldest remnants.

Apparently someone is still interested. Wiseman used FTP to hand off the files to Google. And just after Google announced the availability of the archive, some rogue used FTP to grab the whole archive off the University of Western Ontario’s FTP server — all three gigs of it transferring in one night. “I have no idea what they plan on using it for, since if it’s spam e-mail the addresses are all wrong,” says Wiseman. Now, anyone who wants a full copy will have to ask politely first — it’s no longer on the server.

Google filled in the more recent posts not covered by the old DejaNews archive thanks to Jürgen Christoffel of the German National Research Center for Information Technology, who’d kept his own archives in the ’90s, and Kent Landfield, a network security developer and the maintainer of FAQs.org. Landfield started archiving with entrepreneurial motives. In 1992 and 1993, while at Sterling Software in Omaha, Neb., Landfield had a side project that sold CDs of the Usenet archive. For $349.95 a year, every month you could get a CD burned with the content of Usenet. It was an attempt to cater to the user with a slower modem who still wanted access to every newsgroup.

“I realized that there was definitely a valuable historical aspect to the CDs themselves,” says Landfield. “The reality is, everybody thought that. We’re all just a bunch of packrats. We all knew there was a value to it, and it was a matter of how and when it would be used.”

Thanks to these packrats, Google now estimates that 95 percent of the posts ever made to Usenet are now searchable from the site. But Spencer, for one, can’t help thinking of all that’s still been lost — not just of the other 5 percent of Usenet, but also of the other early history of online communication.

Think of the Arpanet mailing lists that were the precursors to Usenet. Spencer points out that while most of the mailing lists kept archives, a significant number of them have been lost over time. “The first flame war, things like that, most certainly dates before Usenet,” he says. “And I would bet that a lot of that material is gone, because at some point, nobody thought it was worth saving.”

Katharine Mieszkowski

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: