The Alexandrine Dilemma

I: Crash Through or Crash

We live in a time of wonders, and, more often than not, remain oblivious to them until they fail catastrophically. On the 19th of October, 1999 we saw such a failure. After years of preparation, on that day the web-accessible version of Encyclopedia Britannica went on-line. The online version of Britannica contained the complete, unexpurgated content of the many-volume print edition, and it was freely available, at no cost to its users.

I was not the only person who dropped by on the 19th to sample Britannica’s wares. Several million others joined me – all at once. The Encyclopedia’s few servers suddenly succumbed to the overload of traffic – the servers crashed, the network connections crashed, everything crashed. When the folks at Britannica conducted a forensic analysis of the failure, they learned something shocking: the site had crashed because, within its first hours, it had attracted nearly fifty million visitors.

The Web had never seen anything like that before. Yes, there were search engines such as Yahoo! and AltaVista (and even Google), but destination websites never attracted that kind of traffic. Britannica, it seemed, had tapped into a long-standing desire for high-quality factual information. As the gold-standard reference work in the English language, Britannica needed no advertising to bring traffic to its web servers – all it need do was open its doors. Suddenly, everyone doing research, or writing a paper, or just plain interested in learning more about something tried to force themselves through Britannica’s too narrow doorway.

Encyclopedia Britannica ordered some more servers, and installed a bigger pipe to the Internet, and within a few weeks was back in business. Immediately Britannica became one of the most-trafficked sites on the Web, as people came through in search of factual certainty. Yet for all of that traffic, Britannica somehow managed to lose money.

The specifics of this elude my understanding. The economics of the Web are very simple: eyeballs equals money. The more eyeballs you have, the more money you earn. That’s as true for Google as for Britannica. Yet, somehow, despite having one of the busiest websites in the world, Britannica lost money. For that reason, just a few month after it freely opened its doors to the public, Britannica hid itself behind a “paywall”, asking seven dollars a month as a fee to access its inner riches. Immediately, traffic to Britannica dropped to perhaps a hundredth of its former numbers. Britannica did not convert many of its visitors to paying customers: there may be a strong desire for factual information, but even so, most people did not consider it worth paying for. Instead, individuals continued to search for a freely available, high quality source of factual information.

Into this vacuum Wikipedia was born. The encyclopedia that anyone can edit has always been freely available, and, because of its use of the Creative Commons license, can be freely copied. Wikipedia was the modern birth of “crowdsourcing”, the idea that vast numbers of anonymous individuals can labor together (at a distance) on a common project. Wikipedia’s openness in every respect – transparent edits, transparent governance, transparent goals – encouraged participation. People were invited to come by and sample the high-quality factual information on offer – and were encouraged to leave their own offerings. The high-quality facts encouraged visitors; some visitors would leave their own contributions, high-quality facts which would encourage more visitors, and so, in a “virtuous cycle”, Wikipedia grew as large as, then far larger than Encyclopedia Britannica.

Today, we don’t even give a thought to Britannica. It may be the gold-standard reference work in the English language, but no one cares. Wikipedia is good enough, accurate enough (although Wikipedia was never intended to be a competitor to Britannica by 2005 Nature was doing comparative testing of article accuracy) and is much more widely available. Britannica has had its market eaten up by Wikipedia, a market it dominated for two hundred years. It wasn’t the server crash that doomed Britannica; when the business minds at Britannica tried to crash through into profitability, that’s when they crashed into the paywall they themselves established. Watch carefully: over the next decade we’ll see the somewhat drawn out death of Britannica as it becomes ever less relevant in a Wikipedia-dominated landscape.

Just a few weeks ago, the European Union launched a new website, Europeana. Europeana is a repository, a collection of cultural heritage of Europe, made freely available to everyone in the world via the Web. From Descartes to Darwin to Debussy, Europeana hopes to become the online cultural showcase of European thought.

The creators of Europeana scoured Europe’s cultural institutions for items to be digitized and placed within its own collection. Many of these institutions resisted their requests – they didn’t see any demand for these items coming from online communities. As it turns out, these institutions couldn’t have been more wrong. Europeana launched on the 20th of November, and, like Britannica before it, almost immediately crashed. The servers overloaded as visitors from throughout the EU came in to look at the collection. Europeana has been taken offline for a few months, as the EU buys more servers and fatter pipes to connect it all to the Internet. Sometime late in 2008 it will relaunch, and, if its brief popularity is any indication, we can expect Europeana to become another important online resource, like Wikipedia.

All three of these examples prove that there is an almost insatiable interest in factual information made available online, whether the dry articles of Wikipedia or the more bouncy cultural artifacts of Europeana. It’s also clear that arbitrarily restricting access to factual information simply directs the flow around the institution restricting access. Britannica could be earning over a hundred million dollars a year from advertising revenue – that’s what it is projected that Wikipedia could earn, just from banner advertisements, if it ever accepted advertising. But Britannica chose to lock itself away from its audience. That is the one unpardonable sin in the network era: under no circumstances do you take yourself off the network. We all have to sink or swim, crash through or crash, in this common sea of openness.

I only hope that the European museums who have donated works to Europeana don’t suddenly grow possessive when the true popularity of their works becomes a proven fact. That will be messy, and will only hurt the institutions. Perhaps they’ll heed the lesson of Britannica; but it seems as though many of our institutions are mired in older ways of thinking, where selfishness and protecting the collection are seen as a cardinal virtues. There’s a new logic operating: the more something is shared, the more valuable it becomes.

II: The Universal Library

Just a few weeks ago, Google took this idea to new heights. In a landmark settlement of a long-running copyright dispute with book publishers in the United States, Google agreed to pay a license fee to those publishers for their copyrights – even for books out of print. In return, the publishers are allowing Google to index, search and display all of the books they hold under copyright. Google already provides the full text of many books which have an expired copyright – their efforts scanning whole libraries at Harvard and Stanford has given Google access to many such texts. Each of these texts is indexed and searchable – just as with the books under copyright, but, in this case, the full text is available through Google’s book reader tool. For works under copyright but out-of-print, Google is now acting as the sales agent, translating document searches into book sales for the publishers, who may now see huge “long tail” revenues generated from their catalogues.

Since Google is available from every computer connected to the Internet (given that it is available on most mobile handsets, it’s available to nearly every one of the four billion mobile subscribers on the planet), this new library – at least seven million volumes – has become available everywhere. The library has become coextensive with the Internet.

This was an early dream both of the pioneers of the personal computing, and, later, of the Web. When CD-ROM was introduced, twenty years ago, it was hailed as the “new papyrus,” capable of storing vast amounts of information in a richly hyperlinked format. As the limits of CD-ROM became apparent, the Web became the repository of the hopes of all the archivists and bibliophiles who dreamed of a new Library of Alexandria, a universal library with every text in every tongue freely available to all.

We have now gotten as close to that ideal as copyright law will allow; everything is becoming available, though perhaps not as freely as a librarian might like. (For libraries, Google has established subscription-based fees for access to books covered by copyright.) Within another few years, every book within arm’s length of Google (and Google has many, many arms) will be scanned, indexed and accessible through books.google.com. This library can be brought to bear everywhere anyone sits down before a networked screen. This library can serve billions, simultaneously, yet never exhaust its supply of texts.

What does this mean for the library as we have known it? Has Google suddenly obsolesced the idea of a library as a building stuffed with books? Is there any point in going into the stacks to find a book, when that same book is equally accessible from your laptop? Obviously, books are a better form factor than our laptops – five hundred years of human interface design have given us a format which is admirably well-adapted to our needs – but in most cases, accessibility trumps ease-of-use. If I can have all of the world’s books online, that easily bests the few I can access within any given library.

In a very real sense, Google is obsolescing the library, or rather, one of the features of the library, the feature we most identify with the library: book storage. Those books are now stored on servers, scattered in multiple, redundant copies throughout the world, and can be called up anywhere, at any time, from any screen. The library has been obsolesced because it has become universal; the stacks have gone virtual, sitting behind every screen. Because the idea of the library has become so successful, so universal, it no longer means anything at all. We are all within the library.

III: The Necessary Army

With the triumph of the universal library, we must now ask: What of the librarians? If librarians were simply the keepers-of-the-books, we would expect them to fade away into an obsolescence similar to the physical libraries. And though this is the popular perception of the librarian, in fact that is perhaps the least interesting of the tasks a librarian performs (although often the most visible).

The central task of the librarian – if I can be so bold as to state something categorically – is to bring order to chaos. The librarian takes a raw pile of information and makes it useful. How that happens differs from situation to situation, but all of it falls under the rubric of library science. At its most visible, the book cataloging systems used in all libraries represents the librarian’s best efforts to keep an overwhelming amount of information well-managed and well-ordered. A good cataloging system makes a library easy to use, whatever its size, however many volumes are available through its stacks.

It’s interesting to note that books.google.com uses Google’s text search-based interface. Based on my own investigations, you can’t type in a Library of Congress catalog number and get a list of books under that subject area. Google seems to have abandoned – or ignored – library science in its own book project. I can’t tell you why this is, I can only tell you that it looks very foolish and naïve. It may be that Google’s army of PhDs do not include many library scientists. Otherwise why would you have made such a beginner’s mistake? It smells of an amateur effort from a firm which is not known for amateurism.

It’s here that we can see the shape of the future, both in the immediate and longer term. People believe that because we’ve done with the library, we’re done with library science. They could not be more wrong. In fact, because the library is universal, library science now needs to be a universal skill set, more broadly taught than at any time previous to this. We have become a data-centric culture, and are presently drowning in data. It’s difficult enough for us to keep our collections of music and movies well organized; how can we propose to deal with collections that are a hundred thousand times larger?

This is not just some idle speculation; we are rapidly becoming a data-generating species. Where just a few years ago we might generate just a small amount of data on a given day or in a given week, these days we generate data almost continuously. Consider: every text message sent, every email received, every snap of a camera or camera phone, every slip of video shared amongst friends. It all adds up, and it all needs to be managed and stored and indexed and retrieved with some degree of ease. Otherwise, in a few years time the recent past will have disappeared into the fog of unsearchability. In order to have a connection to our data selves of the past, we are all going to need to become library scientists.

All of which puts you in a key position for the transformation already underway. You get to be the “life coaches” for our digital lifestyle, because, as these digital artifacts start to weigh us down (like Jacob Marley’s lockboxes), you will provide the guidance that will free us from these weights. Now that we’ve got it, it’s up to you to tell us how we find it. Now that we’ve captured it, it’s up to you to tell us how we index it.

We have already taken some steps along this journey: much of the digital media we create can now be “tagged”, that is, assigned keywords which provide context and semantic value for the media. We each create “clouds” of our own tags which evolve into “folksonomies”, or home-made taxonomies of meaning. Folksonomies and tagging are useful, but we lack the common language needed to make our digital treasures universally useful. If I tag a photograph with my own tags, that means the photograph is more useful to me; but it is not necessarily more broadly useful. Without a common, public taxonomy (a cataloging system), tagging systems will not scale into universality. That universality has value, because it allows us to extend our searches, our view, and our capability.

I could go on and on, but the basic point is this: wherever data is being created, that’s the opportunity for library science in the 21st century. Since data is being created almost absolutely everywhere, the opportunities for library science are similarly broad. It’s up to you to show us how it’s done, lest we drown in our own creations.

Some of this won’t come to pass until you move out of the libraries and into the streets. Library scientists have to prove their worth; most people don’t understand that they’re slowly drowning in a sea of their own information. This means you have to demonstrate other ways of working that are self-evident in their effectiveness. The proof of your value will be obvious. It’s up to you to throw the rest of us a life-preserver; once we’ve caught it, once we’ve caught on, your future will be assured.

The dilemma that confronts us is that for the next several years, people will be questioning the value of libraries; if books are available everywhere, why pay the upkeep on a building? Yet the value of a library is not the books inside, but the expertise in managing data. That can happen inside of a library; it has to happen somewhere. Libraries could well evolve into the resource the public uses to help manage their digital existence. Librarians will become partners in information management, indispensable and highly valued.

In a time of such radical and rapid change, it’s difficult to know exactly where things are headed. We know that books are headed online, and that libraries will follow. But we still don’t know the fate of librarians. I believe that the transition to a digital civilization will founder without a lot of fundamental input from librarians. We are each becoming archivists of our lives, but few of us have training in how to manage an archive. You are the ones who have that knowledge. Consider: the more something is shared, the more valuable it becomes. The more you share your knowledge, the more invaluable you become. That’s the future that waits for you.

Finally, consider the examples of Britannica and Europeana. The demand for those well-curated collections of information far exceeded even the wildest expectations of their creators. Something similar lies in store for you. When you announce yourselves to the broader public as the individuals empowered to help us manage our digital lives, you’ll doubtless find yourselves overwhelmed with individuals who are seeking to benefit from your expertise. What’s more, to deal with the demand, I expect Library Science to become one of the hot subjects of university curricula of the 21st century. We need you, and we need a lot more of you, if we ever hope to make sense of the wonderful wealth of data we’re creating.