The Alexandrine Dilemma

I: Crash Through or Crash

We live in a time of wonders, and, more often than not, remain oblivious to them until they fail catastrophically. On the 19th of October, 1999 we saw such a failure. After years of preparation, on that day the web-accessible version of Encyclopedia Britannica went on-line. The online version of Britannica contained the complete, unexpurgated content of the many-volume print edition, and it was freely available, at no cost to its users.

I was not the only person who dropped by on the 19th to sample Britannica’s wares. Several million others joined me – all at once. The Encyclopedia’s few servers suddenly succumbed to the overload of traffic – the servers crashed, the network connections crashed, everything crashed. When the folks at Britannica conducted a forensic analysis of the failure, they learned something shocking: the site had crashed because, within its first hours, it had attracted nearly fifty million visitors.

The Web had never seen anything like that before. Yes, there were search engines such as Yahoo! and AltaVista (and even Google), but destination websites never attracted that kind of traffic. Britannica, it seemed, had tapped into a long-standing desire for high-quality factual information. As the gold-standard reference work in the English language, Britannica needed no advertising to bring traffic to its web servers – all it need do was open its doors. Suddenly, everyone doing research, or writing a paper, or just plain interested in learning more about something tried to force themselves through Britannica’s too narrow doorway.

Encyclopedia Britannica ordered some more servers, and installed a bigger pipe to the Internet, and within a few weeks was back in business. Immediately Britannica became one of the most-trafficked sites on the Web, as people came through in search of factual certainty. Yet for all of that traffic, Britannica somehow managed to lose money.

The specifics of this elude my understanding. The economics of the Web are very simple: eyeballs equals money. The more eyeballs you have, the more money you earn. That’s as true for Google as for Britannica. Yet, somehow, despite having one of the busiest websites in the world, Britannica lost money. For that reason, just a few month after it freely opened its doors to the public, Britannica hid itself behind a “paywall”, asking seven dollars a month as a fee to access its inner riches. Immediately, traffic to Britannica dropped to perhaps a hundredth of its former numbers. Britannica did not convert many of its visitors to paying customers: there may be a strong desire for factual information, but even so, most people did not consider it worth paying for. Instead, individuals continued to search for a freely available, high quality source of factual information.

Into this vacuum Wikipedia was born. The encyclopedia that anyone can edit has always been freely available, and, because of its use of the Creative Commons license, can be freely copied. Wikipedia was the modern birth of “crowdsourcing”, the idea that vast numbers of anonymous individuals can labor together (at a distance) on a common project. Wikipedia’s openness in every respect – transparent edits, transparent governance, transparent goals – encouraged participation. People were invited to come by and sample the high-quality factual information on offer – and were encouraged to leave their own offerings. The high-quality facts encouraged visitors; some visitors would leave their own contributions, high-quality facts which would encourage more visitors, and so, in a “virtuous cycle”, Wikipedia grew as large as, then far larger than Encyclopedia Britannica.

Today, we don’t even give a thought to Britannica. It may be the gold-standard reference work in the English language, but no one cares. Wikipedia is good enough, accurate enough (although Wikipedia was never intended to be a competitor to Britannica by 2005 Nature was doing comparative testing of article accuracy) and is much more widely available. Britannica has had its market eaten up by Wikipedia, a market it dominated for two hundred years. It wasn’t the server crash that doomed Britannica; when the business minds at Britannica tried to crash through into profitability, that’s when they crashed into the paywall they themselves established. Watch carefully: over the next decade we’ll see the somewhat drawn out death of Britannica as it becomes ever less relevant in a Wikipedia-dominated landscape.

Just a few weeks ago, the European Union launched a new website, Europeana. Europeana is a repository, a collection of cultural heritage of Europe, made freely available to everyone in the world via the Web. From Descartes to Darwin to Debussy, Europeana hopes to become the online cultural showcase of European thought.

The creators of Europeana scoured Europe’s cultural institutions for items to be digitized and placed within its own collection. Many of these institutions resisted their requests – they didn’t see any demand for these items coming from online communities. As it turns out, these institutions couldn’t have been more wrong. Europeana launched on the 20th of November, and, like Britannica before it, almost immediately crashed. The servers overloaded as visitors from throughout the EU came in to look at the collection. Europeana has been taken offline for a few months, as the EU buys more servers and fatter pipes to connect it all to the Internet. Sometime late in 2008 it will relaunch, and, if its brief popularity is any indication, we can expect Europeana to become another important online resource, like Wikipedia.

All three of these examples prove that there is an almost insatiable interest in factual information made available online, whether the dry articles of Wikipedia or the more bouncy cultural artifacts of Europeana. It’s also clear that arbitrarily restricting access to factual information simply directs the flow around the institution restricting access. Britannica could be earning over a hundred million dollars a year from advertising revenue – that’s what it is projected that Wikipedia could earn, just from banner advertisements, if it ever accepted advertising. But Britannica chose to lock itself away from its audience. That is the one unpardonable sin in the network era: under no circumstances do you take yourself off the network. We all have to sink or swim, crash through or crash, in this common sea of openness.

I only hope that the European museums who have donated works to Europeana don’t suddenly grow possessive when the true popularity of their works becomes a proven fact. That will be messy, and will only hurt the institutions. Perhaps they’ll heed the lesson of Britannica; but it seems as though many of our institutions are mired in older ways of thinking, where selfishness and protecting the collection are seen as a cardinal virtues. There’s a new logic operating: the more something is shared, the more valuable it becomes.

II: The Universal Library

Just a few weeks ago, Google took this idea to new heights. In a landmark settlement of a long-running copyright dispute with book publishers in the United States, Google agreed to pay a license fee to those publishers for their copyrights – even for books out of print. In return, the publishers are allowing Google to index, search and display all of the books they hold under copyright. Google already provides the full text of many books which have an expired copyright – their efforts scanning whole libraries at Harvard and Stanford has given Google access to many such texts. Each of these texts is indexed and searchable – just as with the books under copyright, but, in this case, the full text is available through Google’s book reader tool. For works under copyright but out-of-print, Google is now acting as the sales agent, translating document searches into book sales for the publishers, who may now see huge “long tail” revenues generated from their catalogues.

Since Google is available from every computer connected to the Internet (given that it is available on most mobile handsets, it’s available to nearly every one of the four billion mobile subscribers on the planet), this new library – at least seven million volumes – has become available everywhere. The library has become coextensive with the Internet.

This was an early dream both of the pioneers of the personal computing, and, later, of the Web. When CD-ROM was introduced, twenty years ago, it was hailed as the “new papyrus,” capable of storing vast amounts of information in a richly hyperlinked format. As the limits of CD-ROM became apparent, the Web became the repository of the hopes of all the archivists and bibliophiles who dreamed of a new Library of Alexandria, a universal library with every text in every tongue freely available to all.

We have now gotten as close to that ideal as copyright law will allow; everything is becoming available, though perhaps not as freely as a librarian might like. (For libraries, Google has established subscription-based fees for access to books covered by copyright.) Within another few years, every book within arm’s length of Google (and Google has many, many arms) will be scanned, indexed and accessible through books.google.com. This library can be brought to bear everywhere anyone sits down before a networked screen. This library can serve billions, simultaneously, yet never exhaust its supply of texts.

What does this mean for the library as we have known it? Has Google suddenly obsolesced the idea of a library as a building stuffed with books? Is there any point in going into the stacks to find a book, when that same book is equally accessible from your laptop? Obviously, books are a better form factor than our laptops – five hundred years of human interface design have given us a format which is admirably well-adapted to our needs – but in most cases, accessibility trumps ease-of-use. If I can have all of the world’s books online, that easily bests the few I can access within any given library.

In a very real sense, Google is obsolescing the library, or rather, one of the features of the library, the feature we most identify with the library: book storage. Those books are now stored on servers, scattered in multiple, redundant copies throughout the world, and can be called up anywhere, at any time, from any screen. The library has been obsolesced because it has become universal; the stacks have gone virtual, sitting behind every screen. Because the idea of the library has become so successful, so universal, it no longer means anything at all. We are all within the library.

III: The Necessary Army

With the triumph of the universal library, we must now ask: What of the librarians? If librarians were simply the keepers-of-the-books, we would expect them to fade away into an obsolescence similar to the physical libraries. And though this is the popular perception of the librarian, in fact that is perhaps the least interesting of the tasks a librarian performs (although often the most visible).

The central task of the librarian – if I can be so bold as to state something categorically – is to bring order to chaos. The librarian takes a raw pile of information and makes it useful. How that happens differs from situation to situation, but all of it falls under the rubric of library science. At its most visible, the book cataloging systems used in all libraries represents the librarian’s best efforts to keep an overwhelming amount of information well-managed and well-ordered. A good cataloging system makes a library easy to use, whatever its size, however many volumes are available through its stacks.

It’s interesting to note that books.google.com uses Google’s text search-based interface. Based on my own investigations, you can’t type in a Library of Congress catalog number and get a list of books under that subject area. Google seems to have abandoned – or ignored – library science in its own book project. I can’t tell you why this is, I can only tell you that it looks very foolish and naïve. It may be that Google’s army of PhDs do not include many library scientists. Otherwise why would you have made such a beginner’s mistake? It smells of an amateur effort from a firm which is not known for amateurism.

It’s here that we can see the shape of the future, both in the immediate and longer term. People believe that because we’ve done with the library, we’re done with library science. They could not be more wrong. In fact, because the library is universal, library science now needs to be a universal skill set, more broadly taught than at any time previous to this. We have become a data-centric culture, and are presently drowning in data. It’s difficult enough for us to keep our collections of music and movies well organized; how can we propose to deal with collections that are a hundred thousand times larger?

This is not just some idle speculation; we are rapidly becoming a data-generating species. Where just a few years ago we might generate just a small amount of data on a given day or in a given week, these days we generate data almost continuously. Consider: every text message sent, every email received, every snap of a camera or camera phone, every slip of video shared amongst friends. It all adds up, and it all needs to be managed and stored and indexed and retrieved with some degree of ease. Otherwise, in a few years time the recent past will have disappeared into the fog of unsearchability. In order to have a connection to our data selves of the past, we are all going to need to become library scientists.

All of which puts you in a key position for the transformation already underway. You get to be the “life coaches” for our digital lifestyle, because, as these digital artifacts start to weigh us down (like Jacob Marley’s lockboxes), you will provide the guidance that will free us from these weights. Now that we’ve got it, it’s up to you to tell us how we find it. Now that we’ve captured it, it’s up to you to tell us how we index it.

We have already taken some steps along this journey: much of the digital media we create can now be “tagged”, that is, assigned keywords which provide context and semantic value for the media. We each create “clouds” of our own tags which evolve into “folksonomies”, or home-made taxonomies of meaning. Folksonomies and tagging are useful, but we lack the common language needed to make our digital treasures universally useful. If I tag a photograph with my own tags, that means the photograph is more useful to me; but it is not necessarily more broadly useful. Without a common, public taxonomy (a cataloging system), tagging systems will not scale into universality. That universality has value, because it allows us to extend our searches, our view, and our capability.

I could go on and on, but the basic point is this: wherever data is being created, that’s the opportunity for library science in the 21st century. Since data is being created almost absolutely everywhere, the opportunities for library science are similarly broad. It’s up to you to show us how it’s done, lest we drown in our own creations.

Some of this won’t come to pass until you move out of the libraries and into the streets. Library scientists have to prove their worth; most people don’t understand that they’re slowly drowning in a sea of their own information. This means you have to demonstrate other ways of working that are self-evident in their effectiveness. The proof of your value will be obvious. It’s up to you to throw the rest of us a life-preserver; once we’ve caught it, once we’ve caught on, your future will be assured.

The dilemma that confronts us is that for the next several years, people will be questioning the value of libraries; if books are available everywhere, why pay the upkeep on a building? Yet the value of a library is not the books inside, but the expertise in managing data. That can happen inside of a library; it has to happen somewhere. Libraries could well evolve into the resource the public uses to help manage their digital existence. Librarians will become partners in information management, indispensable and highly valued.

In a time of such radical and rapid change, it’s difficult to know exactly where things are headed. We know that books are headed online, and that libraries will follow. But we still don’t know the fate of librarians. I believe that the transition to a digital civilization will founder without a lot of fundamental input from librarians. We are each becoming archivists of our lives, but few of us have training in how to manage an archive. You are the ones who have that knowledge. Consider: the more something is shared, the more valuable it becomes. The more you share your knowledge, the more invaluable you become. That’s the future that waits for you.

Finally, consider the examples of Britannica and Europeana. The demand for those well-curated collections of information far exceeded even the wildest expectations of their creators. Something similar lies in store for you. When you announce yourselves to the broader public as the individuals empowered to help us manage our digital lives, you’ll doubtless find yourselves overwhelmed with individuals who are seeking to benefit from your expertise. What’s more, to deal with the demand, I expect Library Science to become one of the hot subjects of university curricula of the 21st century. We need you, and we need a lot more of you, if we ever hope to make sense of the wonderful wealth of data we’re creating.

21 thoughts on “The Alexandrine Dilemma

  1. Hi Mark,
    it was an inspiring talk you presented at nsl4…I was to self-conscious to ask about Project Gutenberg…obviously not seriously competitive with books.google.com
    Google inc’s vision statement is to “do no evil” but Brin & Page will one day retire if they haven’t already, can they/we entrust those that come after them…to “do no evil”? Librarians…have no hidden agenda (generally) and don’t rely on advertising…(corporate collaboration will probably be the future of public library economic survival), so hopefully there will be a future for the profession of librarians.
    thanks again
    Bron

  2. Gutenberg is a good point, and I do believe they have something like 100,000 texts available for download. Which is really most excellent, particularly when considering that these are serious, important texts. It’s not quite the same thing as seven million texts, but it’s still fantastic.

  3. Pingback: The Alexandrine Dilemma - Mark Pesce’s message for Librarians. « Lucacept - intercepting the Web

  4. Hi Mark. Nice one.

    At the moment, probably the Open Library is more interesting than Project Gutenberg *in theory* as a challenge to Google Books. The demand hasn’t matched the promise of its very interesting structure however.

    I think there were probably library scientists at Google who made exactly the right call when they decided not to add any kind of Call or Shelf number (or even set of subject headings) to Google Books.

    These numbers are historical artifacts. They were subjectively assigned in a time when there was only a single physical access point. You couldn’t assign more than one shelf number as it had to match a spot on the shelf. ( My real-life example of the absurdity of this is that in the early 1990s, the State Library of Western Australia shelved “The Cook’s Garden 1″ with the cookery books, and “The Cook’s Garden 2″ with the gardening books).

    Assigning call numbers is not monkey work and takes a lot of human time to do well – cataloguers need to understand taxonomy in theory, know how to apply heirarchical terms/numbers accurately and initimately know the details of the classification system they are using.

    If items in a system are not classified to the same degree then some are more retrievable than others – based merely on an arbitrary reason like “Jane catalogues faster than Joe, so he has a huge backlog on his desk” . Without absolute consistency, then users will mistakenly believe that everything about subject XYZ can be found at call number ABC.

    A good classification schema made sense in a time of fewer items being created daily in far fewer formats with no other access points available. It was a powerful and exceedingly useful tool when the ratio of available librarian hours to knowledge needing clasifying was much more even. Now it is an inadequate tool, a relic of a time when we believed that there was a viewpoint of an “ordinary, reasonable man” who would of course think exactly the same as i) the librarians who created the schema and ii) the librarian who *applied* the schema … an approach that produced convoluted Library of Congress subject headings like “Leather life style (Sexuality)” or “Lesbian vampires in motion pictures”. (Yeah, I always look for movies under “motion pictures” …)

    Google made a commercial decision about what it scanned. It would have cost much, much more to spend the person hours sorting out what was important to scan than it did to just take huge armloads of books and scan the lot indiscriminately. (By the way, Google will only be licensing full text access to Google Books to libraries in the United States. At this stage there is no plan to make this worldwide.)

    The same with subject / call number classification. The cost of employing people to work out 1) which Google Books are worth classifying or 2) spend people hours to try to consistently classify the works, does not make sense given the limited benefit for the cost. Importing existing LC numbers via a datamatch from another system is not a workable option – without all items having an LC number the system is weakened to the point of uselessness.

    I’d rather library scientists focus their skills on other areas. Things like:
    * fully understanding metadata standards and schemas; * how to open up datasets so they can be shared, re-mixed and re-combined; * working on creating interoperable systems for information created with taxpayer/ratepayer funds; * suggesting to our communities useful online tools to access and retrieve information; * nutting out a solution to standardising personal digital identity; * archiving for our specific communities their information objects that were born digital, * integrating print and digital works so that retrieval of the best information is formal agnostic, and * understanding and interpreting to our communities issues like Open Source, Open Access and Creative Commons.

    I just hope we can change quickly enough to be able to meet these challenges – and find enough champions (like you?) so that organisations realise that we have changed and what we will be able to offer … Personally, I think my colleagues and I will be doing everything in my list of wishes above, but doubt that we’ll be still called “librarians”.

  5. Pingback: John Connell » Blog Archive » The Alexandrine Dream - and Dilemma

  6. Mark – Glad you’ve noted the role for librarians.

    “It’s up to you to throw the rest of us a life-preserver; once we’ve caught it, once we’ve caught on, your future will be assured.”

    A few months ago, I pitched a course on personal information management to the continuing ed division of an Australian Uni – and got nothing but blank looks back…

  7. Hi Mark,

    Great stuff. Vintage Pesce… and as delectable as ever. Thank you. :-)

    I’m not sure though that librarians will necessarily be best placed to throw out life-preserving advice and services to prevetn people from slowly dorwning in the(ir) own info-oceans. Librarians help people to find the info they are looking for. They can help speed up searches. They might grow to become counselors as to how to keep one’s own info-palace navigable.

    But there is more to the process of people going through slow info-drowning than meets the eye. And it seems to me that librarians are ill-equipped to help people with drowning-prevention for anything that goes beyond keeping personal info-palaces navigable.

    People do not just slowly drown because of their lack of skills in keeping on top of their mushrooming info. That’s a significant part of it, for sure. But what makes them drown even more, what paralyzes them even more than this are the mushrooming amount of “open loops” which they accumulate in their mental RAM.

    I refe3r explicitly to David Allen’s incisive thinking and speaking about these open loops as the linchpin of his impressive “Getting Things Done” methodology. (See http://en.wikipedia.org/wiki/Getting_Things_Done)

    My point is: people are more paralyzed and are slowly drowning more from not really knowing how to efficiently and effectively “Get Things Done”… than from lacking the info-management skills to keep their personal info-palaces navigable.

    Librarians may rise to help them with the latter… and that will be most useful and welcomed. But librarians are not up to offering people the self-management skills whihc people arguably need even more, especially in these times.

    Note that both problems compound each other… exacerbating and accelerating this slow drowning process which you so aptly coined in verbal imagery.

    Note also that David Allen for one is fully cognizant of the existence and exponentially worsening character of this personal info-management problem. Accordingly, he offers imnasho and in my experience most practical and usable advice and techniques for dealing with such “drinking from info-hoses” challenges.

    Allen’s techniques and rules of thumb are not perfect, but they do work… for now. We’ll see if they hold up when the infoglut problem fully kicks in.

    I somehow don’t see librarians taking on roles like the role David Allen carved out for himself… proferring tailored info-management services to individuals. That would seem to be a huge departure from their traditions and their professional comfort zones.

    I dearly hope I am wrong and that librarians somehow do metamorphose to such “info-doctor” and “self-management-counselor” roles… because such would be most welcome and much needed.

    Failing that, it would already be great if librarians could point a lot of seeking and questing people to Allen’s “Getting Things Done” book and recommend practice of its precepts. It will bring them critically needed very pragmatic skills in personal info-management too. I did, I am a considerably less stressed info-naut and life-naut for it.

    Thanks again.

    ~ Philippe Van Nedervelde ~

  8. Philippe – I think the role of “info doctor” or “self-management counselor” may be an important one. Librarians have traditionally been trained to deal with collective needs (the whole community for public librarians, universities for academic librarians and wherever they are for public librarians) rather than those of specific individuals. Whether the profession chooses to deal with this – or whether coaches themselves absorb parts of information management to deal with their clients’ needs, remains to be seen.

    Dave “How to save the world” Pollard has written a great deal on this and is also a fan of Allen’s work.

  9. Pingback: Terrell Russell: This Old Network : The Alexandrine Dilemma

  10. Pingback: Will OpenSource Concepts Define Education in 21st Century? — Open Education

  11. Pingback: The World A.T. Ways » In which best practices lead the way

  12. A couple of corrections while I process the full meaning behind your great essay. Wikipedia’s content license is GNU Free Documentation License (the full explanation of the Wikipedia Copyrights is available. The GFDL is not the same as the Creative Commons licenses. The amended GFDL will allow Wikipedia to adopt the CC-BY-SA license (according to the GFDL FAQ. Wikimedia has a very complicated chart on what licenses can be used when reusing Wikimedia content.

    Google is not scanning the whole Harvard library; it is only scanning the public domain works at the library. After reviewing the terms of the settlement agreement with the book authors and book publishers, Harvard reaffirmed its decision to limit its participation to public domain books only. I think it also important to note that the presentation of in-copyright books as you described should be in the future tense; the settlement agreement has not been finalized and money from subscribers/purchases of books under the Google book reader interface is not yet being collected.

  13. Pingback: For the heart and soul of librarianship — human description versus fulltext analytics | Disruptive Library Technology Jester

  14. Pingback: Will OpenSource Concepts Define Education in 21st Century? « eLearning for India

  15. Wikipedia, with a 97% share of the online encyclopedia market, has forced Microsoft to shut down Encarta. How long will it be before Wikipedia claims the prize scalp of Encyclopaedia Britannica?

    Encyclopaedia Britannica did not think that an open source product like Wikipedia would significantly challenge the credibility of its brand. They were dead wrong and Encyclopaedia Britannica’s staff seriously misread the global market. They are now very concerned about the widespread use of a free Wikipedia vs their paid subscription model. From a corporate and financial perspective, Encyclopaedia Britannica is in significant trouble.

    It will be interesting to see if Encyclopaedia Britannica survives, but recent indications do not look good. It is the combination of a) the success of Wikipedia and b) improved search engines that has put financial pressure on Encyclopedia Britannica over recent years. Many libraries, schools & individuals are questioning the need to pay for sets of expensive books, or to subscribe to Encyclopaedia Britannica Online, when the content is free on the internet, and much more comprehensive.

  16. Hi Mark,

    Just spotted this on MetaFilter and it brought to mind your presentation, and the point you made re: Google missing the point of how useful subject headings and other human-assigned metdata can be in the information retrieval process:

    http://www.metafilter.com/84673/Do-I-contradict-myself-Very-well-then-I-contradict-myself-I-am-large-I-contain-multitudes

    Thanks again for making me feel like I haven’t joined an obsolescent profession!

  17. Pingback: Google Books: “…a mishmash wrapped in a muddle wrapped in a mess…” : John Connell: The Blog

  18. Pingback: Cataloging Futures

  19. Pingback: Necessary Reading - academhack - Thoughts on Technology and Higher Education

  20. Pingback: Another Blog Title

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>