libraries

How the Library of Congress is Trying to Archive Twitter

LOC_Main_Reading_Room_Highsmith
Main Reading Room of the Library of Congress, Library of Congress, Prints and Photographs Division, 2009, via Wikimedia.

Losing the remnants of the past is one of the nightmares of historians. There are so many fragments that simply don’t survive the march of time, and as they disintegrate so too does our ability to understand the past.

The modern era and its technologies raise new questions about how archivists, historians, and other interested parties can preserve historical objects that are created every day. How can archivists archive the internet?

Let me take a brief pause here to talk about historical objects. Keep in mind that this is a gross simplification of the term. In the past, historians focused on the writings of great men, military tactics, and in the case of ancient civilizations, material objects. Over the past 100 years or so, historians have expanded the types of questions they ask of the past and what objects they use to explore it. This means that letters from ordinary people, advertisements, music, film, and clothing have all gained a new importance in the field.

That also means that when people study the 2010’s, they will probably want to study what is on the internet including trends in memes, hashtags, blogs, and, yes, tweets. Finding those pieces of the digital past, however, could be difficult.

Traditional library archives are filled with books, papers, and sometimes a few physical objects. Staff often digitize these items, but if a digital file is corrupted, there is a tangible item for scholars to turn to. The internet lacks that physicality, and as a result, librarians are working to find a way to store the information that exists on it, especially in public forums like Twitter.

Back in April of 2010, the Library of Congress announced that it would work with Twitter to archive public tweets from 2006 onward. The Library declared this initial stage a success in 2013 when it had archived around 170 billion tweets. The Library then turned its attention to figuring out how they could make that archive accessible to the people who wanted it.

And then the project stalled.

The Library of Congress still collects tweets and stores them in a server, but those tweets aren’t accessible or searchable to anyone yet. As The Atlantic reports, no tech engineers are currently assigned to the project and developments are slow. Library employees are understaffed, and Twitter has no incentive to help streamline the data transfer. In fact, Twitter frequently sells public tweeting data to marketing and research firms. And the company is making a tidy sum off of those sales. Making that information freely available would cut into Twitter’s profit.

Still the historian in me holds out hope that this project moves forward. Having a public database of tweets will make the lives of future scholars so much easier. Having worked on digital archives myself, however, I know that creating a database like that isn’t so simple. Librarians and staff have to decide how to tag tweets, how to group them, how to store them, and they have to create a usable interface.

But hopefully they’ll manage to make the Twitter archive a reality someday. We would all benefit from it.

20 thoughts on “How the Library of Congress is Trying to Archive Twitter

  1. The idea of Donnie’s tweets and all the hideous, vile replies from both sides being saved for posterity brings me out in a rash. Whatever will the future think of us? Thank goodness I’ll be dead… 😉

    Liked by 1 person

  2. What would be the use of this in the future? I mean… most tweets are frivolous at best and narcissistic and hateful at worst. Is there some particular reason/material that we would want to preserve from Twitter??

    Like

  3. My first thought on seeing the title was: Good luck to them! It seems an impossible task and not really one I would imagine needs archiving…there are surely a lot more worthwhile online resources that can and are already being archived. An interesting post, Kristen!

    Liked by 1 person

    1. Mine to. It does sound like Twitter itself has managed to add some type of metadata to the information though if they are selling it to others. It’s too bad that they aren’t more willing to share how they structured that information.

      Liked by 1 person

  4. I can’t imagine how much time and money has been spent on this. The inanities that are now residing on those servers somewhere boggle the mind. Surely a year’s worth would have been enough or maybe two at the most but seven?!

    Like

  5. A bit like ferreting through the old rubbish heaps that archeologists are so fond of – but this is our mental equivalent… I’d love to be a fly on the wall when they start looking at all the kitten and cake pics.

    Liked by 1 person

    1. “These people in the year 2017, they must have worshiped cats. And it appears 90% of their diet was composed of fancy pastries. Probably because they were saving all of the nutritious food for the kittens. Such strange creatures.”

      Liked by 1 person

  6. I’m all for preserving history, but the amount of equipment and effort needed to retain such a vast amount of mostly useless information makes my brain hurt. They should at least curate it. That would probably reduce the amount of storage space needed by 97%.

    Liked by 1 person

Leave a comment