How the Library of Congress is Trying to Archive Twitter

June 7, 2017June 9, 2017 Kristen Twardowski

LOC_Main_Reading_Room_Highsmith — Main Reading Room of the Library of Congress, Library of Congress, Prints and Photographs Division, 2009, via Wikimedia.

Losing the remnants of the past is one of the nightmares of historians. There are so many fragments that simply don’t survive the march of time, and as they disintegrate so too does our ability to understand the past.

The modern era and its technologies raise new questions about how archivists, historians, and other interested parties can preserve historical objects that are created every day. How can archivists archive the internet?

Let me take a brief pause here to talk about historical objects. Keep in mind that this is a gross simplification of the term. In the past, historians focused on the writings of great men, military tactics, and in the case of ancient civilizations, material objects. Over the past 100 years or so, historians have expanded the types of questions they ask of the past and what objects they use to explore it. This means that letters from ordinary people, advertisements, music, film, and clothing have all gained a new importance in the field.

That also means that when people study the 2010’s, they will probably want to study what is on the internet including trends in memes, hashtags, blogs, and, yes, tweets. Finding those pieces of the digital past, however, could be difficult.

Traditional library archives are filled with books, papers, and sometimes a few physical objects. Staff often digitize these items, but if a digital file is corrupted, there is a tangible item for scholars to turn to. The internet lacks that physicality, and as a result, librarians are working to find a way to store the information that exists on it, especially in public forums like Twitter.

Back in April of 2010, the Library of Congress announced that it would work with Twitter to archive public tweets from 2006 onward. The Library declared this initial stage a success in 2013 when it had archived around 170 billion tweets. The Library then turned its attention to figuring out how they could make that archive accessible to the people who wanted it.

And then the project stalled.

The Library of Congress still collects tweets and stores them in a server, but those tweets aren’t accessible or searchable to anyone yet. As The Atlantic reports, no tech engineers are currently assigned to the project and developments are slow. Library employees are understaffed, and Twitter has no incentive to help streamline the data transfer. In fact, Twitter frequently sells public tweeting data to marketing and research firms. And the company is making a tidy sum off of those sales. Making that information freely available would cut into Twitter’s profit.

Still the historian in me holds out hope that this project moves forward. Having a public database of tweets will make the lives of future scholars so much easier. Having worked on digital archives myself, however, I know that creating a database like that isn’t so simple. Librarians and staff have to decide how to tag tweets, how to group them, how to store them, and they have to create a usable interface.

But hopefully they’ll manage to make the Twitter archive a reality someday. We would all benefit from it.

20 thoughts on “How the Library of Congress is Trying to Archive Twitter”

grace kim says:

June 7, 2017 at 10:50 am

Interesting read, especially as we approach this era of technology. Also serves as a reminder to beware of what you post!

peachesofmyheart.wordpress.com

LikeLiked by 1 person

Reply
1. Kristen Twardowski says:
  
  June 11, 2017 at 2:59 pm
  
  They (whoever they are) do say that the internet is forever, and to some extent that may be true!
  
  LikeLiked by 1 person
  
  Reply
FictionFan says:

June 7, 2017 at 11:24 am

The idea of Donnie’s tweets and all the hideous, vile replies from both sides being saved for posterity brings me out in a rash. Whatever will the future think of us? Thank goodness I’ll be dead… 😉

LikeLiked by 1 person

Reply
1. Kristen Twardowski says:
  
  June 11, 2017 at 2:58 pm
  
  I can’t decide what would be worse: looking at the tweets in some distant future or knowing that they were lost to time and that future people assessed our past without them. A conundrum.
  
  LikeLiked by 1 person
  
  Reply
Melanie Noell Bernard says:

June 7, 2017 at 12:44 pm

What would be the use of this in the future? I mean… most tweets are frivolous at best and narcissistic and hateful at worst. Is there some particular reason/material that we would want to preserve from Twitter??

LikeLike

Reply
Annika Perry says:

June 7, 2017 at 3:46 pm

My first thought on seeing the title was: Good luck to them! It seems an impossible task and not really one I would imagine needs archiving…there are surely a lot more worthwhile online resources that can and are already being archived. An interesting post, Kristen!

LikeLiked by 1 person

Reply
Ree Kimberley says:

June 7, 2017 at 4:06 pm

My head spins just thinking about the magnitude of such a job.

LikeLiked by 1 person

Reply
1. Kristen Twardowski says:
  
  June 11, 2017 at 2:57 pm
  
  Mine to. It does sound like Twitter itself has managed to add some type of metadata to the information though if they are selling it to others. It’s too bad that they aren’t more willing to share how they structured that information.
  
  LikeLiked by 1 person
  
  Reply
Jessica Bakkers says:

June 7, 2017 at 6:21 pm

Fascinating! They had 170 billion from 06 to 13… given the increase in popularity of twitter I’d say they’d have to archive 500 billion from 14 to 17! Good luck!

LikeLiked by 1 person

Reply
1. Kristen Twardowski says:
  
  June 11, 2017 at 2:56 pm
  
  The sheer number of tweets is mind-boggling. We’ll see how it works out!
  
  LikeLiked by 1 person
  
  Reply
Lelia T says:

June 8, 2017 at 4:16 am

I can’t imagine how much time and money has been spent on this. The inanities that are now residing on those servers somewhere boggle the mind. Surely a year’s worth would have been enough or maybe two at the most but seven?!

LikeLike

Reply
sjhigbee says:

June 8, 2017 at 5:23 am

A bit like ferreting through the old rubbish heaps that archeologists are so fond of – but this is our mental equivalent… I’d love to be a fly on the wall when they start looking at all the kitten and cake pics.

LikeLiked by 1 person

Reply
1. Kristen Twardowski says:
  
  June 9, 2017 at 7:36 am
  
  “These people in the year 2017, they must have worshiped cats. And it appears 90% of their diet was composed of fancy pastries. Probably because they were saving all of the nutritious food for the kittens. Such strange creatures.”
  
  LikeLiked by 1 person
  
  Reply
  1. sjhigbee says:
    
    June 9, 2017 at 1:39 pm
    
    lol… oh exactly!
    
    LikeLiked by 1 person
B.L. Daniels says:

June 9, 2017 at 9:22 pm

I’m all for preserving history, but the amount of equipment and effort needed to retain such a vast amount of mostly useless information makes my brain hurt. They should at least curate it. That would probably reduce the amount of storage space needed by 97%.

LikeLiked by 1 person

Reply
Pingback: Sunday Post – 11th June 2017 | Brainfluff
Pingback: Have you met Kristen Twardowski? ~ Jemima Pett
Pingback: How the Library of Congress is Trying to Archive Twitter | Campbells World
Patty says:

June 17, 2017 at 10:16 am

Fascinating! Reblogged at campbellsworld.wordpress.com

LikeLiked by 1 person

Reply
1. Kristen Twardowski says:
  
  June 17, 2017 at 12:23 pm
  
  Thanks, Patty!
  
  LikeLike
  
  Reply