Everyone knows that our digital lives are tracked. When we click ‘I agree’ for a service like Facebook, we expected it to store some of our data. Despite this, the amount of data in your Facebook archive might still shock you.
I do not know why Facebook needs a complete history of my pokes, over 20mb of my messages, every single event I have ever attended, and a year long log of my Facebook logins (including timestamps, IP addresses, and even the browser type used). However, you can do some neat things with this amount of data, so I’ll put my tin foil hat away for now.
I wrote some code to analyze the largest of my message threads, which is a year old group chat with 20 of my high school friends and over 50,000 messages. After counting word frequencies and removing very common words (e.g. a, and, the) I was able to make a giant word cloud. The size of each word is proportional to how frequently it was used.
Also, counting message frequency by name was also a simple task. Here is a leader board of who sends the most messages.
There is much more analysis that can be done. I didn’t even touch the message timestamps, which could help visualize sleep schedules and answer random questions (e.g., How much did message traffic slow down when we all went off to college? How much more frequent was the word ‘prom’ in the months leading up to it?). However, it’s winter holiday and I am refusing to code any longer. That will be a project for another day.