CODEX

Youtube version of Spotify Year in Review

Published in

CodeX

4 min readMar 7, 2021

I mainly use Youtube to listen to music while working as well as to watch random videos before sleeping 🤔. Comparing Spotify with Youtube is not the goal of this post but one thing that I really like about Spotify is that at the end of each year it shows you which artists or songs you listened to the most. I’ve searched the web to try to find a solution for Youtube but couldn’t find it so, as a software engineer, I decided to code it.

It’s fairly easy! All you have to do is download the historical data as a JSON file and then work with that data to get what you want. And that is basically what we will be doing through this post.

To accomplish this, we need to do 3 steps:

Instead of going through these steps, you can download the project and run it directly on your machine. For more information, go to this part of the article.

Step 1 — Download Youtube historical data

To download the data you can web scrape this page — https://www.youtube.com/feed/history. Or you can simply go to this website http://google.com/takeout and download directly the files.

I prefer the second one because it’s much easier. With the second option, you have to make sure to select only “Youtube and Youtube Music” and also select the JSON option.

Step 2 — Convert file from JSON to CSV and create other columns

The data comes is in JSON format and it has information about each video that you have watched:

I prefer working with dataframes so in this step we will convert the JSON data to CSV format. The following method also calls another method that is not displayed here! The method creates two columns that will have information regarding the duration of the videos. These columns will be used in some plots. If you don’t want to use it you can simply comment on the line. Alternatively, you can check the Github repo (https://github.com/FilipeGood/Youtube-Most-Viewed/blob/main/main.py) to get the function create_duration_col(df)

Next, just save the dataframe.

Step 3 — Create multiple XLSX files and plots

Now that we have the data in our preferred format we can play with it.

First, we will create some excel files by aggregating and grouping our data with different columns. The following function creates excel files that group our data by tittle and channel, views by channel and views by day. With this, you can get a lot of insights about what and how you consume from Youtube.

It’s always better to analyze the data by seeing the data displayed in different plots. You can do all kinds of plots with the data, the limit is your imagination! In this article, I’m only going to show 3 plots but I have other implementations in Github.

Top 10 most viewed videos

The following code will count all records with the same title (same videos) and will plot the Top 10.

Yeeepppp I really like Mac Miller’s last album.

Views by day of the week

To create this plot, we will use the column that we created in the last step and group the data by the day of the week.

Top 20 videos with most minutes spent
For this plot, we will use the columns that were created by the function create_duration_col(df). If you don’t have these columns, pass the section.
This plot shows us the top 20 videos that consumed most of our time.

The winning video happens to be a video of binaural beats. When I was writing my thesis I enjoyed listening to binaural beats :) Of course, Mac’s Miller performance in Tiny desk is on the podium.

You can create a lot of different plots or you can use the code that I have in the Github repo to create the plots for you.

If you only want to run the project you can download it from here: https://github.com/FilipeGood/Youtube-Most-Viewed/

After downloading the project, you have 3 main commands:

python3 main.py -f convert -d *<file_name.json>* - converts from json to csv (Step 2)python3 main.py -f create - creates the excel files and the plotspython3 main.py -f join -d *<new_file_name.csv>* - join previous historica data with new one. If you have a new historica data file, you can join the old one with the new one

Thanks for reading :)

CodeX

CODEX

Youtube version of Spotify Year in Review

Step 1 — Download Youtube historical data

Step 2 — Convert file from JSON to CSV and create other columns

Step 3 — Create multiple XLSX files and plots

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in CodeX

Written by Filipe Good

No responses yet