This is a university project made with others collegues. The goal of this work is to visualize the YouTube trending video data before and during the first period of Covid-19 pandemic, in particular from Dicembre 2019 to May 2020.
Project Details
The project is divided in two main parts.
The first part is the downloading of the data. This has been done through a pipeline set for this pourpose. The data are taken by the Youtube API (https://developers.google.com/youtube/v3) 4 times a day, every 6 hours. This has been done aproximally for 5 months. Then the data have been cleaned and stored into a MongoDB database. The choice of this particular database model is done because the data were taken with lot of missing data, therefore a schema-less database model is the best choice. Also the data comes fro 10 different countries so we have applied a sharding structure for better scalability.
The second part involves a complex regular expression to identify if a trending video is about Covid-19 or not. The expression as been applied to the titles and the tags of the video in all different languages we got (for details see the github page). Then the data have been modelled to be visualized on Tableau. Follows the links.
Visualizations
How the categories of the trending videos vary before to during the Covid-19 quarantine?
Is true that the trending videos on YouTube relating to Covid-19 follow the trend of epidemic data?
https://public.tableau.com/app/profile/federico6485/viz/YoutubealtempodelCovid-19/Storia