An Interactive Visual Analytics Platform for High Dimensional Film Data Exploration
Osnat (Ossi) Mokryn (omokryn@fas.harvard.edu)

This capstone project aims to develop an interactive visual analytics platform for exploring large high-dimensional data. The project will utilize a dataset collected from a film review platform in 2020, comprising metadata and reviews of over 17,000 movies, and augmented with an emotional signature for each film. The emotional signature, represented as a vector of eight values, including Joy, Fear, Anger, Trust, Surprise, Disgust, Sadness, and Anticipation, serves as a means of understanding the emotional response a film evokes in its viewers.

The goal of this project is to create a visual interface that enables the intelligent exploration of multifaceted data through visual analytics. The project will have two main pillars of exploration: emotions and text analytics.
The first pillar, emotions, will involve creating a hierarchical, dimension-reduction representation of films according to their emotional signatures, allowing for in-depth exploration within the hierarchy. This aspect of the project will involve unsupervised machine learning, dimension reduction, and visualization algorithms. The second pillar, text analytics, will involve exploring the similarity between films based on their reviews, using methods from information theory and natural language processing.
An interesting view would be to create a similarity network where films are connected by an edge according to a similarity metric that combines aspects from the above two pillars and the film metadata. The network can then be analyzed using network science algorithms. Human-computer interaction (HCI) will also be a key focus of the project, with the main design considerations being ease of use, responsiveness, and a clean design pattern. The system will allow visual interactive exploration of the data using additional metadata fields, such as understanding the variety in emotional signatures of films by specific directors or actors.
Finally, the project will also include considerations for software engineering, including implementing a client-server model with a slim and reactive client and preprocessing and smart storage on the server side to enable fast responses.