Degrees of Separation between Marvel Superheroes with Breadth-first Search in PySpark RDD

This project utilizes PySpark RDD and the Breadth-first Search (BFS) algorithm to find the shortest path and degrees of separation between two given Marvel superheroes based on based on their appearances together in the same comic books. It offers efficient exploration of superhero connections, empowering users to analyse and discover degrees of separation between their favourite superheroes in the Marvel universe.

Item-based Collaborative Filtering for Movie Recommendations in PySpark DataFrames and RDD

This project utilizes PySpark DataFrames and PySpark RDD to implement item-based collaborative filtering. By calculating cosine similarity scores or identifying movies with the highest number of shared viewers, the system recommends 10 similar movies for a given target movie. The repository contains two independent spark driver code files, catering to datasets of different sizes. The project offers a scalable and personalized movie recommendation system for users to discover movies aligned with their preferences.