Saved searches

Use saved searches to filter your results more quickly

Cancel Create saved search Sign up Reseting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Recommender system for the Yelp dataset.

Notifications You must be signed in to change notification settings

pauldoan/yelp-recommendation

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Go to file

Folders and files

Last commit message Last commit date

Latest commit

History

View all files

Repository files navigation

Final Project - Personalization Theory

Authors: Bertrand Thia-Thiong-Fat (bt2513), Jeremy Yao (jy3015), Paul Doan (pqd2001)

Directories and files

  1. Data Preprocessing
  2. Baseline Model
  3. Content-based Model
  4. Deep Learning Model
  5. Conclusion

Finally, please find the datasets used to test our models during this study.

The Objectives

Context

We are placing ourselves in the position of Senior Data Scientists at a company that recommends local businesses. We wish to focus on a particular business objective: predict accurately the latest rating of all active users of the website Yelp. Being able to accurately predict the last rating of a given user allows for a better understanding of their current preferences well. As a result, we can recommend other businesses that the user could potentially be interested in. This explains why we decided to focus on making accurate predictions to understand consumer preferences and drive valuable insights for Yelp's business.

We will study different models and compare them to suggest the best available tool for Yelp. Our work attempts at predicting customer ratings accurately and does not address the cold start problem.

In the end, we will decide if the created algorithm can really be used in a real situation for Yelp or if more studies and more data need to be available in order to provide an effective and reliable recommender system.

Content

To make the data more tractable, we will proceed to reduction of the size of the datasets and strive to obtain unbiased samples. We will reduce the original data to approximately 500k ratings. As a next step, we will develop 3 different models of recommender systems. We will start with a user-based collaborative filtering model, which will be also act as a baseline for comparison with other models. For instance, we will create a collective factorization algorithm and develop a deep learning model.

We will also conclude our analysis by comparing the different models and methods to recommend relevant local businesses.