NotFlix¶
About¶
This is Notflix, a free movie database and recommendation website.
This website is simply a side project, that aims at displaying a fixed dataset of movies and provide recommendations about other movies to watch. I am building it mostly for fun and also to have a nice playground to implement various recommendation algorithms using Machine Learning.
NotFlix is based on data from the following sources:
- OMDB: The Open Movie DataBase
- Grouplens’ MovieLens:
- Datasets behind MovieLens project.
Installation¶
Pre-requisite:
- Install the following software:
- Download the movielens data:
- Download the
ml-1m dataset
by clicking here - Unzip it and place it under
datasets/movielens/ml-1m
- Download the
Once you have all the pre-requisite set up, follow these steps:
Copy the db-credentials.env template and add the credentials you want:
cp -n db-credentials.env.dist db-credentials.env;
Create a virtual environment and install the required packages:
virtualenv venv; source venv/bin/activate; pip install -r requirements.txt;
Build the Docker images:
docker-compose build;
Launch the PostgreSQL database:
docker-compose up -d postgres
Use the following flask-cli commands to insert the data into the DB:
export FLASK_APP="src/web"; export POSTGRES_HOST="127.0.0.1"; flask init-db; flask insert-engines; flask insert-pages; flask download-movies; flask insert-movies; flask train-engines; flask upload-engines;
- Launch the application with
make start
and then visit localhost:5000
.
- Launch the application with
Usage¶
So far, Notflix exposes the following pages:
- A home page, displaying the popular movies, the user browsing history
- and some personalized recommendations.
- A movie page, displaying basic informations about the selected movie
- and recommendations on similar movies to watch.
- A genres page, that lets you browse movies by genres.
- A search page, that lets you search the movies.
The configuration for engines and pages is handled with the display.json file. You can use it to change the engines displayed, their names and order on the page.
Repository organization¶
The repository is organized the following way:
- .circleci: Configuration file for CircleCI
- datasets\ : Folder containing the datasets
- (so far only movielens)
- logs: Logs file are saved here
- models: Machine Learning models are saved here.
- notebooks: The exploratory Jupyter notebooks
- src: Source code
- api: Flask API, responsible of computing the recommendations displayed on the web app.
- data_interface: Code for interacting with the cache or the database.
- recommender: Everything related to computing recommendations.
- tracker: Code for tracking the user events.
- utils: Various utility functions
- web: Code for the Flask web application
- tests: Unit test code
Notes¶
- I am deliberately showing multiple engines on a web page to outline
- the different recommendations results from an algorithm to another.
- I am not removing the movie duplicates from an engine to another
- for the same reason than above.
- The Machine Learning algorithms are not very well trained yet, I spent
- some time working on the application to make it easy to add new engines later.
So far, the movie page looks like this:

notflix¶
config module¶
src package¶
Subpackages¶
src.data_interface package¶
-
class
src.data_interface.cache.
Cache
[source]¶ Bases:
object
-
append
(key, value)[source]¶ Append to a redis list
Parameters: - key (str) – cache key
- value (str) – object to store in cache
-
This module contains wrappers to download various movies datasets. So far we are only using Movielens but we can add more if we want.
Every dataset should have its wrapper class that inherits from Downloader
.
-
class
src.data_interface.model.
BaseTable
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Model
-
created_at
= Column(None, DateTime(), table=None, nullable=False, default=ColumnDefault(<function datetime.utcnow>))¶
-
id
= Column(None, Integer(), table=None, primary_key=True, nullable=False)¶
-
updated_at
= Column(None, DateTime(), table=None, onupdate=ColumnDefault(<function datetime.utcnow>), default=ColumnDefault(<function datetime.utcnow>))¶
-
-
class
src.data_interface.model.
Engine
(**kwargs)[source]¶ Bases:
src.data_interface.model.BaseTable
-
created_at
¶
-
display_name
¶
-
id
¶
-
priority
¶
-
type
¶
-
updated_at
¶
-
-
class
src.data_interface.model.
Genre
(**kwargs)[source]¶ Bases:
src.data_interface.model.BaseTable
-
created_at
¶
-
id
¶
-
name
¶
-
updated_at
¶
-
-
class
src.data_interface.model.
Movie
(**kwargs)[source]¶ Bases:
src.data_interface.model.BaseTable
-
actors
¶
-
awards
¶
-
country
¶
-
created_at
¶
-
description
¶
-
director
¶
-
duration
¶
-
genres
¶
-
id
¶
-
image
¶
-
language
¶
-
name
¶
-
rating
¶
-
updated_at
¶
-
year
¶
-
-
class
src.data_interface.model.
Page
(**kwargs)[source]¶ Bases:
src.data_interface.model.BaseTable
-
created_at
¶
-
engines
¶
-
id
¶
-
name
¶
-
updated_at
¶
-
-
class
src.data_interface.model.
Recommendation
(**kwargs)[source]¶ Bases:
src.data_interface.model.BaseTable
-
created_at
¶
-
engine_name
¶
-
id
¶
-
recommended_item_id
¶
-
score
¶
-
source_item_id
¶
-
source_item_id_kind
¶
-
updated_at
¶
-
src.recommender package¶
This is the engines
package, where we define the
recommendation engines.
This package contains the following modules:
engine
: Module where the base classes are defined. All the created- engines should inherit from one of these base classes. They define the skeleton of the engine, like which methods they must overwrite.
collaborative_filtering
: All collaborative filtering enginescontent_based
: All content based filtering enginesgeneric
: All the generic engines, that are not really collaborative- or content based (e.g display the most popular items, or the items in the user browsing history, etc …)
-
class
src.recommender.engines.collaborative_filtering.
Item2VecOnline
[source]¶ Bases:
src.recommender.engines.engine.OnlineEngine
-
load_model
()[source]¶ Load the ML model from disk and return it
Returns: The ML model to be saved as self.model
-
predict
(context)[source]¶ Predict using the loaded model and the context.
Parameters: context (src.recommender.wrappers.Context) – Context wrapper Returns: list of recommended ids sorted by descending score scores (list(float)): list of scores for each recommended item Return type: ids (list(int))
-
-
class
src.recommender.engines.content_based.
SameGenres
[source]¶ Bases:
src.recommender.engines.engine.QueryBasedEngine
-
compute_query
(context)[source]¶ Abstract method that computes the SQL query using SQLAlchemy
Parameters: context (recommender.wrappers.Context) – context wrapper Returns: query result
-
-
class
src.recommender.engines.engine.
Engine
[source]¶ Bases:
abc.ABC
Abstract class for all engines. You should not directly use this class, instead use the classes that inherit from this class.
-
init_recommendations
(context)[source]¶ Create an empty
src.recommender.wrappers.Recommendations`
object and fill in the engine type, display name and priority based on the informations stored in DB.Parameters: context (src.recommender.wrappers.Context) – Context wrapper, containing useful informations for the engine. Returns: Recommendations object filled with engine type, display name and priority Return type: (src.recommender.wrappers.Recommendations)
-
recommend
(context)[source]¶ Abstract method for all engines for recommending items.
The context wrapper stores all the informations the engine might need to compute the recommendations, like the current item_id, the current user_id, the user browsing history, etc …
Every engine must override this method. They have to call
self.init_recommendations
first to create an emptysrc.recommender.wrappers.Recommendations
object and then enrich it with the recommended items.Parameters: context (src.recommender.wrappers.Context) – the context Returns: the recommendation object Return type: src.recommender.wrappers.Recommendations
-
-
class
src.recommender.engines.engine.
OfflineEngine
[source]¶ Bases:
src.recommender.engines.engine.QueryBasedEngine
These engines are a special kind of QueryBasedEngine because they require a training.
Most of the offline Machine Learning algorithms will inherit from this class.
The recommendations are computed offline with the
train
method, then saved on disk withsave_recommendations_to_csv
and finally uploaded to the DB usingupload
.-
compute_query
(context)[source]¶ Get the recommended items from the DB.
Parameters: context (src.recommender.wrappers.Context) – Context wrapper Returns: list of Recommendation Return type: list
-
save_recommendations_to_csv
(recommendations)[source]¶ Save recommendations to a CSV file.
Parameters: recommendations (list(tuple)) – List of recommendation tuple corresponding to: (movie_id, recommended_movie_id, input_kind, score)
-
-
class
src.recommender.engines.engine.
OnlineEngine
[source]¶ Bases:
src.recommender.engines.engine.Engine
Online Machine Learning Engines that do not get their recommendations from a SQL query but from a loaded model.
The model is trained with the
train
method, and loaded at runtime with theload_model
method.-
load_model
()[source]¶ Load the ML model from disk and return it
Returns: The ML model to be saved as self.model
-
predict
(context)[source]¶ Predict using the loaded model and the context.
Parameters: context (src.recommender.wrappers.Context) – Context wrapper Returns: list of recommended ids sorted by descending score scores (list(float)): list of scores for each recommended item Return type: ids (list(int))
-
recommend
(context)[source]¶ Recommend movies based on context
Parameters: context (src.recommender.wrappers.Context) – Context wrapper Returns: src.recommender.wrappers.Recommendations as dict Return type: recommendations (dict)
-
-
class
src.recommender.engines.engine.
QueryBasedEngine
[source]¶ Bases:
src.recommender.engines.engine.Engine
Abstract class for an engine based on a SQL query performed at every call. These are engines require no training, for instance an engine that will recommend random items for DB.
-
compute_query
(context)[source]¶ Abstract method that computes the SQL query using SQLAlchemy
Parameters: context (recommender.wrappers.Context) – context wrapper Returns: query result
-
recommend
(context)[source]¶ Method for recommending items, by calling self.compute_query.
Parameters: context (recommender.wrappers.Context) – context wrapper Returns: recommendations as list of dict Return type: list(dict)
-
-
class
src.recommender.engines.generic.
MostRecent
[source]¶ Bases:
src.recommender.engines.engine.QueryBasedEngine
-
compute_query
(context)[source]¶ Abstract method that computes the SQL query using SQLAlchemy
Parameters: context (recommender.wrappers.Context) – context wrapper Returns: query result
-
-
class
src.recommender.engines.generic.
Random
[source]¶ Bases:
src.recommender.engines.engine.QueryBasedEngine
-
compute_query
(context)[source]¶ Abstract method that computes the SQL query using SQLAlchemy
Parameters: context (recommender.wrappers.Context) – context wrapper Returns: query result
-
-
class
src.recommender.engines.generic.
TopRated
[source]¶ Bases:
src.recommender.engines.engine.QueryBasedEngine
-
compute_query
(context)[source]¶ Abstract method that computes the SQL query using SQLAlchemy
Parameters: context (recommender.wrappers.Context) – context wrapper Returns: query result
-
-
class
src.recommender.engines.generic.
UserHistory
[source]¶ Bases:
src.recommender.engines.engine.QueryBasedEngine
-
compute_query
(context)[source]¶ Abstract method that computes the SQL query using SQLAlchemy
Parameters: context (recommender.wrappers.Context) – context wrapper Returns: query result
-
The metrics functions are copied from this repository: https://gist.github.com/bwhite/3726239
-
src.recommender.metrics.
dcg_at_k
(r, k, method=0)[source]¶ Score is discounted cumulative gain (dcg) Relevance is positive real values. Can use binary as the previous methods. Example from http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf >>> r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0] >>> dcg_at_k(r, 1) 3.0 >>> dcg_at_k(r, 1, method=1) 3.0 >>> dcg_at_k(r, 2) 5.0 >>> dcg_at_k(r, 2, method=1) 4.2618595071429155 >>> dcg_at_k(r, 10) 9.6051177391888114 >>> dcg_at_k(r, 11) 9.6051177391888114 :param r: Relevance scores (list or numpy) in rank order
(first element is the first item)Parameters: - k – Number of results to consider
- method – If 0 then weights are [1.0, 1.0, 0.6309, 0.5, 0.4307, …] If 1 then weights are [1.0, 0.6309, 0.5, 0.4307, …]
Returns: Discounted cumulative gain
-
src.recommender.metrics.
evaluate_recommendations
(predictions, target, k)[source]¶ Evaluate the quality of recommendations with NDCG. We compare the predictions set with the target set that should reflect what items are relevant.
Parameters: - predictions (list) – List of recommended items. Ordered by descending score.
- target (list) – List of relevant items.
- k (int) – Only consider the k first items in the set
Returns: NDCG at k score
Return type: float
-
src.recommender.metrics.
ndcg_at_k
(r, k, method=0)[source]¶ Score is normalized discounted cumulative gain (ndcg) Relevance is positive real values. Can use binary as the previous methods. Example from http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf >>> r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0] >>> ndcg_at_k(r, 1) 1.0 >>> r = [2, 1, 2, 0] >>> ndcg_at_k(r, 4) 0.9203032077642922 >>> ndcg_at_k(r, 4, method=1) 0.96519546960144276 >>> ndcg_at_k([0], 1) 0.0 >>> ndcg_at_k([1], 2) 1.0 :param r: Relevance scores (list or numpy) in rank order
(first element is the first item)Parameters: - k – Number of results to consider
- method – If 0 then weights are [1.0, 1.0, 0.6309, 0.5, 0.4307, …] If 1 then weights are [1.0, 0.6309, 0.5, 0.4307, …]
Returns: Normalized discounted cumulative gain
-
class
src.recommender.recommender.
Recommender
[source]¶ Bases:
object
Recommender System base class.
-
recommend
(context, restrict_to_engines=[])[source]¶ Call all the active engines based on a context and return their recommendations.
It is possible to restrict to a list of engines by using the
restrict_to_engines
parameter.Parameters: context (recommender.wrappers.Context) – Context wrapper, providing informations about the current item or user or session. Returns: List of recommendations as dictionaries Return type: list(dict)
-
-
class
src.recommender.wrappers.
Context
(**kwargs)[source]¶ Bases:
object
A wrapper for context that will help engines make recommendations.
src.utils package¶
-
src.utils.data.
recommendations_from_similarity_matrix
(movie_ids, sim_matrix, n_recommendations, input_kind)[source]¶
-
src.utils.data.
sparse_matrix_from_df
(df, groupby, indicator)[source]¶ Make a scipy sparse matrix from a pandas Dataframe
Parameters: - df (pd.DataFrame) – Dataframe with the matrix desired rows as index
- groupby (str) – Name of the column to set as matrix column
- indicator (str) – Name of the column that will serve as data
Returns: sparse matrix (scipy.sparse.csr_matrix) row values (list) column values (list)