Your Next Read: Designing a Book Recommender System for Avid Readers
Photo by Prasanna Kumar on Unsplash
As someone who loves reading, I sometimes struggle to decide what book to read next. I once had to go through all the books my bookseller had in store to find something to read. After confirming with some friends that it's also a frequent problem for them, I decided to find a way to solve it.
The ideal approach will be to have a system that recommends books based on some input from the user.
NextRead: The Book Recommendation System
NextRead aims to solve the problem of deciding what book to read next by taking the title of a book from you and suggesting 10 other similar books for you to read.
NextRead works by using data about books available on the Goodreads website. After it takes a title from you, it searches the dataset to find a title with at least 60% similarity to the title you provide. Then it compares the description of each book that is similar to the title found in the dataset. If there are similar titles, it returns the top 10 results. Otherwise, it tells you there are no results found.
You can view a live demo of the project to test it.
Building The Project
Finding Data for the Model
After I decided what I wanted to build, I started by looking for websites with lots of book information or APIs that could provide such. I spent about 2 days before finding the Goodreads website.
The next thing to do was to build a scraper. The challenge with the scaper became obvious when I realized that Goodreads stored the information of each book inside a tooltip. So a regular scraping approach was not going to work. I somehow needed a way to make the tooltips appear while scraping the pages I needed.
Since I needed to interact with the page for the tooltips to appear, I decided to scrape the page with Python Selenium. It took a while for me to get the hang of it, but eventually, I was able to scrape the data I needed.
Deciding What Approach to Use
Now that I had the data available, I needed to decide how to build the recommender system. I did some research and eventually came up with 2 options:
Use Google's generative AI via langchain
Build a simple ML model.
I initially went with the first option and everything was good until I had to implement a UI for it. I realized the response format was always inconsistent and styling it with Streamlit was a challenge. Sometimes, it returns a dataframe, other times, it returns a list, and sometimes, it returns a number as the final result.
After spending so much time trying to fix the issue, I decided to opt for the second option; creating my model. Before now, I had never created a model before, so it felt exciting. I started by reading articles and watching videos to understand what I was getting myself into. But of course, I found a new problem.
Content-Based Filtering vs Collaborative Filtering
I realized that I could build my model with more than one approach (of course! What was I expecting?๐). This realization made me go back to consider the workflow of my app.
With collaborative filtering, the app will require users to sign up so I can use their activities and interests to recommend books to other users with similar interests. But I wanted something everyone could use on the fly with no signup required.
With content-based filtering, recommendation only happens based on some input (text, image, etc) by the user. This made it easier to discard the signup aspect so I chose this approach.
Designing the User Interface
After I finished building the recommender system with Python, I needed a way for users to interact with it on the web. The easiest option was Streamlit so I opted for it.
After the first iteration, my UI looked like this:
The recommender works but doesn't provide the user with more information such as the description and image of the books it's recommending. I needed to fix this for a better user experience.
I did some extra research and found a better solution. After implementing this solution, the new UI looked like this:
Although the user will have to scroll a bit more than before, they get value for it because they get more information about each book.
I kept thinking of a better way to improve the user experience and I decided to cache the data the first time the recommender is loaded for the user. This will improve its speed and reduce load time.
Conclusion
Building NextRead was definitely fun for me. I enjoyed researching and building something useful for me and the people around me.
It goes without saying that the recommender system still falls short in many areas. Some of the problems I have identified include the following:
Limited dataset: The dataset used in this version of the project is very small, so the recommender might not work for some books.
Dissimilar results: Because I have a very small dataset, I made the model return results that are 60% similar. This leaves a wide gap and allows unrelated books to be suggested to users.
I plan to keep working on the project to make it better for users. If you're interested in contributing to the project, feel free to check out the repository on GitHub.