Collaborative filtering is a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many users (collaborating).
Muscle is creating new technology to understand behavior and also create personalize offers to cardholders for that we have build several approaches and Artificial Intelligence is one of them. In this article we want to show you a simple way to get recommendations based on algorithms.
Collaborative filtering is a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person.
We will use this to build an algorithm capable of recommending a set hotels B = {hotel1, hotel2,...hoteln} from a specific hotel we know (hotel10) based on the Pearson correlation coefficient. The criteria used for comparison are the reviews users gave to the hotels (from 1 to 5).
To make this more interesting, we will use some real hotels and ratings downloaded from Datafiniti in the Kaggle machine learning platform (https://www.kaggle.com/datafiniti/hotel-reviews).
Enough! let's get into the fun part.
Load Hotel Ratings Data
Welcome!, let's build a collaborative-filtering algorithm to offer recommendations for Hotels based on past users ratings. For now, let's load the data that contains what we need to start.
user
hotel
rating
0
Paula
Rancho Valencia Resort Spa
5.0
1
D
Rancho Valencia Resort Spa
5.0
2
Ron
Rancho Valencia Resort Spa
5.0
3
jaeem2016
Aloft Arundel Mills
2.0
4
MamaNiaOne
Aloft Arundel Mills
5.0
As you can see, we have a table with only three columns: the username, the hotel name, and the rating given. We will assume that this data contains only the last rating provided by the user to a particular hotel. The data format is similar to what we will find in our database. To build collaborative filtering, we require to use a matrix where one dimension is the users, the other is the hotel names, and the values are the ratings. This is easily done with Pandas.
Let's build our rating matrix with Pandas pivot_table()
Convert Data to Ratings Matrix
hotel
1906 Lodge At Coronado Beach
250 Main Hotel
AC Hotel Chicago Downtown
AC Hotel Miami Beach
AC Hotel by Marriott Boston Downtown
ARIA Resort Casino
Acadia Suites
Ace Hotel Chicago
Ace Hotel New Orleans
Admiral Hotel
...
Wyndham Garden Lafayette
Wyndham Garden Pittsburgh Airport
Wyndham Garden San Jose Silicon Valley
Wyndham Garden-amarillo
Wyndham Houston - Medical Center Hotel and Suites
XV Beacon
Yakutat Lodge
dana hotel and spa
hampton inn Springfield southeast
hotel le bleu
user
'Sina Bamtefa
0
0
0.0
0
0
0
0
0.0
0
0
...
0
0.0
0.0
0
0
0
0
0.0
0
0
007lele
0
0
0.0
0
0
0
0
0.0
0
0
...
0
0.0
0.0
0
0
0
0
0.0
0
0
0501MVKG
0
0
0.0
0
0
5
0
0.0
0
0
...
0
0.0
0.0
0
0
0
0
0.0
0
0
0ls0njo
0
0
0.0
0
0
0
0
0.0
0
0
...
0
0.0
0.0
0
0
0
0
0.0
0
0
0theHero
0
0
0.0
0
0
0
0
0.0
0
0
...
0
0.0
0.0
0
0
0
0
0.0
0
0
5 rows × 1670 columns
This matrix is what we need to start. One thing to look at is that the matrix is sparse. There will be a lot of zeros in the matrix because it is unlikely that all the users visited all the hotels, therefore it is expected that a particular user visited just a few ones.
Dimensional Reduction using SVD
We will be looking to recommend similar a set of hotels to hotel X (we will choose one). So we will "compress" the user's dimension using the Singular Value Decomposition (SVD) transformer. SVD works great with sparse matrices such as ours. We will reduce users from 6942 to 10.
To do this, we will flip our rating matrix with the Transpose operator so that the hotel-names dimension will be Y and user's be on the X-axis. After doing this, we will apply the SVD transform.
user
'Sina Bamtefa
007lele
0501MVKG
0ls0njo
0theHero
103bennier
106PamelaL
108peggyt
112traveler47
121dawne
...
yves r
yyftam
z
zackandbritt8611
zamguy2013
zenleanne
zfakhavan
zhamant
zip98221
zumbadiva1
hotel
1906 Lodge At Coronado Beach
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
250 Main Hotel
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
AC Hotel Chicago Downtown
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
AC Hotel Miami Beach
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
AC Hotel by Marriott Boston Downtown
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
5 rows × 6942 columns
(1670, 10)
Success! We have compressed 6942 users to 10, and we kept all the hotels.
Calculate the Pearson Correlation Coefficient
The Pearson Correlation is a measure of linear correlation between two sets of data. This coefficient is the covariance of the two variables divided by the product of their standard deviations. In other words, it compares how "similar" are two vectors of the same size. Now that we have 1670 hotels, each one with a vector of size 10, we can compare one to each other to define this correlation coefficient. We assume that if the Pearson coefficient calculated from two hotels, say P(hotel1, hotel2) is 0.99, they are highly correlated, so if the user liked hotel1, they will probably like hotel2.
Numpy has the np.corrcoef() method to calculate all the Pearson coefficients in a single step. Let's use it to get all the coefficients at once.
(1670, 1670)
Recommendations
Good job; we have everything ready to start getting recommendations. We have all the ingredients in place. Now, let's create a method to obtain recommendations based on the hotel name.
The recommended (name) method performs the following tasks:
gets the index of the hotel from the rating matrix (which column number is)
gets ALL the pearson coefficients from that hotel against all others.
now creates a pandas data frame with hotel names, Pearson coefficients and sorts them descending. The top 10 are those that are the most similar. Yes, the top one with a Perason coefficient of 1 should be the same hotel.
Example 1: Hotels similar to AC Hotel Miami Beach
pearson
hotel
3
1.000000
AC Hotel Miami Beach
784
0.977024
Hampton Inn Suites Lavonia
221
0.951110
Best Western Plus Hotel At The Convention Center
1028
0.940408
Inn At The 5th
1286
0.938370
Ramada Plaza Hawthorne/LAX
1328
0.937191
Residence Inn Annapolis
949
0.934082
Home2 Suites by Hilton Buffalo Airport/Galleri...
1347
0.932799
Residence Inn Phoenix North/Happy Valley
1401
0.932389
Shilo Inn Suites - Coeur d'Alene
537
0.932084
Dillon Motel
Example 2: Hotels similar to Silver Sands Oceanfront Motel
pearson
hotel
1407
1.000000
Silver Sands Oceanfront Motel
670
0.970656
Grande Colonial La Jolla
578
0.966207
Element Basalt - Aspen
250
0.963198
Best Western Plus Williston Hotel & Suites
757
0.960163
Hampton Inn Orange City
798
0.958458
Hampton Inn Suites Sioux City/South
455
0.949858
Courtyard Las Vegas Henderson/Green Valley
1644
0.945691
Wildflower Inn
340
0.943473
Carter Iva
1099
0.942650
Lyttleton Inn
Example 3: Hotels similar to Hampton Inn Suites National HarborAlexandria Area
pearson
hotel
791
1.000000
Hampton Inn Suites National HarborAlexandria Area
848
0.976166
Hilton Garden Inn Chesapeake/Suffolk
1311
0.975341
Red Roof Inn Hampton Coliseum Convention Center
754
0.974526
Hampton Inn Norfolk/Virginia Beach
604
0.974434
Extended Stay America Washington, D.C. - Sprin...
1599
0.974317
TownePlace Suites by Marriott Suffolk Chesapeake
932
0.974111
Holiday Inn Express and Suites Exmore, Eastern...
600
0.974036
Extended Stay America Hampton - Coliseum
543
0.974036
DoubleTree by Hilton Hotel Orlando at SeaWorld
1040
0.969385
Island View Motel
Improvements to this example!
Nice! You have reached the end of our tour of memory-based collaborative filtering for hotels. We want to finish this post with some things we can do to improve recommendations in this example targeting some real-life conditions. Here are some things you can do to improve recommendations and user experience:
Filter by longitude and latitude. You can calculate how far the hotel is from others and sort them by distance. There are some popular distance methods, such a Euclidean or by using the Haversine formula.
Try other distance methods. We used the Pearson Coefficient but feel free to try Cosine Similarity, Spearman, Mean Square Distance, etc.
Update the rating matrix often. This matrix is a kind of active learning, as the newest ratings by the users should be used to re-calculate the correlation coefficients that can bring new recommendations.
Check if recommendations make sense. Choose a hotel, ask for ten offers and check if they make sense based on this list of recommendations.
Try other dim-reduction algorithms. The TruncateSVD is a way to deal with the cold start. Because there are so many zeros in our matrix, using a dimensional reduction method helps. Using other dimensional reduction techniques instead of SVD might produce different results. Some candidate algorithms to test are not limited to Isomap Embedding, Spectral Embedding, PCA, and t-SNE.
With Muscle you will have an Artificial Intelligence solution with personalization, engagement, ordering and filtering features that improve your profits and save you money. If you want to build the future, contact us today.
Juan Zamora-Mora, Ph.D
Subscribe to amazing content
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.