Enhanced Collaborative Filtering with Reinforcement Learning



Research Motivation


Disadvantages of RBM based recommendation systems

  1. Inputs do not take into account the time relationship of user ratings
    → Doesn’t reflect the user’s change in taste

  2. Must train on the whole data to account for new data
    → Cannot update the system fast enough to the incoming data

  3. Implicit feedback data is not considered
    → Cannot utilize data that can be more meaningful than explicit rating data



Suggested Model


Enhanced Collaborative Filtering with Reinforcement Learning



  1. Obtain score r̂ for each item from the output of RBM
  2. Extract K rank values from Half normal distribution
  3. Select items ranked equivalently to the extracted rank values and produce approporiate rewards by comparing to the left-out items
  4. Calculate ranks based on r̂ approximately and apply it to the Half normal distribution to obtain the policy gradients to train the RBM’s parameters



Experimentation Data


Used MovieLens dataset to perform recommendation system evaluation

  Users Items Ratings Rating density
MovieLens100K 1,000 1,700 100,000 5.88%
MovieLens1M 6,000 4,000 1,000,000 4.17%



Comparison to Original RBM


MovieLens100K

image

  RLRBM RBM
HR@10 0.1481 0.1217
HR@25 0.2169 0.2169
ARHR 0.05986 0.05152

MovieLens1M

image

  RLRBM RBM
HR@10 0.09272 0.08940
HR@25 0.1813 0.1763
ARHR 0.04423 0.04353



Comparison to Supervised Learning Approach for Additional Training to the Left-out Data


MovieLens100K

image

  RLRBM SUPERVISED
HR@10 0.1481 0.1217
HR@25 0.2169 0.2275
ARHR 0.05986 0.05036

MovieLens1M

image

  RLRBM SUPERVISED
HR@10 0.09271 0.08858
HR@25 0.1813 0.1738
ARHR 0.04423 0.04557



Comparison of Different Values of K


image



Conclusion


 Conclusion

 Future Work