STEREO

State-of-the-art

Collaborative Filtering (CF) is a thriving subfield of machine learning, and several surveys expose the achievements in this fields [1,2]. It became popular in the late 90's with the spread of online services that use recommender systems, such as Amazon.com, Yahoo! and Netflix. CF solutions in the current literature are often divided in two main groups: memory-based and model-based [3].

Memory-based algorithms operate on the entire set of ratings to generate predictions. The idea behind this long-lived family of algorithms is to compute a similarity score, which reflects the distance between two users or two items. Predictions are then computed by looking at the ratings of the k most similar users or items (k-nearest neighbor) [4,5]. Memory-based methods are used in a lot of real-world systems because of their simple design and implementation. However, they impose several scalability limitations that make their use impractical when dealing with large amounts of data. The slope one algorithms [6] were proposed to make faster prediction than memory-based algorithms, but they were unable to overcome the scalability issues of the latter.

Model-based approaches have been investigated to overcome the shortcomings of memory-based algorithms. They use the set of available ratings to estimate or learn a model and then apply this model to make rating predictions. There have been several model-based CF approaches proposed in the literature, which use almost every existing machine learning technique. In this multitude of algorithms, the most successful are by far those based on low-dimensional factor models. The idea behind such techniques is that attitudes or preferences can be described by a small number of unobserved factors, that can be thought of as representing user communities (like-minded users) or item communities (genres). The Netflix Prize definitively consecrated latent factor models, in particular those based on matrix factorization (MF) [7,8,9,10]. These methods aims at obtaining two lower rank matrices P and Q, for users and items respectively, from the global matrix of ratings R, with minimal loss of information. The fact that in real-word datasets most entries in R are missing makes the factorization R = P Q a difficult non-convex optimization problem [11]. Current MF techniques works by minimizing an error function that measures the factorization quality, and the most popular solutions are Alternating Least Squares (ALS) and Stochastic Gradient Descent (SGD). Both algorithms need several passes through the set of ratings to achieve this goal. ALS [9] alternates between keeping P and Q fixed. The idea is that, although both these values are unknown, when the item vectors are fixed, the system can recompute the user vectors by solving a least-squares problem (that can be solved optimally), and vice versa. SGD [10] works by taking steps proportional to the negative of the gradient of the error function. The term stochastic means that P and Q are updated, at each iteration, for each given training case by a small step, toward the average gradient descent.

Some recent works aimed at increasing the scalability of SGD [12,13,14], however the asymptotic cost of this technique makes it difficult to fit the timeliness requirements of real-world applications, especially when applied on large data sets. Furthermore, each update leads to non-local changes (for each observation the user vector increment is proportional to the item vector, and vice versa) which increase the difficulty (i.e. the communication costs) of distributed implementations.

Recently, new CF techniques have been proposed that consider the time dimension in ratings data. The main motivation behind these new approaches is that users taste change over time. Moreover, the user base, as well as item base, keep growing in the system. Traditional CF techniques don’t take into account this behavior and need to be periodically trained in order to be up-to-date with the latest information. Both memory-based and model-based approaches have been adapted to take temporal effects into consideration.

The most natural way to adapt neighborhood-based algorithms to temporal effects is to somehow give more relevance to recent observations, and less to past observations. This can be achieved using a series of techniques, most of which are based on either discrete time windows [15] or continuous decay functions [16, 17].

Matrix factorization algorithms have recently been adapted to cope with temporal effects. In [18] the SVD algorithm is extended to tackle temporal effects, using a combination of discrete time windows (for items) and continuous decay functions (for users). The idea proposed in [19] is to introduce time factors into the rating matrix, splitting the time in discrete windows.

One emerging trend in the Web 2.0 is that user’s interactions with the systems are becoming highly dynamic. It is therefore crucial for a CF algorithm operating in this environment to be capable of dynamically tracking users’ interests and providing very fast responses. In order to achieve such a goal, a straightforward strategy would be to adopt any existing CF algorithm and simply repeatedly rebuild the model using both old and new data. However most algorithms typically need a very lengthy training stage, and the computational complexity of this naive approach would not be affordable for applications requiring fast responses. It is therefore necessary to develop incremental algorithms that can efficiently adapt existing model using the new data. The algorithm described in [17] has an incremental and low complexity similarity computation, allowing the system to perform online updates. The possibility to update incrementally the model is an essential feature that a CF algorithm that copes with unbounded flow of data must possess. Several incremental solution have been proposed and evaluated [18, 19, 20, 21].

The main issue with online approaches is their short-term “memory”, i.e., since the updates based only on the most recent data point do not take into account past observations, the model quickly “forgets” them. Recently in [22] have been proposed a CF online learning algorithm that addresses this problem by keeping a representative sample of the data set in a reservoir and using only this sample plus the new observation for retraining. However, this research area is quite new, active and not solved, both for the algorithmic and the architectural aspects. The design of a distributed techniques aimed at providing scalable, low-latency solution for online rating aggregation and data stream processing is the main focus of the stereo project.

References

[1] Su, Xiaoyuan, and Taghi M. Khoshgoftaar. "A survey of collaborative filtering techniques." Advances in artificial intelligence 2009 (2009): 4.
[2] Adomavicius, Gediminas, and Alexander Tuzhilin. "Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions." Knowledge and Data Engineering, IEEE Transactions on 17.6 (2005): 734-749.
[3] Breese, John S., David Heckerman, and Carl Kadie. "Empirical analysis of predictive algorithms for collaborative filtering." Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 1998.
[4] Resnick, Paul, et al. "GroupLens: an open architecture for collaborative filtering of netnews." Proceedings of the 1994 ACM conference on Computer supported cooperative work. ACM, 1994.
[5] Linden, Greg, Brent Smith, and Jeremy York. "Amazon. com recommendations: Item-to-item collaborative filtering." Internet Computing, IEEE 7.1 (2003): 76-80.
[6] Lemire, Daniel, and Anna Maclachlan. "Slope One Predictors for Online Rating-Based Collaborative Filtering." SDM. Vol. 5. 2005.
[7] Hu, Yifan, Yehuda Koren, and Chris Volinsky. "Collaborative filtering for implicit feedback datasets." Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. IEEE, 2008.
[8] Koren, Yehuda. "Factorization meets the neighborhood: a multifaceted collaborative filtering model." Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008.
[9] Zhou, Yunhong, et al. "Large-scale parallel collaborative filtering for the netflix prize." Algorithmic Aspects in Information and Management. Springer Berlin Heidelberg, 2008. 337-348.
[10] Takács, Gábor, et al. "Scalable collaborative filtering approaches for large recommender systems." The Journal of Machine Learning Research 10 (2009): 623-656.
[11] Salakhutdinov, Ruslan, and Andriy Mnih. "Probabilistic Matrix Factorization." NIPS. Vol. 1. No. 1. 2007.
[12] Gemulla, Rainer, et al. "Large-scale matrix factorization with distributed stochastic gradient descent." Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2011.
[13] Zhuang, Yong, et al. "A fast parallel SGD for matrix factorization in shared memory systems." Proceedings of the 7th ACM conference on Recommender systems. ACM, 2013.
[14] Teflioudi, Christina, Faraz Makari, and Rainer Gemulla. "Distributed Matrix Completion." ICDM. 2012.
[15] Lathia, Neal, Stephen Hailes, and Licia Capra. "Temporal collaborative filtering with adaptive neighbourhoods." Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2009.
[16] Ding, Yi, and Xue Li. "Time weight collaborative filtering." Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 2005.
[17] Liu, Nathan N., et al. "Online evolutionary collaborative filtering." Proceedings of the fourth ACM conference on Recommender systems. ACM, 2010.
[18] Sarwar, Badrul, et al. "Incremental singular value decomposition algorithms for highly scalable recommender systems." Fifth International Conference on Computer and Information Science. 2002.
[19] Wang, Jialei, et al. "Online multi-task collaborative filtering for on-the-fly recommender systems." Proceedings of the 7th ACM conference on Recommender systems. ACM, 2013.
[20] Ling, Guang, et al. "Online learning for collaborative filtering." Neural Networks (IJCNN), The 2012 International Joint Conference on. IEEE, 2012.
[21] Rendle, Steffen, and Lars Schmidt-Thieme. "Online-updating regularized kernel matrix factorization models for large-scale recommender systems." Proceedings of the 2008 ACM conference on Recommender systems. ACM, 2008.
[22] Diaz-Aviles, Ernesto, et al. "Real-time top-n recommendation in social streams." Proceedings of the sixth ACM conference on Recommender systems. ACM, 2012.