During the holidays I read an interesting article on product recommendation systems for online shops. To be more precise, it was on the theory behind the algorithms and approaches used to filter the product set of an online shop to show the customer the products which might be worth having another look. If you ever bought something from Amazon, you know what I’m talking about…;)
Well, finally I implemented a product recommendation algorithm in ColdFusion and it’s really amazing how good this algorithm already is, given the fact that I chose the easiest of all possible approaches for playing with it.
The approach is quite simple:
1. Create a vector (a CF array) containing the product the customers bought already, for example (3,0,5,0,1,0) which means that the customer bought product 1 3 times, product 3 5 time etc…
2. Build a diagonal matrix with each matrix entry containing a “metric” value between the two corresponding customer-product vectors. The question is what the metric should be like. A simple approach is to calculate the cosinus of the angle between the two vectors. As all vector components are positive there are cosinus values between 1 and 0 to expect with 1 as the best value in terms of that the vectors are as similar as possible. The formular to calculate the cosinus alpha is cos alpha = a*b / (abs(a)*abs(b)) with a*b the vector multiplication and abs(a)*abs(b) the multiplication of two scalar values.
3. Find the nearest n customers to the one you’re building the recommendation list for. That’s easy and just an array sort after a simple SELECT query.
4. Fetch the product vectors of the n selected customers of step 3 and summarize them, subtract the corresponding values of the target customers product vector. Take the m entries with the largest amount and present them to your customer.
It works fine so far, I nearly finished putting all the stuff into a set of CFCs and hopefully I’m able to put them online tomorrow as a demo.
There are several drawback to this approach:
1. It’s hard to build the product vector in step 1 as it’s some sort of implicite product recommendation vector (the customer buys something and I interpolate this to be a recommendation). It might be an issue also that the algorithm doesn’t work well for completely new customers.
2. Calculating the angle between the two vectors is not a real “metric” in a mathematical sense. To be a metric d has to fulfill three properties:
a) d(a,b) has to be greater than 0 and d(a,a)=0
b) d(a,b) = d(b,a)
c) d(a,c) has to be smaller or equal than d(a,b)+d(b,c) (triangle un-equation)
Another issue with the cosinus pseudo-metric is that you usually end up with a lot of matrix entries containing a zero (the two corresponding product vectors are not similar at all) and that there’s no way to differentiate that further. Additionally the calculation is quite “expensive” in terms of building the vector product of two n-dimensional vectors etc.
3. This approach creates a lot of data. For example if you deal with 5000 customers you’d end up with a metric matrix of ca. 12,500,000 components (database rows). At least the matrix is sparse so that if you translate it to a database, you should end up with less data as you could leave the zero values out. Additionally there are a lot of techniques for indexing the data in a database.
But besides that I’m quite fine with it 😉 Hopefully I’ll get the rest of the work done tomorrow…
If anyone is interesting in this topic, I’d appreciate comments and suggestions. I have to dig futher into this and to find a better metric. The article provided some suggestions concerning data mining and data clustering which sound very interesting.
I don’t suppose you can explain this in simpler language for us non-math types? Espically step 2, and what does “nearest n customers” mean in this context?
AgentK,
I took an ecommerce course last year during my MS at North Carolina State University in which we had several lectures about recommendation systems especially Collaborative Filtering (People to people corellation). Let me know if you need any addtional information and I would be glad to send you my lectures notes and other references.
Thanks
Comments on this entry are closed.