Jim commented my entry on product recommendation that he’d like to get some more details about the “mystic math” behind the approach.
Ok, I’ll do my best to deliver it a little bit more in depth…meanwhile the CFC are done also and I’ll post them in one of the next entries tomorrow.So, let’s assume a very basic setting. We run an online store with 3 products and we have 3 customers for now. Not THAT big, isn’t it?
Products:
1- A
2- B
3- C
Customers:
1, he bought 3xA, 5xB and 1xC
2, she bought 2xA, 1xB and 0xC
3, she bought 9xA, 0xB and 4xC
The resulting product vectors for the customers would be:
1: (3,5,1)
2: (2,1,0)
3: (9,0,4)
For all who don’t have any idea of linear algebra, linear equations and matrix calculations… in this case – with the product vectors consisting of 3 entries each, you could imagine them as arrows in a 3D-space. So the “arrow” of customer 1 point to a point with coordinates x=3, y=5 and z=1 in our 3D world, assuming you’d defined a (0,0,0) value somewhere.
We further assume that all the customers are very satisfied with their choices up to now so that having a 9 in the first component of customer 3’s prodcut vector means that he’s very satisfied with the product (at least he bought it 9 times).
Now for the matrix. What we’d have to do is to calculate a “similiarity” value for each customer pair. In real life I’d like to take the best 10 (or 20 or 100 or 1000 or whatever) customers for a given visitor of my store. to be able to do that I need a way how I could compare them.
The idea used here is the angle between the product vectors (between the “arrows”. If the angle is 90 degree, the arrows obviously point in two totally different direction whereas an angle of just 1 degree might suggest that the two point in a very similar direction (that they “ARE” similar at all). A way to express that is to calculate the angles cosinus, a cosinus of 0 equals an angle of 90 degree, a cosinus of 1 equals an angle of 0 degree.
The calculation is quite easy. Let’s take customer 1 and 2:
1: (3,5,1)
2: (2,1,0)
3: (9,0,4)
cosinus alpha12
= (3,5,1)*(2,1,0) / ( abs(3,5,1)*abs(2,1,0) )
= 3*2+5*1+1*0 / ( sqr(3*3+5*5+1*1)*sqr(2*2+1*1+0*0) )
= 11 / ( sqr(35)*sqr(5) )
= 11 / sqr(175)
= 11 / 13.228
= 0.831
Same for the others:
alpha 13 = 31 / ( sqr(35)*sqr(97) ) = 0.532
alpha 23 = 18 / ( srq(5)*sqr(97) ) = 0.817
So, the “matrix” is the following:
1 0.831 0.532
0.831 1 0.817
0.532 0.817 1
Obviously the diagonal is cosistently equals 1 as the angle between a vector and itself is always 0 and obviously the diagonal is a mirror line because the calculation of the cosinus is commutative (it doesn’t matter if you calculate
cos(a,b) or cos(b,a).
Given this, we don*t have to deal with n*n sets of data, we could reduce it to (n*n+n)/2 sets of data (=rows in a databse). In real life with a lot of products, there will be a lot of zero values as we have to deal with an n-dimensional space and the customers product vectors will contain a lot of zero entries (Usually most of the customers by several products but not all of your stock of 10,000 different products and variants). So, we could leave all the zero entries away to further reduce the amount of data.
With the matrix, we’re nearly finished now.
Let’s say, customer 1 enters the store again. We want to offer him the best product of the nearest customer (in terms of the cosinus rating).
For customer 1, customer 2 is the “nearest one”, as she beats customer 3 by 0.831 to 0.532. So, we look up what customer 2 bought up to now: 2 times product A and 1 time product B. In a real setting we would look up the best 10 customers for example and we would add up all their product vectors. I want to avoid that customer 1 will be recommend to buy a product he already bought, so we’d set all values of the “summed” product vector to equal 0 where customer 1 has entries in his own product vector.
For example (as my numbers from above don’t work for it ;):
product vector of customer 1: (5,0,0,1,0,0,5)
summed product vevor of the best 10 customers related to customer 1: (54,76,12,65,12,45,12)
The result would be: (0,76,12,0,12,45,0) and we’d present the products B,F,C and E (in this case in the order of the amount bought in total) to customer 1.
I hope that this example helps to improve the understanding a bit. If there are further remarks or questions, just ask.
You are a crazy guy!
Sven, with a vector by my head *g
Comments on this entry are closed.