Correlation
by Richard Reid
(Reference "The Theory of Blackjack" by Dr. Peter Griffin)
The term Betting Correlation is often mentioned when talking about card counting systems. What does the term correlation mean? This article is intended to answer the question and will give the details necessary to understand most of the basics of correlation.
Correlation is the relationship between two variables. The correlation coefficient is a measure of the degree to which these two variables are linearly related.
For instance, let's say we have measured the overall effect of removal for each denomination of cards in a single deck and wish to know the degree of relationship that exists between a point count system and the effects of removal.
The statistical technique appropriate to this inquiry is correlation. Technically, we would say that we have scores on the variables of "effects of removal" and "point count tag values" and wish to determine the degree of correlation between these two variables.
Suppose we are using the High-Low card counting system in a single deck, S17, NDAS game.
| Card Value |
Effects of Removal |
Point Count Tag Value |
| Two |
0.38 |
+1 |
| Three |
0.44 |
+1 |
| Four |
0.55 |
+1 |
| Five |
0.69 |
+1 |
| Six |
0.46 |
+1 |
| Seven |
0.28 |
0 |
| Eight |
0.00 |
0 |
| Nine |
-0.18 |
0 |
| Ten |
-0.51 |
-1 |
| Ace |
-0.61 |
-1 |
Table 1
We can see by the green (positive values), white (zero values) and red (negative values) that the "effects of removal" and the "point count tag values" are somewhat related, but not exactly related. We will show how to determine the degree of relationship later on, but for now let's take a look at a scatter diagram (Figure 1) that illustrates a perfect positive correlation. Notice that every data point in Figure 1 lies exactly on the straight line. It is called "perfect" because the amount of increase in scores on one variable is exactly proportional to the amount of increase in scores on the other variable, with no exceptions. It is called "positive" because an increase in scores on one variable is associated with an increase in scores on the other variable.
Now, let's take a look at a scatter diagram (Figure 2) that illustrates a perfect negative correlation. It is evident from the diagram that "Favorability" is inversely related to the "Perfect Negative Point Count." Statistically, we would say there is a perfect negative correlation between these two variables. Again, it is perfect because the dots all fall exactly on a straight line and that the amount of increase on one variable is proportional to the amount of decrease on the other variable. It is "negative" because the two variables are inversely related.
The coefficient of correlation for the perfect positive correlation shown in Figure 1 is r = 1.00. The coefficient for the perfect negative correlation shown in Figure 2 is r = -1.00. These two values are the maximum values for r. It should be noted that the sign of the correlation coefficient indicates whether the correlation is positive or negative. The coefficient that indicates no degree of correlation is r = 0.00. This is the case when scores are not related in any way to scores on the other variable as in Figure 3.
A word of caution is needed here on the interpretation of the meaning of correlation. Correlation indicates the amount of relationship between two variables. This should not be taken to mean that there is necessarily any causal relationship between them. Correlation does not imply that, because two variables are related, one is causing the other.
Another caution should also be observed in interpreting correlation coefficients. Because they are coefficients, they cannot be interpreted as percentages of agreement. A coefficient of r = 0.30 does not indicate that there is a 30% agreement between the two sets of scores. Also, it is not proper to say that a correlation of r = 0.40 is twice as strong as a correlation of r = 0.20 just because the coefficient is twice as large. Coefficients of correlation are useful only in judging relative strengths of association and for indicating significant relationships between two variables.
The formula for determining the correlation coefficient in Blackjack is technically based on "Pearson's Product Moment Correlation Coefficient (symbol is usually r)." It is a measure of linear association and so is unsuitable if the relationship is non-linear. This is calculated from the formula:
sxy / sxsy where sx and sy are the standard deviations of x and y, and sxy is the covariance of x and y.
In Blackjack, given that Ci are the "Point Count Tag Values" and Pi are the "Effect of Removal" (see Table 1),
correlation = A / B, where
A = sum { Ci * Pi } and
B = sqrt ( sum of Ci2 * sum of Pi2 )
Example:
Let's use the data in Table 1 to determine the Betting Correlation of the High-Low counting system.
A = sum { Ci * Pi }
A = (.38*1) + (.44*1) + (.55*1) + (.69*1) + (.46*1) + (.28*0) + (0.00*0) + (-.18*0) + 4*(-.51*-1) + (-.61*-1)
A = 5.17
sum of Ci2 = 1i2 + 1i2 + 1i2 + 1i2 + 1i2 + 0i2 + 0i2 + 0i2 + 4*(-1i2) + -1i2
sum of Ci2 = 1 + 1 + 1 + 1 + 1 + 0 + 0 + 0 + 4 + 1
sum of Ci2 = 10
sum of Pi2 = .38i2 + .44i2 + .55i2 + .69i2 + .46i2 + .28i2 + 0.00i2 + -.18i2 + 4*(-.51i2) + -.61i2
sum of Pi2 = .1444 + .1936 + .3025 + .4761 + .2116 + .0784 + 0 + .0324 + 1.0404 + .3721
sum of Pi2 = 2.8515
B = sqrt ( sum of Ci2 * sum of Pi2 )
B = sqrt ( 10 * 2.8515 )
B = 5.33994
Correlation = A/B = 5.17 / 5.33994 = 0.968
From these calculations we now know that the Betting Correlation of the High-Low counting system in a single deck, S17, NDAS game is approximately 0.968. Of course, the Betting Correlation for other balanced card counting systems can be determined by using the same procedure with the appropriate point count tag values. Furthermore, although there are no major differences within a system, the Betting Correlation for the High-Low counting system (and other systems) for games that use other rules or a greater number of decks can also be determined by using the appropriate "effects of removal" for the game in question.
Return to: Statistics