Generally, more than one curve of a given type will appear to fit a set of data. to avoid individual judgement in constructing lines, parabolas, or other approximating curves, it is necessary to agree on a definition of a "best-fitting line," "best-fitting parabola," etc.
To motivate a possible definition consider Figure 1 in which the data points are (x1,y1), (x2,y2), . . ., (xn,yn). For a given value of x, say x1, there will be a difference between the value y1 and the corresponding value as determined from the curve. We denote this difference as "d1", which is sometimes referred to as a deviation, error, or residual and may be positive, negative, or zero. Similarly, corresponding to the values x2, . . ., xn we obtain the deviations d2, . . ., dn.
![]()
A measure of the "goodness of fit" of the curve in Figure 1 to the set of data is provided by the quantity d12 + d22 + . . . + dn2. If this is small the fit is good, if it is large the fit is bad. We therefore, make the following
Definition:
Of all curves approximating a given set of data points, the curve having the property that
d12 + d22 + . . . + dn2 = a minimum
is the best-fitting curve.
A curve having this property is said to fit the data in the least-squares sense and is called a least-squares regression curve or simply a least-squares curve.Let's say we wish to come up with a straight line istead of a curve. The "Regression" or "Prediction" line has the form: y = a + bx
Without going through the proof, we know that:
a = y_hat - b * x_hat, where y_hat and x_hat are the means of {yi} and {xi} respectively, for i = 1 to n.
And we know that:
b = sum {(xi - x_hat)(yi - y_hat)} / sum {(xi - x_hat)2}, for i = 1 to n
In "The Theory of Blackjack", Dr. Peter Griffin uses the "method of least squares" to determine the best linear estimate of deck favorability for any subset of cards.