Question
If you have a 2D scatter plot with n points, you want to draw a horizontal line such that the perpendicular distance between the line and the points is minimised.
Answer
Intuitively, you will take the mean of the y-coordinates and draw the line through the y-axis at this point.
Can we prove this a little more formally?
You basically want to find the minimum $k$ for $\sum_{i=0}^{n} (y_i - k)^2$
$$ \frac{d}{dk} \sum_{i=0}^{n} (y_i - k)^2 $$
$$ = \frac{d}{dk} \sum_{i=0}^{n} (y_i^2 + k^2 - 2 * y_i*k) $$
$$ = \sum_{i=0}^{n} (2*k - 2 * y_i) $$
To find minimum, set to 0 and solve:
$$ 0 = \sum_{i=0}^{n} (2*k - 2 * y_i) $$
$$ => 0 = n * k + \sum_{i=0}^{n} - y_i $$
$$ => k = \frac{ \sum_{i=0}^{n} y_i}{n}$$
Second derivative is $n$ which is positive so this is minimum.
If you have a 2D scatter plot with n points, you want to draw a horizontal line such that the perpendicular distance between the line and the points is minimised.
Answer
Intuitively, you will take the mean of the y-coordinates and draw the line through the y-axis at this point.
Can we prove this a little more formally?
You basically want to find the minimum $k$ for $\sum_{i=0}^{n} (y_i - k)^2$
$$ \frac{d}{dk} \sum_{i=0}^{n} (y_i - k)^2 $$
$$ = \frac{d}{dk} \sum_{i=0}^{n} (y_i^2 + k^2 - 2 * y_i*k) $$
$$ = \sum_{i=0}^{n} (2*k - 2 * y_i) $$
To find minimum, set to 0 and solve:
$$ 0 = \sum_{i=0}^{n} (2*k - 2 * y_i) $$
$$ => 0 = n * k + \sum_{i=0}^{n} - y_i $$
$$ => k = \frac{ \sum_{i=0}^{n} y_i}{n}$$
Second derivative is $n$ which is positive so this is minimum.
No comments:
Post a Comment