Saturday, April 16, 2011

Gaussian Elimination Through Linear Least Squares

Ax = b, where A is an nxn, x is an nx1, and it craps out an nx1 on the right side called b. We all know how to solve this system with Gaussian elimination and back-substitution. We also know that if we're trying to reduce A and we wind up with a row that is all 0's, we're in deep shit: b lies outside the column space of A, so you're asking to find a 3D point in a 2D plane (not gonna happen).

Now lets talk about another related problem: Ax = b, where A is an nx3, x is a 3x2, and b is a nx2. This is how I'm used to solving the affine transformation matrix between two points. The nx3 matrix is all my points in one image in homogeneous coordinates, and my nx2 matrix is all my points in the other image, and I'm trying to solve x: the affine matrix. If I tried to use Gaussian elimination between A and b (an nx3 and a nx2, where n > 3), I'd realize that A's column space isn't anywhere near the level of dimensionality I need to solve b. Its hard to do Gaussian elimination when the matrices aren't squares.

So if we can't solve for b, we can solve for proj(b, A) instead. Then our solution is going to be the best x, such that Ax = proj(b, A). As a reminder, the proj(b, a) is (b.a/b.b)*a, so we're trying to maximize that (b.a/b.b)*a and minimize the distance between that projection and b. We can solve this with geometry or calculus. Geometry tends to be easier, but calculus will come back up for non-linear least squares.


We're solving for the new 'best' x which makes Ax = proj(b, A). We use the constraint that A and the error (b - A*x) must be orthogonal since the error was in that higher dimensional space that we could solve for before. Using some fancy math, we now have an expression for a new x that is about as optimal as we can possibly get. 


**Note that when I solve for the affine, I usually think of it as points2 = Affine*points1, but to put it in terms of a linear least squares, I have to put the affine in the x position of b = Ax. Matrix multiplication isn't commutative, but (AB)' = (B'A'). That's why, in matlab, I usually have a lot of these:  Affine = (points1'\points2')'. 

Dot Products: What the hell do they mean?

First off, I think all the formulas and equations in linear algebra can be broken into two camps: stuff that is computable, and stuff that gives you some understanding of what's going on.

I submit to the jury, exhibit A:
a1*b1 + a2*b2 + a3*b3 .....
or
sqrt(a1^2 + a2^2 ...)*sqrt(b1^2 + b2^2)*cos(theta)
Computationally, this is really easy to do. You can crank this out on a test in the 1 minute, then slam the test down at the front of the desk in a giant display of "Fuck ya."
But what about understanding? The dot product doesn't really relate to any intuitive meaning. For example, when I see the equation for projections that involve dot products, it doesn't make much sense to me (see the far right of the below derivation).


So the projection of a onto b   =   a fraction of b that is proportional to the projection over b   =   some fancy math...... => a term that includes some dot products. 

The dot product equation only makes sense once you step back to that first (||y||/||b||)*b and realize that you can calculate ||y|| with some trig, and then wrap that trig in the definition of a dot product. 

My big message is that dot products sometimes hide the REAL meaning of the operation. Its a tradeoff between ease of computation and true understanding, and any good Linear Algebra course should take care to draw a distinction between equations that actually give you intuition vs equations that are easy to put into a machine. 

Friday, April 15, 2011

Jacobian: A term for Non-Chalcedonian Syrians, 18th century British monarchists, and now a matrix! The most overloaded word in the English language: Webster should have thrown a COMPILER ERROR a long time ago.

My Old Nemesis: Linear Least Squares
I fumbled across this today and decided to build some more intuition on the topic, but first I had to figure out what the damn J was. After hunting around for an explanation I realized that the Jacobian is just a Gradient, or, thats backwards: the Gradient is a Jacobian! So as a recap, an image is just a function [x,y] |-> f(x,y), and the gradient turns a 1-d point f(x,y) into a 2-d space via f(x,y) |-> [df/dx, df/dy]. All we're doing is partial derivatives: No brain surgery. The jacobian matrix is a generalization of this principle.

Lets say we have an image I, such that [x,y] |-> x+y. This would look pretty boring:


and the gradient operation would just take x+y |-> [1, 1].

Here's where wiki comes in: "[A Jacobite Matrix] is the matrix of all first-order partial derivatives of a vector- or scalar-valued function with respect to another vector"


Isn't that what we just did? We took the jacobite of f(x,y) with the vector [x y]. In matlab, it would look like:

syms x y;
f = [x+y];
jacobian(f,[x y])

The jacobian operation pairs up it's two vectors like an outer product, but instead of multiplication, it derives one with respect to the other. The dimensionality increases the same way. In the code a jacobian(1x1, 2x1) => 2x1.