Kalman filter primer: derivations and golang implementation

One word for anyone who doesn’t know about the filter yet: a simple but strong algorithm which combines estimates and measurements to give better predications of some linear systems.

I learned it when studying self-driving during which I revisited really a lot of knowledge of linear algebra. So I wrote my notes here and hope that it would be your most easy-to-understand kalman filter primer.

First of all, let’s assume a linear system which is modelled by the following two equations:

Fig 1


  • Xt (n*1) is the state vector of a process at time t, e.g., [position, velocity] of a running car
  • Ut is the control input vector at time t, e.g., brake or acceleration of the car
  • Ft (n*n) and Bt are transition matrices at time t
  • Zt (n*1) is measurement of Xt at time t, and Ht (n*n) is transition matrix at time t
  • Wt (n*1) is estimation noise vector at time t. It’s normal distribution and covariance is given by Qt (n*n)
  • Vt (n*1) is measurement noise vector at time t. Also normal distribution and covariance is given by Rt (n*n)

By linear, we mean its dynamics can be described by linear functions or transitions, for instance, we can describe the position (s) and velocity (v) of a car as the following(a is acceleration):

Fig 2

Then its state vector can be described as:

Fig 3

State vector at time t has linear relations with state vector at time t-1. That is just linear.

Now it comes to the predication equations:

Fig 4

X with a hat and superscript ‘-’ in (3) is the prior estimation of the state vector, that is to say, we get it only by prediction. X with a hat at t-1 in (3) is the fused state vector at time t-1 (you will see how we fuse prediction and measurement in update equations). P with superscript ‘-’ in (4) is our estimation error covariance with the unknown true value X. P at t-1 in (4) is the fused error covariance with the unknown true value X at time t-1.

How do we get (4)? Let’s derive it by definition:

Fig 5

Covariance is just the variances between two state vectors and variance is just the estimation of error between two state variables, so it should be easy to understand (8).

Substitution of (1) and (3) into (8) gives:

Fig 6

Please note that Wt and state predication error are independent from each other, so their covariance is 0 matrix. Ft can be seen as constant matrix, so its estimation is itself. Also you should know some basic knowledge of probability and linear algebra, mainly:

Fig 7

C is constant matrix and superscript ‘T’ means transpose matrix. These are what you should know before proceeding to the next stage, so review them first if needed.

Finally we come to the update equations:

Fig 8

From predication we got prior X with superscript ‘-’ and from measurement we got Zt, how do we fuse them to get a better value which is closer to the true X? (5) provides a weighted approach to fuse them together with nice properties that when prediction error is larger, fused X is closer to measurement, otherwise it’s closer to prediction. Pt is the error covariance of fused X and the unknown true X. Our objective is just to minimize Pt, specifically, the trace of Pt (the trace of matrix is the sum of elements along its main diagonal), so the fused X can be as close to the true X as possible. Kt (n*n) is the optimal weight factor which gives the minimized Pt. Let’s derive it step by step.

Still by definition:

Fig 9

Substitution of (1), (2) and (5) into (9) gives:

Fig 10

It’s similar to the derivation of prior error covariance, so we won’t explain further.

Now the derivative of tr(Pt) with respect to Kt:

Fig 11

Note that tr(Pt) is scalar while Kt is matrix and the derivative is matrix too. In fact, tr(Pt) is the sum function of its main diagonal elements and the derivative matrix is just the derivative of tr(Pt) with respect to each element of Kt. The equation above also uses the following matrix trace properties:

Fig 12

The second item in (10) is the transpose matrix of the third item in (10) (Pt is symmetric), so their traces are the same, so are their derivatives, as illustrated in Fig 13.

Fig 13

Let (11) be 0, then we finally get the optimal K in (7). The superscript ‘-1’ means inverse matrix.

Then the minimized Pt is given by:

Fig 14

which is just described in (6).

Well done! We completed the derivation of kalman filter and now let’s put the five equations together:

Fig 15

That’s all about it. Kalman filter is iterative and it’s easy to implement the algorithm following the equations above. Yes you don’t need to know the details simply for applications, but knowing the derivations is certainly better for building reasonable applications😃.

Finally, an implementation in golang for your reference:

Enjoy learning!

Breathtaking interfaces and strong services together make great products