[CV] 14. Image Alignment (Transformation Estimation)

7 min readDec 23, 2020

1. Recap

Previously, we have seen how local features are extracted from an image, using scale-invariant local feature descriptors such as Harris-Laplace and SIFT. After extracting local features individually from two images, the local features can be paired by searching the one with the highest similarity.

**Figure 1. pairs of interest points independently extracted from two images of the same object, from [1]**

More specifically, the basic matching algorithm is as follows:

(1) Detect interest points in two images using local feature descriptors.

(2) For each interest point, extract region (patch) and compute a local feature representation.

(3) Compare one feature from image 1 to every feature in image 2 and select one which shows the highest similarity.

(4) Repeat step 2 for all interest points in image 1.

(5) return the pairs of interest points.

What can we do with the pairs?

One application is to estimate the transformation between images so that images can be stitched (aligned).

**Figure 2. Image stitching, from [1, 8]**

From now on, this article will address how to find the transformation.

2. Warping VS Alignment

Before getting into the topic: how to find the image transformation, let’s first elaborate on what do we have and what do we want to do. For this, we need to know the difference between two tasks, Warping and Alignment.

**Figure 3. Definition of Warping and Alignment, from [1, 2]**

Given an image 1 and the image transformation, Warping is what an image 1 would be like after the transformation. On the contrary, Alignment is given two images and seeking for what is the transformation between two images?

As one might notice, Alignment is the task we want to solve by finding an appropriate transformation T. Then how can we find such T?

Since the transformation is applied in pixel-wise mannel, we first need to find candidate points. Further, a points from one image should be matched to one point from the other image, and the found pairs are used to determine the transformation T.

**Figure 4. Matrix representation of the task: Finding an image transformation T (in matrix form, M), from [1, 9]**

Fortunately, we already have the interest point pairs from local feature descriptors. Therefore, what is left is just to find the transformation T (or M, in matrix form).

But how?

3. Types of Transformation

There are three models for the transformation, which are Linear, Affine, and Projective transformation. Choosing which model to define the transformation between matching feature pairs is depends on the task we are working on. Although the projective transformation is the commonly selected model in general, we will take a look at each of them, how they are defined and how to find the values for parameters of the transformation matrices.

3.1 Linear Transformation

Linear transformations are combinations of Scale, Rotation, Shear, and Mirror operations. One property of Linear transform is that only the linear transformations can be represented as 2⨯2 matrix. Other types of transformation need ‘Homogeneous Coordinate’ to represent them in a matrix form.

**Summary of Linear Transformations, adapted from [1]**

In order to fit the parameters of Linear Transformations according to the set of matching feature pairs (also called ‘correspondences’), the well known approach: Least Squares Estimation is applied.

**Complete picture of Least Squares Estimation to determine 2D Linear Transformation, adapted from [1]**

Given the set of interest points X and X' from image 1 and image 2, respectively, the goal is to find a proper linear function to predict X from X' such that aX+b = X'. As we have two unknown parameters, we need two equations to determine the unique solution. Therefore, we need to have at least two matching pairs of interest points.

However, it is common that we get more pairs than two out of two given images. In this case, we concatenate all pairs and apply pseudo inverse to find a solution (for the parameter a and b) such that it minimizes the squared error.

**Pseudo Inverse to determine the solution for x**

3.2 Affine Transformation

Shortly, Affine transformation is the combination of any linear transformation and ‘translation’. One property to know is that the parallel lines remain parallel after Affine transformations. Further, as Affine transformation for a 2D point cannot be represented by 2⨯2 matrix, due to the translation part, the Homogeneous coordinate is applied to represent the 2D Affine transformation by 3⨯3 matrix as follows.

**Translation operation for a 2D point represented in Matrix form**

All linear transformations represented using Homogeneous coordinate are shown below.

Now then how can he find the values for parameters (a, b, c,…, f) of Affine transformation?

As now we have 6 degree of freedom, we need three pairs of matching interest points to solve and find the unique solution.

In case there are more than the three pairs, Least Squares Estimation can again be applied to find the solution.

3.3 Projective Transformation (Homography)

Homography is a mapping between any two perspective projections with the same center of projection, according to [1]. For example, two planes in 3D along the same sight ray.

The following properties hold in Homography:

Rectangle should map to an arbitrary quadrilateral.
Straight lines are preserved.

**Illustrated Projective Transformation, adapted from [1]**

In order to fit a Homography estimating the transformation, we need slightly different approach from the one for previous transformations. Let’s start with the problem definition. Again, we have the matching correspondences, which is noted (A, B) in the following image, and 8 degree of freedom.

**Two given images with matching pairs of interest points (A, B) from [1, 2]**

The complete process to estimate the transformation matrix is illustrated in the figure below. As now the number of parameters is eight, we need 4 pairs to determine the unique solution. When getting into obtaining the solution that minimizes the least squared error for all found pairs (when number of pairs is larger than 4), one might notice we cannot apply the pseudo inverse approach here, as the rightside of the equation is zero (Ah=0), and therefore we can only obtain a trivial solution with pseudo inverse.

**Figure 5. Derivation of fitting a Homography, adapted from [1, 2]**

Let’s take this equation Ah=0 from different perspective. A is a matrix and h is a vector. Recall the definition of eigenvector and eigenvalue. Then the h of Ah=0 can be interpreted as finding the eigenvector h corresponding to the eigenvalue of zero. While this approach seems nice, one problem remains that the matrix A is likely not to be a square matrix.

Therefore, we can apply the singular vector decomposition (SVD) instead, and find the singular vector h that corresponds to the singular value of zero.

**FIgure 6. Solution to fitting a Homography using SVD, adapted from [1, 2]**

The following figure is the result of stitched two images using the estimated homography.

**Figure 7. Example of homography in real images, from [1]**

Before I wrap up this article, one should keep in mind that the homography only holds for planar(flat) surface, such as walls not car, cat.

More specifically, according to [1], Homography can model

transformation across images of any scene taken from the exact same camera position (center of projection)
transformation across images of a planar object taken from any camera position

These two cases are illustrated in the following.