[CV] 14. Image Alignment (Transformation Estimation)

1. Recap

Previously, we have seen how local features are extracted from an image, using scale-invariant local feature descriptors such as and . After extracting local features individually from two images, the local features can be paired by searching the one with the highest similarity.

Figure 1. pairs of interest points independently extracted from two images of the same object, from [1]

More specifically, the basic matching algorithm is as follows:

(1) Detect interest points in two images using local feature descriptors.

(2) For each interest point, extract region (patch) and compute a local feature representation.

(3) Compare one feature from image 1 to every feature in image 2 and select one which shows the highest similarity.

(4) Repeat for all interest points in image 1.

(5) return the pairs of interest points.

What can we do with the pairs?

One application is to estimate the transformation between images so that images can be stitched (aligned).

Figure 2. Image stitching, from [1, 8]

From now on, this article will address how to find the transformation.

2. Warping VS Alignment

Before getting into the topic: how to find the image transformation, let’s first elaborate on what do we have and what do we want to do. For this, we need to know the difference between two tasks, and .

Figure 3. Definition of Warping and Alignment, from [1, 2]

Given an image 1 and the image transformation, is what an image 1 would be like after the transformation. On the contrary, is given two images and seeking for what is the transformation between two images?

As one might notice, is the task we want to solve by finding an appropriate transformation . Then how can we find such ?

Since the transformation is applied in pixel-wise mannel, we first need to find candidate points. Further, a points from one image should be matched to one point from the other image, and the found pairs are used to determine the transformation .

Figure 4. Matrix representation of the task: Finding an image transformation T (in matrix form, M), from [1, 9]

Fortunately, we already have the interest point pairs from local feature descriptors. Therefore, what is left is just to find the transformation T (or M, in matrix form).

But how?

3. Types of Transformation

There are three models for the transformation, which are Linear, Affine, and Projective transformation. Choosing which model to define the transformation between matching feature pairs is depends on the task we are working on. Although the projective transformation is the commonly selected model in general, we will take a look at each of them, how they are defined and how to find the values for parameters of the transformation matrices.

3.1 Linear Transformation

Linear transformations are combinations of Scale, Rotation, Shear, and Mirror operations. One property of Linear transform is that only the linear transformations can be represented as matrix. Other types of transformation need ‘Homogeneous Coordinate’ to represent them in a matrix form.

Summary of Linear Transformations, adapted from [1]

In order to fit the parameters of Linear Transformations according to the set of matching feature pairs (also called ‘correspondences’), the well known approach: is applied.

Complete picture of Least Squares Estimation to determine 2D Linear Transformation, adapted from [1]

Given the set of interest points and from image 1 and image 2, respectively, the goal is to find a proper linear function to predict from such that. As we have two unknown parameters, we need two equations to determine the unique solution. Therefore, we need to have at least two matching pairs of interest points.

However, it is common that we get more pairs than two out of two given images. In this case, we concatenate all pairs and apply pseudo inverse to find a solution (for the parameter and ) such that it minimizes the squared error.

Pseudo Inverse to determine the solution for x

3.2 Affine Transformation

Shortly, Affine transformation is the combination of any linear transformation and ‘’. One property to know is that the parallel lines remain parallel after Affine transformations. Further, as Affine transformation for a 2D point cannot be represented by matrix, due to the translation part, the Homogeneous coordinate is applied to represent the 2D Affine transformation by 3 matrix as follows.

Translation operation for a 2D point represented in Matrix form

All linear transformations represented using Homogeneous coordinate are shown below.

Now then how can he find the values for parameters () of Affine transformation?

As now we have 6 degree of freedom, we need three pairs of matching interest points to solve and find the unique solution.

In case there are more than the three pairs, can again be applied to find the solution.

3.3 Projective Transformation (Homography)

The following properties hold in Homography:

  • Rectangle should map to an arbitrary quadrilateral.
  • Straight lines are preserved.
Illustrated Projective Transformation, adapted from [1]

In order to fit a Homography estimating the transformation, we need slightly different approach from the one for previous transformations. Let’s start with the problem definition. Again, we have the matching correspondences, which is noted (A, B) in the following image, and 8 degree of freedom.

Two given images with matching pairs of interest points (A, B) from [1, 2]

The complete process to estimate the transformation matrix is illustrated in the figure below. As now the number of parameters is eight, we need 4 pairs to determine the unique solution. When getting into obtaining the solution that minimizes the least squared error for all found pairs (when number of pairs is larger than 4), one might notice we cannot apply the approachhere, as the rightside of the equation is zero (), and therefore we can only obtain a trivial solution with pseudo inverse.

Figure 5. Derivation of fitting a Homography, adapted from [1, 2]

Let’s take this equation from different perspective. A is a matrix and h is a vector. Recall the definition of eigenvector and eigenvalue. Then the of can be interpreted as finding the eigenvector corresponding to the eigenvalue of zero. While this approach seems nice, one problem remains that the matrix is likely not to be a square matrix.

Therefore, we can apply the singular vector decomposition (SVD) instead, and find the singular vector that corresponds to the singular value of zero.

FIgure 6. Solution to fitting a Homography using SVD, adapted from [1, 2]

The following figure is the result of stitched two images using the estimated homography.

Figure 7. Example of homography in real images, from [1]

Before I wrap up this article, one should keep in mind that the homography only holds for planar(flat) surface, such as walls not car, cat.

More specifically, according to [1], can model

  • transformation across images of any scene taken from the exact same camera position (center of projection)
  • transformation across images of a planar object taken from any camera position

These two cases are illustrated in the following.

Figure 8. Examples of two cases when Homography holds, from [12]

In case you need one for non-planar object, you can consider to use


In order to increase the accuracy and the quality of Image transformation, one can consider to apply RANSAC.

Please refer to here for more information at the moment.


[1] RWTH Aachen, computer vision group

[2] CS 376: Computer Vision of Prof. Kristen Grauman

[3] S. Seitz
[4] Forsyth & Ponce

[5] Prof. Svetlana Lazebnik

[6] Prof M. Irani and R. Basri

[7] Prof. Li fei-fei

[8] Denis Simakov

[9] Alexej Efros

[10] Prof. Krystian Mikolajczyk

[11] Homography1

[12] Homography2

[13] Saad J Bedros

[14] OpenCV



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store