# [CV] 14. Image Alignment (Transformation Estimation)

# 1. Recap

Previously, we have seen how local features are extracted from an image, using scale-invariant local feature descriptors such as *Harris-Laplace* and *SIFT*. After extracting local features individually from two images, the local features can be paired by searching the one with the highest similarity.

More specifically, the basic matching algorithm is as follows:

(1) Detect interest points in two images using local feature descriptors.

(2) For each interest point, extract region (patch) and compute a local feature representation.

(3) Compare one feature from image 1 to every feature in image 2 and select one which shows the highest similarity.

(4) Repeat *step 2* for all interest points in image 1.

(5) return the pairs of interest points.

**What can we do with the pairs?**

One application is to estimate the transformation between images so that images can be stitched (aligned).

From now on, this article will address how to find the **transformation**.

# 2. Warping VS Alignment

Before getting into the topic: how to find the image transformation, let’s first elaborate on what do we have and what do we want to do. For this, we need to know the difference between two tasks, *Warping* and *Alignment*.

Given an image 1 and the image transformation, *Warping* is what an image 1 would be like after the transformation. On the contrary, *Alignment* is given two images and seeking for what is the transformation between two images?

As one might notice, *Alignment* is the task we want to solve by finding an appropriate transformation ** T**. Then how can we find such

**?**

*T*Since the transformation is applied in pixel-wise mannel, we first need to find candidate points. Further, a points from one image should be matched to one point from the other image, and the found pairs are used to determine the transformation ** T**.

Fortunately, we already have the interest point pairs from local feature descriptors. **Therefore, what is left is just to find the transformation T (or M, in matrix form).**

**But how?**

# 3. Types of Transformation

There are three models for the transformation, which are Linear, Affine, and Projective transformation. Choosing which model to define the transformation between matching feature pairs is depends on the task we are working on. Although the projective transformation is the commonly selected model in general, we will take a look at each of them, how they are defined and how to find the values for parameters of the transformation matrices.

## 3.1 Linear Transformation

Linear transformations are combinations of Scale, Rotation, Shear, and Mirror operations. One property of Linear transform is that only the linear transformations can be represented as *2⨯2* matrix. Other types of transformation need ‘Homogeneous Coordinate’ to represent them in a matrix form.

In order to fit the parameters of Linear Transformations according to the set of matching feature pairs (also called ‘correspondences’), the well known approach:** Least Squares Estimation** is applied.

Given the set of interest points ** X** and

**from image 1 and image 2, respectively, the goal is to find a proper linear function to predict**

*X'***from**

*X***such that**

*X'***. As we have two unknown parameters, we need two equations to determine the unique solution. Therefore, we need to have at least two matching pairs of interest points.**

*aX+b = X'*However, it is common that we get more pairs than two out of two given images. In this case, we concatenate all pairs and apply pseudo inverse to find a solution (for the parameter ** a** and

**) such that it minimizes the squared error.**

*b*## 3.2 Affine Transformation

Shortly, Affine transformation is the combination of any linear transformation and ‘** translation**’. One property to know is that the parallel lines remain parallel after Affine transformations. Further, as Affine transformation for a 2D point cannot be represented by

*2⨯2*matrix, due to the translation part, the Homogeneous coordinate is applied to represent the 2D Affine transformation by 3

*⨯3*matrix as follows.

All linear transformations represented using Homogeneous coordinate are shown below.

Now then how can he find the values for parameters (** a, b, c,…, f**) of Affine transformation?

As now we have 6 degree of freedom, we need three pairs of matching interest points to solve and find the unique solution.

In case there are more than the three pairs, ** Least Squares Estimation **can again be applied to find the solution.

## 3.3 Projective Transformation (Homography)

Homography is a mapping between any two perspective projections with the same center of projection, according to [1]. For example, two planes in 3D along the same sight ray.

The following properties hold in Homography:

- Rectangle should map to an arbitrary quadrilateral.
- Straight lines are preserved.

In order to fit a Homography estimating the transformation, we need slightly different approach from the one for previous transformations. Let’s start with the problem definition. Again, we have the matching correspondences, which is noted (A, B) in the following image, and 8 degree of freedom.

The complete process to estimate the transformation matrix is illustrated in the figure below. As now the number of parameters is eight, we need 4 pairs to determine the unique solution. When getting into obtaining the solution that minimizes the least squared error for all found pairs (when number of pairs is larger than 4), one might notice we cannot apply the *pseudo inverse *approach* *here, as the rightside of the equation is zero (** Ah=0**), and therefore we can only obtain a trivial solution with pseudo inverse.

Let’s take this equation *Ah=0 *from different perspective. A is a matrix and h is a vector. Recall the definition of eigenvector and eigenvalue. Then the *h* of *Ah=0 *can be interpreted as finding the eigenvector *h* corresponding to the eigenvalue of zero. While this approach seems nice, one problem remains that the matrix *A* is likely not to be a square matrix.

**Therefore, we can apply the singular vector decomposition (SVD) instead, and find the singular vector h that corresponds to the singular value of zero.**

The following figure is the result of stitched two images using the estimated homography.

**Before I wrap up this article, one should keep in mind that the homography only holds for planar(flat) surface, such as walls not car, cat.**

More specifically, according to [1], *Homography* can model

- transformation across images of any scene taken from the exact same camera position (center of projection)
- transformation across images of a planar object taken from any camera position

These two cases are illustrated in the following.

In case you need one for non-planar object, you can consider to use *‘Fundamental Matrix’.*

# 4. RANSAC

In order to increase the accuracy and the quality of Image transformation, one can consider to apply RANSAC.

The content is planned to be updated

Please refer to here for more information at the moment.

# Reference

[1] RWTH Aachen, computer vision group

[2] CS 376: Computer Vision of Prof. **Kristen Grauman**

[3] **S. Seitz**

[4] **Forsyth & Ponce**

[5] Prof. **Svetlana Lazebnik**

[6] Prof **M. Irani** and **R.** **Basri**

[7] Prof. **Li fei-fei**

[8] Denis Simakov

[9] Alexej Efros

[10] Prof. Krystian Mikolajczyk

[11] Homography1

[12] Homography2

[13] Saad J Bedros

[14] OpenCV