Exploring Geometric Transformations in Python for Image Processing
Written on
Chapter 1: Understanding Geometric Transformations
Geometric transformations play a crucial role in the field of computer vision. They not only enhance datasets when data is scarce but also facilitate better model generalization. Every time an image is altered, at least one geometric transformation is involved. In this discussion, we will delve into their functionality and demonstrate how to implement them using Python.
This article will cover:
- Introduction to Geometric Transformations
- What Are Geometric Transformations?
- Practical Applications
- Backward Mapping
- Non-linear Transformations
Introduction to Geometric Transformations
Contrary to popular belief, geometric transformations are frequently encountered when editing images or preparing presentations. While rotation and scaling are the most commonly used transformations, they are far from the only options available. This article addresses the concept of modifying an image's geometry, answering questions about how these transformations are applied.
Geometric transformations can align various images, correct lens distortions, and adjust sizes for different display formats, such as mobile versus desktop views.
So, how do these transformations fit into the realms of data science and AI? The answer lies in data augmentation. Geometric transformations are essential for improving image data through augmentation, which is particularly important for training Convolutional Neural Networks that require large datasets.
Geometric Transformations Explained
Unlike pixel transformations, where pixel values are altered, geometric transformations preserve pixel values while modifying the image's geometry. To elaborate, a geometric transformation maps the original pixel's position (x,y) to a new position (x',y') without changing its intensity.
An affine transformation maintains collinearity among points. The primary types of affine transformations include:
- Translation
- Rotation
- Scaling
- Shearing
To illustrate, consider the following transformations applied to an image:
- Translation: Moves a pixel's position horizontally and/or vertically. For instance, shifting a pixel 10 pixels to the right and 20 pixels up.
- Scaling: Resizes an image, for example, converting a 250x250 pixel image to 1000x500 pixels, where the scaling factors for x and y are 4 and 2, respectively.
- Rotation: Each pixel is rotated by a specified angle, either clockwise or counterclockwise.
- Shearing: Similar to translation but applies shifts differentially based on pixel locations.
Multiple transformations can be combined into a single calculation, utilizing a transformation matrix for efficiency.
If you're interested in seeing how these transformations are implemented in Python, here's a snippet to get you started:
import math
import numpy as np
import matplotlib.pyplot as plt
from skimage import data, transform, img_as_float
transl = transform.EuclideanTransform(translation=(100, -20))
rot = transform.EuclideanTransform(translation=(100, -20), rotation=np.pi/2.)
scal = transform.SimilarityTransform(scale=0.5)
shear = transform.AffineTransform(shear=np.pi/6)
img = img_as_float(a)
transl_img = transform.warp(img, transl.inverse)
rot_img = transform.warp(img, rot.inverse)
scal_img = transform.warp(img, scal.inverse)
shear_img = transform.warp(img, shear.inverse)
This video provides a tutorial on basic geometric transformations using OpenCV with Python, offering practical insights into how these techniques are implemented.
Backward Mapping
The affine transformations we've discussed utilize forward mapping, where coefficients are defined prior to application. However, this method can lead to gaps in the output images. To address this, backward mapping is employed, allowing us to define the output pixel positions first and then trace back to determine the corresponding input pixel values.
The inverse transformation is calculated by inverting the transformation matrix. One challenge of backward mapping is encountering non-integer pixel positions, which necessitates interpolation to assign values accurately. Zero-order interpolation rounds to the nearest pixel, while first-order interpolation (bilinear interpolation) considers the four closest pixels for a weighted average.
Non-linear Transformations
Linear transformations involve scaling the image using a vector or matrix, while non-linear transformations apply more complex modifications. A noteworthy example is the use of a fish-eye lens, which distorts the image in a unique manner. This effect can be simulated in Python by converting coordinates and applying polar transformations.
Here's a code snippet to create a fish-eye effect:
from skimage import transform, data, io
import numpy as np
import matplotlib.pyplot as plt
def fisheye(xy):
center = np.mean(xy, axis=0)
xc, yc = (xy - center).T
r = np.sqrt(xc**2 + yc**2)
theta = np.arctan2(yc, xc)
r = 0.8 * np.exp(r**(1/2.1) / 1.8)
return np.column_stack((r * np.cos(theta), r * np.sin(theta))) + center
out = transform.warp(a, fisheye)
This video demonstrates applying transformations in OpenGL with Python, further illustrating how these concepts can be applied in various programming environments.
Conclusions
Geometric transformations are powerful tools that enable the creation of diverse image variants. While they have broad applications, their role in image augmentation is particularly significant in computer vision, especially for training convolutional neural networks. These transformations are straightforward to implement in Python, as illustrated in this article.
All images used in this article were created by the author. For further exploration, feel free to check my GitHub repository, where I compile resources related to machine learning and artificial intelligence. If you enjoyed this content or have thoughts to share, please leave a comment below.
For more insightful articles or to connect, visit my LinkedIn or explore my other platforms. Thank you for engaging with the In Plain English community!