Tech

A Comprehensive Introduction to Super Resolution for Satellite Images

Super Resolution for Satellite Images Depiction for Article Cover
Published on
December 13, 2024

In this post we aim to provide a general overview of super-resolution methods through the specific lens of their application to satellite imaging. We illustrate or provide examples for different types of super resolution methods by referencing in some cases published papers authored in collaboration with members of our team ([1], [2]). Here we include introductory information, upon which we plan to expand on with more in-depth details in subsequent posts.

What is super-resolution?

Super resolution is an advanced image processing technique used to enhance the resolution of an image, making it clearer and more detailed. It works by reconstructing or predicting finer details that may not be explicitly present in the original lower-resolution image.

Illustration of original lower-resolution satellite images vs super-resolved outputs
Illustration of original satellite image details from Sentinel-2 (top) vs super-resolved outputs by a factor of 2x (bottom) [1].

Why does super resolution matter?

Super resolution is particularly important for satellite images because it can enhance the quality and usefulness of the data captured from space. Satellites play a vital role in areas like environmental monitoring, urban planning, defense, and disaster response, but they can face inherent limitations in terms of resolution due to physical, technical, and financial constraints. Super resolution addresses these limitations, making satellite imagery more valuable for a wide range of applications.

A reminder on detail, resolution, sampling and aliasing

Optic diffraction, sampling theory and aliasing form the basis to understand satellite image super resolution. Each could take its own post to cover. For readers who are already familiar with these concepts this section may serve as a brief reminder. For those who aren’t, we provide a basic introduction and links to good resources in case they want to go into more detail.

To form a satellite image, light rays incoming from the scene pass through an optical system. This system can include a telescope, lens, interferometer, filters and/or other elements. Rays that pass through the optical system reach a sensor that has a certain number of pixels of a determinate size. At this point the light intensity is sampled at each pixel, discretizing the signal representing the scene and forming a digital image.

The optic system of the satellite already determines what is the finest detail that reaches the sensor. One usual way of specifying this is using the Modulation Transfer Function or MTF. The MTF is a function of contrast with respect to frequency. In layman terms, it shows whether after passing through the optic system you can tell apart fine details or if they will be blurred together (finer details being represented by higher frequencies and telling them apart by having sufficient contrast between them). Any detail finer than the cutoff of the MTF is blurred together and can’t be recovered, only extrapolated. Edmund Optics has a good and accessible post with more details on it in [3].

Modulation transfer function (MTF) for an imaging system

The figure above illustrates the Modulation transfer function (MTF) for an imaging system [4] (not a satellite for this illustration, but the concepts all translate directly to satellite systems). MTF for annular aperture of objective mirror (MTFap), the MTF for CCD sampling (MTFCCD), and total MTF of imaging system (MTFtotal = MTFap × MTFCCD) are plotted. MTF values above zero after the Nyquist frequency will translate into aliased spectral content. After the MTF reaches zero, there is no more spectral content that can be recovered by any other method than extrapolation or inductive bias. 

The MTF determines the maximum frequency of the signal that reaches the sensor, and here’s where sampling theory comes into play. The continuous signal can be fully recovered from the discrete samples if sampled at least at 2x the maximum signal frequency [5]. This however is not commonly done since it would produce very blurry images and requires smaller pixels. The common practice is to sample with a rate that cuts off at ~15% MTF contrast or above. This results in sharper images with the added bonus of having less noise (due to using larger pixels), but also introduces aliasing. Aliasing can be seen in images as spurious undesirable effects that arise from insufficient sampling [6]. In the frequency domain it can be interpreted as frequencies above the sampling cutoff being folded back into the signal and added to other frequencies below the cutoff.

illustration of aliasing in the spatial domain
An illustration of aliasing in the spatial domain; the green dots represents sampled data; the dashed line is the aliased signal [7].
illustration of aliasing in the frequency domain
An illustration of aliasing in the frequency domain. A true spectrum (top panel) appears periodically as aliases in the frequency domain with the frequency shift nf0 (middle panel). This results in distortion of the original spectrum (bottom panel) even within the Nyquist frequency fN (shaded area) [8].
Illustration of aliasing effects on satellite images
Illustration of aliasing effects on satellite images: staircase effects in edges and change in direction of periodic stripes are caused by aliasing [1].

It is thus bad to have aliasing effects on images, but for the purpose of super resolution it can be fundamental. Alias is information from the incoming signal that gets mixed up during the sampling process, and there are super resolution techniques that recover actual details from it.

In short, super resolution techniques can either recover actual detail from the incoming signal without extrapolating, or they can make educated guesses about what high-resolution detail was lost during image formation without guarantees of that detail actually being correct. When genuine, verifiable detail is recovered, it’s due to the use of aliased information in the signal.

Single-image super resolution

These methods augment the resolution of an image without using multiple images as input. They use algorithms that analyze the patterns, textures, and structures in the image to predict and generate higher-resolution versions.

Deep-learning-based methods are the most compelling single-image super resolution techniques nowadays, but we’ll devote a section to those in particular further down in the post.

Interpolation

Bilinear, bicubic or spline interpolation (among others) estimate new pixel values by averaging the surrounding pixels. These classic methods are simple and fast, but result in blurry images that don't recover detail and don’t improve the image’s interpretability much. Interpolations can be followed by a sharpening operation to mitigate the blur in the result. You can find an excellent reference on image interpolation in [9].

Classical model-based and regression methods

Some super-resolution techniques do not rely on interpolating the available samples but instead estimate an underlying function that adheres to the observed samples. These methods include, among others, optimization-based techniques, kernel regression [10], and the Projection Onto Convex Sets (POCS) approach [11]. By modeling the underlying signal rather than just filling in pixels, these methods aim to reconstruct higher-frequency details more accurately. These techniques are also applied for multi-image super resolution. These methods are fast and interpretable, but their results have been surpassed by those of data-driven approaches.

Pansharpening

Satellites capture images in different spatial and spectral resolutions and bands. Often, multi-spectral satellites capture a panchromatic band (PAN) which has broad spectral support (e.g. covers from red to blue) and is of higher spatial resolution than the other spectral bands (e.g. red, green, blue, near infrared, etc.). It’s this broader spectral support that enables the higher resolution: since more photons are being integrated by each pixel, pixel size can be reduced (and thus resolution increased) without increasing the sampling noise beyond that of the other narrower bands at lower resolution.

Pansharpening is a technique that increases the resolution of multi-spectral images by combining these low resolution images with higher resolution panchromatic images. This process results in a single high-resolution image that retains the spectral information of the multi-spectral image but with the spatial detail of the panchromatic one.

Common pansharpening methods include:

  • Brovey Transformation: Enhances color by normalizing each interpolated spectral band by the sum of all bands, then multiplying by the high-resolution panchromatic image.
  • Principal Component Analysis (PCA): Transforms the multispectral data into uncorrelated components, which are then combined with the panchromatic image.
  • Intensity-Hue-Saturation (IHS) Transformation: Converts the RGB image to IHS, replaces the intensity with the panchromatic image, then transforms it back to RGB.
  • Wavelet-based Methods: Uses wavelet transforms to decompose the images, combine high-frequency details, and reconstruct the pansharpened image.

Multi-Image super resolution

In these methods, multiple images of the same scene are combined to enhance the resolution. This can be useful when there are slight variations between the images (e.g., due to motion or angles), which can provide more data samples to reconstruct a better, single higher-resolution image. An important advantage of these methods is that they can recover actual high-resolution details from the scene without hallucinating content.

Multi-image methods exploit the fact that the multiple images from a single scene are taken with variations in position of the sensor, thus the sampling grid from the combination of the multiple images is finer than the original one, and a higher resolution image can be recovered from this grid.

Multi-image methods super resolution

In the example above from [2], the authors decompose the input images into coarse and fine detail components, perform kernel regression over the aligned inputs to reconstruct a super resolved detail component and finally combine it with an interpolated coarse image to produce the output.

Sub-pixel accurate image corregistration is usually a necessary component of these methods that is used to generate the underlying higher resolution grid, although there are cases where the corregistration is implicit rather than explicit, this is specially the case for deep learning methods. 

A common pitfall of multi-image methods is that they can produce artifacts in cases where objects are occluded or move in a manner that is not consistent with the global motion between scenes (or local scene regions).

Deep Learning Methods

As in so many applications, deep learning provides a powerful toolset that can yield excellent results when applied to the problem of super resolution, both for single or multi-image variants.

These methods learn how to reconstruct a high-resolution version of a low-resolution sample, and can be divided mainly into two categories: those based on regression (they learn to produce a weighted average of all possible outputs) and those based on conditional generative models like Generative Adversarial Networks (GANs), Normalizing Flows (e.g. SRFlow), or (Latent) Diffusion Models.

SRCNN [12] is the first example of a CNN regressor applied to the problem of super resolution. This network has a simple and effective architecture, consisting of 3 main layers. The first extracts features from the input low-resolution image, the second performs a non-linear mapping of the features to a high-resolution feature space, finally, the high-resolution output is reconstructed from the high-resolution features (see illustration below). Due to its simplicity and effectiveness SRCNN became the foundation for more advanced deep learning-based super-resolution methods.

Illustration of the SRCNN architecture
Illustration of the SRCNN architecture [12].

Generative models were spearheaded by generative adversarial networks that were first applied to super resolution with SRGAN [13]. One of the main motivations to employ GANs is that they could provide sharper and more natural-looking images than MSE-minimizing CNNs. This is because the latter produce reconstructions that represent a pixel-wise average of the many possible high-resolution solutions for the low-resolution input, and thus blur together these many possible solutions. SRGAN on the other hand can reconstruct a single sample from among the possible solutions, minimizing a perceptual loss.

Architecture of SRGAN with kernel size
Architecture of SRGAN with kernel size (k), number of feature maps (n) and stride (s) indicated for each convolutional layer [13].

Both SRCNN and SRGAN (2016 and 2017 respectively) are seminal works that have led to many improvements and advances since, but they are still representative of the class of methods they preceded.

GANs are notoriously hard to train and suffer from mode-collapse (networks converge to a very limited set of results). More recently normalizing flows (e.g. SRFlow [14]) and diffusion model methods (e.g. SR3 [15]) have been proposed addressing these issues. They also provide higher quality outputs.

We’ll cover more thoroughly in an upcoming post deep learning super resolution methods, presenting more recent works and results.  

Most deep learning super resolution methods have been conceived for general-purpose image enhancement. However, satellite imagery presents unique challenges and characteristics that  differ from typical photographic images. The general-purpose methods can be applied out of the box to satellite images, but the best results will be obtained by training on the target data and adapting techniques to the specific characteristics of a satellite.

One such example of a deep learning super resolution method that exploits particular characteristics of the target satellite is L1BSR [1]. This is a self-supervised method trained directly on real Sentinel-2 L1B 10m data, not requiring a high-resolution ground truth. To produce super-resolved results, L1BSR leverages the specific design of the satellite’s sensor. The method exploits the fact that data from the different bands are captured at different instants, with subpixel displacements between each other in order to reconstruct a high-resolution image. The training (illustrated below) enforces the high-resolution reconstruction to be consistent with the coregistered low-resolution bands captured by an adjacent overlapping CMOS detector that serve as target for the self-supervised training.

Illustration of L1BSR
Illustration of L1BSR [1], described above. At inference time, only one input and the reconstruction net are required.

In the case of supervised methods, a dataset with low resolution / high resolution pairs is necessary, which in the case of satellite images can pose unique challenges.

Representative and well balanced datasets are a must for deep learning methods. For example, if you want to have good super resolution in urban areas to improve the result of a remote sensing application in cities, a dataset with urban samples may suffice, but the super-resolution model would underperform in forests, mountains, fields or other environments not represented in the dataset.

Look out for our upcoming post diving deeper into these methods. 

Metrics 

Evaluation metrics for super resolution methods commonly involve comparing a ground truth high-resolution image with a super-resolved result obtained from the downsampled ground truth. These metrics, often used in combination, give a more comprehensive evaluation of an SR method's performance by balancing objective and subjective aspects of image quality.

Pixel-based metrics

These are directly based on the pixel-wise differences in values between the ground truth and the reconstruction. The most common ones are:

Peak Signal-to-Noise Ratio (PSNR): measures the ratio of the maximum possible power of a signal (image) to the power of the noise affecting it. Higher PSNR values indicate that the SR image is closer to the ground truth image. It can be calculated directly from the mean square error (MSE) using the formula below, where L is the maximum pixel value possible (255 for an 8-bit image).

Structural Similarity Index (SSIM): assesses the similarity between the SR image and the ground truth based on structural information like luminance, contrast, and structure. It’s designed to better reflect human perception compared to PSNR, but it may not fully align with subjective quality perception.

Perceptual Metrics

There can be cases where images with low PSNR or SSIM values may present artifacts or unnatural-looking elements. Perceptual metrics capture how natural or visually realistic the super resolved images appear, often aligning more closely with human perception.

Natural Image Quality Evaluator (NIQE): this is a no-reference metric, meaning it evaluates image quality without needing a ground truth. It measures natural image statistics to determine how “natural” or visually pleasing the SR image looks. Lower NIQE scores generally indicate better quality, as the image appears more realistic and closer to natural image statistics.

Learned Perceptual Image Patch Similarity (LPIPS): often used in training deep learning-based SR models, leverages pre-trained deep neural networks (like VGG) to calculate the perceptual difference between the SR and ground truth images. This metric compares feature representations of both images, ensuring that the SR model focuses on reproducing perceptual qualities rather than just pixel accuracy.

Task-Specific Metrics

These metrics indicate how well the SR images serve practical applications.

Detection or Classification Accuracy: In cases where SR images are used for tasks like object detection or image classification, the effectiveness of the SR method can be evaluated by measuring how well it improves performance on these tasks. For instance, running an object detector on the SR output and checking if the detection accuracy improves gives insight into how useful the SR method is for downstream applications.

Subjective Quality Assessment

Mean Opinion Score (MOS): MOS involves human evaluators who rate the perceived quality of the SR images on a scale (e.g., 1 to 5). This method captures subjective quality aspects that may not be well reflected in objective metrics. It’s often considered the gold standard for quality evaluation but is time-consuming and is subjective.

Pros, cons and achievable resolution gains

Methods Pros Cons
Single-image Only a LR image is required Typically lower resolution gains than multi-image methods
Multi-image Potentially higher resolution enhancement due to aggregated information Can require more costly or complex specific satellite maneuvers to produce a series of input images with enough overlap
Requires excellent either explicit or implicit input image alignment
Can produce artifacts in misaligned areas
Interpolation Simple and fast, low compute requirements Don’t recover much detail, barely impacting image interpretability
Classical model-based & regression methods Fast and interpretable Results have been surpassed by those of data-driven approaches
Pan sharpening Simple and fast, with good resolution enhancement of LR bands. A higher-resolution PAN channel is required, and resolution is enhanced only up to that reference PAN band.
Deep Learning Methods (regression or generative) Very compelling resolution enhancement. Large, well curated image databases are required.
High computational cost for training.
Results can suffer from distortions or hallucinations, particularly for enhancement factors above 2x and when using networks not trained on datasets representative of the target sensor and operational conditions.
Regressor CNNs Very good improvement to image interpretability and resolution.
Easy to train (if you have the dataset).
Learn to average among the possible solutions, bounding the level of detail of the reconstruction.
Generative Networks Currently highest improvement to image interpretability.
Permits to quantify the uncertainty of the SR image.
Unstable training in the case of GANs.
As of today, diffusion models have high computational costs.

Due to the physical constraints of image formation described in the sampling theory section, super resolution gains above 2x risk introducing hallucinations.

Deep learning models can push the reconstruction resolution well beyond the limits imposed by sampling theory, with some vendors offering 10x resolution increases. In these extreme cases, it must be understood that what the models do is mostly invent content or inpaint the reconstruction with archived content. For example, you may find in the reconstruction a partially-built road that has now been completed, but it’s shown in the same state as in some older aerial survey images used to guide the 10x enhancement.

Conclusions

In this post, we’ve laid a foundational understanding of super resolution and its importance in satellite imaging. We’ve explored the principles of resolution, sampling, and aliasing, which underpin the challenges and opportunities in enhancing satellite images. We’ve also provided an overview of various super-resolution methods, ranging from traditional interpolation techniques to state-of-the-art deep learning models, discussing their strengths, limitations, and potential applications.

Super resolution is not just a technical advancement but a transformative capability for satellite imaging. By enhancing image quality, it unlocks new levels of detail and accuracy for critical applications in environmental monitoring, disaster response, urban planning, and beyond. As satellite technology continues to evolve, so too will the methods we use to extract the most value from its data. However, achieving high-resolution results requires a careful balance between reconstructing authentic detail and avoiding hallucinated content, especially when pushing beyond the physical limits of image formation.

Looking forward, the future of super resolution in satellite imagery lies in the synergy between advanced algorithms, tailored datasets, and practical considerations of real-world applications. With rapid advancements in deep learning, the potential for even greater breakthroughs is immense, but they must be matched by rigorous evaluation and domain-specific adaptation to ensure reliability and usefulness.

We’re excited to continue this exploration in subsequent posts, where we’ll delve deeper into the latest innovations, real-world case studies, and emerging trends in satellite image super resolution. Stay tuned!

References

[1] Nguyen, N.L., Anger, J., Davy, A., Arias, P. and Facciolo, G., 2023. L1BSR: Exploiting detector overlap for self-supervised single-image super-resolution of Sentinel-2 L1b imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2012-2022).

[2] Lafenetre, J., Nguyen, N.L., Facciolo, G. and Eboli, T., 2023. Handheld burst super-resolution meets multi-exposure satellite imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2056-2064).

[3] Edmund Optics. Introduction to Modulation Transfer Function. Edmund Optics. [online] Available at: https://www.edmundoptics.com/knowledge-center/application-notes/optics/introduction-to-modulation-transfer-function/ 

‌[4] Ohsuka, Shinji, et al. "Laboratory-size three-dimensional x-ray microscope with Wolter type I mirror optics and an electron-impact water window x-ray source." Review of Scientific Instruments 85.9 (2014).

[5] Wikipedia Contributors (2019). “Sampling (signal processing)”. Wikipedia. [online]  Available at: https://en.wikipedia.org/wiki/Sampling_(signal_processing).

[6] Wikipedia Contributors (2019). Aliasing. [online] Wikipedia. Available at: https://en.wikipedia.org/wiki/Aliasing

[7] Perpetual Industries. (n.d.). Signal Processing: Aliasing. [online] Available at: https://xyobalancer.com/signal-rocessing-aliasing/  [Accessed 3 Dec. 2024].

[8] Narita, Y., and K-H. Glassmeier. "Spatial aliasing and distortion of energy distribution in the wave vector domain under multi-spacecraft measurements." Annales Geophysicae. Vol. 27. No. 8. Göttingen, Germany: Copernicus Publications, 2009.

[9] Pascal Getreuer, Linear Methods for Image Interpolation, Image Processing On Line, 1 (2011), pp. 238–259. https://doi.org/10.5201/ipol.2011.g_lmii 

[10] Kernel regression, H. Takeda, S. Farsiu, and P. Milanfar, Kernel Regression for Image Processing and Reconstruction, IEEE Transactions on Image Processing, 16 (2007), pp. 349–366. https://doi.org/10.1109/TIP.2006.888330

[11] B. Wronski, I. Garcia-Dorado, M. Ernst, D. Kelly, M. Krainin, C. Liang, M. Levoy, and P. Milanfar, Handheld Multi-Frame Super-Resolution, ACM Transactions on Graphics, 38 (2019), pp. 28:1–28:18. https://dl.acm.org/doi/10.1145/3306346.3323024

[12] Dong, Chao, et al. "Image super-resolution using deep convolutional networks." IEEE transactions on pattern analysis and machine intelligence 38.2 (2015): 295-307. 

[13] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z. and Shi, W., 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681-4690).

[14] Lugmayr, A., Danelljan, M., Van Gool, L. and Timofte, R., 2020. Srflow: Learning the super-resolution space with normalizing flow. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16 (pp. 715-732). Springer International Publishing.

[15] Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J. and Norouzi, M., 2022. Image super-resolution via iterative refinement. IEEE transactions on pattern analysis and machine intelligence, 45(4), pp.4713-4726.