In this post we aim to provide a general overview of super-resolution methods through the specific lens of their application to satellite imaging. We illustrate or provide examples for different types of super resolution methods by referencing in some cases published papers authored in collaboration with members of our team ([1], [2]). Here we include introductory information, upon which we plan to expand on with more in-depth details in subsequent posts.
What is super-resolution?
Super resolution is an advanced image processing technique used to enhance the resolution of an image, making it clearer and more detailed. It works by reconstructing or predicting finer details that may not be explicitly present in the original lower-resolution image.
Why does super resolution matter?
Super resolution is particularly important for satellite images because it can enhance the quality and usefulness of the data captured from space. Satellites play a vital role in areas like environmental monitoring, urban planning, defense, and disaster response, but they can face inherent limitations in terms of resolution due to physical, technical, and financial constraints. Super resolution addresses these limitations, making satellite imagery more valuable for a wide range of applications.
A reminder on detail, resolution, sampling and aliasing
Optic diffraction, sampling theory and aliasing form the basis to understand satellite image super resolution. Each could take its own post to cover. For readers who are already familiar with these concepts this section may serve as a brief reminder. For those who aren’t, we provide a basic introduction and links to good resources in case they want to go into more detail.
To form a satellite image, light rays incoming from the scene pass through an optical system. This system can include a telescope, lens, interferometer, filters and/or other elements. Rays that pass through the optical system reach a sensor that has a certain number of pixels of a determinate size. At this point the light intensity is sampled at each pixel, discretizing the signal representing the scene and forming a digital image.
The optic system of the satellite already determines what is the finest detail that reaches the sensor. One usual way of specifying this is using the Modulation Transfer Function or MTF. The MTF is a function of contrast with respect to frequency. In layman terms, it shows whether after passing through the optic system you can tell apart fine details or if they will be blurred together (finer details being represented by higher frequencies and telling them apart by having sufficient contrast between them). Any detail finer than the cutoff of the MTF is blurred together and can’t be recovered, only extrapolated. Edmund Optics has a good and accessible post with more details on it in [3].
The figure above illustrates the Modulation transfer function (MTF) for an imaging system [4] (not a satellite for this illustration, but the concepts all translate directly to satellite systems). MTF for annular aperture of objective mirror (MTFap), the MTF for CCD sampling (MTFCCD), and total MTF of imaging system (MTFtotal = MTFap × MTFCCD) are plotted. MTF values above zero after the Nyquist frequency will translate into aliased spectral content. After the MTF reaches zero, there is no more spectral content that can be recovered by any other method than extrapolation or inductive bias.
The MTF determines the maximum frequency of the signal that reaches the sensor, and here’s where sampling theory comes into play. The continuous signal can be fully recovered from the discrete samples if sampled at least at 2x the maximum signal frequency [5]. This however is not commonly done since it would produce very blurry images and requires smaller pixels. The common practice is to sample with a rate that cuts off at ~15% MTF contrast or above. This results in sharper images with the added bonus of having less noise (due to using larger pixels), but also introduces aliasing. Aliasing can be seen in images as spurious undesirable effects that arise from insufficient sampling [6]. In the frequency domain it can be interpreted as frequencies above the sampling cutoff being folded back into the signal and added to other frequencies below the cutoff.
It is thus bad to have aliasing effects on images, but for the purpose of super resolution it can be fundamental. Alias is information from the incoming signal that gets mixed up during the sampling process, and there are super resolution techniques that recover actual details from it.
In short, super resolution techniques can either recover actual detail from the incoming signal without extrapolating, or they can make educated guesses about what high-resolution detail was lost during image formation without guarantees of that detail actually being correct. When genuine, verifiable detail is recovered, it’s due to the use of aliased information in the signal.
Single-image super resolution
These methods augment the resolution of an image without using multiple images as input. They use algorithms that analyze the patterns, textures, and structures in the image to predict and generate higher-resolution versions.
Deep-learning-based methods are the most compelling single-image super resolution techniques nowadays, but we’ll devote a section to those in particular further down in the post.
Interpolation
Bilinear, bicubic or spline interpolation (among others) estimate new pixel values by averaging the surrounding pixels. These classic methods are simple and fast, but result in blurry images that don't recover detail and don’t improve the image’s interpretability much. Interpolations can be followed by a sharpening operation to mitigate the blur in the result. You can find an excellent reference on image interpolation in [9].
Classical model-based and regression methods
Some super-resolution techniques do not rely on interpolating the available samples but instead estimate an underlying function that adheres to the observed samples. These methods include, among others, optimization-based techniques, kernel regression [10], and the Projection Onto Convex Sets (POCS) approach [11]. By modeling the underlying signal rather than just filling in pixels, these methods aim to reconstruct higher-frequency details more accurately. These techniques are also applied for multi-image super resolution. These methods are fast and interpretable, but their results have been surpassed by those of data-driven approaches.
Pansharpening
Satellites capture images in different spatial and spectral resolutions and bands. Often, multi-spectral satellites capture a panchromatic band (PAN) which has broad spectral support (e.g. covers from red to blue) and is of higher spatial resolution than the other spectral bands (e.g. red, green, blue, near infrared, etc.). It’s this broader spectral support that enables the higher resolution: since more photons are being integrated by each pixel, pixel size can be reduced (and thus resolution increased) without increasing the sampling noise beyond that of the other narrower bands at lower resolution.
Pansharpening is a technique that increases the resolution of multi-spectral images by combining these low resolution images with higher resolution panchromatic images. This process results in a single high-resolution image that retains the spectral information of the multi-spectral image but with the spatial detail of the panchromatic one.
Common pansharpening methods include:
- Brovey Transformation: Enhances color by normalizing each interpolated spectral band by the sum of all bands, then multiplying by the high-resolution panchromatic image.
- Principal Component Analysis (PCA): Transforms the multispectral data into uncorrelated components, which are then combined with the panchromatic image.
- Intensity-Hue-Saturation (IHS) Transformation: Converts the RGB image to IHS, replaces the intensity with the panchromatic image, then transforms it back to RGB.
- Wavelet-based Methods: Uses wavelet transforms to decompose the images, combine high-frequency details, and reconstruct the pansharpened image.
Multi-Image super resolution
In these methods, multiple images of the same scene are combined to enhance the resolution. This can be useful when there are slight variations between the images (e.g., due to motion or angles), which can provide more data samples to reconstruct a better, single higher-resolution image. An important advantage of these methods is that they can recover actual high-resolution details from the scene without hallucinating content.
Multi-image methods exploit the fact that the multiple images from a single scene are taken with variations in position of the sensor, thus the sampling grid from the combination of the multiple images is finer than the original one, and a higher resolution image can be recovered from this grid.
In the example above from [2], the authors decompose the input images into coarse and fine detail components, perform kernel regression over the aligned inputs to reconstruct a super resolved detail component and finally combine it with an interpolated coarse image to produce the output.
Sub-pixel accurate image corregistration is usually a necessary component of these methods that is used to generate the underlying higher resolution grid, although there are cases where the corregistration is implicit rather than explicit, this is specially the case for deep learning methods.
A common pitfall of multi-image methods is that they can produce artifacts in cases where objects are occluded or move in a manner that is not consistent with the global motion between scenes (or local scene regions).
Deep Learning Methods
As in so many applications, deep learning provides a powerful toolset that can yield excellent results when applied to the problem of super resolution, both for single or multi-image variants.
These methods learn how to reconstruct a high-resolution version of a low-resolution sample, and can be divided mainly into two categories: those based on regression (they learn to produce a weighted average of all possible outputs) and those based on conditional generative models like Generative Adversarial Networks (GANs), Normalizing Flows (e.g. SRFlow), or (Latent) Diffusion Models.
SRCNN [12] is the first example of a CNN regressor applied to the problem of super resolution. This network has a simple and effective architecture, consisting of 3 main layers. The first extracts features from the input low-resolution image, the second performs a non-linear mapping of the features to a high-resolution feature space, finally, the high-resolution output is reconstructed from the high-resolution features (see illustration below). Due to its simplicity and effectiveness SRCNN became the foundation for more advanced deep learning-based super-resolution methods.
Generative models were spearheaded by generative adversarial networks that were first applied to super resolution with SRGAN [13]. One of the main motivations to employ GANs is that they could provide sharper and more natural-looking images than MSE-minimizing CNNs. This is because the latter produce reconstructions that represent a pixel-wise average of the many possible high-resolution solutions for the low-resolution input, and thus blur together these many possible solutions. SRGAN on the other hand can reconstruct a single sample from among the possible solutions, minimizing a perceptual loss.
Both SRCNN and SRGAN (2016 and 2017 respectively) are seminal works that have led to many improvements and advances since, but they are still representative of the class of methods they preceded.
GANs are notoriously hard to train and suffer from mode-collapse (networks converge to a very limited set of results). More recently normalizing flows (e.g. SRFlow [14]) and diffusion model methods (e.g. SR3 [15]) have been proposed addressing these issues. They also provide higher quality outputs.
We’ll cover more thoroughly in an upcoming post deep learning super resolution methods, presenting more recent works and results.
Most deep learning super resolution methods have been conceived for general-purpose image enhancement. However, satellite imagery presents unique challenges and characteristics that differ from typical photographic images. The general-purpose methods can be applied out of the box to satellite images, but the best results will be obtained by training on the target data and adapting techniques to the specific characteristics of a satellite.
One such example of a deep learning super resolution method that exploits particular characteristics of the target satellite is L1BSR [1]. This is a self-supervised method trained directly on real Sentinel-2 L1B 10m data, not requiring a high-resolution ground truth. To produce super-resolved results, L1BSR leverages the specific design of the satellite’s sensor. The method exploits the fact that data from the different bands are captured at different instants, with subpixel displacements between each other in order to reconstruct a high-resolution image. The training (illustrated below) enforces the high-resolution reconstruction to be consistent with the coregistered low-resolution bands captured by an adjacent overlapping CMOS detector that serve as target for the self-supervised training.
In the case of supervised methods, a dataset with low resolution / high resolution pairs is necessary, which in the case of satellite images can pose unique challenges.
Representative and well balanced datasets are a must for deep learning methods. For example, if you want to have good super resolution in urban areas to improve the result of a remote sensing application in cities, a dataset with urban samples may suffice, but the super-resolution model would underperform in forests, mountains, fields or other environments not represented in the dataset.
Look out for our upcoming post diving deeper into these methods.
Metrics
Evaluation metrics for super resolution methods commonly involve comparing a ground truth high-resolution image with a super-resolved result obtained from the downsampled ground truth. These metrics, often used in combination, give a more comprehensive evaluation of an SR method's performance by balancing objective and subjective aspects of image quality.
Pixel-based metrics
These are directly based on the pixel-wise differences in values between the ground truth and the reconstruction. The most common ones are:
Peak Signal-to-Noise Ratio (PSNR): measures the ratio of the maximum possible power of a signal (image) to the power of the noise affecting it. Higher PSNR values indicate that the SR image is closer to the ground truth image. It can be calculated directly from the mean square error (MSE) using the formula below, where L is the maximum pixel value possible (255 for an 8-bit image).
Structural Similarity Index (SSIM): assesses the similarity between the SR image and the ground truth based on structural information like luminance, contrast, and structure. It’s designed to better reflect human perception compared to PSNR, but it may not fully align with subjective quality perception.
Perceptual Metrics
There can be cases where images with low PSNR or SSIM values may present artifacts or unnatural-looking elements. Perceptual metrics capture how natural or visually realistic the super resolved images appear, often aligning more closely with human perception.
Natural Image Quality Evaluator (NIQE): this is a no-reference metric, meaning it evaluates image quality without needing a ground truth. It measures natural image statistics to determine how “natural” or visually pleasing the SR image looks. Lower NIQE scores generally indicate better quality, as the image appears more realistic and closer to natural image statistics.
Learned Perceptual Image Patch Similarity (LPIPS): often used in training deep learning-based SR models, leverages pre-trained deep neural networks (like VGG) to calculate the perceptual difference between the SR and ground truth images. This metric compares feature representations of both images, ensuring that the SR model focuses on reproducing perceptual qualities rather than just pixel accuracy.
Task-Specific Metrics
These metrics indicate how well the SR images serve practical applications.
Detection or Classification Accuracy: In cases where SR images are used for tasks like object detection or image classification, the effectiveness of the SR method can be evaluated by measuring how well it improves performance on these tasks. For instance, running an object detector on the SR output and checking if the detection accuracy improves gives insight into how useful the SR method is for downstream applications.
Subjective Quality Assessment
Mean Opinion Score (MOS): MOS involves human evaluators who rate the perceived quality of the SR images on a scale (e.g., 1 to 5). This method captures subjective quality aspects that may not be well reflected in objective metrics. It’s often considered the gold standard for quality evaluation but is time-consuming and is subjective.
Pros, cons and achievable resolution gains
Due to the physical constraints of image formation described in the sampling theory section, super resolution gains above 2x risk introducing hallucinations.
Deep learning models can push the reconstruction resolution well beyond the limits imposed by sampling theory, with some vendors offering 10x resolution increases. In these extreme cases, it must be understood that what the models do is mostly invent content or inpaint the reconstruction with archived content. For example, you may find in the reconstruction a partially-built road that has now been completed, but it’s shown in the same state as in some older aerial survey images used to guide the 10x enhancement.
Conclusions
In this post, we’ve laid a foundational understanding of super resolution and its importance in satellite imaging. We’ve explored the principles of resolution, sampling, and aliasing, which underpin the challenges and opportunities in enhancing satellite images. We’ve also provided an overview of various super-resolution methods, ranging from traditional interpolation techniques to state-of-the-art deep learning models, discussing their strengths, limitations, and potential applications.
Super resolution is not just a technical advancement but a transformative capability for satellite imaging. By enhancing image quality, it unlocks new levels of detail and accuracy for critical applications in environmental monitoring, disaster response, urban planning, and beyond. As satellite technology continues to evolve, so too will the methods we use to extract the most value from its data. However, achieving high-resolution results requires a careful balance between reconstructing authentic detail and avoiding hallucinated content, especially when pushing beyond the physical limits of image formation.
Looking forward, the future of super resolution in satellite imagery lies in the synergy between advanced algorithms, tailored datasets, and practical considerations of real-world applications. With rapid advancements in deep learning, the potential for even greater breakthroughs is immense, but they must be matched by rigorous evaluation and domain-specific adaptation to ensure reliability and usefulness.
We’re excited to continue this exploration in subsequent posts, where we’ll delve deeper into the latest innovations, real-world case studies, and emerging trends in satellite image super resolution. Stay tuned!
References
[1] Nguyen, N.L., Anger, J., Davy, A., Arias, P. and Facciolo, G., 2023. L1BSR: Exploiting detector overlap for self-supervised single-image super-resolution of Sentinel-2 L1b imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2012-2022).
[2] Lafenetre, J., Nguyen, N.L., Facciolo, G. and Eboli, T., 2023. Handheld burst super-resolution meets multi-exposure satellite imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2056-2064).
[3] Edmund Optics. Introduction to Modulation Transfer Function. Edmund Optics. [online] Available at: https://www.edmundoptics.com/knowledge-center/application-notes/optics/introduction-to-modulation-transfer-function/
[4] Ohsuka, Shinji, et al. "Laboratory-size three-dimensional x-ray microscope with Wolter type I mirror optics and an electron-impact water window x-ray source." Review of Scientific Instruments 85.9 (2014).
[5] Wikipedia Contributors (2019). “Sampling (signal processing)”. Wikipedia. [online] Available at: https://en.wikipedia.org/wiki/Sampling_(signal_processing).
[6] Wikipedia Contributors (2019). Aliasing. [online] Wikipedia. Available at: https://en.wikipedia.org/wiki/Aliasing.
[7] Perpetual Industries. (n.d.). Signal Processing: Aliasing. [online] Available at: https://xyobalancer.com/signal-rocessing-aliasing/ [Accessed 3 Dec. 2024].
[8] Narita, Y., and K-H. Glassmeier. "Spatial aliasing and distortion of energy distribution in the wave vector domain under multi-spacecraft measurements." Annales Geophysicae. Vol. 27. No. 8. Göttingen, Germany: Copernicus Publications, 2009.
[9] Pascal Getreuer, Linear Methods for Image Interpolation, Image Processing On Line, 1 (2011), pp. 238–259. https://doi.org/10.5201/ipol.2011.g_lmii
[10] Kernel regression, H. Takeda, S. Farsiu, and P. Milanfar, Kernel Regression for Image Processing and Reconstruction, IEEE Transactions on Image Processing, 16 (2007), pp. 349–366. https://doi.org/10.1109/TIP.2006.888330.
[11] B. Wronski, I. Garcia-Dorado, M. Ernst, D. Kelly, M. Krainin, C. Liang, M. Levoy, and P. Milanfar, Handheld Multi-Frame Super-Resolution, ACM Transactions on Graphics, 38 (2019), pp. 28:1–28:18. https://dl.acm.org/doi/10.1145/3306346.3323024.
[12] Dong, Chao, et al. "Image super-resolution using deep convolutional networks." IEEE transactions on pattern analysis and machine intelligence 38.2 (2015): 295-307.
[13] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z. and Shi, W., 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681-4690).
[14] Lugmayr, A., Danelljan, M., Van Gool, L. and Timofte, R., 2020. Srflow: Learning the super-resolution space with normalizing flow. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16 (pp. 715-732). Springer International Publishing.
[15] Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J. and Norouzi, M., 2022. Image super-resolution via iterative refinement. IEEE transactions on pattern analysis and machine intelligence, 45(4), pp.4713-4726.