08 - Super resolution

Advanced Image Processing

Poznan University of Technology, Institute of Robotics and Machine Intelligence

Laboratory 8: Super Resolution

Introduction

Super resolution is a computer vision task that aims to reconstruct a high-resolution (HR) image from a low-resolution (LR) input. This laboratory will guide you through both classical and deep learning approaches to super resolution, helping you understand the fundamental differences and advantages of modern neural network-based methods.

Goals

The objectives of this laboratory are to:

Understand the mathematical foundations and challenges of super resolution
Implement traditional interpolation-based upscaling methods (bicubic, bilinear)
Load and use pre-trained deep learning models for super resolution (CARN, RCAN, NinaSR)
Evaluate results using perceptual quality metrics (PSNR, SSIM)
Compare classical vs. deep learning approaches quantitatively and qualitatively

Resources

Deep Learning for Image Super-resolution: A Survey
torchSR - PyTorch Super Resolution package with pre-trained models
Image Super-Resolution Using Very Deep Residual Channel Attention Networks
Fast, Accurate, and Lightweight Super-Resolution with Cascading Residual Network

Image source: Image Super-Resolution: An Overview of the Current State of Research

Prerequisites

Install dependencies

First, let’s set up our Python environment with all necessary dependencies.

PyTorch

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126

Computer Vision libraries

pip install opencv-python pillow

Metrics and visualization

pip install matplotlib scikit-image

Super-Resolution - torchSR

pip install torchsr

Image resize

This section provides a link to download the imresize.py script, which is a Python implementation of MATLAB’s imresize function.

The deep learning models used in this laboratory were trained on images that were downscaled using this specific MATLAB resize implementation. This function produces slightly different results compared to standard OpenCV (cv2.resize) or Pillow (PIL.Image.resize) methods due to differences in interpolation kernels and boundary handling. Therefore, in this laboratory we will use a Python implementation of MATLAB’s imresize function, which is available in imresize.py script.

💥 Task 1 💥

Download imresize.py script and place it in the working directory of this lab.

Using imresize

To downscale an image, call imresize.imresize() with the image array, scale factor, and interpolation method. The scale factor can be a float (e.g., 0.25 for 4x downscaling) or a tuple specifying output dimensions. Common interpolation methods include "nearest", "bilinear", and "bicubic". In the following tasks we will use scalar_scale=0.25 and method="bicubic", as in the example below.

import imresize

# Downscale image to 25% of original size using bicubic interpolation
img_lr = imresize.imresize(img_original, scalar_scale=0.25, method="bicubic")

Import libraries

import cv2
import matplotlib.pyplot as plt
import numpy as np
import requests
import torch
from PIL import Image
from skimage.metrics import peak_signal_noise_ratio as psnr
from skimage.metrics import structural_similarity as ssim
from torchsr.models import carn, rcan
from torchsr.models.ninasr import ninasr_b2

import imresize

Load image

img_url = "https://raw.githubusercontent.com/Coloquinte/torchSR/v1.0.2/doc/example_hr.png"
response = requests.get(img_url, stream=True)
img_original = np.asarray(Image.open(response.raw).convert("RGB"))

print(f"Original image shape: {img_original.shape}")
print(f"Original image size: {img_original.shape[0] * img_original.shape[1]} pixels")

# Create low-resolution version using bicubic downsampling from imresize module
img_lr = imresize.imresize(img_original, 0.25, method="bicubic")

print(f"LR image shape: {img_lr.shape}")
print(f"LR image size: {img_lr.shape[0] * img_lr.shape[1]} pixels")
print(f"Information loss: {100 * (1 - img_lr.size / img_original.size):.2f}%")

Super-Resolution

Mathematical background

The super resolution problem can be formulated as:

\[ I_{LR} = (I_{HR} \otimes k) \downarrow_s + n \]

Where:

\(I_{HR}\) is the high-resolution image
\(I_{LR}\) is the low-resolution image
\(\otimes\) denotes convolution with a blur kernel \(k\)
\(\downarrow_s\) represents downsampling by factor \(s\)
\(n\) is additive noise

Our goal is to learn a function \(f\) such that: \[ \hat{I}_{HR} = f(I_{LR}) \approx I_{HR} \]

Metrics

Peak Signal-to-Noise Ratio (PSNR)

PSNR is a metric used to measure the quality of a reconstructed or compressed image by comparing it to an original, high-quality image. It’s based on the Mean Squared Error (MSE) between the two images and it measures the ratio between the maximum possible pixel value (\(MAX_I\)) and the “noise” (the errors introduced by compression or processing).

\[ PSNR = 10 \cdot \log_{10} \left( \frac{MAX_I^2}{MSE} \right) \]

This metric is expressed in decibels (dB) and a higher value means better quality. Nevertheless, PSNR does not always correlate well with human visual perception. An image with a higher PSNR might not necessarily look better to a person than an image with a slightly lower PSNR.

PSNR implementation is available in scikit-image package and example usage is below:

from skimage.metrics import peak_signal_noise_ratio as psnr


psnr_val = psnr(img_original, img_upscaled, data_range=255)  # Image format HWC and data format uint8

Structural Similarity Index (SSIM)

SSIM is a more modern metric designed to measure image quality in a way that aligns better with human perception. It assumes that the human visual system is highly adapted for extracting structural information from a scene. Therefore, it measures the similarity between two images using three components:

luminance (brightness)
contrast
structure (e.g., edges, textures)

SSIM metric is expressed as a decimal value between -1 and 1 (or 0 and 1 for most common use cases). A value of 1 indicates perfect similarity (identical images). A value closer to 1 means the images are more structurally similar and perceived as higher quality.

Overall, SSIM is generally considered a better metric for perceptual quality than PSNR because it accounts for how humans see, not just pixel-by-pixel error.

SSIM implementation is available in scikit-image package and example usage is below:

from skimage.metrics import structural_similarity as ssim


ssim_val = ssim(img_original, img_upscaled, channel_axis=2, data_range=255)  # Image format HWC and data format uint8

Classical approaches to Super-Resolution

Traditional methods use mathematical interpolation to fill in missing pixels. These approaches rely on weighted combinations of neighboring pixel values to estimate the intensity of new pixels at intermediate positions.

Nearest neighbor interpolation - the simplest method - just copy the nearest pixel value.
Bilinear interpolation - considers the 4 nearest pixels and performs linear interpolation.
Bicubic interpolation - considers 16 nearest pixels and uses cubic polynomials.

The quality of results depends primarily on the interpolation kernel used. More sophisticated kernels (such as bicubic) consider more neighboring pixels and employ higher-order polynomials, producing smoother gradients compared to simpler methods (such as nearest neighbor or bilinear).

💥 Task 2 💥

Implement classical approaches to super-resolution using OpenCV package and cv2.resize method with the following interpolations:

nearest neighbor,
bilinear,
bicubic.

Compare all three interpolation methods visually (plotting results on the common graph, including original image) and quantitatively (using SSIM and PSNR metrics). Then, answer the following questions:

Which interpolation method produces the best PSNR/SSIM scores?
Do the metrics align with your visual perception?
What are the common artifacts you notice (e.g., blurriness, jagged edges)?
In your opinion, is the degradation process reversible with simple methods? Why or why not?

💥 Task 3 💥

Some visual artifacts and quality differences may not be obvious when viewing the entire image. Therefore, analyze a region of interest (ROI) using the code snippet below. Select a region of interest from the image by adjusting the coordinates (y1, y2, x1, and x2), then zoom in to see the differences between the interpolation methods more clearly.

ROI analysis

############# TODO: Student code #####################
methods = ["High Resolution", "Nearest Neighbor Interpolation", ...]
images = [img_original, img_nearest_neighbor, ...]

# Define ROI (adjust coordinates based on your image)
y1, y2 = 100, 200
x1, x2 = 200, 300
######################################################

fig, axes = plt.subplots(1, 4, figsize=(16, 4))

axes[0].imshow(img_original[y1:y2, x1:x2])
axes[0].set_title('Original (HR)')
axes[0].axis('off')

for ax, method, img in zip(axes[1:], methods, images):
    ax.imshow(img[y1:y2, x1:x2])
    ax.set_title(f'{method}')
    ax.axis('off')

plt.suptitle('Region of Interest comparison', fontsize=14)
plt.tight_layout()
plt.show()

Deep Learning approach to Super-Resolution

Classical approaches to Super-Resolution, while computationally efficient and mathematically well-defined, are fundamentally limited by their local nature. Interpolation methods cannot recover information that was lost during downsampling; they can only produce smooth approximations based on existing data.

In turn, deep learning approaches overcome these limitations by learning complex, non-linear mappings from extensive training datasets containing pairs of low-resolution and high-resolution images. Neural networks can recognize patterns, textures, and structures that were present in the training data, enabling them to hallucinate plausible high-frequency details rather than simply smoothing between existing pixels. This data-driven approach allows deep learning models to achieve significantly better reconstruction quality, particularly for challenging scenarios involving fine details, textures, and semantic content.

💥 Task 4 💥

Neural networks for image upscaling learn complex mappings from low-resolution to high-resolution images; however, to perform well on new images they require specific input formats and value ranges. Write the preprocessing/postprocessing pipelines by implementing the steps described in the following code snippets.

preprocess

def preprocess_image(img: np.ndarray, device: torch.device) -> torch.Tensor:
    """
    Convert NumPy image to PyTorch tensor format for model input.

    Parameters
    ----------
    img : np.ndarray
        Input image (H, W, C) in range [0, 255]
    device : torch.device
        Target device (CPU or CUDA)

    Returns
    -------
    torch.Tensor
        Preprocessed tensor (1, C, H, W) in range [0, 1]
    """
    # 1. Normalize to [0, 1]
    # 2. Convert HWC to CHW (PyTorch format)
    # 3. Add batch dimension: (C, H, W) -> (1, C, H, W)
    # 4. Convert to PyTorch tensor and move to device

    img_tensor = None

    return img_tensor

postprocess

def postprocess_image(output: torch.Tensor) -> np.ndarray:
    """
    Convert model output tensor back to NumPy image format.

    Parameters
    ----------
    output : torch.Tensor
        Model output tensor (1, C, H, W) in range [0, 1]

    Returns
    -------
    np.ndarray
        Image array (H, W, C) in range [0, 255]
    """
    # 1. Move to CPU, remove batch dimension, clamp values
    # 2. Clamp values to [0, 1] range
    # 3. Convert to NumPy
    # 4. Convert CHW to HWC
    # 5. Denormalize to [0, 255]

    output_np = None

    return output_np

torchSR

torchSR is a comprehensive PyTorch library for super-resolution that provides easy access to multiple state-of-the-art pre-trained models including NinaSR, CARN, RCAN, and others. The library offers models trained on various benchmark datasets (DIV2K, RealSR, Set5, Urban100) with different upscaling factors (x2, x3, x4). It also includes convenient dataset handlers for loading and preprocessing standard super-resolution benchmark datasets, making it straightforward to evaluate and compare different models.

To install torchSR Python package simply call:

pip install torchsr

To use TorchSR:

Load a pre-trained model with the desired upscaling factor
Preprocess your low-resolution image to a tensor format
Run inference on the model
Postprocess the output tensor back to an image

The above steps are presented by the following code snippet:

import torch
from torchsr.models import carn


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


def load_CARN_x4():
    model = carn(scale=4, pretrained=True)
    model.eval()
    return model.to(device)


# Load model
model = load_CARN_x4()

# Preprocess
img_tensor = preprocess_image(img_lr, device)

# Run inference
with torch.no_grad():
    output_tensor = model(img_tensor)

# Postprocess
output_img = postprocess_image(output_tensor)

Note: The variable img_lr (low-resolution image) must be defined before running this code.

💥 Task 5 💥

Run super-resolution model on low-resolution image and verify the results.

Repeat this task for RCAN and NinaSR models available in torchSR package.

💥 Tip: for NinaSR use version B2, which can be imported from torchSR with the following statement: from torchsr.models.ninasr import ninasr_b2

Which model achieves the highest SSIM? Does it look the best visually?
How do the metrics correlate with visual quality?

Advanced analysis - perceptual quality analysis

Sharp, well-defined edges are essential for visual clarity and significantly impact perceived image quality. When edges become blurred or distorted by artifacts such as ringing or aliasing, the image appears less natural, even if the overall pixel-level metrics remain acceptable. Hence, edge density can serve as an indicator of detail preservation - higher values typically indicate better preservation of fine details and structural information.

Sharp, well-defined edges are essential for visual clarity and significantly impact perceived image quality. When edges become blurred or distorted by artifacts such as ringing or aliasing, the image appears less natural, even if the overall pixel-level metrics remain acceptable. Hence, edge density can serve as an indicator of detail preservation - a higher values typically indicates better preservation of fine details and structural information.

def analyze_edges(img: np.ndarray) -> float:
    """Analyze edge sharpness using Canny edge detection."""
    # Convert to grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
    # Apply Canny edge detection
    edges = cv2.Canny(gray, 50, 150)
    # Count edge pixels
    edge_density = np.sum(edges > 0) / edges.size
    return edge_density

💥 Task 6 💥

Apply the analyze_edges function to evaluate edge preservation across all super-resolution methods implemented in previous tasks. Calculate edge density for:

original high-resolution image
classical interpolation methods (nearest neighbor, bilinear, bicubic)
deep learning models (CARN, RCAN, NinaSR)

Compare the edge density values and answer the following questions:

Which method best preserves edge information compared to the original image?
How do classical methods compare to deep learning approaches in terms of edge density?
Does higher edge density correlate with better visual quality in your perception?
Are there any methods that produce artificially high edge density (potential over-sharpening)?

💥 Task 7 💥

Apply super resolution to your own images. Verify, in terms of metrics and visual evaluation, whether pre-trained deep learning models perform better than classical upscaling methods.

Then, consider the following aspects:

What are the limitations of current approaches?
In your opinion, how well would these models work with domain-specific images, such as satellite imagery or microscopy? Would they be effective without adaptation to the domain?
Is “hallucination” a problem in super-resolution tasks?