EFFICIENCY-DRIVEN SINGLE IMAGE SUPER-RESOLUTION USING ATTENTION-ENHANCED RESIDUAL FEATURE DISTILLATION NETWORKS

MANGAL, ISHAN; Verma, Bindu (SUPERVISOR)

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/22915

Title:	EFFICIENCY-DRIVEN SINGLE IMAGE SUPER-RESOLUTION USING ATTENTION-ENHANCED RESIDUAL FEATURE DISTILLATION NETWORKS
Authors:	MANGAL, ISHAN Verma, Bindu (SUPERVISOR)
Keywords:	EFFICIENCY-DRIVEN SINGLE IMAGE SUPER-RESOLUTION RESIDUAL FEATURE DISTILLATION NETWORKS ESPCN
Issue Date:	May-2026
Series/Report no.:	TD-8820;
Abstract:	Image reconstruction based on low-resolution input appears to be a straightfor ward task but it is fundamentally ill-posed since there may exist many possible solu tions to the problem– all high-resolution images that could correspond to the observed downsampled version. The existing approaches in DL generate outstanding outputs; however, almost all of them require large amounts of computational power, which is incompatible with real-time inference on mobile devices, edge computing units, and camera hardware. ARFD-ESPCN is an image super-resolution architecture that builds upon the ES PCN sub-pixel upsampling structure but utilizes Feature Distillation Blocks, Squeeze and-Excitation channel-wise attention, and Global Feature Fusion layer. The key fea ture of the proposed architecture lies in preserving the speed of ESPCN by using only low-resolution convolutions and applying PixelShuffle once, at the final layer. This design replaces ESPCN’s shallow encoder-decoder with deeper and more competitive architecture that exploits six FDBs for splitting and merging convolution features with higher attention on the aspects not caught previously. The final model has 597,904parameters, needs 4.87 GFLOPsfora640×360input, and runs in 4.78 ms on a mid-rangeGPU.TrainingusedtheDF2Kdataset(DIV2Kplus Flickr2K, roughly 3,450 images) with L1 loss for 720 epochs and Charbonnier loss for the remaining 80, combined with flip and rotation augmentation. The optimiser is Adam with a cosine-annealing schedule that warms up over the first 50 epochs and decays toward 10−6 by the end. On standard benchmarks the model scores 31.57 dB / 0.8861 SSIM on Set5, 27.21 dB / 0.7447 on Set14, 26.22 dB / 0.7029 on BSD100, and 25.37 dB / 0.7606 on Urban100. That beats the original ESPCN by over 2 dB on Set5 and matches VDSR while using roughly 125× fewer floating-point operations. Compared with heavier efficient models like IMDN and RFDN the quality gap is about 0.6–0.7 dB, but ARFD-ESPCN runs 2–3× faster. A step-by-step ablation decomposes the total 2.11 dB gain into individual contribu tions: the training schedule accounts for 1.04 dB (the largest single factor), DF2K data adds 0.45 dB, SE attention with residual connections contributes 0.38 dB, augmenta tion plus L1 loss gives 0.15 dB, and the distillation block structure adds 0.07 dB. The practical lesson is that for models under one million parameters, getting the training pipeline right matters at least as much as architectural design. The model is small enough to fit on mobile accelerators and fast enough for 60 fps video pipelines, making it relevant for medical imaging, satellite photo enhancement, and streaming scenarios where latency budgets are tight. Possible extensions include real-world degradation handling via BSRGAN-style pipelines, perceptual and adver sarial losses for sharper visual textures, window-based spatial attention for periodic patterns, and INT8 quantisation for hardware without floating-point units.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/22915
Appears in Collections:	M.E./M.Tech. Information Technology

Files in This Item:

File	Description	Size	Format
Ishan_Mtech_.pdf		8.89 MB	Adobe PDF	View/Open
Ishan Mangal plag.pdf		17.65 MB	Adobe PDF	View/Open

Show full item record