<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title>DSpace Collection:</title>
  <link rel="alternate" href="http://dspace.dtu.ac.in:8080/jspui/handle/123456789/99" />
  <subtitle />
  <id>http://dspace.dtu.ac.in:8080/jspui/handle/123456789/99</id>
  <updated>2026-07-02T15:24:39Z</updated>
  <dc:date>2026-07-02T15:24:39Z</dc:date>
  <entry>
    <title>STRUCTURED LATENT SPACE EXPLORATION  WITH TRANSFORMER ENCODERS FOR  DIVERSIFIED AND PERSONALIZED  MULTIRATER MEDICAL IMAGE  SEGMENTATION</title>
    <link rel="alternate" href="http://dspace.dtu.ac.in:8080/jspui/handle/repository/22924" />
    <author>
      <name>SHUKLA, KESHU</name>
    </author>
    <author>
      <name>Verma, Bindu (SUPERVISOR)</name>
    </author>
    <id>http://dspace.dtu.ac.in:8080/jspui/handle/repository/22924</id>
    <updated>2026-06-25T04:57:11Z</updated>
    <published>2026-05-01T00:00:00Z</published>
    <summary type="text">Title: STRUCTURED LATENT SPACE EXPLORATION  WITH TRANSFORMER ENCODERS FOR  DIVERSIFIED AND PERSONALIZED  MULTIRATER MEDICAL IMAGE  SEGMENTATION
Authors: SHUKLA, KESHU; Verma, Bindu (SUPERVISOR)
Abstract: Multi-rater medical image segmentation requires models that capture inter-annotator&#xD;
disagreement, not average it away. Standard probabilistic models process all rater&#xD;
annotation through one shared encoder: when four radiologists label the same nodule&#xD;
differently, their reconstruction gradients partially cancel inside that encoder and the&#xD;
latent code ends up as a gradient-weighted compromise across all four boundary deci&#xD;
sions. This is why prior samples from such a model cluster near the mean annotation&#xD;
rather than spanning the actual range of what qualified radiologists drew.&#xD;
We address this by replacing the shared posterior with N independent per&#xD;
rater posterior encoders qi(zi | x,yi), one per annotator. Each receives a 2-channel input:&#xD;
the image and a single rater mask. Gradient isolation follows from the per-rater ELBO&#xD;
decomposition, not from any regularisation: by the chain rule, ∂Li/∂zj = 0 for i ̸ = j,&#xD;
so rater i’s reconstruction gradient cannot reach rater j’s encoder. On LIDC-IDRI&#xD;
(1,018 CT scans, 4 radiologists, 1,609 nodule patches, 4-fold cross-validation), the per&#xD;
rater model (Stage 1 only) achieves GED 0.1444±0.0141 (−4.2%) and Dice_match&#xD;
0.9112±0.0061 (+2.28% relative) over the full D-Persona two-stage pipeline. A&#xD;
systematic ablation tests transformer-based encoders (MiT-B2), orthogonality regu&#xD;
larisation, a discretised prior bank (k = 100), a dual diversity loss, and Stage 2 style&#xD;
vectors against the D-Persona baseline. Per-rater posteriors are the only modification&#xD;
that consistently improves both metrics at once. Transformer encoder capacity, tested as&#xD;
a direct competing architectural hypothesis, does not resolve the training-level gradient&#xD;
conflict. Dice_soft is unchanged at 0.9015: the gain comes from improved diversity&#xD;
and per-rater accuracy, not from higher average prediction quality.&#xD;
We test the model’s behaviour when not all annotators label every training&#xD;
image (the common clinical situation in multi-rater datasets). Under full sparsity (one&#xD;
annotator), the shared baseline undergoes gradient collapse: mean pairwise cosine&#xD;
similarity of reconstruction gradients rises from 0.167 (full annotation) to 0.976; within&#xD;
fold standard deviation shrinks approximately 19-fold (0.439 → 0.023). Per-rater&#xD;
posteriors maintain zero alignment by construction in all sparsity levels. The GED&#xD;
advantage grows with sparsity: +11.5% with three annotators, +17.8% with two,&#xD;
iv&#xD;
+21.4% with one. All 12 per-fold comparisons favour the per-rater model (sign-test&#xD;
p &lt;0.001). At full annotation, both models are statistically equivalent (0.5% gap,&#xD;
within noise); the advantage is tied to sparsity, not general accuracy. On NPC-170 (170&#xD;
nasopharyngeal carcinoma MRI cases, 4 annotators), the GED difference is 0.0011,&#xD;
within seed variance ±0.0085. The method works on a different anatomy and dataset.&#xD;
A third contribution analyses inter-rater annotation disagreement on 1,603&#xD;
LIDC-IDRI cases using the nine per-rater clinical attribute ratings. Nodule margin&#xD;
clarity is the strongest predictor of inter-rater mask variance (Pearson r = 0.318, p &lt;&#xD;
0.001, confirmed across all four folds independently), followed by lobulation (r =0.243)&#xD;
and texture (r = 0.210). Malignancy is negatively correlated with mask variance&#xD;
(r =−0.202, p&lt;0.001); a nodule rated highly suspicious need not have an ambiguous&#xD;
boundary, and one with an unclear margin need not look malignant. These findings&#xD;
point to where uncertainty-aware segmentation matters most: ill-defined, lobulated,&#xD;
part-solid nodules.</summary>
    <dc:date>2026-05-01T00:00:00Z</dc:date>
  </entry>
  <entry>
    <title>EFFICIENCY-DRIVEN SINGLE IMAGE SUPER-RESOLUTION USING ATTENTION-ENHANCED RESIDUAL FEATURE DISTILLATION NETWORKS</title>
    <link rel="alternate" href="http://dspace.dtu.ac.in:8080/jspui/handle/repository/22915" />
    <author>
      <name>MANGAL, ISHAN</name>
    </author>
    <author>
      <name>Verma, Bindu (SUPERVISOR)</name>
    </author>
    <id>http://dspace.dtu.ac.in:8080/jspui/handle/repository/22915</id>
    <updated>2026-06-25T04:55:57Z</updated>
    <published>2026-05-01T00:00:00Z</published>
    <summary type="text">Title: EFFICIENCY-DRIVEN SINGLE IMAGE SUPER-RESOLUTION USING ATTENTION-ENHANCED RESIDUAL FEATURE DISTILLATION NETWORKS
Authors: MANGAL, ISHAN; Verma, Bindu (SUPERVISOR)
Abstract: Image reconstruction based on low-resolution input appears to be a straightfor&#xD;
ward task but it is fundamentally ill-posed since there may exist many possible solu&#xD;
tions to the problem– all high-resolution images that could correspond to the observed&#xD;
downsampled version. The existing approaches in DL generate outstanding outputs;&#xD;
however, almost all of them require large amounts of computational power, which is&#xD;
incompatible with real-time inference on mobile devices, edge computing units, and&#xD;
camera hardware.&#xD;
ARFD-ESPCN is an image super-resolution architecture that builds upon the ES&#xD;
PCN sub-pixel upsampling structure but utilizes Feature Distillation Blocks, Squeeze&#xD;
and-Excitation channel-wise attention, and Global Feature Fusion layer. The key fea&#xD;
ture of the proposed architecture lies in preserving the speed of ESPCN by using only&#xD;
low-resolution convolutions and applying PixelShuffle once, at the final layer. This&#xD;
design replaces ESPCN’s shallow encoder-decoder with deeper and more competitive&#xD;
architecture that exploits six FDBs for splitting and merging convolution features with&#xD;
higher attention on the aspects not caught previously.&#xD;
The final model has 597,904parameters, needs 4.87 GFLOPsfora640×360input,&#xD;
and runs in 4.78 ms on a mid-rangeGPU.TrainingusedtheDF2Kdataset(DIV2Kplus&#xD;
Flickr2K, roughly 3,450 images) with L1 loss for 720 epochs and Charbonnier loss for&#xD;
the remaining 80, combined with flip and rotation augmentation. The optimiser is&#xD;
Adam with a cosine-annealing schedule that warms up over the first 50 epochs and&#xD;
decays toward 10−6 by the end.&#xD;
On standard benchmarks the model scores 31.57 dB / 0.8861 SSIM on Set5,&#xD;
27.21 dB / 0.7447 on Set14, 26.22 dB / 0.7029 on BSD100, and 25.37 dB / 0.7606&#xD;
on Urban100. That beats the original ESPCN by over 2 dB on Set5 and matches&#xD;
VDSR while using roughly 125× fewer floating-point operations. Compared with&#xD;
heavier efficient models like IMDN and RFDN the quality gap is about 0.6–0.7 dB,&#xD;
but ARFD-ESPCN runs 2–3× faster.&#xD;
A step-by-step ablation decomposes the total 2.11 dB gain into individual contribu&#xD;
tions: the training schedule accounts for 1.04 dB (the largest single factor), DF2K data&#xD;
adds 0.45 dB, SE attention with residual connections contributes 0.38 dB, augmenta&#xD;
tion plus L1 loss gives 0.15 dB, and the distillation block structure adds 0.07 dB. The&#xD;
practical lesson is that for models under one million parameters, getting the training&#xD;
pipeline right matters at least as much as architectural design.&#xD;
The model is small enough to fit on mobile accelerators and fast enough for 60 fps&#xD;
video pipelines, making it relevant for medical imaging, satellite photo enhancement,&#xD;
and streaming scenarios where latency budgets are tight. Possible extensions include&#xD;
real-world degradation handling via BSRGAN-style pipelines, perceptual and adver&#xD;
sarial losses for sharper visual textures, window-based spatial attention for periodic&#xD;
patterns, and INT8 quantisation for hardware without floating-point units.</summary>
    <dc:date>2026-05-01T00:00:00Z</dc:date>
  </entry>
  <entry>
    <title>ENHANCEMENT OF REVERSIBLE IMAGE STEGANOGRAPHY AND OPTIMIZATION OF QUANTUM IMAGE REPRESENTATION USING THE NEQR MODEL</title>
    <link rel="alternate" href="http://dspace.dtu.ac.in:8080/jspui/handle/repository/22163" />
    <author>
      <name>SINGH, SUMITRA</name>
    </author>
    <id>http://dspace.dtu.ac.in:8080/jspui/handle/repository/22163</id>
    <updated>2025-09-02T06:36:46Z</updated>
    <published>2025-05-01T00:00:00Z</published>
    <summary type="text">Title: ENHANCEMENT OF REVERSIBLE IMAGE STEGANOGRAPHY AND OPTIMIZATION OF QUANTUM IMAGE REPRESENTATION USING THE NEQR MODEL
Authors: SINGH, SUMITRA
Abstract: Reversible steganography allows for exact reconstruction of the cover media after&#xD;
hidden data extraction, making it vital for applications such as content authentication,&#xD;
medical imaging, and military communications. Various reversible steganography&#xD;
techniques include histogram shifting, image interpolation, and difference expansion.&#xD;
Histogram shifting methods apply shifting to pixel-domain histograms or prediction&#xD;
error histograms. Prediction error histogram methods offer higher embedding capacity,&#xD;
but they are more complex, lack a guaranteed lower bound on PSNR, and are more&#xD;
susceptible to histogram-based steganalysis. Pixel-domain histogram shifting&#xD;
techniques, though simpler and more efficient with a theoretical PSNR bound,&#xD;
generally have lower embedding capacity.&#xD;
Under this project, experiments are conducted on pixel-domain histogram shifting-&#xD;
based techniques. The capacity and histogram for varying number of non-overlapping&#xD;
image blocks and histogram blocks are analyzed. Experimental results show that&#xD;
embedding in image blocks does not significantly enhance the capacity compared to&#xD;
embedding in histogram blocks. Analysis of histogram blocks shows that embedding&#xD;
in two blocks yields the optimal results. A method is developed for making histogram&#xD;
shifting adaptive to payload size and a two layer embedding is developed for improved&#xD;
hiding capacity. Compared to previous methods, the two-layer embedding achieves&#xD;
higher capacity, better resistance to steganalysis, and maintains the PSNR acceptable&#xD;
for real-world applications.&#xD;
Quantum computing is an advancing field that offers significant speed advantages for&#xD;
certain computational tasks over classical computing. Notable examples include&#xD;
Shor’s algorithm, which efficiently solves integer factorization and discrete logarithm&#xD;
problems, and Grover’s algorithm, which accelerates the search process in&#xD;
unstructured databases.&#xD;
Quantum computing is based on quantum arithmetic operations where addition forms&#xD;
the core of all operations, as subtraction, multiplication, exponentiation, and division&#xD;
ix&#xD;
can all be reduced to repeated or modified forms of addition. Experiments are&#xD;
conducted for performance analysis of quantum addition on quantum hardware.&#xD;
Development of quantum circuits for addition and comparison, including half adders,&#xD;
full adders, Toffoli-based adders, QFT-based adders (utilizing the Quantum Fourier&#xD;
Transform), and quantum comparators is carried out using IBM Qiskit. The circuits&#xD;
are first validated on ideal simulators to confirm correctness, followed by testing on&#xD;
noisy simulators to emulate real quantum hardware conditions. Final execution is&#xD;
carried out on IBM's Eagle 127-qubit Quantum Processing Unit (QPU). Results show&#xD;
that computation accuracy on actual hardware is limited by physical constraints such&#xD;
as short qubit coherence times and instability. A performance comparison shows that&#xD;
Toffoli-based adders outperform QFT-based adders in terms of accuracy, making them&#xD;
more reliable for precise arithmetic computations.&#xD;
Quantum image representation provides exponential efficiency in image storage and&#xD;
processing. It relies on the fundamental principles of superposition and entanglement.&#xD;
NEQR (Novel Enhanced Quantum Representation) is a lossless encoding method used&#xD;
to represent digital images on a quantum computer. It is widely applicable in domains&#xD;
such as quantum machine learning, image steganography, and quantum image&#xD;
analysis.&#xD;
This work introduces two enhancements to the NEQR framework: (1) Optimizing the&#xD;
decomposition of Multi-Controlled NOT (MCX) gates into Toffoli gates, and (2)&#xD;
Parallelizing the NEQR by parallel bit-plane encoding of the NEQR circuit, where the&#xD;
NEQR circuit is simultaneously constructed for each of the eight bit-planes of an&#xD;
image, thereby reducing overall circuit depth. Experimental results demonstrate that&#xD;
these enhancements lead to reduced circuit depth and faster execution, thereby&#xD;
mitigating decoherence-related errors. Additionally, quantum image processing&#xD;
operations that demonstrate exponential speedup over classical approaches — such as&#xD;
image negation, rotation, and intensity superposition — are also implemented and&#xD;
evaluated as part of this work.</summary>
    <dc:date>2025-05-01T00:00:00Z</dc:date>
  </entry>
  <entry>
    <title>A STUDY ON DEEP LEARNING AND TRANSFORMER BASED MODELS FOR HAND GESTURE AND ACTION RECOGNITION</title>
    <link rel="alternate" href="http://dspace.dtu.ac.in:8080/jspui/handle/repository/21847" />
    <author>
      <name>SUTTY, SAHIL</name>
    </author>
    <id>http://dspace.dtu.ac.in:8080/jspui/handle/repository/21847</id>
    <updated>2025-07-08T08:48:56Z</updated>
    <published>2025-06-01T00:00:00Z</published>
    <summary type="text">Title: A STUDY ON DEEP LEARNING AND TRANSFORMER BASED MODELS FOR HAND GESTURE AND ACTION RECOGNITION
Authors: SUTTY, SAHIL
Abstract: Fundamental technologies in the evolution of human-computer interaction (HCI), hand&#xD;
gestures and human action recognition enable more natural, intuitive, and accessible&#xD;
interfaces across sectors including assistive technologies, robotics, virtual reality, and&#xD;
surveillance. Using the MSRA Hand Gesture Dataset and the UCF101 Dataset, this&#xD;
paper presents a thorough comparative analysis of state-of- the-art deep learning and&#xD;
transformer-based models for hand gesture recognition and for human action recognition.&#xD;
Comprising 76,500 depth images distributed over 17 gesture classes, the MSRA Hand&#xD;
Gesture Dataset offers a strong basis for spatial feature extraction. ResNet101 obtained&#xD;
the highest F1-score (0.9978) among all architectures; closely followed by DenseNet 169&#xD;
(0.9919) and DenseNet 201 (0.9901). MobileNetV2 demonstrated a good balance between&#xD;
computational efficiency and accuracy with an F1-score of 0.9847; VGG variants lagged&#xD;
since they lacked sophisticated architectural elements.&#xD;
Human action recognition using the UCF101 dataset—which consists of over 13,000&#xD;
video clips in 101 action categories—was driven with an eye toward the 50 most frequent&#xD;
classes to guarantee computational feasibility and class balance.With F1-score 0.9997,&#xD;
transformer-based models especially ViT Tiny Patch surpassed even the deepest CNNs.&#xD;
While MobileNetV2 once shown efficiency in settings with limited resources, VGG16bn’s&#xD;
performance revealed the limits of older CNN architectures for demanding tasks.&#xD;
The results underline how architectural innovations including residual connections,&#xD;
dense connectivity, and attention mechanisms help to raise recognition accuracy and&#xD;
computational efficiency. The paper claims that transformer-based models are redefin ing benchmarks even if deep CNNs continue to be strong candidates. More particularly,&#xD;
considering hybrid CNN-transformer designs, explicit temporal modeling, and advanced&#xD;
augmentation techniques helps to increase recognition capacities in pragmatic settings.</summary>
    <dc:date>2025-06-01T00:00:00Z</dc:date>
  </entry>
</feed>

