<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel rdf:about="http://dspace.dtu.ac.in:8080/jspui/handle/123456789/96">
    <title>DSpace Community:</title>
    <link>http://dspace.dtu.ac.in:8080/jspui/handle/123456789/96</link>
    <description />
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://dspace.dtu.ac.in:8080/jspui/handle/repository/22924" />
        <rdf:li rdf:resource="http://dspace.dtu.ac.in:8080/jspui/handle/repository/22915" />
        <rdf:li rdf:resource="http://dspace.dtu.ac.in:8080/jspui/handle/repository/22744" />
        <rdf:li rdf:resource="http://dspace.dtu.ac.in:8080/jspui/handle/repository/22696" />
      </rdf:Seq>
    </items>
    <dc:date>2026-07-02T16:43:13Z</dc:date>
  </channel>
  <item rdf:about="http://dspace.dtu.ac.in:8080/jspui/handle/repository/22924">
    <title>STRUCTURED LATENT SPACE EXPLORATION  WITH TRANSFORMER ENCODERS FOR  DIVERSIFIED AND PERSONALIZED  MULTIRATER MEDICAL IMAGE  SEGMENTATION</title>
    <link>http://dspace.dtu.ac.in:8080/jspui/handle/repository/22924</link>
    <description>Title: STRUCTURED LATENT SPACE EXPLORATION  WITH TRANSFORMER ENCODERS FOR  DIVERSIFIED AND PERSONALIZED  MULTIRATER MEDICAL IMAGE  SEGMENTATION
Authors: SHUKLA, KESHU; Verma, Bindu (SUPERVISOR)
Abstract: Multi-rater medical image segmentation requires models that capture inter-annotator&#xD;
disagreement, not average it away. Standard probabilistic models process all rater&#xD;
annotation through one shared encoder: when four radiologists label the same nodule&#xD;
differently, their reconstruction gradients partially cancel inside that encoder and the&#xD;
latent code ends up as a gradient-weighted compromise across all four boundary deci&#xD;
sions. This is why prior samples from such a model cluster near the mean annotation&#xD;
rather than spanning the actual range of what qualified radiologists drew.&#xD;
We address this by replacing the shared posterior with N independent per&#xD;
rater posterior encoders qi(zi | x,yi), one per annotator. Each receives a 2-channel input:&#xD;
the image and a single rater mask. Gradient isolation follows from the per-rater ELBO&#xD;
decomposition, not from any regularisation: by the chain rule, ∂Li/∂zj = 0 for i ̸ = j,&#xD;
so rater i’s reconstruction gradient cannot reach rater j’s encoder. On LIDC-IDRI&#xD;
(1,018 CT scans, 4 radiologists, 1,609 nodule patches, 4-fold cross-validation), the per&#xD;
rater model (Stage 1 only) achieves GED 0.1444±0.0141 (−4.2%) and Dice_match&#xD;
0.9112±0.0061 (+2.28% relative) over the full D-Persona two-stage pipeline. A&#xD;
systematic ablation tests transformer-based encoders (MiT-B2), orthogonality regu&#xD;
larisation, a discretised prior bank (k = 100), a dual diversity loss, and Stage 2 style&#xD;
vectors against the D-Persona baseline. Per-rater posteriors are the only modification&#xD;
that consistently improves both metrics at once. Transformer encoder capacity, tested as&#xD;
a direct competing architectural hypothesis, does not resolve the training-level gradient&#xD;
conflict. Dice_soft is unchanged at 0.9015: the gain comes from improved diversity&#xD;
and per-rater accuracy, not from higher average prediction quality.&#xD;
We test the model’s behaviour when not all annotators label every training&#xD;
image (the common clinical situation in multi-rater datasets). Under full sparsity (one&#xD;
annotator), the shared baseline undergoes gradient collapse: mean pairwise cosine&#xD;
similarity of reconstruction gradients rises from 0.167 (full annotation) to 0.976; within&#xD;
fold standard deviation shrinks approximately 19-fold (0.439 → 0.023). Per-rater&#xD;
posteriors maintain zero alignment by construction in all sparsity levels. The GED&#xD;
advantage grows with sparsity: +11.5% with three annotators, +17.8% with two,&#xD;
iv&#xD;
+21.4% with one. All 12 per-fold comparisons favour the per-rater model (sign-test&#xD;
p &lt;0.001). At full annotation, both models are statistically equivalent (0.5% gap,&#xD;
within noise); the advantage is tied to sparsity, not general accuracy. On NPC-170 (170&#xD;
nasopharyngeal carcinoma MRI cases, 4 annotators), the GED difference is 0.0011,&#xD;
within seed variance ±0.0085. The method works on a different anatomy and dataset.&#xD;
A third contribution analyses inter-rater annotation disagreement on 1,603&#xD;
LIDC-IDRI cases using the nine per-rater clinical attribute ratings. Nodule margin&#xD;
clarity is the strongest predictor of inter-rater mask variance (Pearson r = 0.318, p &lt;&#xD;
0.001, confirmed across all four folds independently), followed by lobulation (r =0.243)&#xD;
and texture (r = 0.210). Malignancy is negatively correlated with mask variance&#xD;
(r =−0.202, p&lt;0.001); a nodule rated highly suspicious need not have an ambiguous&#xD;
boundary, and one with an unclear margin need not look malignant. These findings&#xD;
point to where uncertainty-aware segmentation matters most: ill-defined, lobulated,&#xD;
part-solid nodules.</description>
    <dc:date>2026-05-01T00:00:00Z</dc:date>
  </item>
  <item rdf:about="http://dspace.dtu.ac.in:8080/jspui/handle/repository/22915">
    <title>EFFICIENCY-DRIVEN SINGLE IMAGE SUPER-RESOLUTION USING ATTENTION-ENHANCED RESIDUAL FEATURE DISTILLATION NETWORKS</title>
    <link>http://dspace.dtu.ac.in:8080/jspui/handle/repository/22915</link>
    <description>Title: EFFICIENCY-DRIVEN SINGLE IMAGE SUPER-RESOLUTION USING ATTENTION-ENHANCED RESIDUAL FEATURE DISTILLATION NETWORKS
Authors: MANGAL, ISHAN; Verma, Bindu (SUPERVISOR)
Abstract: Image reconstruction based on low-resolution input appears to be a straightfor&#xD;
ward task but it is fundamentally ill-posed since there may exist many possible solu&#xD;
tions to the problem– all high-resolution images that could correspond to the observed&#xD;
downsampled version. The existing approaches in DL generate outstanding outputs;&#xD;
however, almost all of them require large amounts of computational power, which is&#xD;
incompatible with real-time inference on mobile devices, edge computing units, and&#xD;
camera hardware.&#xD;
ARFD-ESPCN is an image super-resolution architecture that builds upon the ES&#xD;
PCN sub-pixel upsampling structure but utilizes Feature Distillation Blocks, Squeeze&#xD;
and-Excitation channel-wise attention, and Global Feature Fusion layer. The key fea&#xD;
ture of the proposed architecture lies in preserving the speed of ESPCN by using only&#xD;
low-resolution convolutions and applying PixelShuffle once, at the final layer. This&#xD;
design replaces ESPCN’s shallow encoder-decoder with deeper and more competitive&#xD;
architecture that exploits six FDBs for splitting and merging convolution features with&#xD;
higher attention on the aspects not caught previously.&#xD;
The final model has 597,904parameters, needs 4.87 GFLOPsfora640×360input,&#xD;
and runs in 4.78 ms on a mid-rangeGPU.TrainingusedtheDF2Kdataset(DIV2Kplus&#xD;
Flickr2K, roughly 3,450 images) with L1 loss for 720 epochs and Charbonnier loss for&#xD;
the remaining 80, combined with flip and rotation augmentation. The optimiser is&#xD;
Adam with a cosine-annealing schedule that warms up over the first 50 epochs and&#xD;
decays toward 10−6 by the end.&#xD;
On standard benchmarks the model scores 31.57 dB / 0.8861 SSIM on Set5,&#xD;
27.21 dB / 0.7447 on Set14, 26.22 dB / 0.7029 on BSD100, and 25.37 dB / 0.7606&#xD;
on Urban100. That beats the original ESPCN by over 2 dB on Set5 and matches&#xD;
VDSR while using roughly 125× fewer floating-point operations. Compared with&#xD;
heavier efficient models like IMDN and RFDN the quality gap is about 0.6–0.7 dB,&#xD;
but ARFD-ESPCN runs 2–3× faster.&#xD;
A step-by-step ablation decomposes the total 2.11 dB gain into individual contribu&#xD;
tions: the training schedule accounts for 1.04 dB (the largest single factor), DF2K data&#xD;
adds 0.45 dB, SE attention with residual connections contributes 0.38 dB, augmenta&#xD;
tion plus L1 loss gives 0.15 dB, and the distillation block structure adds 0.07 dB. The&#xD;
practical lesson is that for models under one million parameters, getting the training&#xD;
pipeline right matters at least as much as architectural design.&#xD;
The model is small enough to fit on mobile accelerators and fast enough for 60 fps&#xD;
video pipelines, making it relevant for medical imaging, satellite photo enhancement,&#xD;
and streaming scenarios where latency budgets are tight. Possible extensions include&#xD;
real-world degradation handling via BSRGAN-style pipelines, perceptual and adver&#xD;
sarial losses for sharper visual textures, window-based spatial attention for periodic&#xD;
patterns, and INT8 quantisation for hardware without floating-point units.</description>
    <dc:date>2026-05-01T00:00:00Z</dc:date>
  </item>
  <item rdf:about="http://dspace.dtu.ac.in:8080/jspui/handle/repository/22744">
    <title>DESIGN AND DEVELOPMEN T OF SECURE FEDERATED LEARNING FRAMEWORK FOR DATA-SENSITIVE APPLICATIONS</title>
    <link>http://dspace.dtu.ac.in:8080/jspui/handle/repository/22744</link>
    <description>Title: DESIGN AND DEVELOPMEN T OF SECURE FEDERATED LEARNING FRAMEWORK FOR DATA-SENSITIVE APPLICATIONS
Authors: NARULA, MANU; Vishwakarma, Dinesh Kumar (SUPERVISOR); Meena, Jasraj (CO-SUPERVISOR)
Abstract: In an era where security breaches and cyber fraud have become increasingly common, con-&#xD;
cerns about data privacy have also risen exponentially. This presents a substantial hur-&#xD;
dle for training Artificial Intelligence (AI) applications that handle sensitive data, such as&#xD;
healthcare and finance. Federated Learning (FL) provides a solution for Data-Sensitive&#xD;
Applications (DSA), enabling collaboration between competing parties while guaranteeing&#xD;
complete control over data privacy. Over time, the attacks on FL have evolved, and the&#xD;
native technology alone is insufficient to maintain the expected privacy levels when han-&#xD;
dling such data. Hence, efforts are made to integrate other technologies with FL to enhance&#xD;
certain aspects, such as resource management, security, privacy, and efficiency, with varying&#xD;
results and pros and cons.&#xD;
This thesis examines the integration of security mechanisms with FL, focusing on networks&#xD;
with resource scarcity, which include AdHoc networks, Internet of Things, and Internet&#xD;
of Medical Things (IoMT), commonly found in the field of healthcare and other DSA,&#xD;
addressing critical issues such as data privacy, security, access control, and scalability. It&#xD;
proposes solutions to enhance data security and privacy in healthcare systems by examining&#xD;
various existing studies and frameworks.&#xD;
FL is adopted for its ability to facilitate the integration of AI in domains where conven-&#xD;
tional centralized training approaches are infeasible due to data distribution constraints,&#xD;
privacy concerns, or competitive restrictions. Furthermore, FL enables remote or decen-&#xD;
tralized systems to achieve performance comparable to state-of-the-art centralized methods&#xD;
while incurring minimal resource overhead. By eliminating the need for centralized data&#xD;
aggregation, FL significantly mitigates the risks associated with large-scale data breaches&#xD;
and unauthorized access.&#xD;
This thesis makes a substantial contribution in summarizing the existing state of FL, its&#xD;
various types, supporting technologies, and challenges, designing hybrid security frameworks&#xD;
for FL aimed at DSA networks, and performing extensive evaluation of the state-of-the-art&#xD;
FL techniques. Our contributions are as follows:&#xD;
• An extensive systematic literature review has been conducted to investigate the cur-&#xD;
x&#xD;
rent state of FL implementations in DSA fields, such as healthcare and finance. This&#xD;
analysis highlights the complex challenges of implementing FL in mainstream health-&#xD;
care applications, including concerns about transmission costs, data security, privacy,&#xD;
and data/system heterogeneity. Additionally, we provide a detailed taxonomy for the&#xD;
existing literature, focusing on DSA. Alongside identifying the gaps, the review also&#xD;
highlights the fundamental challenges that FL may pose in real-world scenarios.&#xD;
• As organizations and developers explore FL solutions for various applications, the&#xD;
multitude of FL tools and frameworks can feel overwhelming for a beginner. We&#xD;
studied various tools and frameworks available in both open and proprietary forms&#xD;
to provide a concise view of their advantages, limitations, and utility, thereby filling&#xD;
a gap in the proposed literature.&#xD;
• It contributes to improving the security in resource-constrained DSA networks such as&#xD;
IoMT by employing quantization. Unlike prior techniques, we incorporate a workload-&#xD;
aware client selection scheme to overcome the quantization loss, optimize bandwidth,&#xD;
and provide stability to the training network by minimizing the straggler nodes&#xD;
with negligible performance overhead. Thus, providing a reliable security solution&#xD;
to resource-constrained DSA. The experimental results show a 77% to 95% decrease&#xD;
in straggler nodes and an 8% to 0% decrease in accuracy compared to standard FL,&#xD;
depending on the dataset’s complexity.&#xD;
• To ensure security in small edge networks, IoMT networks, and AdHoc networks, a&#xD;
dynamic, lightweight cryptographic FL framework is proposed. Conventional cryp-&#xD;
tographic techniques are computationally expensive for small networks. Thus, we&#xD;
propose a block cipher with an ever-changing key that takes multiple client and net-&#xD;
work attributes into account and is unique to all participants. The overhead in-&#xD;
duced by the cipher and key generation is significantly lower than that of state-of-&#xD;
the-art solutions available. The experimental evaluation indicates that the proposed&#xD;
scheme’s relative performance is comparable to that of native FL. The results reveal&#xD;
an increase of approximately 4.5% to 8% in per-round computation time compared&#xD;
to standard FL. However, this marginal computational overhead is justified by the&#xD;
significantly enhanced security guarantees provided by the proposed approach, par-&#xD;
ticularly in resource-constrained network environments, where conventional security&#xD;
mechanisms with similar computational budgets offer comparatively lower protection&#xD;
• To ensure privacy in networks with sufficient resources, we propose a hybrid FL&#xD;
xi&#xD;
framework that utilizes differential privacy in conjunction with homomorphic encryp-&#xD;
tion. To counter the loss incurred by the differential privacy scheme, we employ the&#xD;
workload-aware client selection. The simulation results demonstrate minimal compu-&#xD;
tational overhead and comparable performance to other state-of-the-art techniques.&#xD;
The proposed Fed-HDVE imposes a minimal time overhead of 2.8 to 3.5 seconds,&#xD;
depending on the model size.&#xD;
The performance evaluation, analysis, and experimental results indicate that the proposed&#xD;
solutions offer a viable and effective security solution. Moreover, the comparative study&#xD;
demonstrates that the suggested approaches perform on par with existing solutions, with&#xD;
lower computational overhead and resource consumption.</description>
    <dc:date>2026-02-01T00:00:00Z</dc:date>
  </item>
  <item rdf:about="http://dspace.dtu.ac.in:8080/jspui/handle/repository/22696">
    <title>FACE DETECTION AND TRACKING</title>
    <link>http://dspace.dtu.ac.in:8080/jspui/handle/repository/22696</link>
    <description>Title: FACE DETECTION AND TRACKING
Authors: MOOL, AKSHAY; Panda, Jeebananda (SUPERVISOR); Sharma, Kapil (CO-SUPERVISOR)
Abstract: With the use of various technological advancements and devices in today’s&#xD;
routine life, detection and tracking of human faces and facial features become&#xD;
very essential areas of focus. They become important so that more techniques&#xD;
could be developed for increasing their working efficiency. The field of Computer&#xD;
Vision uses these intermediary processes of face detection and tracking to track&#xD;
and analyze the input of visual information about humans, their faces and/or&#xD;
body movements, and correspondingly proceed to the desired application.&#xD;
The present thesis work has been taken up for the development of (i) an optimiz-&#xD;
able face detection and tracking model based on facial landmark localisation and&#xD;
feature tracking, for better and efficient processing of faces in high quality video&#xD;
streams, and (ii) a Non-Neighbourhood Background Elimination component us-&#xD;
ing the built model with mathematical and statistical modelling, for reducing the&#xD;
processing time and computations required for finding the target face in a frame&#xD;
when it has already been detected.&#xD;
Many algorithms have been developed to facilitate Face Detection and Track-&#xD;
ing applications. Viola and Jones were able to develop an algorithm in 2004, that&#xD;
achieves real-time performance with decent accuracy in detecting faces. It was&#xD;
one of the first algorithms to achieve such efficient performance, that is why it is&#xD;
still used as a standard algorithm to compare against other upcoming algorithms,&#xD;
and therefore has been specifically discussed in this thesis.&#xD;
vii&#xD;
There is a lot of visual information that is generated in various fields, rang-&#xD;
ing from daily routine to specialized applications. Processing all these types of&#xD;
information efficiently is a valid concern and need focused research. Most face&#xD;
detection algorithms have to deal with low quality data in videos, since they’re&#xD;
mainly focused on surveillance applications, whose information capturing devices&#xD;
capture less information per frame. Consequently, this thesis reviews some state-&#xD;
of-the-art face detection algorithms and compares their processing efficiency on&#xD;
low and high quality videos. The comparative analysis reveals that these recent&#xD;
and modern algorithms do not work as effectively on high quality videos as they&#xD;
do on lower quality videos. Therefore, there is an increasing need to focus re-&#xD;
search on analysis of high quality information in videos in an efficient manner, so&#xD;
as to keep up the pace of their analysis with the information that is generated.&#xD;
High quality videos (data generated by current applications like social media,&#xD;
Multimedia content, etc) mostly exist in offline mode, that could be used for post&#xD;
processing by the Computer Vision applications. To address this need, an effort&#xD;
has been made to focus on developing such an algorithm that gives faster results&#xD;
on high quality videos, at par with the algorithms working on live low quality&#xD;
video feeds. The proposed algorithm uses Convolutional-MTCNN as base algo-&#xD;
rithm, and speeds it up for high definition videos. The proposed model speeds&#xD;
up the face detection process really fast, up to 19+ FPS, while still maintaining&#xD;
above 90% accuracy. This paper also presents a novel solution to the problem&#xD;
of occlusion and detecting partial or fully hidden faces in the videos. This is&#xD;
achieved by using statistical and probabilistic approaches, given that the face has&#xD;
been identified in first few frames, to give the algorithm an estimate of where the&#xD;
face should be in the occluded region.&#xD;
Since the focus of our research is to efficiently process high quality data, some&#xD;
viii&#xD;
commercially used face detection algorithms in open literature have also been&#xD;
considered in our research. Models like FaceNet, HOG, YuNet, alongwith Viola-&#xD;
Jones algorithm and MTCNN, have been discussed and analysed in our thesis.&#xD;
The research done is compared against these models, in an effort to improve their&#xD;
performance in commercial settings.&#xD;
Further analysis lead to the conclusion that modern face detection algorithms&#xD;
fail to provide optimal results when they have to deal with larger amounts of&#xD;
data per frame while processing higher quality videos. This thesis discusses an-&#xD;
other proposed work that tackles discussed problem and offers a solution to deploy&#xD;
commercially used state-of-the-art face detection algorithms to process only the&#xD;
regions of interest in a frame, and discard the rest to decrease the data to be&#xD;
processed. The model maintains the accuracy of the base algorithm while de-&#xD;
creasing the processing time per frame, thereby increasing the overall efficiency.&#xD;
The selection of region of interest is dependent on the detection of facial window&#xD;
in the previous frame. Therefore, the choice of base algorithm plays an important&#xD;
role in determining the speed of the model. The model achieves increased pro-&#xD;
cessing speeds of about 69–76% more than the standalone usage of the detection&#xD;
algorithms for analyzed frame rates.</description>
    <dc:date>2024-10-01T00:00:00Z</dc:date>
  </item>
</rdf:RDF>

