Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/22924
Full metadata record
DC FieldValueLanguage
dc.contributor.authorSHUKLA, KESHU-
dc.contributor.authorVerma, Bindu (SUPERVISOR)-
dc.date.accessioned2026-06-25T04:57:11Z-
dc.date.available2026-06-25T04:57:11Z-
dc.date.issued2026-05-
dc.identifier.urihttp://dspace.dtu.ac.in:8080/jspui/handle/repository/22924-
dc.description.abstractMulti-rater medical image segmentation requires models that capture inter-annotator disagreement, not average it away. Standard probabilistic models process all rater annotation through one shared encoder: when four radiologists label the same nodule differently, their reconstruction gradients partially cancel inside that encoder and the latent code ends up as a gradient-weighted compromise across all four boundary deci sions. This is why prior samples from such a model cluster near the mean annotation rather than spanning the actual range of what qualified radiologists drew. We address this by replacing the shared posterior with N independent per rater posterior encoders qi(zi | x,yi), one per annotator. Each receives a 2-channel input: the image and a single rater mask. Gradient isolation follows from the per-rater ELBO decomposition, not from any regularisation: by the chain rule, ∂Li/∂zj = 0 for i ̸ = j, so rater i’s reconstruction gradient cannot reach rater j’s encoder. On LIDC-IDRI (1,018 CT scans, 4 radiologists, 1,609 nodule patches, 4-fold cross-validation), the per rater model (Stage 1 only) achieves GED 0.1444±0.0141 (−4.2%) and Dice_match 0.9112±0.0061 (+2.28% relative) over the full D-Persona two-stage pipeline. A systematic ablation tests transformer-based encoders (MiT-B2), orthogonality regu larisation, a discretised prior bank (k = 100), a dual diversity loss, and Stage 2 style vectors against the D-Persona baseline. Per-rater posteriors are the only modification that consistently improves both metrics at once. Transformer encoder capacity, tested as a direct competing architectural hypothesis, does not resolve the training-level gradient conflict. Dice_soft is unchanged at 0.9015: the gain comes from improved diversity and per-rater accuracy, not from higher average prediction quality. We test the model’s behaviour when not all annotators label every training image (the common clinical situation in multi-rater datasets). Under full sparsity (one annotator), the shared baseline undergoes gradient collapse: mean pairwise cosine similarity of reconstruction gradients rises from 0.167 (full annotation) to 0.976; within fold standard deviation shrinks approximately 19-fold (0.439 → 0.023). Per-rater posteriors maintain zero alignment by construction in all sparsity levels. The GED advantage grows with sparsity: +11.5% with three annotators, +17.8% with two, iv +21.4% with one. All 12 per-fold comparisons favour the per-rater model (sign-test p <0.001). At full annotation, both models are statistically equivalent (0.5% gap, within noise); the advantage is tied to sparsity, not general accuracy. On NPC-170 (170 nasopharyngeal carcinoma MRI cases, 4 annotators), the GED difference is 0.0011, within seed variance ±0.0085. The method works on a different anatomy and dataset. A third contribution analyses inter-rater annotation disagreement on 1,603 LIDC-IDRI cases using the nine per-rater clinical attribute ratings. Nodule margin clarity is the strongest predictor of inter-rater mask variance (Pearson r = 0.318, p < 0.001, confirmed across all four folds independently), followed by lobulation (r =0.243) and texture (r = 0.210). Malignancy is negatively correlated with mask variance (r =−0.202, p<0.001); a nodule rated highly suspicious need not have an ambiguous boundary, and one with an unclear margin need not look malignant. These findings point to where uncertainty-aware segmentation matters most: ill-defined, lobulated, part-solid nodules.en_US
dc.language.isoenen_US
dc.relation.ispartofseriesTD-8832;-
dc.subjectLATENT SPACE EXPLORATIONen_US
dc.subjectTRANSFORMER ENCODERSen_US
dc.subjectMEDICAL IMAGE SEGMENTATIONen_US
dc.subjectGEDen_US
dc.titleSTRUCTURED LATENT SPACE EXPLORATION WITH TRANSFORMER ENCODERS FOR DIVERSIFIED AND PERSONALIZED MULTIRATER MEDICAL IMAGE SEGMENTATIONen_US
dc.typeThesisen_US
Appears in Collections:M.E./M.Tech. Information Technology

Files in This Item:
File Description SizeFormat 
Keshu Shukla M.Tech.pdf2.38 MBAdobe PDFView/Open
Keshu Shukla plag.pdf4.15 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.