STRUCTURED LATENT SPACE EXPLORATION  WITH TRANSFORMER ENCODERS FOR  DIVERSIFIED AND PERSONALIZED  MULTIRATER MEDICAL IMAGE  SEGMENTATION

SHUKLA, KESHU; Verma, Bindu (SUPERVISOR)

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/22924

Full metadata record

DC Field	Value	Language
dc.contributor.author	SHUKLA, KESHU	-
dc.contributor.author	Verma, Bindu (SUPERVISOR)	-
dc.date.accessioned	2026-06-25T04:57:11Z	-
dc.date.available	2026-06-25T04:57:11Z	-
dc.date.issued	2026-05	-
dc.identifier.uri	http://dspace.dtu.ac.in:8080/jspui/handle/repository/22924	-
dc.description.abstract	Multi-rater medical image segmentation requires models that capture inter-annotator disagreement, not average it away. Standard probabilistic models process all rater annotation through one shared encoder: when four radiologists label the same nodule differently, their reconstruction gradients partially cancel inside that encoder and the latent code ends up as a gradient-weighted compromise across all four boundary deci sions. This is why prior samples from such a model cluster near the mean annotation rather than spanning the actual range of what qualified radiologists drew. We address this by replacing the shared posterior with N independent per rater posterior encoders qi(zi \| x,yi), one per annotator. Each receives a 2-channel input: the image and a single rater mask. Gradient isolation follows from the per-rater ELBO decomposition, not from any regularisation: by the chain rule, ∂Li/∂zj = 0 for i ̸ = j, so rater i’s reconstruction gradient cannot reach rater j’s encoder. On LIDC-IDRI (1,018 CT scans, 4 radiologists, 1,609 nodule patches, 4-fold cross-validation), the per rater model (Stage 1 only) achieves GED 0.1444±0.0141 (−4.2%) and Dice_match 0.9112±0.0061 (+2.28% relative) over the full D-Persona two-stage pipeline. A systematic ablation tests transformer-based encoders (MiT-B2), orthogonality regu larisation, a discretised prior bank (k = 100), a dual diversity loss, and Stage 2 style vectors against the D-Persona baseline. Per-rater posteriors are the only modification that consistently improves both metrics at once. Transformer encoder capacity, tested as a direct competing architectural hypothesis, does not resolve the training-level gradient conflict. Dice_soft is unchanged at 0.9015: the gain comes from improved diversity and per-rater accuracy, not from higher average prediction quality. We test the model’s behaviour when not all annotators label every training image (the common clinical situation in multi-rater datasets). Under full sparsity (one annotator), the shared baseline undergoes gradient collapse: mean pairwise cosine similarity of reconstruction gradients rises from 0.167 (full annotation) to 0.976; within fold standard deviation shrinks approximately 19-fold (0.439 → 0.023). Per-rater posteriors maintain zero alignment by construction in all sparsity levels. The GED advantage grows with sparsity: +11.5% with three annotators, +17.8% with two, iv +21.4% with one. All 12 per-fold comparisons favour the per-rater model (sign-test p <0.001). At full annotation, both models are statistically equivalent (0.5% gap, within noise); the advantage is tied to sparsity, not general accuracy. On NPC-170 (170 nasopharyngeal carcinoma MRI cases, 4 annotators), the GED difference is 0.0011, within seed variance ±0.0085. The method works on a different anatomy and dataset. A third contribution analyses inter-rater annotation disagreement on 1,603 LIDC-IDRI cases using the nine per-rater clinical attribute ratings. Nodule margin clarity is the strongest predictor of inter-rater mask variance (Pearson r = 0.318, p < 0.001, confirmed across all four folds independently), followed by lobulation (r =0.243) and texture (r = 0.210). Malignancy is negatively correlated with mask variance (r =−0.202, p<0.001); a nodule rated highly suspicious need not have an ambiguous boundary, and one with an unclear margin need not look malignant. These findings point to where uncertainty-aware segmentation matters most: ill-defined, lobulated, part-solid nodules.	en_US
dc.language.iso	en	en_US
dc.relation.ispartofseries	TD-8832;	-
dc.subject	LATENT SPACE EXPLORATION	en_US
dc.subject	TRANSFORMER ENCODERS	en_US
dc.subject	MEDICAL IMAGE SEGMENTATION	en_US
dc.subject	GED	en_US
dc.title	STRUCTURED LATENT SPACE EXPLORATION WITH TRANSFORMER ENCODERS FOR DIVERSIFIED AND PERSONALIZED MULTIRATER MEDICAL IMAGE SEGMENTATION	en_US
dc.type	Thesis	en_US
Appears in Collections:	M.E./M.Tech. Information Technology

Files in This Item:

File	Description	Size	Format
Keshu Shukla M.Tech.pdf		2.38 MB	Adobe PDF	View/Open
Keshu Shukla plag.pdf		4.15 MB	Adobe PDF	View/Open

Show simple item record