SOCIAL BIAS IDENTIFICATION AND  MITIGATION IN NATURAL LANGUAGE  TEXT USING MACHINE LEARNING

KAMBOJ, PRADEEP; KUMAR, SHAILENDER ( SUPERVISOR); GOYAL, VIKRAM (CO - SUPERVISOR )

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/22944

Full metadata record

DC Field	Value	Language
dc.contributor.author	KAMBOJ, PRADEEP	-
dc.contributor.author	KUMAR, SHAILENDER ( SUPERVISOR)	-
dc.contributor.author	GOYAL, VIKRAM (CO - SUPERVISOR )	-
dc.date.accessioned	2026-06-25T05:08:43Z	-
dc.date.available	2026-06-25T05:08:43Z	-
dc.date.issued	2026-04	-
dc.identifier.uri	http://dspace.dtu.ac.in:8080/jspui/handle/repository/22944	-
dc.description.abstract	Advanced Artificial Intelligence (AI) methods have enabled the creation of sophisticated large language models (LLMs) capable of generating human-like text and handling a broad spectrum of complex language comprehension tasks. The last decade has seen the advent of LLMs that fill crucial roles across a variety of applications, including automated content generation and summarization, healthcare analytics, legal decision support, conversational agents, and educational technologies. Despite their remarkable abilities, these models often reflect and even amplify the social biases embedded in the large datasets on which they are trained. These biases can manifest as stereotypes or unjust associations related to gender, race, religion, profession, or other social features. When these AI systems are deployed in high-stakes domains where fairness and reliability are paramount, the presence of such biases raises major ethical, social, and technical concerns. As a result, understanding, measuring, and mitigating bias in LLMs has emerged as a prominent research challenge at the forefront of responsible and trustworthy AI. This thesis constitutes a thorough exploration of social bias in natural language text generated by language models (LMs) and LLMs, with a focus on systematic approaches to measuring, evaluating, and mitigating it. The research draws on theoretical, empirical, experimental, and methodological approaches to investigate bias from several angles across the AI pipeline, including word embeddings, contextualized language models, prompt-based inference functions, and fine-tuning strategies. The work focuses on understanding biases across these components and seeks practical solutions to build fairer and more trustworthy generative AI systems. The initial phase of the research investigates gender bias in contextualized word embeddings generated by transformer-based LMs. Word embeddings are the building blocks of language in many NLP systems, and biases encoded in these representations can carry over to downstream applications. The gender direction in the embedding space is extracted, and the gender polarity of profession-related terms (occupation names) with respect to gendered pronouns is calculated, yielding a quantitative framework for measuring one type of bias: that women or men are less likely to pursue certain professions. Indeed, an experimental analysis shows that dynamic embeddings from transformer-based models exhibit substantial gender associations even in the absence of explicit gender information in the input text. To alleviate this problem, we propose a form of post-processing debiasing that modifies the embedding representations to reduce stereotypical associations while preserving the semantic relationships among words. The experimental results show that the proposed method can significantly alleviate gender bias in profession embeddings, thereby balancing the model’s representations. Building on this foundation, the thesis broadens the analysis to large language models and a wider range of societal biases stemming from multiple demographic attributes. We introduce a systematic evaluation framework for bias in LLM-generated outputs, in part by creating a curated inference dataset from previously established bias benchmarks. The dataset includes contexts that encourage language models to generate stereotypical, anti-stereotypical, and neutral responses, enabling systematic assessment of model behaviour. This study provides a comprehensive mechanism for v analyzing how different models respond to socially sensitive contexts and how bias manifests in generated text. This research makes an important contribution by exploring prompt engineering to both detect and mitigate bias in LLMs. Several types of prompt variants are developed to investigate the effects of their design on model behaviour, namely standard, chain of-thought, cognitive-style, and human-persona prompts. These prompts are systematically assessed to study the effects of various prompting techniques on output bias. Also proposed are the debiased versions of these prompts that explicitly elicit neutral reasoning and unbiased decision-making. The introduction of prompt-only bias evaluation is a key aspect of the extended work, exploring whether biased responses can be induced by prompts alone, without context. Experimental results indicate that when certain prompts are presented to language models, those models make stereotypical predictions, suggesting that bias arises from the interaction between prompts and the models' reasoning mechanisms, rather than solely from the training data. This underlined the importance of careful prompt design and evaluation when deploying language models in real-world settings. Alongside this bias analysis, the research also delves into the issue of hallucination in LLMs, whereby a model provides confident answers that are factually incorrect or unsupported. Across most domains, hallucinations undermine the model’s reliability and may introduce risks in critical domains such as healthcare, legal advice, and policy analysis. To tackle this phenomenon, the thesis presents a contrastive decoding method powered by disturb prompts to compare the probability distributions of model outputs for same prompt and perturbation-prompt scenarios. The method helps detect hallucinated content and enhances the factual consistency of outputs by comparing responses to normal prompts with those to perturbed prompts. The results show that contrastive prompting methods can mitigate hallucination and improve the robustness of language model outputs. Another important aspect of the research is assessing how well fine-tuning approaches mitigate biases. Among such models, large open-source language models are fine tuned on balanced sets with equal numbers of biased/unbiased statements across a wide range of social categories. Fine-tuning is when models are trained to produce more neutral and fair responses while retaining their language comprehension. In fact, experimental results show that fine-tuning with fairness-aware special prompts significantly reduces the model's biased outputs and improves fairness performance. In conclusion, the work in this thesis demonstrates that bias in LMs is a complex, multifaceted phenomenon with multiple underlying sources, including training data, representation learning, and prompting. Tackling this challenge requires the integrated use of bias measurement, dataset design, prompt engineering, model fine-tuning, and evaluation metrics. The methodologies are cross-disciplinary, offering actionable tools to identify and prevent bias in generative AI systems without sacrificing performance or usability. This work extends beyond technical contributions, establishing the need for a broader meaning of fair and responsible development in the internalization of AI. Overall, this thesis gives a good overview of bias in LMs and LLMs. The research, by integrating representation-level analysis, prompt-based evaluation, hallucination detection, and fairness-aware fine-tuning, provides novel insights into the mechanisms that produce vi biases in AI systems while suggesting appropriate strategies to mitigate them. The results of this work demonstrate the potential to help establish more ethical, fair, transparent, and socially responsible generative AI technologies that can serve a wider range of communities without perpetuating harmful stereotypes or obesity-related inequalities.	en_US
dc.language.iso	en	en_US
dc.relation.ispartofseries	TD-8855;	-
dc.subject	SOCIAL BIAS IDENTIFICATION	en_US
dc.subject	MITIGATION	en_US
dc.subject	NATURAL LANGUAGE TEXT	en_US
dc.subject	LARGE LANGUAGE MODELS (LLMS)	en_US
dc.title	SOCIAL BIAS IDENTIFICATION AND MITIGATION IN NATURAL LANGUAGE TEXT USING MACHINE LEARNING	en_US
dc.type	Thesis	en_US
Appears in Collections:	Ph.D. Computer Engineering

Files in This Item:

File	Description	Size	Format
PRADEEP KAMBOJ ph.D..pdf		2.54 MB	Adobe PDF	View/Open
PRADEEP KAMBOJ plag.pdf		2.57 MB	Adobe PDF	View/Open

Show simple item record