DSpace Collection:

PHISHSCORE: A WEIGHTED MULTI-FEATURE SCORING FRAMEWORK FOR TRI-CLASS PHISHING EMAIL DETECTION

2026-07-06T09:18:11Z

Title: PHISHSCORE: A WEIGHTED MULTI-FEATURE SCORING FRAMEWORK FOR TRI-CLASS PHISHING EMAIL DETECTION Authors: JHA, SUJYOTI; Kumar, Manoj (SUPERVISOR) Abstract: Phishing attacks have become increasingly sophisticated, personalized, and challenging to detect in recent years. These attacks exploit fundamental aspects of human psychol ogy — urgency, authority, fear, and trust making them effective even against technically aware users. Recent advancements in large language models (LLMs) such as GPT-4 and Claude have further exacerbated this threat by providing attackers with the means to produce grammatically polished, contextually coherent, and highly personalized phish ing emails on a massive scale, thereby bypassing legacy keyword-based and rule-based detection tools. This thesis makes two primary contributions. First, it presents a system atic literature review of eighteen peer-reviewed studies tracing the evolution of behavioral cyber threat detection from classical machine learning approaches. The review identi fies a critical research gap: the absence of a lightweight, interpretable, and training-free framework capable of distinguishing AI-generated phishing from both human-authored phishing and legitimate email. To address this gap, this thesis proposes PhishScore — an unsupervised weighted scoring system that categorizes emails into one of three classes: genuine, phishing, and AI-phishing. PhishScore computes a continuous risk score between 0 and 100 based on twelve handcrafted features organized under social engineering, struc tural, and stylometric characteristics, and maps this score to actionable risk tiers — Low, Medium, and High — using fixed thresholds. When tested on a balanced dataset of 2,139 emails drawn from the Enron, Nazario, and Greco (2023) corpora, PhishScore delivers an ROC-AUC score of 0.8135 with statistically significant class separability (F = 313.62, p <0.001). Interestingly, stylometric features turn out to be stronger predictors than social engineering terms, confirming that AI-generated phishing is linguistically distinguishable not by what is written, but by how it is written. PhishScore is fully interpretable, re quires no supervised training, and is suitable for deployment as a transparent pre-filter in real-world enterprise email security pipelines.

VLLM FEATURE READINESS ON TEXT GENERTAED LLAMA 3.1-8B MODEL

2026-07-06T09:18:04Z

Title: VLLM FEATURE READINESS ON TEXT GENERTAED LLAMA 3.1-8B MODEL Authors: VERMA, POOJA; Kaur, Gull (SUPERVISOR) Abstract: The recent developments in Artificial Intelligence, Deep Learning, and Natural Language Processing have revolutionized the areas of speech synthesis and language generation. Contemporary neural Text-to-Speech (TTS) models are now able to synthesize highly human-sounding speech that is characterized by improved intelligibility, pronunciations, and prosody. At the same time, there has been great success in using Large Language Models (LLMs) for tasks like conversations, text generation, summarization, and reasoning. The current research work comprises a detailed comparative analysis of sophisticated Neural TTS systems alongside inference optimization methods for Large Language Models. This analysis is concentrated on three contemporary TTS models, namely Tacotron, FastSpeech 2, and MatchaTTS. These models are comparatively studied regarding their performance parameters such as synthesizer quality, training difficulty, inference speed, computation, and real-world implementation feasibility. In addition to speech synthesis, this study also considers advanced inference acceleration and optimization methods for autoregressive Large Language Models (LLMs). The considered methods include speculative decoding, LoRA adapter, and Multi-LoRA adapter, which were implemented on an Intel AI accelerator with Intel Gaudi B70 accelerator and the Llama 3.1 8B model. The following three speculative decoding strategies have been considered in order to speed up the inference process based on generation of several token candidates through draft/retrieval procedures and subsequent validation by the target LLM: EAGLE-3, N-gram prompt lookup decoding, and suffix array retrieval decoding. Moreover, the LoRA adapter and Multi-LoRA adapter methods have been considered for efficient parameter tuning and multitask fine tuning, respectively. The effectiveness of the presented inference strategies has been assessed by using the key inference parameters: time to first token (TTFT), time per output token (TPOT), and throughput.

SEMANTIC REGION-AWARE FACIAL ATTRIBUTE EDITING VIA SEGFORMER PARSING AND DIFFUSION INPAINTING

2026-07-06T09:17:56Z

Title: SEMANTIC REGION-AWARE FACIAL ATTRIBUTE EDITING VIA SEGFORMER PARSING AND DIFFUSION INPAINTING Authors: NIKHIL; Sethi, Manoj (SUPERVISOR) Abstract: Facial attribute editing, which involves making modifications to a specific area of the face without changing other areas, is still a difficult task due to its inherent nature of being on the cusp of semantic segmentation and generative image synthesis. The existing GAN-based methods, like StarGAN and AttGAN, have issues with controlling the edit's position, while the diffusion-based techniques, which do not employ masks, do not restrict the edit to the proper area. In this work, I introduce SemFaceDiff, which incorporates both capabilities through a transformer-based face parsing followed by latent diffusion-based inpainting in a three-step pipeline. Then, the segmentation mask is improved using morphological binary dilation and Gaussian boundary feathering (σ=7). The actual inpainting is carried out with Stable Diffusion XL Inpainting, which utilizes a pair of CLIP-based prompt encoders. The presented solution allows for editing 11 facial attributes in an iterative fashion and produces photo-realistic results, measured by the high SSIM score, low LPIPS score, and CLIP Faithfulness score.

CONDITIONAL ADVERSARIAL IMAGE-TO-IMAGE TRANSLATION WITH U-NET GENERATOR, PATCHGAN DISCRIMINATOR, AND VGG19 PERCEPTUAL LOSS

2026-07-06T09:17:21Z

Title: CONDITIONAL ADVERSARIAL IMAGE-TO-IMAGE TRANSLATION WITH U-NET GENERATOR, PATCHGAN DISCRIMINATOR, AND VGG19 PERCEPTUAL LOSS Authors: SINGH, TARUN; Bhat, Aruna (SUPERVISOR) Abstract: The demand for converting hand drawings into photorealistic images can be attributed to the difficulty of generating rich visuals from sparse, abstract, and incomplete inputs. The rapid growth of creative and design applications by the demand of automation requires better strategies for image synthesis. However, while generative modeling has been proposed as a combination of adversarial training, cycle consistency, and diffusion-based architectures; the use of deep generative systems, improving an architecture time and complexity. Hand drawings are input about a user’s coarse sketch. The global movement of having generative models for the public is producing many initiatives. While generative adversarial networks have demonstrated some promising results, there are still challenges, particularly in models trained for conditional generation. Advanced generative techniques in the computer vision domain addressed the critical challenge of preserving semantic layout and ensuring the judicious usage of perceptual losses for models in deep learning. In this model, we have implemented the key techniques which involve Generative Adversarial Networks (GANs), particularly pix2pix, perceptual loss functions, pre trained VGG-19 network and U-Net architecture. These techniques will provide robust solutions for photorealistic output and secure scene composition. Perceptual metrics are very crucial in providing critical insights into image quality and the mechanics of human-like similarity. We have used a conditional GAN-based architecture (Pix2Pix) with a U-Net generator and perceptual loss, trained on hand-drawn sketches for photorealistic image synthesis. Our results help in demonstrating the effectiveness of the method proposed by offering a scalable solution for the generation of realistic images.