Skip to main navigation Skip to search Skip to main content

CTIGuardian: a few-shot framework for mitigating privacy leakage in fine-tuned LLMs

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Large Language Models (LLMs) are often fine-tuned to adapt their general-purpose knowledge to specific tasks and domains such as cyber threat intelligence (CTI). Fine-tuning is mostly done through proprietary datasets that may contain sensitive information. Owners expect their fine-tuned model to not inadvertently leak this information to potentially adversarial end users. Using CTI as a use case, we demonstrate that data-extraction attacks can recover sensitive information from fine-tuned models on CTI reports, underscoring the need for mitigation. Retraining the full model to eliminate this leakage is computationally expensive and impractical. We propose an alternative approach, which we call privacy alignment, inspired by safety alignment in LLMs. Just like safety alignment teaches the model to abide by safety constraints through a few examples, we enforce privacy alignment through few-shot supervision, integrating a privacy classifier and a privacy redactor, both handled by the same underlying LLM. We evaluate our system, called CTIGuardian, using GPT-4o mini and Mistral-7B Instruct models, benchmarking against Presidio, a named entity recognition (NER) baseline. Results show that CTIGuardian provides a better privacy-utility trade-off than NER based models. While we demonstrate its effectiveness on a CTI use case, the framework is generic enough to be applicable to other sensitive domains.

Original languageEnglish
Title of host publication2025 Annual Computer Security Applications Conference Workshops ACSACW 2025
Subtitle of host publicationproceedings
Place of PublicationPiscataway, NJ
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages510-522
Number of pages13
ISBN (Electronic)9798331545369
ISBN (Print)9798331545376
DOIs
Publication statusPublished - 2025
Event2025 Annual Computer Security Applications Conference Workshops, ACSACW 2025 - Honolulu, United States
Duration: 8 Dec 202512 Dec 2025

Conference

Conference2025 Annual Computer Security Applications Conference Workshops, ACSACW 2025
Country/TerritoryUnited States
CityHonolulu
Period8/12/2512/12/25

Fingerprint

Dive into the research topics of 'CTIGuardian: a few-shot framework for mitigating privacy leakage in fine-tuned LLMs'. Together they form a unique fingerprint.

Cite this