radio-llava: Advancing Vision-Language Models for Radio Astronomical Source Analysis

S. Riggi*, T. Cecconello, A. Pilzer, S. Palazzo, N. Gupta, A. M. Hopkins, C. Trigilio, G. Umana

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The advent of next-generation radio telescopes is set to transform radio astronomy by producing massive data volumes that challenge traditional processing methods. Deep learning techniques have shown strong potential in automating radio analysis tasks, yet are often constrained by the limited availability of large annotated datasets. Recent progress in self-supervised learning has led to foundational radio vision models, but adapting them for new tasks typically requires coding expertise, limiting their accessibility to a broader astronomical community. Text-based AI interfaces offer a promising alternative by enabling task-specific queries and example-driven learning. In this context, large language models (LLMs), with their remarkable zero-shot capabilities, are increasingly used in scientific domains. However, deploying large-scale models remains resource-intensive, and there is a growing demand for AI systems that can reason over both visual and textual data in astronomical analysis. This study explores small-scale vision-language models (VLMs) as AI assistants for radio astronomy, combining LLM capabilities with vision transformers. We fine-tuned the LLaVA VLM on a dataset of 59k radio images from multiple surveys, enriched with 38k image-caption pairs from the literature. The fine-tuned models show clear improvements over base models in radio-specific tasks, achieving ∼30% F1-score gains in extended source detection, but they underperform vision-only classifiers and exhibit ∼20% drop on general multimodal tasks. Inclusion of caption data and LoRA fine-tuning enhances instruction following and helps recover ∼10% accuracy on multimodal benchmarks (e.g., ChartQA/DocVQA). This work lays the foundation for future advancements in radio VLMs, highlighting their potential and limitations, such as the need for better multimodal alignment, higher-quality datasets, and mitigation of catastrophic forgetting.

Original languageEnglish
Article numbere121
Pages (from-to)1-19
Number of pages19
JournalPublications of the Astronomical Society of Australia
Volume42
DOIs
Publication statusPublished - 26 Aug 2025

Bibliographical note

© The Author(s), 2025. Published by Cambridge University Press on behalf of Astronomical Society of Australia. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Keywords

  • Radio continuum: general
  • methods: data analysis
  • techniques: image processing
  • radio continuum: galaxies
  • astronomical instrumentation
  • methods and techniques

Fingerprint

Dive into the research topics of 'radio-llava: Advancing Vision-Language Models for Radio Astronomical Source Analysis'. Together they form a unique fingerprint.

Cite this