A Lightweight Vision-Language Model for Disaster Image Summarization

Keywords

Semantic CommunicationDisaster ResponseVision-Language ModelEdge InferenceDR-IoTImage Captioning

Hibiki Yoshizaki , Akira Uchiyama , Akihito Hiromori , Mineo Takai , Hirozumi Yamaguchi

2026 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), PerconAI 2026, pp. 1203–1208

Abstract

During disasters, response agencies must rapidly obtain accurate situational awareness. Images of on-site conditions are useful for this purpose, but their large data size makes real-time aggregation from many locations difficult when communication infrastructure is degraded. We address this challenge by combining a disaster-ready wide-area wireless system (DR-IoT) with small edge devices deployed across sites. Each device locally summarizes captured images into concise text and transmits the text as a compressed proxy, enabling objective reporting and efficient multi-site data collection under strict bandwidth limits. We develop a lightweight model that runs on small devices and generates textual summaries of disaster scenes. We evaluate our model against existing lightweight captioning baselines in terms of output quality and model size. Results show that it achieves practical latency and competitive accuracy for disaster-focused summarization, indicating its suitability for deployment on IoT devices in real disaster settings.

Immediately after a large-scale disaster, response agencies must obtain accurate situational awareness quickly. Images of on-site conditions are a valuable source of information, but their size makes real-time aggregation from many locations impractical when communication infrastructure is degraded. Disaster-ready wide-area wireless systems such as LPWA and DR-IoT provide only tens of kbps — enough for text but not raw images.

We propose a two-stage reporting scheme: each edge device captures local scenes, summarizes them on-device into concise text, and transmits only the text as a compact proxy. The emergency operations center can then selectively request high-resolution images for scenes that require deeper analysis, allocating bandwidth and personnel where they matter most.

At the core of this system is a lightweight vision–language model designed to run on small edge devices. Targeted at disaster scene summarization, it is evaluated against existing lightweight captioning baselines in terms of output quality, model size, and latency. Results show that our model achieves practical latency on IoT-class devices while matching or exceeding baselines on disaster-domain summarization, demonstrating its suitability for real-world deployment.

Environment-Aware Distributed Scheduling for Emergency LoRa Networks

Yuto Inaba, Tatsuya Amano, Akihito Hiromori, Hirozumi Yamaguchi

2026 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), SPT-IoT 2026, pp. 1366–1371

Disaster CommunicationLoRa +4

Physics-Integrated Deep Learning for Urban Landslide Prediction

Ren Ozeki, Hamada Rizk, Hirozumi Yamaguchi

2026 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), URBSENSE 2026, pp. 1094–1099

Landslide PredictionPhysics-Integrated Learning +3

Ray-Tracing-Driven Pattern-Based Vehicle Recognition in ISAC Radar

Heetae Jin, Akira Uchiyama

2026 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), PerRad 2026, pp. 328–333

ISACBeyond 5G +4

A Simulation Framework for Precision Formation Flying of Massive Satellite Swarms

Tatsuya Amano, Akihito Hiromori, Hirozumi Yamaguchi, Sumio Morioka

2026 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), PerVehicle , pp. 230–235

Satellite Formation FlyingDistributed Simulation +4

A Digital Twin Approach for Crowd Flow Modeling on Railway Station Platforms

Yu Yasuda, Tatsuya Amano and Hirozumi Yamaguchi

IEEE International Conference on Smart Computing (SMARTCOMP), pp. 82-89

DOI 10.1109/SMARTCOMP65954.2025.00069

Digital TwinCrowd Simulation +1

Efficient Machine Unlearning for Mobility Logs with Spatio-Temporal and Natural-Language Data

Haruki Yonekura, Ren Ozeki, Tatsuya Amano, Hamada Rizk, Hirozumi Yamaguchi

In Proceedings of the 33rd ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL '25). pp. 1186–1189.

DOI 10.1145/3748636.3763226

Machine UnlearningPrivacy +1