[2605.04409] UAV as Urban Construction Change Monitor: A New ...
disentangles task-specific representations via multi-head gating, and injects detection-derived spatial priors into caption generation, enabling coherent semantic correspondence while preserving fine...