DynamicEarth: How Far are We from Open-Vocabulary Change Detection?

DynamicEarth: How Far are We from Open-Vocabulary Change Detection?

Kaiyu Li¹

Xiangyong Cao^{✉ 1}

Yupeng Deng²

Chao Pang³

Zepeng Xin¹

Hui Qiao⁴

Tieliang Gong¹

Deyu Meng¹

Zhi Wang¹

Xi'an Jiaotong University¹

Chinese Academy of Sciences²

Wuhan University³

China Telecom⁴

Code [GitHub]

Paper [arXiv]

Demo [Colab]

The two OVCD frameworks proposed in this paper. (a) M-C-I: discover all class-agnostic masks, determine if the mask region has changed, and identify the change class. (b) I-M-C: identify all targets of interest, convert to mask format, and compare if the target has changed.

Abstract

Monitoring Earth's evolving land covers requires methods capable of detecting changes across a wide range of categories and contexts. Existing change detection methods are hindered by their dependency on predefined classes, reducing their effectiveness in open-world applications. To address this issue, we introduce open-vocabulary change detection (OVCD), a novel task that bridges vision and language to detect changes across any category. Considering the lack of high-quality data and annotation, we propose two training-free frameworks, M-C-I and I-M-C, which leverage and integrate off-the-shelf foundation models for the OVCD task. The insight behind the M-C-I framework is to discover all potential changes and then classify these changes, while the insight of I-M-C framework is to identify all targets of interest and then determine whether their states have changed. Based on these two frameworks, we instantiate to obtain several methods, e.g., SAM-DINOv2-SegEarth-OV, Grounding-DINO-SAM2-DINO, etc. Extensive evaluations on 5 benchmark datasets demonstrate the superior generalization and robustness of our OVCD methods over existing supervised and unsupervised methods. To support continued exploration, we release DynamicEarth, a dedicated codebase designed to advance research and application of OVCD.

Different change detection tasks: (a) Binary change detection aims at discovering all (interested) changes and generating a binary mask; (b) Semantic change detection further identifies the category of changes. However, both can only be trained and evaluated on data with predefined categories; (c) Our proposed OVCD can detect changes in any category according to the user's requirements.

Quantitative Results

Visualizations

Open-vocabulary change detection examples. In each group: x1, x2, ground truth, the result of an M-C-I method and the result of an I-M-C method. Color rendering: Building, Water, Playground.

Acknowledgements

Based on a template by Phillip Isola and Richard Zhang.