Back to Projects
ResearchCVPR 2025

AnySat Multi-Sensor Flood Inference

Implemented inference using AnySat multimodal foundation model (CVPR 2025 Highlight) for flood segmentation with SAR-optical fusion from Sentinel-1 and Sentinel-2.

125M
Parameters (Multi-Modal)
AnySatJEPASAR FusionSentinel-1/2Multi-ModalPyTorch

Overview

Implemented inference using the AnySat multimodal foundation model (CVPR 2025 Highlight) for flood segmentation, demonstrating expertise with state-of-the-art multi-sensor fusion architectures.

Technical Implementation

Multi-Modal Fusion

Processed heterogeneous satellite data combining:

  • Sentinel-1 SAR: 3 channels (VV, VH, ratio)
  • Sentinel-2 Optical: 10 channels

Scale-Adaptive Architecture

Implemented AnySat's JEPA-based architecture (125M parameters) handling varying spatial resolutions from 10m to 60m.

Three Output Modes

  • Tile mode: [B, 768] embeddings for scene classification
  • Patch mode: [B, P, P, 768] grid for patch-level tasks
  • Dense mode: [B, H, W, 1536] per-pixel features for segmentation

Technical Complexity

  • Handled 11 different sensor types (Sentinel-1/2, NAIP, ALOS-2, aerial imagery)
  • Implemented temporal dimension processing for multi-date analysis
  • Managed varying channel counts across sensors (2-13 channels)
  • Built preprocessing for SAR-optical fusion with proper normalization

Model Architecture

  • Vision Transformer: 768-dim embeddings, 6+1 blocks, 12 attention heads
  • Modality projectors for 11+ sensor types
  • Scale-adaptive JEPA (Joint-Embedding Predictive Architecture)