Skip to content

water module

Water body segmentation using OmniWaterMask.

This module provides a high-level interface for detecting water bodies in satellite and aerial imagery using the OmniWaterMask library. It supports a wide range of sensors (Sentinel-2, NAIP, Landsat, etc.) and resolutions (0.2m to 50m) by combining a sensor-agnostic deep learning model with NDWI calculations and OpenStreetMap reference data.

Reference

BAND_ORDER_PRESETS = {'naip': [1, 2, 3, 4], 'sentinel2': [3, 2, 1, 4], 'landsat': [4, 3, 2, 5]} module-attribute

Predefined band order mappings for common sensors.

Each preset maps to a list of 1-based band indices in the order [Red, Green, Blue, NIR] as used by rasterio.

  • "naip": R, G, B, NIR (bands 1-4)
  • "sentinel2": For 6-band composites (B2, B3, B4, B8, B11, B12), maps Red=B4 (band 3), Green=B3 (band 2), Blue=B2 (band 1), NIR=B8 (band 4)
  • "landsat": For Landsat 8/9 (B1-B7), maps Red=B4 (band 4), Green=B3 (band 3), Blue=B2 (band 2), NIR=B5 (band 5)

segment_water(input_path, band_order='naip', output_raster=None, output_vector=None, batch_size=4, device=None, dtype='float32', no_data_value=0, patch_size=1000, overlap_size=300, use_osm_water=True, use_osm_building=True, use_osm_roads=True, cache_dir=None, model_dir=None, overwrite=True, min_area=10, smooth=True, smooth_iterations=3, verbose=True, **kwargs)

Segment water bodies from satellite or aerial imagery using OmniWaterMask.

Uses a sensor-agnostic deep learning model combined with NDWI and OpenStreetMap data to detect water bodies in imagery ranging from 0.2m to 50m resolution. Supports Sentinel-2, NAIP, Landsat, and other multispectral sensors with Red, Green, Blue, and NIR bands.

Parameters:

Name Type Description Default
input_path Union[str, Path, List[Union[str, Path]]]

Path to input GeoTIFF file(s). Can be a single path (string or Path) or a list of paths for batch processing. When a list is provided, each scene is processed and output files are named based on each input filename.

required
band_order Union[List[int], str]

Band indices for Red, Green, Blue, NIR channels (1-based, as used by rasterio). Can be a list of 4 integers or a string preset: "naip" ([1,2,3,4]), "sentinel2" ([3,2,1,4]), "landsat" ([4,3,2,5]). Defaults to "naip".

'naip'
output_raster Optional[str]

Path to save the output water mask GeoTIFF. If None, derives from input filename (e.g., input_water_mask.tif). For multiple inputs, this is ignored and output names are derived from each input filename.

None
output_vector Optional[str]

Path to save vectorized water body polygons (e.g., GeoJSON, GPKG, Shapefile). If provided, the raster mask is converted to vector polygons. For multiple inputs, this is ignored and output names are derived from each input filename using the same extension. Defaults to None.

None
batch_size int

Number of scenes to process in parallel. Defaults to 4.

4
device Optional[str]

Device for inference (e.g., "cuda", "cpu", "mps"). If None, auto-selects the best available device. Defaults to None.

None
dtype str

Data type for model inference precision. One of "float32", "float16", or "bfloat16". Using "float16" is recommended as it is faster and uses less memory while the output is always a binary mask. Defaults to "float32".

'float32'
no_data_value int

Value representing no-data pixels in the input imagery. Defaults to 0.

0
patch_size int

Size of patches for sliding-window inference in pixels. Defaults to 1000.

1000
overlap_size int

Overlap between patches to reduce edge artifacts in pixels. Defaults to 300.

300
use_osm_water bool

Include OpenStreetMap water features to improve accuracy. Defaults to True.

True
use_osm_building bool

Include OSM building data to reduce false positives in built-up areas. Defaults to True.

True
use_osm_roads bool

Include OSM road data to reduce false positives along roads. Defaults to True.

True
cache_dir Optional[str]

Directory for caching intermediate results and OSM data. If None, uses OWM_cache in the current working directory. Defaults to None.

None
model_dir Optional[str]

Custom directory to store/load OmniWaterMask model files. If None, uses the default location. Defaults to None.

None
overwrite bool

Whether to overwrite existing output files. Defaults to True.

True
min_area float

Minimum polygon area in square map units to keep during vectorization. Defaults to 10.

10
smooth bool

Whether to smooth vectorized polygons using the smoothify library. Only applies when output_vector is provided. Defaults to True.

True
smooth_iterations int

Number of smoothing iterations. Higher values produce smoother boundaries. Only applies when smooth=True. Defaults to 3.

3
verbose bool

Whether to print progress messages. Defaults to True.

True
**kwargs Any

Additional keyword arguments passed to omniwatermask.make_water_mask().

{}

Returns:

Type Description
Union[str, GeoDataFrame, List[str], List[GeoDataFrame]]

For a single input file: If output_vector is provided, returns a GeoDataFrame with vectorized (and optionally smoothed) water body polygons. Otherwise, returns the file path (str) to the output water mask GeoTIFF.

Union[str, GeoDataFrame, List[str], List[GeoDataFrame]]

For multiple input files: Returns a list of results (list of str paths or list of GeoDataFrames), one per input file.

Raises:

Type Description
ImportError

If omniwatermask is not installed.

ValueError

If band_order is an unrecognized string preset or does not contain exactly 4 band indices.

FileNotFoundError

If the input file(s) do not exist.

Example

import geoai

NAIP imagery (R, G, B, NIR)

mask_path = geoai.segment_water( ... "naip_scene.tif", ... band_order="naip", ... output_raster="water_mask.tif", ... )

Sentinel-2 with vectorization and smoothing

gdf = geoai.segment_water( ... "sentinel2_scene.tif", ... band_order="sentinel2", ... output_raster="water_mask.tif", ... output_vector="water_bodies.geojson", ... smooth=True, ... smooth_iterations=3, ... min_area=100, ... )

Batch processing multiple files

results = geoai.segment_water( ... ["scene1.tif", "scene2.tif"], ... band_order="sentinel2", ... )

Source code in geoai/water.py
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
def segment_water(
    input_path: Union[str, "Path", List[Union[str, "Path"]]],
    band_order: Union[List[int], str] = "naip",
    output_raster: Optional[str] = None,
    output_vector: Optional[str] = None,
    batch_size: int = 4,
    device: Optional[str] = None,
    dtype: str = "float32",
    no_data_value: int = 0,
    patch_size: int = 1000,
    overlap_size: int = 300,
    use_osm_water: bool = True,
    use_osm_building: bool = True,
    use_osm_roads: bool = True,
    cache_dir: Optional[str] = None,
    model_dir: Optional[str] = None,
    overwrite: bool = True,
    min_area: float = 10,
    smooth: bool = True,
    smooth_iterations: int = 3,
    verbose: bool = True,
    **kwargs: Any,
) -> Union[str, "gpd.GeoDataFrame", List[str], List["gpd.GeoDataFrame"]]:
    """Segment water bodies from satellite or aerial imagery using OmniWaterMask.

    Uses a sensor-agnostic deep learning model combined with NDWI and
    OpenStreetMap data to detect water bodies in imagery ranging from
    0.2m to 50m resolution. Supports Sentinel-2, NAIP, Landsat, and
    other multispectral sensors with Red, Green, Blue, and NIR bands.

    Args:
        input_path: Path to input GeoTIFF file(s). Can be a single path
            (string or Path) or a list of paths for batch processing.
            When a list is provided, each scene is processed and output
            files are named based on each input filename.
        band_order: Band indices for Red, Green, Blue, NIR channels
            (1-based, as used by rasterio). Can be a list of 4 integers
            or a string preset: ``"naip"`` ([1,2,3,4]),
            ``"sentinel2"`` ([3,2,1,4]), ``"landsat"`` ([4,3,2,5]).
            Defaults to ``"naip"``.
        output_raster: Path to save the output water mask GeoTIFF. If None,
            derives from input filename (e.g., ``input_water_mask.tif``).
            For multiple inputs, this is ignored and output names are
            derived from each input filename.
        output_vector: Path to save vectorized water body polygons (e.g.,
            GeoJSON, GPKG, Shapefile). If provided, the raster mask is
            converted to vector polygons. For multiple inputs, this is
            ignored and output names are derived from each input filename
            using the same extension. Defaults to None.
        batch_size: Number of scenes to process in parallel. Defaults to 4.
        device: Device for inference (e.g., ``"cuda"``, ``"cpu"``, ``"mps"``).
            If None, auto-selects the best available device. Defaults to None.
        dtype: Data type for model inference precision. One of
            ``"float32"``, ``"float16"``, or ``"bfloat16"``. Using
            ``"float16"`` is recommended as it is faster and uses less
            memory while the output is always a binary mask.
            Defaults to ``"float32"``.
        no_data_value: Value representing no-data pixels in the input imagery.
            Defaults to 0.
        patch_size: Size of patches for sliding-window inference in pixels.
            Defaults to 1000.
        overlap_size: Overlap between patches to reduce edge artifacts in
            pixels. Defaults to 300.
        use_osm_water: Include OpenStreetMap water features to improve
            accuracy. Defaults to True.
        use_osm_building: Include OSM building data to reduce false positives
            in built-up areas. Defaults to True.
        use_osm_roads: Include OSM road data to reduce false positives along
            roads. Defaults to True.
        cache_dir: Directory for caching intermediate results and OSM data.
            If None, uses ``OWM_cache`` in the current working directory.
            Defaults to None.
        model_dir: Custom directory to store/load OmniWaterMask model files.
            If None, uses the default location. Defaults to None.
        overwrite: Whether to overwrite existing output files.
            Defaults to True.
        min_area: Minimum polygon area in square map units to keep during
            vectorization. Defaults to 10.
        smooth: Whether to smooth vectorized polygons using the smoothify
            library. Only applies when ``output_vector`` is provided.
            Defaults to True.
        smooth_iterations: Number of smoothing iterations. Higher values
            produce smoother boundaries. Only applies when ``smooth=True``.
            Defaults to 3.
        verbose: Whether to print progress messages. Defaults to True.
        **kwargs: Additional keyword arguments passed to
            ``omniwatermask.make_water_mask()``.

    Returns:
        For a single input file:
            If ``output_vector`` is provided, returns a ``GeoDataFrame``
            with vectorized (and optionally smoothed) water body polygons.
            Otherwise, returns the file path (str) to the output water
            mask GeoTIFF.
        For multiple input files:
            Returns a list of results (list of str paths or list of
            GeoDataFrames), one per input file.

    Raises:
        ImportError: If omniwatermask is not installed.
        ValueError: If ``band_order`` is an unrecognized string preset or
            does not contain exactly 4 band indices.
        FileNotFoundError: If the input file(s) do not exist.

    Example:
        >>> import geoai
        >>> # NAIP imagery (R, G, B, NIR)
        >>> mask_path = geoai.segment_water(
        ...     "naip_scene.tif",
        ...     band_order="naip",
        ...     output_raster="water_mask.tif",
        ... )
        >>> # Sentinel-2 with vectorization and smoothing
        >>> gdf = geoai.segment_water(
        ...     "sentinel2_scene.tif",
        ...     band_order="sentinel2",
        ...     output_raster="water_mask.tif",
        ...     output_vector="water_bodies.geojson",
        ...     smooth=True,
        ...     smooth_iterations=3,
        ...     min_area=100,
        ... )
        >>> # Batch processing multiple files
        >>> results = geoai.segment_water(
        ...     ["scene1.tif", "scene2.tif"],
        ...     band_order="sentinel2",
        ... )
    """
    try:
        from omniwatermask import make_water_mask
    except ImportError:
        raise ImportError(
            "omniwatermask is required for water segmentation. "
            "Install it with: pip install omniwatermask "
            "or pip install geoai-py[extra]"
        )

    import torch

    # Resolve band order
    if isinstance(band_order, str):
        preset = band_order.lower()
        if preset not in BAND_ORDER_PRESETS:
            raise ValueError(
                f"Unknown band_order preset: {band_order!r}. "
                f"Available presets: {list(BAND_ORDER_PRESETS.keys())}. "
                f"Or pass a list of 4 integers for custom band order."
            )
        band_order = BAND_ORDER_PRESETS[preset]
    elif isinstance(band_order, (list, tuple)):
        if len(band_order) != 4:
            raise ValueError(
                f"band_order must contain exactly 4 band indices "
                f"(Red, Green, Blue, NIR), got {len(band_order)}."
            )
        if any(b < 1 for b in band_order):
            raise ValueError(
                f"band_order indices must be 1-based (>= 1), got {band_order}."
            )
    else:
        raise ValueError(
            f"band_order must be a string preset or list of integers, "
            f"got {type(band_order).__name__}."
        )

    # Normalize input path(s)
    if isinstance(input_path, (str, Path)):
        scene_paths = [Path(input_path)]
        multi_input = False
    else:
        scene_paths = [Path(p) for p in input_path]
        multi_input = len(scene_paths) > 1

    for p in scene_paths:
        if not p.exists():
            raise FileNotFoundError(f"Input file not found: {p}")

    # Determine output raster path(s)
    if multi_input:
        # For multiple inputs, derive output names from each input
        output_dir = Path.cwd() if output_raster is None else Path(output_raster)
        if output_raster is not None and not output_dir.suffix:
            # output_raster is treated as a directory for multi-input
            os.makedirs(output_dir, exist_ok=True)
        else:
            output_dir = Path.cwd()
        raster_paths = [
            str(output_dir / f"{p.stem}_water_mask.tif") for p in scene_paths
        ]
    else:
        if output_raster is None:
            stem = scene_paths[0].stem
            output_raster = str(scene_paths[0].parent / f"{stem}_water_mask.tif")
        raster_paths = [str(output_raster)]

    # Check overwrite for output files (single input only; multi checked per-file)
    if not multi_input and not overwrite:
        if Path(raster_paths[0]).exists():
            raise FileExistsError(
                f"Output raster '{raster_paths[0]}' already exists and "
                "overwrite is set to False."
            )
        if output_vector is not None and Path(output_vector).exists():
            raise FileExistsError(
                f"Output vector '{output_vector}' already exists and "
                "overwrite is set to False."
            )

    # Map dtype string to torch.dtype
    dtype_map = {
        "float32": torch.float32,
        "float16": torch.float16,
        "bfloat16": torch.bfloat16,
    }
    if isinstance(dtype, str):
        if dtype not in dtype_map:
            raise ValueError(
                f"Unknown dtype: {dtype!r}. Choose from: {list(dtype_map.keys())}"
            )
        inference_dtype = dtype_map[dtype]
    else:
        inference_dtype = dtype

    # Determine the output directory for OmniWaterMask
    owm_output_dir = Path(raster_paths[0]).parent

    # Build kwargs for make_water_mask
    owm_kwargs = {
        "scene_paths": scene_paths,
        "band_order": band_order,
        "batch_size": batch_size,
        "inference_dtype": inference_dtype,
        "no_data_value": no_data_value,
        "inference_patch_size": patch_size,
        "inference_overlap_size": overlap_size,
        "use_osm_water": use_osm_water,
        "use_osm_building": use_osm_building,
        "use_osm_roads": use_osm_roads,
        "overwrite": overwrite,
        "output_dir": owm_output_dir,
    }

    if cache_dir is not None:
        owm_kwargs["cache_dir"] = Path(cache_dir)

    if model_dir is not None:
        owm_kwargs["destination_model_dir"] = Path(model_dir)

    if device is not None:
        owm_kwargs["inference_device"] = device
        owm_kwargs["mosaic_device"] = device

    # Merge any extra kwargs
    owm_kwargs.update(kwargs)

    # Run OmniWaterMask inference
    if verbose:
        print("Running OmniWaterMask water segmentation...")

    result_paths = make_water_mask(**owm_kwargs)

    if not result_paths:
        raise RuntimeError("OmniWaterMask did not produce any output files.")

    # Process each result: extract water band and optionally vectorize
    output_rasters = []
    output_gdfs = []

    for i, result_path in enumerate(result_paths):
        dst_raster = (
            raster_paths[i]
            if i < len(raster_paths)
            else str(owm_output_dir / f"{scene_paths[i].stem}_water_mask.tif")
        )

        os.makedirs(Path(dst_raster).parent, exist_ok=True)
        _extract_water_band(str(result_path), dst_raster)

        # Clean up original multi-band file if it differs from output
        if Path(result_path).resolve() != Path(dst_raster).resolve():
            try:
                os.remove(str(result_path))
            except OSError:
                pass

        if verbose:
            print(f"Water mask saved to: {dst_raster}")

        output_rasters.append(dst_raster)

        # Vectorize if requested
        if output_vector is not None:
            if multi_input:
                # Derive per-file vector name from input stem
                vec_ext = Path(output_vector).suffix or ".geojson"
                vec_path = str(
                    Path(dst_raster).parent
                    / f"{scene_paths[i].stem}_water_bodies{vec_ext}"
                )
            else:
                vec_path = str(output_vector)

            gdf = _vectorize_mask(
                raster_path=dst_raster,
                output_vector=vec_path,
                min_area=min_area,
                smooth=smooth,
                smooth_iterations=smooth_iterations,
                overwrite=overwrite,
                verbose=verbose,
            )
            output_gdfs.append(gdf)

    # Return results
    if output_vector is not None:
        return output_gdfs if multi_input else output_gdfs[0]
    return output_rasters if multi_input else output_rasters[0]