Skip to content

multiclean module

multiclean module

MultiClean integration utilities for cleaning segmentation results.

This module provides functions to use MultiClean (https://github.com/DPIRD-DMA/MultiClean) for post-processing segmentation masks and classification rasters. MultiClean performs morphological operations to smooth edges, remove noise islands, and fill gaps.

check_multiclean_available()

Check if multiclean is installed.

Raises:

Type Description
ImportError

If multiclean is not installed.

Source code in geoai/tools/multiclean.py
28
29
30
31
32
33
34
35
36
37
38
39
40
def check_multiclean_available():
    """
    Check if multiclean is installed.

    Raises:
        ImportError: If multiclean is not installed.
    """
    if not MULTICLEAN_AVAILABLE:
        raise ImportError(
            "multiclean is not installed. "
            "Please install it with: pip install multiclean "
            "or: pip install geoai-py[extra]"
        )

clean_raster(input_path, output_path, class_values=None, smooth_edge_size=2, min_island_size=100, connectivity=8, max_workers=None, fill_nan=False, band=1, nodata=None)

Clean a classification raster (GeoTIFF) and save the result.

Reads a GeoTIFF file, applies MultiClean morphological operations, and saves the cleaned result while preserving geospatial metadata (CRS, transform, nodata value).

Parameters:

Name Type Description Default
input_path str

Path to input GeoTIFF file.

required
output_path str

Path to save cleaned GeoTIFF file.

required
class_values int, list of int, or None

Target class values to process. If None, auto-detects unique values. Defaults to None.

None
smooth_edge_size int

Kernel width in pixels for edge smoothing. Defaults to 2.

2
min_island_size int

Minimum area (in pixels) for components. Defaults to 100.

100
connectivity int

Connectivity for component detection (4 or 8). Defaults to 8.

8
max_workers int

Thread pool size. Defaults to None.

None
fill_nan bool

Whether to fill NaN/nodata pixels. Defaults to False.

False
band int

Band index to read (1-indexed). Defaults to 1.

1
nodata float

Nodata value to use. If None, uses value from input file. Defaults to None.

None

Returns:

Name Type Description
None None

Writes cleaned raster to output_path.

Raises:

Type Description
ImportError

If multiclean or rasterio is not installed.

FileNotFoundError

If input_path does not exist.

Example

from geoai.tools.multiclean import clean_raster clean_raster( ... "segmentation_raw.tif", ... "segmentation_cleaned.tif", ... class_values=[0, 1, 2], ... smooth_edge_size=3, ... min_island_size=50 ... )

Source code in geoai/tools/multiclean.py
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
def clean_raster(
    input_path: str,
    output_path: str,
    class_values: Optional[Union[int, List[int]]] = None,
    smooth_edge_size: int = 2,
    min_island_size: int = 100,
    connectivity: int = 8,
    max_workers: Optional[int] = None,
    fill_nan: bool = False,
    band: int = 1,
    nodata: Optional[float] = None,
) -> None:
    """
    Clean a classification raster (GeoTIFF) and save the result.

    Reads a GeoTIFF file, applies MultiClean morphological operations,
    and saves the cleaned result while preserving geospatial metadata
    (CRS, transform, nodata value).

    Args:
        input_path (str): Path to input GeoTIFF file.
        output_path (str): Path to save cleaned GeoTIFF file.
        class_values (int, list of int, or None): Target class values to process.
            If None, auto-detects unique values. Defaults to None.
        smooth_edge_size (int): Kernel width in pixels for edge smoothing.
            Defaults to 2.
        min_island_size (int): Minimum area (in pixels) for components.
            Defaults to 100.
        connectivity (int): Connectivity for component detection (4 or 8).
            Defaults to 8.
        max_workers (int, optional): Thread pool size. Defaults to None.
        fill_nan (bool): Whether to fill NaN/nodata pixels. Defaults to False.
        band (int): Band index to read (1-indexed). Defaults to 1.
        nodata (float, optional): Nodata value to use. If None, uses value
            from input file. Defaults to None.

    Returns:
        None: Writes cleaned raster to output_path.

    Raises:
        ImportError: If multiclean or rasterio is not installed.
        FileNotFoundError: If input_path does not exist.

    Example:
        >>> from geoai.tools.multiclean import clean_raster
        >>> clean_raster(
        ...     "segmentation_raw.tif",
        ...     "segmentation_cleaned.tif",
        ...     class_values=[0, 1, 2],
        ...     smooth_edge_size=3,
        ...     min_island_size=50
        ... )
    """
    check_multiclean_available()

    if not RASTERIO_AVAILABLE:
        raise ImportError(
            "rasterio is required for raster operations. "
            "Please install it with: pip install rasterio"
        )

    if not os.path.exists(input_path):
        raise FileNotFoundError(f"Input file not found: {input_path}")

    # Read input raster
    with rasterio.open(input_path) as src:
        # Read the specified band
        mask = src.read(band)

        # Get metadata
        profile = src.profile.copy()

        # Handle nodata
        if nodata is None:
            nodata = src.nodata

        # Convert nodata to NaN if specified
        if nodata is not None:
            mask = mask.astype(np.float32)
            mask[mask == nodata] = np.nan

    # Clean the mask
    cleaned = clean_segmentation_mask(
        mask,
        class_values=class_values,
        smooth_edge_size=smooth_edge_size,
        min_island_size=min_island_size,
        connectivity=connectivity,
        max_workers=max_workers,
        fill_nan=fill_nan,
    )

    # Convert NaN back to nodata if needed
    if nodata is not None:
        # Convert any remaining NaN values back to nodata value
        if np.isnan(cleaned).any():
            cleaned = np.nan_to_num(cleaned, nan=nodata)

    # Update profile for output
    profile.update(
        dtype=cleaned.dtype,
        count=1,
        compress="lzw",
        nodata=nodata,
    )

    # Write cleaned raster
    output_dir = os.path.dirname(os.path.abspath(output_path))
    if output_dir and output_dir != os.path.abspath(os.sep):
        os.makedirs(output_dir, exist_ok=True)
    with rasterio.open(output_path, "w", **profile) as dst:
        dst.write(cleaned, 1)

clean_raster_batch(input_paths, output_dir, class_values=None, smooth_edge_size=2, min_island_size=100, connectivity=8, max_workers=None, fill_nan=False, band=1, suffix='_cleaned', verbose=True)

Clean multiple classification rasters in batch.

Processes multiple GeoTIFF files with the same cleaning parameters and saves results to an output directory.

Parameters:

Name Type Description Default
input_paths list of str

List of paths to input GeoTIFF files.

required
output_dir str

Directory to save cleaned files.

required
class_values int, list of int, or None

Target class values. Defaults to None (auto-detect).

None
smooth_edge_size int

Kernel width for edge smoothing. Defaults to 2.

2
min_island_size int

Minimum component area. Defaults to 100.

100
connectivity int

Connectivity (4 or 8). Defaults to 8.

8
max_workers int

Thread pool size. Defaults to None.

None
fill_nan bool

Whether to fill NaN pixels. Defaults to False.

False
band int

Band index to read (1-indexed). Defaults to 1.

1
suffix str

Suffix to add to output filenames. Defaults to "_cleaned".

'_cleaned'
verbose bool

Whether to print progress. Defaults to True.

True

Returns:

Type Description
List[str]

list of str: Paths to cleaned output files.

Raises:

Type Description
ImportError

If multiclean or rasterio is not installed.

Example

from geoai.tools.multiclean import clean_raster_batch input_files = ["mask1.tif", "mask2.tif", "mask3.tif"] outputs = clean_raster_batch( ... input_files, ... output_dir="cleaned_masks", ... min_island_size=50 ... )

Source code in geoai/tools/multiclean.py
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
def clean_raster_batch(
    input_paths: List[str],
    output_dir: str,
    class_values: Optional[Union[int, List[int]]] = None,
    smooth_edge_size: int = 2,
    min_island_size: int = 100,
    connectivity: int = 8,
    max_workers: Optional[int] = None,
    fill_nan: bool = False,
    band: int = 1,
    suffix: str = "_cleaned",
    verbose: bool = True,
) -> List[str]:
    """
    Clean multiple classification rasters in batch.

    Processes multiple GeoTIFF files with the same cleaning parameters
    and saves results to an output directory.

    Args:
        input_paths (list of str): List of paths to input GeoTIFF files.
        output_dir (str): Directory to save cleaned files.
        class_values (int, list of int, or None): Target class values.
            Defaults to None (auto-detect).
        smooth_edge_size (int): Kernel width for edge smoothing. Defaults to 2.
        min_island_size (int): Minimum component area. Defaults to 100.
        connectivity (int): Connectivity (4 or 8). Defaults to 8.
        max_workers (int, optional): Thread pool size. Defaults to None.
        fill_nan (bool): Whether to fill NaN pixels. Defaults to False.
        band (int): Band index to read (1-indexed). Defaults to 1.
        suffix (str): Suffix to add to output filenames. Defaults to "_cleaned".
        verbose (bool): Whether to print progress. Defaults to True.

    Returns:
        list of str: Paths to cleaned output files.

    Raises:
        ImportError: If multiclean or rasterio is not installed.

    Example:
        >>> from geoai.tools.multiclean import clean_raster_batch
        >>> input_files = ["mask1.tif", "mask2.tif", "mask3.tif"]
        >>> outputs = clean_raster_batch(
        ...     input_files,
        ...     output_dir="cleaned_masks",
        ...     min_island_size=50
        ... )
    """
    check_multiclean_available()

    # Create output directory
    os.makedirs(output_dir, exist_ok=True)

    output_paths = []

    for i, input_path in enumerate(input_paths):
        if verbose:
            print(f"Processing {i+1}/{len(input_paths)}: {input_path}")

        # Generate output filename
        basename = os.path.basename(input_path)
        name, ext = os.path.splitext(basename)
        output_filename = f"{name}{suffix}{ext}"
        output_path = os.path.join(output_dir, output_filename)

        try:
            # Clean the raster
            clean_raster(
                input_path,
                output_path,
                class_values=class_values,
                smooth_edge_size=smooth_edge_size,
                min_island_size=min_island_size,
                connectivity=connectivity,
                max_workers=max_workers,
                fill_nan=fill_nan,
                band=band,
            )

            output_paths.append(output_path)

            if verbose:
                print(f"  ✓ Saved to: {output_path}")

        except Exception as e:
            if verbose:
                print(f"  ✗ Failed: {e}")
            continue

    return output_paths

clean_segmentation_mask(mask, class_values=None, smooth_edge_size=2, min_island_size=100, connectivity=8, max_workers=None, fill_nan=False)

Clean a segmentation mask using MultiClean morphological operations.

This function applies three cleaning operations: 1. Edge smoothing - Uses morphological opening to reduce jagged boundaries 2. Island removal - Eliminates small connected components (noise) 3. Gap filling - Replaces invalid pixels with nearest valid class

Parameters:

Name Type Description Default
mask ndarray

2D numpy array containing segmentation classes. Can be int or float. NaN values are treated as nodata.

required
class_values int, list of int, or None

Target class values to process. If None, auto-detects unique values from the mask. Defaults to None.

None
smooth_edge_size int

Kernel width in pixels for edge smoothing. Set to 0 to disable smoothing. Defaults to 2.

2
min_island_size int

Minimum area (in pixels) for connected components. Components with area strictly less than this are removed. Defaults to 100.

100
connectivity int

Connectivity for component detection. Use 4 or 8. 8-connectivity considers diagonal neighbors. Defaults to 8.

8
max_workers int

Thread pool size for parallel processing. If None, uses default threading. Defaults to None.

None
fill_nan bool

Whether to fill NaN pixels with nearest valid class. Defaults to False.

False

Returns:

Type Description
ndarray

np.ndarray: Cleaned 2D segmentation mask with same shape as input.

Raises:

Type Description
ImportError

If multiclean is not installed.

ValueError

If mask is not 2D or if connectivity is not 4 or 8.

Example

import numpy as np from geoai.tools.multiclean import clean_segmentation_mask mask = np.random.randint(0, 3, (512, 512)) cleaned = clean_segmentation_mask( ... mask, ... class_values=[0, 1, 2], ... smooth_edge_size=2, ... min_island_size=50 ... )

Source code in geoai/tools/multiclean.py
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
def clean_segmentation_mask(
    mask: np.ndarray,
    class_values: Optional[Union[int, List[int]]] = None,
    smooth_edge_size: int = 2,
    min_island_size: int = 100,
    connectivity: int = 8,
    max_workers: Optional[int] = None,
    fill_nan: bool = False,
) -> np.ndarray:
    """
    Clean a segmentation mask using MultiClean morphological operations.

    This function applies three cleaning operations:
    1. Edge smoothing - Uses morphological opening to reduce jagged boundaries
    2. Island removal - Eliminates small connected components (noise)
    3. Gap filling - Replaces invalid pixels with nearest valid class

    Args:
        mask (np.ndarray): 2D numpy array containing segmentation classes.
            Can be int or float. NaN values are treated as nodata.
        class_values (int, list of int, or None): Target class values to process.
            If None, auto-detects unique values from the mask. Defaults to None.
        smooth_edge_size (int): Kernel width in pixels for edge smoothing.
            Set to 0 to disable smoothing. Defaults to 2.
        min_island_size (int): Minimum area (in pixels) for connected components.
            Components with area strictly less than this are removed. Defaults to 100.
        connectivity (int): Connectivity for component detection. Use 4 or 8.
            8-connectivity considers diagonal neighbors. Defaults to 8.
        max_workers (int, optional): Thread pool size for parallel processing.
            If None, uses default threading. Defaults to None.
        fill_nan (bool): Whether to fill NaN pixels with nearest valid class.
            Defaults to False.

    Returns:
        np.ndarray: Cleaned 2D segmentation mask with same shape as input.

    Raises:
        ImportError: If multiclean is not installed.
        ValueError: If mask is not 2D or if connectivity is not 4 or 8.

    Example:
        >>> import numpy as np
        >>> from geoai.tools.multiclean import clean_segmentation_mask
        >>> mask = np.random.randint(0, 3, (512, 512))
        >>> cleaned = clean_segmentation_mask(
        ...     mask,
        ...     class_values=[0, 1, 2],
        ...     smooth_edge_size=2,
        ...     min_island_size=50
        ... )
    """
    check_multiclean_available()

    if mask.ndim != 2:
        raise ValueError(f"Mask must be 2D, got shape {mask.shape}")

    if connectivity not in [4, 8]:
        raise ValueError(f"Connectivity must be 4 or 8, got {connectivity}")

    # Apply MultiClean
    cleaned = clean_array(
        mask,
        class_values=class_values,
        smooth_edge_size=smooth_edge_size,
        min_island_size=min_island_size,
        connectivity=connectivity,
        max_workers=max_workers,
        fill_nan=fill_nan,
    )

    return cleaned

compare_masks(original, cleaned)

Compare original and cleaned masks to quantify changes.

Parameters:

Name Type Description Default
original ndarray

Original segmentation mask.

required
cleaned ndarray

Cleaned segmentation mask.

required

Returns:

Name Type Description
tuple Tuple[int, int, float]

(pixels_changed, total_pixels, change_percentage) - pixels_changed: Number of pixels that changed value - total_pixels: Total number of valid pixels - change_percentage: Percentage of pixels changed

Example

import numpy as np from geoai.tools.multiclean import compare_masks original = np.random.randint(0, 3, (512, 512)) cleaned = original.copy() changed, total, pct = compare_masks(original, cleaned) print(f"Changed: {pct:.2f}%")

Source code in geoai/tools/multiclean.py
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
def compare_masks(
    original: np.ndarray,
    cleaned: np.ndarray,
) -> Tuple[int, int, float]:
    """
    Compare original and cleaned masks to quantify changes.

    Args:
        original (np.ndarray): Original segmentation mask.
        cleaned (np.ndarray): Cleaned segmentation mask.

    Returns:
        tuple: (pixels_changed, total_pixels, change_percentage)
            - pixels_changed: Number of pixels that changed value
            - total_pixels: Total number of valid pixels
            - change_percentage: Percentage of pixels changed

    Example:
        >>> import numpy as np
        >>> from geoai.tools.multiclean import compare_masks
        >>> original = np.random.randint(0, 3, (512, 512))
        >>> cleaned = original.copy()
        >>> changed, total, pct = compare_masks(original, cleaned)
        >>> print(f"Changed: {pct:.2f}%")
    """
    # Handle NaN values
    valid_mask = ~(np.isnan(original) | np.isnan(cleaned))

    # Count changed pixels
    pixels_changed = np.sum((original != cleaned) & valid_mask)
    total_pixels = np.sum(valid_mask)

    # Calculate percentage
    change_percentage = (pixels_changed / total_pixels * 100) if total_pixels > 0 else 0

    return pixels_changed, total_pixels, change_percentage