Cleaning Segmentation Results with MultiClean¶
This notebook demonstrates how to use MultiClean integration in GeoAI to post-process and clean segmentation results. MultiClean performs morphological operations to:
- Smooth edges - Reduce jagged boundaries using morphological opening
- Remove noise - Eliminate small isolated components (islands)
- Fill gaps - Replace invalid pixels with nearest valid class
MultiClean is particularly useful for cleaning up noisy predictions from deep learning segmentation models.
Installation¶
Uncomment the following line to install the required packages if needed.
# %pip install -U "geoai-py[extra]"
Import Libraries¶
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import rasterio
from rasterio.transform import from_bounds
import tempfile
import os
# Import GeoAI multiclean utilities
# You can import from geoai directly (convenience imports)
from geoai import (
clean_segmentation_mask,
clean_raster,
clean_raster_batch,
compare_masks,
)
# Or import from the tools subpackage directly
# from geoai.tools.multiclean import (
# clean_segmentation_mask,
# clean_raster,
# clean_raster_batch,
# compare_masks,
# )
1. Create a Synthetic Noisy Segmentation Mask¶
First, let's create a synthetic segmentation mask with realistic noise patterns that might occur in deep learning predictions.
def create_noisy_segmentation(size=(512, 512), num_classes=3, noise_level=0.1):
"""
Create a synthetic segmentation mask with noise.
Args:
size: Tuple of (height, width)
num_classes: Number of segmentation classes
noise_level: Fraction of pixels to add noise (0-1)
Returns:
Noisy segmentation mask
"""
np.random.seed(42)
# Create base segmentation with smooth regions
mask = np.zeros(size, dtype=np.int32)
# Create class regions
mask[: size[0] // 2, :] = 0 # Background
mask[size[0] // 2 :, : size[1] // 2] = 1 # Class 1
mask[size[0] // 2 :, size[1] // 2 :] = 2 # Class 2
# Add noise - small random islands
num_noise_pixels = int(size[0] * size[1] * noise_level)
noise_y = np.random.randint(0, size[0], num_noise_pixels)
noise_x = np.random.randint(0, size[1], num_noise_pixels)
noise_classes = np.random.randint(0, num_classes, num_noise_pixels)
mask[noise_y, noise_x] = noise_classes
# Add some edge roughness by randomly changing boundary pixels
from scipy.ndimage import binary_erosion, binary_dilation
for class_id in range(num_classes):
class_mask = mask == class_id
# Find edges
eroded = binary_erosion(class_mask)
edges = class_mask & ~eroded
# Randomly toggle some edge pixels
edge_coords = np.where(edges)
if len(edge_coords[0]) > 0:
num_toggle = int(len(edge_coords[0]) * 0.3)
toggle_idx = np.random.choice(
len(edge_coords[0]), num_toggle, replace=False
)
toggle_y = edge_coords[0][toggle_idx]
toggle_x = edge_coords[1][toggle_idx]
mask[toggle_y, toggle_x] = (mask[toggle_y, toggle_x] + 1) % num_classes
return mask
# Create noisy mask
noisy_mask = create_noisy_segmentation(size=(512, 512), num_classes=3, noise_level=0.05)
print(f"Created noisy mask with shape: {noisy_mask.shape}")
print(f"Classes: {np.unique(noisy_mask)}")
2. Visualize the Noisy Mask¶
Let's visualize the noisy segmentation mask.
# Create color map for visualization
colors = ["#1f77b4", "#ff7f0e", "#2ca02c"] # Blue, Orange, Green
cmap = ListedColormap(colors)
plt.figure(figsize=(10, 10))
plt.imshow(noisy_mask, cmap=cmap, interpolation="nearest")
plt.title("Noisy Segmentation Mask", fontsize=16)
plt.colorbar(label="Class", ticks=[0, 1, 2])
plt.xlabel("X")
plt.ylabel("Y")
plt.tight_layout()
plt.show()
3. Clean the Segmentation Mask¶
Now let's apply MultiClean to remove noise and smooth edges.
# Apply MultiClean
cleaned_mask = clean_segmentation_mask(
noisy_mask,
class_values=[0, 1, 2], # Classes to process
smooth_edge_size=3, # Kernel size for edge smoothing
min_island_size=100, # Remove islands smaller than 100 pixels
connectivity=8, # Use 8-connectivity (includes diagonals)
fill_nan=False, # Don't fill NaN values (we don't have any)
)
print(f"Cleaned mask shape: {cleaned_mask.shape}")
print(f"Classes: {np.unique(cleaned_mask)}")
4. Compare Before and After¶
Let's visualize the noisy and cleaned masks side by side.
fig, axes = plt.subplots(1, 2, figsize=(20, 10))
# Noisy mask
im1 = axes[0].imshow(noisy_mask, cmap=cmap, interpolation="nearest")
axes[0].set_title("Before Cleaning (Noisy)", fontsize=16)
axes[0].set_xlabel("X")
axes[0].set_ylabel("Y")
plt.colorbar(im1, ax=axes[0], label="Class", ticks=[0, 1, 2])
# Cleaned mask
im2 = axes[1].imshow(cleaned_mask, cmap=cmap, interpolation="nearest")
axes[1].set_title("After Cleaning (Smooth)", fontsize=16)
axes[1].set_xlabel("X")
axes[1].set_ylabel("Y")
plt.colorbar(im2, ax=axes[1], label="Class", ticks=[0, 1, 2])
plt.tight_layout()
plt.show()
5. Quantify the Changes¶
Use the compare_masks function to quantify how much the mask changed.
pixels_changed, total_pixels, change_percentage = compare_masks(
noisy_mask, cleaned_mask
)
print("Cleaning Statistics:")
print(f" Total pixels: {total_pixels:,}")
print(f" Pixels changed: {pixels_changed:,}")
print(f" Change percentage: {change_percentage:.2f}%")
6. Zoom In on a Region¶
Let's zoom in to see the edge smoothing and noise removal in detail.
# Select a region to zoom in
y_start, y_end = 200, 350
x_start, x_end = 200, 350
fig, axes = plt.subplots(1, 2, figsize=(16, 8))
# Zoomed noisy region
im1 = axes[0].imshow(
noisy_mask[y_start:y_end, x_start:x_end], cmap=cmap, interpolation="nearest"
)
axes[0].set_title("Before Cleaning (Zoomed)", fontsize=14)
axes[0].set_xlabel("X")
axes[0].set_ylabel("Y")
plt.colorbar(im1, ax=axes[0], label="Class", ticks=[0, 1, 2])
# Zoomed cleaned region
im2 = axes[1].imshow(
cleaned_mask[y_start:y_end, x_start:x_end], cmap=cmap, interpolation="nearest"
)
axes[1].set_title("After Cleaning (Zoomed)", fontsize=14)
axes[1].set_xlabel("X")
axes[1].set_ylabel("Y")
plt.colorbar(im2, ax=axes[1], label="Class", ticks=[0, 1, 2])
plt.tight_layout()
plt.show()
7. Experiment with Different Parameters¶
Let's see how different cleaning parameters affect the results.
# Create masks with different parameters
params = [
{"smooth_edge_size": 0, "min_island_size": 0, "title": "No Cleaning"},
{"smooth_edge_size": 0, "min_island_size": 100, "title": "Island Removal Only"},
{"smooth_edge_size": 3, "min_island_size": 0, "title": "Edge Smoothing Only"},
{"smooth_edge_size": 3, "min_island_size": 100, "title": "Full Cleaning"},
]
fig, axes = plt.subplots(2, 2, figsize=(16, 16))
axes = axes.flatten()
for i, param in enumerate(params):
if i == 0:
# No cleaning - just show original
mask = noisy_mask
else:
# Apply cleaning with specified parameters
mask = clean_segmentation_mask(
noisy_mask,
class_values=[0, 1, 2],
smooth_edge_size=param["smooth_edge_size"],
min_island_size=param["min_island_size"],
connectivity=8,
)
im = axes[i].imshow(mask, cmap=cmap, interpolation="nearest")
axes[i].set_title(param["title"], fontsize=14)
axes[i].set_xlabel("X")
axes[i].set_ylabel("Y")
plt.colorbar(im, ax=axes[i], label="Class", ticks=[0, 1, 2])
plt.tight_layout()
plt.show()
8. Working with GeoTIFF Files¶
MultiClean can also process GeoTIFF files directly while preserving geospatial metadata.
# Create a temporary directory for our test files
tmpdir = tempfile.mkdtemp()
print(f"Working directory: {tmpdir}")
# Save noisy mask as GeoTIFF
input_tif = os.path.join(tmpdir, "noisy_segmentation.tif")
output_tif = os.path.join(tmpdir, "cleaned_segmentation.tif")
# Create a simple transform (geographic coordinates)
transform = from_bounds(
west=-120.0,
south=35.0,
east=-119.0,
north=36.0,
width=noisy_mask.shape[1],
height=noisy_mask.shape[0],
)
# Write noisy mask to GeoTIFF
with rasterio.open(
input_tif,
"w",
driver="GTiff",
height=noisy_mask.shape[0],
width=noisy_mask.shape[1],
count=1,
dtype=noisy_mask.dtype,
crs="EPSG:4326",
transform=transform,
compress="lzw",
) as dst:
dst.write(noisy_mask, 1)
print(f"Saved noisy mask to: {input_tif}")
# Clean the GeoTIFF
clean_raster(
input_path=input_tif,
output_path=output_tif,
class_values=[0, 1, 2],
smooth_edge_size=3,
min_island_size=100,
connectivity=8,
)
print(f"Cleaned raster saved to: {output_tif}")
# Verify the output preserves geospatial metadata
with rasterio.open(input_tif) as src_in:
print("Input metadata:")
print(f" CRS: {src_in.crs}")
print(f" Transform: {src_in.transform}")
print(f" Bounds: {src_in.bounds}")
print()
with rasterio.open(output_tif) as src_out:
print("Output metadata:")
print(f" CRS: {src_out.crs}")
print(f" Transform: {src_out.transform}")
print(f" Bounds: {src_out.bounds}")
# Read cleaned data
cleaned_from_file = src_out.read(1)
print("\n✓ Geospatial metadata preserved!")
9. Batch Processing Multiple Files¶
You can process multiple segmentation files at once using clean_raster_batch.
# Create multiple test files
input_files = []
for i in range(3):
# Create different noisy masks
test_mask = create_noisy_segmentation(
size=(256, 256), num_classes=3, noise_level=0.05 + i * 0.02
)
# Save to file
filepath = os.path.join(tmpdir, f"test_mask_{i}.tif")
with rasterio.open(
filepath,
"w",
driver="GTiff",
height=test_mask.shape[0],
width=test_mask.shape[1],
count=1,
dtype=test_mask.dtype,
crs="EPSG:4326",
transform=from_bounds(-120, 35, -119, 36, 256, 256),
) as dst:
dst.write(test_mask, 1)
input_files.append(filepath)
print(f"Created {len(input_files)} test files")
# Batch clean all files
output_dir = os.path.join(tmpdir, "batch_cleaned")
output_files = clean_raster_batch(
input_paths=input_files,
output_dir=output_dir,
class_values=[0, 1, 2],
smooth_edge_size=2,
min_island_size=50,
connectivity=8,
suffix="_cleaned",
verbose=True,
)
print(f"\nProcessed {len(output_files)} files")
print("Output files:")
for f in output_files:
print(f" - {os.path.basename(f)}")
10. Integration with Segmentation Workflows¶
MultiClean is designed to be used as a post-processing step after semantic segmentation. Here's an example workflow:
def segment_and_clean_workflow(image_path, output_path):
"""
Example workflow: Segmentation + Cleaning
In a real application, this would:
1. Load an image
2. Run semantic segmentation model (e.g., UNet, DeepLab)
3. Get raw predictions (often noisy)
4. Apply MultiClean to smooth and denoise
5. Save final result
"""
# For this example, we'll use our synthetic data
# In practice, you would:
# - Load the image with rasterio or PIL
# - Run your trained segmentation model
# - Get the prediction mask
# Simulate noisy model predictions
raw_predictions = create_noisy_segmentation(
size=(512, 512), num_classes=3, noise_level=0.08
)
# Apply MultiClean post-processing
cleaned_predictions = clean_segmentation_mask(
raw_predictions,
class_values=[0, 1, 2],
smooth_edge_size=3,
min_island_size=100,
connectivity=8,
)
return raw_predictions, cleaned_predictions
# Run the workflow
raw, cleaned = segment_and_clean_workflow(None, None)
# Compare
fig, axes = plt.subplots(1, 2, figsize=(16, 8))
axes[0].imshow(raw, cmap=cmap, interpolation="nearest")
axes[0].set_title("Raw Model Predictions", fontsize=14)
axes[0].axis("off")
axes[1].imshow(cleaned, cmap=cmap, interpolation="nearest")
axes[1].set_title("After MultiClean Post-Processing", fontsize=14)
axes[1].axis("off")
plt.tight_layout()
plt.show()
# Quantify improvement
changed, total, pct = compare_masks(raw, cleaned)
print(f"\nPost-processing changed {pct:.2f}% of pixels")
11. Best Practices and Tips¶
Choosing Parameters¶
- smooth_edge_size: Start with 2-3 pixels. Larger values create smoother boundaries but may over-smooth fine details.
- min_island_size: Depends on your minimum object size. Set to the smallest valid object area in pixels.
- connectivity: Use 8 for natural objects (smoother results), 4 for grid-aligned objects.
- fill_nan: Set to True if your predictions have nodata/NaN values that should be filled.
Performance Tips¶
- Use max_workers parameter for parallel processing on multi-core systems
- Process large rasters in tiles if memory is limited
- For batch processing, use
clean_raster_batchinstead of loops
When to Use MultiClean¶
✓ After semantic segmentation to remove noise
✓ When edge boundaries are jagged or noisy
✓ To remove small false positive detections
✓ For cleaning up classification rasters
✗ Don't use if you need to preserve exact boundaries
✗ Not suitable for instance segmentation (use on semantic masks only)
Summary¶
In this notebook, we demonstrated:
- ✅ Basic usage of
clean_segmentation_mask()for numpy arrays - ✅ Visualizing before/after comparisons
- ✅ Quantifying changes with
compare_masks() - ✅ Experimenting with different cleaning parameters
- ✅ Processing GeoTIFF files with
clean_raster() - ✅ Batch processing with
clean_raster_batch() - ✅ Integration with segmentation workflows
MultiClean is a powerful tool for post-processing segmentation results, helping you achieve cleaner, more professional outputs from your deep learning models.
References¶
# Cleanup temporary files
import shutil
shutil.rmtree(tmpdir)
print("Cleaned up temporary files")