Train an Instance Segmentation Model using Mask R-CNN¶

This notebook demonstrates how to train instance segmentation models for object detection (e.g., building detection) using Mask R-CNN. Unlike semantic segmentation, instance segmentation can distinguish between individual objects of the same class, providing separate masks for each instance.

Install packages¶

To use the new functionality, ensure the required packages are installed.

In [ ]:

Copied!

# %pip install geoai-py
# %pip install geoai-py

Import libraries¶

In [ ]:

Copied!

import geoai
import geoai

Download sample data¶

We'll use the same dataset as the semantic segmentation example for consistency.

In [ ]:

Copied!





train_raster_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_rgb_train.tif"
)
train_vector_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_train_buildings.geojson"
test_raster_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_test.tif"
)
train_raster_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_rgb_train.tif"
)
train_vector_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_train_buildings.geojson"
test_raster_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_test.tif"
)

In [ ]:

Copied!

train_raster_path = geoai.download_file(train_raster_url)
train_vector_path = geoai.download_file(train_vector_url)
test_raster_path = geoai.download_file(test_raster_url)
train_raster_path = geoai.download_file(train_raster_url)
train_vector_path = geoai.download_file(train_vector_url)
test_raster_path = geoai.download_file(test_raster_url)

Visualize sample data¶

In [ ]:

Copied!

geoai.get_raster_info(train_raster_path)
geoai.get_raster_info(train_raster_path)

In [ ]:

Copied!





style_dict = {
    "color": "#ff0000",
    "weight": 2,
    "opacity": 1,
    # "fill": True,
    # "fillColor": "#ffffff",
    "fillOpacity": 0,
    # "dashArray": "9"
    # "clickable": True,
}
style_function = lambda x: style_dict

geoai.view_vector_interactive(
    train_vector_path, tiles=train_raster_path, style_function=style_function
)
style_dict = {
    "color": "#ff0000",
    "weight": 2,
    "opacity": 1,
    # "fill": True,
    # "fillColor": "#ffffff",
    "fillOpacity": 0,
    # "dashArray": "9"
    # "clickable": True,
}
style_function = lambda x: style_dict

geoai.view_vector_interactive(
    train_vector_path, tiles=train_raster_path, style_function=style_function
)

In [ ]:

Copied!

geoai.view_raster(test_raster_path)
geoai.view_raster(test_raster_path)

Create training data¶

We'll create training tiles from the imagery and vector labels.

In [ ]:

Copied!





out_folder = "buildings_instance"
tiles = geoai.export_geotiff_tiles(
    in_raster=train_raster_path,
    out_folder=out_folder,
    in_class_data=train_vector_path,
    tile_size=512,
    stride=256,
    buffer_radius=0,
)
out_folder = "buildings_instance"
tiles = geoai.export_geotiff_tiles(
    in_raster=train_raster_path,
    out_folder=out_folder,
    in_class_data=train_vector_path,
    tile_size=512,
    stride=256,
    buffer_radius=0,
)

Train instance segmentation model¶

Now we'll train an instance segmentation model using the train_instance_segmentation_model function. This function uses Mask R-CNN, which is specifically designed for instance segmentation tasks.

Key Differences from Semantic Segmentation:¶

Instance Segmentation: Identifies and segments each individual object separately (e.g., distinguishes Building A from Building B)
Semantic Segmentation: Only classifies pixels into categories (all buildings are treated as one class)

Model Architecture:¶

Mask R-CNN combines:

Faster R-CNN for object detection (bounding boxes)
FCN for pixel-level segmentation (masks)
ResNet-50 + FPN backbone for feature extraction

Training Parameters:¶

num_classes: Number of classes including background (default: 2 for background + buildings)
num_channels: Number of input channels (3 for RGB, 4 for RGBN)
batch_size: Typically smaller than semantic segmentation (4-8) due to model complexity
num_epochs: Number of training epochs
learning_rate: Initial learning rate (default: 0.005)
val_split: Fraction of data for validation (default: 0.2)

In [ ]:

Copied!





# Train Mask R-CNN model
geoai.train_instance_segmentation_model(
    images_dir=f"{out_folder}/images",
    labels_dir=f"{out_folder}/labels",
    output_dir=f"{out_folder}/instance_models",
    num_classes=2,  # background + building
    num_channels=3,
    batch_size=4,
    num_epochs=10,
    learning_rate=0.005,
    val_split=0.2,
    visualize=True,
    verbose=True,
)
# Train Mask R-CNN model
geoai.train_instance_segmentation_model(
    images_dir=f"{out_folder}/images",
    labels_dir=f"{out_folder}/labels",
    output_dir=f"{out_folder}/instance_models",
    num_classes=2,  # background + building
    num_channels=3,
    batch_size=4,
    num_epochs=10,
    learning_rate=0.005,
    val_split=0.2,
    visualize=True,
    verbose=True,
)

Run inference¶

Now we'll use the trained model to make predictions on the test image. The instance_segmentation function performs sliding window inference to handle large images.

In [ ]:

Copied!

# Define paths
masks_path = "naip_test_instance_prediction.tif"
model_path = f"{out_folder}/instance_models/best_model.pth"
# Define paths
masks_path = "naip_test_instance_prediction.tif"
model_path = f"{out_folder}/instance_models/best_model.pth"

In [ ]:

Copied!





# Run instance segmentation inference
geoai.instance_segmentation(
    input_path=test_raster_path,
    output_path=masks_path,
    model_path=model_path,
    num_classes=2,
    num_channels=3,
    window_size=512,
    overlap=256,
    confidence_threshold=0.5,
    batch_size=4,
)
# Run instance segmentation inference
geoai.instance_segmentation(
    input_path=test_raster_path,
    output_path=masks_path,
    model_path=model_path,
    num_classes=2,
    num_channels=3,
    window_size=512,
    overlap=256,
    confidence_threshold=0.5,
    batch_size=4,
)

Adjust confidence threshold (optional)¶

You can control which predictions to keep by adjusting the confidence threshold. Higher values (e.g., 0.7) will be more conservative and only keep high-confidence detections, while lower values (e.g., 0.3) will be more permissive.

In [ ]:

Copied!





# Run inference with higher confidence threshold
masks_path_high_conf = "naip_test_instance_prediction_high_conf.tif"

geoai.instance_segmentation(
    input_path=test_raster_path,
    output_path=masks_path_high_conf,
    model_path=model_path,
    num_classes=2,
    num_channels=3,
    window_size=512,
    overlap=256,
    confidence_threshold=0.7,  # Higher threshold for more confident predictions
    batch_size=4,
)
# Run inference with higher confidence threshold
masks_path_high_conf = "naip_test_instance_prediction_high_conf.tif"

geoai.instance_segmentation(
    input_path=test_raster_path,
    output_path=masks_path_high_conf,
    model_path=model_path,
    num_classes=2,
    num_channels=3,
    window_size=512,
    overlap=256,
    confidence_threshold=0.7,  # Higher threshold for more confident predictions
    batch_size=4,
)

Vectorize masks¶

Convert the predicted mask to vector format for better visualization and analysis.

In [ ]:

Copied!

output_vector_path = "naip_test_instance_prediction.geojson"
gdf = geoai.orthogonalize(masks_path, output_vector_path, epsilon=2)
output_vector_path = "naip_test_instance_prediction.geojson"
gdf = geoai.orthogonalize(masks_path, output_vector_path, epsilon=2)

Add geometric properties¶

Calculate area, perimeter, and other geometric properties for each detected building.

In [ ]:

Copied!

gdf_props = geoai.add_geometric_properties(gdf, area_unit="m2", length_unit="m")
gdf_props = geoai.add_geometric_properties(gdf, area_unit="m2", length_unit="m")

Visualize results¶

In [ ]:

Copied!

geoai.view_raster(
    masks_path, nodata=0, cmap="tab20", basemap=test_raster_path, backend="ipyleaflet"
)
geoai.view_raster(
    masks_path, nodata=0, cmap="tab20", basemap=test_raster_path, backend="ipyleaflet"
)

In [ ]:

Copied!

geoai.view_vector_interactive(gdf_props, column="area_m2", tiles=test_raster_path)
geoai.view_vector_interactive(gdf_props, column="area_m2", tiles=test_raster_path)

Filter by area¶

Filter out small detections that might be noise or artifacts.

In [ ]:

Copied!

gdf_filtered = gdf_props[(gdf_props["area_m2"] > 50)]
gdf_filtered = gdf_props[(gdf_props["area_m2"] > 50)]

In [ ]:

Copied!

geoai.view_vector_interactive(gdf_filtered, column="area_m2", tiles=test_raster_path)
geoai.view_vector_interactive(gdf_filtered, column="area_m2", tiles=test_raster_path)

Compare predictions with imagery¶

In [ ]:

Copied!





geoai.create_split_map(
    left_layer=gdf_filtered,
    right_layer=test_raster_path,
    left_args={"style": {"color": "red", "fillOpacity": 0.2}},
    basemap=test_raster_path,
)
geoai.create_split_map(
    left_layer=gdf_filtered,
    right_layer=test_raster_path,
    left_args={"style": {"color": "red", "fillOpacity": 0.2}},
    basemap=test_raster_path,
)

Model Performance Analysis¶

Let's examine the training curves and model performance:

In [ ]:

Copied!





geoai.plot_performance_metrics(
    history_path=f"{out_folder}/instance_models/training_history.pth",
    figsize=(15, 5),
    verbose=True,
)
geoai.plot_performance_metrics(
    history_path=f"{out_folder}/instance_models/training_history.pth",
    figsize=(15, 5),
    verbose=True,
)

Instance vs Semantic Segmentation Comparison¶

When to use Instance Segmentation:¶

Individual object analysis: When you need to count, measure, or analyze individual objects
Overlapping objects: When objects of the same class may overlap or touch
Object tracking: When tracking individual objects across frames or images
Spatial relationships: When analyzing relationships between individual objects

When to use Semantic Segmentation:¶

Area coverage: When you only need to know what percentage of an image contains a certain class
Land cover mapping: For continuous features like vegetation, water, roads
Simpler models: When you want faster training and inference
Pixel-level classification: When object boundaries are less important

Model Outputs:¶

Instance Segmentation (Mask R-CNN):

Bounding boxes for each object
Confidence scores for each detection
Binary mask for each individual object
Class label for each object

Semantic Segmentation:

Single multi-class mask covering the entire image
Probability map (optional)
No distinction between individual objects

Performance Considerations:¶

Aspect	Instance Segmentation	Semantic Segmentation
Training Time	Slower (more complex model)	Faster
Inference Time	Slower	Faster
Memory Usage	Higher	Lower
Accuracy	Better for distinct objects	Better for continuous classes
Typical Batch Size	2-8	8-32

Metrics:¶

Instance Segmentation Metrics:

AP (Average Precision): Precision at different IoU thresholds
AP@0.5: Average Precision at IoU threshold of 0.5
AP@0.75: Average Precision at IoU threshold of 0.75
AR (Average Recall): Recall averaged across IoU thresholds

Semantic Segmentation Metrics:

IoU (Intersection over Union): Overlap between prediction and ground truth
Dice Score: Similar to IoU but more sensitive to small objects
Pixel Accuracy: Percentage of correctly classified pixels

Batch Processing (Optional)¶

If you have multiple images to process, you can use the batch inference function:

In [ ]:

Copied!





# Uncomment to process multiple images
# geoai.instance_segmentation_batch(
#     input_dir="path/to/input/images",
#     output_dir="path/to/output/masks",
#     model_path=model_path,
#     num_classes=2,
#     num_channels=3,
#     window_size=512,
#     overlap=256,
#     confidence_threshold=0.5,
#     batch_size=4,
# )
# Uncomment to process multiple images
# geoai.instance_segmentation_batch(
#     input_dir="path/to/input/images",
#     output_dir="path/to/output/masks",
#     model_path=model_path,
#     num_classes=2,
#     num_channels=3,
#     window_size=512,
#     overlap=256,
#     confidence_threshold=0.5,
#     batch_size=4,
# )

Advanced: Multi-channel Input (RGBN)¶

If your imagery includes a near-infrared (NIR) band, you can train with 4 channels:

In [ ]:

Copied!





# Example for 4-channel (RGBN) imagery
# geoai.train_instance_segmentation_model(
#     images_dir=f"{out_folder}/images",
#     labels_dir=f"{out_folder}/labels",
#     output_dir=f"{out_folder}/instance_models_rgbn",
#     num_classes=2,
#     num_channels=4,  # RGB + NIR
#     batch_size=4,
#     num_epochs=10,
#     learning_rate=0.005,
#     val_split=0.2,
#     verbose=True,
# )
# Example for 4-channel (RGBN) imagery
# geoai.train_instance_segmentation_model(
#     images_dir=f"{out_folder}/images",
#     labels_dir=f"{out_folder}/labels",
#     output_dir=f"{out_folder}/instance_models_rgbn",
#     num_classes=2,
#     num_channels=4,  # RGB + NIR
#     batch_size=4,
#     num_epochs=10,
#     learning_rate=0.005,
#     val_split=0.2,
#     verbose=True,
# )