Train an Instance Segmentation Model using Mask R-CNN¶
This notebook demonstrates how to train instance segmentation models for object detection (e.g., building detection) using Mask R-CNN. Unlike semantic segmentation, instance segmentation can distinguish between individual objects of the same class, providing separate masks for each instance.
Install packages¶
To use the new functionality, ensure the required packages are installed.
# %pip install geoai-py
Import libraries¶
import geoai
Download sample data¶
We'll use the same dataset as the semantic segmentation example for consistency.
train_raster_url = (
"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_rgb_train.tif"
)
train_vector_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_train_buildings.geojson"
test_raster_url = (
"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_test.tif"
)
train_raster_path = geoai.download_file(train_raster_url)
train_vector_path = geoai.download_file(train_vector_url)
test_raster_path = geoai.download_file(test_raster_url)
Visualize sample data¶
geoai.get_raster_info(train_raster_path)
geoai.view_vector_interactive(train_vector_path, tiles=train_raster_path)
geoai.view_raster(test_raster_path)
Create training data¶
We'll create training tiles from the imagery and vector labels.
out_folder = "buildings_instance"
tiles = geoai.export_geotiff_tiles(
in_raster=train_raster_path,
out_folder=out_folder,
in_class_data=train_vector_path,
tile_size=512,
stride=256,
buffer_radius=0,
)
Train instance segmentation model¶
Now we'll train an instance segmentation model using the train_instance_segmentation_model
function. This function uses Mask R-CNN, which is specifically designed for instance segmentation tasks.
Key Differences from Semantic Segmentation:¶
- Instance Segmentation: Identifies and segments each individual object separately (e.g., distinguishes Building A from Building B)
- Semantic Segmentation: Only classifies pixels into categories (all buildings are treated as one class)
Model Architecture:¶
Mask R-CNN combines:
- Faster R-CNN for object detection (bounding boxes)
- FCN for pixel-level segmentation (masks)
- ResNet-50 + FPN backbone for feature extraction
Training Parameters:¶
num_classes
: Number of classes including background (default: 2 for background + buildings)num_channels
: Number of input channels (3 for RGB, 4 for RGBN)batch_size
: Typically smaller than semantic segmentation (4-8) due to model complexitynum_epochs
: Number of training epochslearning_rate
: Initial learning rate (default: 0.005)val_split
: Fraction of data for validation (default: 0.2)
# Train Mask R-CNN model
geoai.train_instance_segmentation_model(
images_dir=f"{out_folder}/images",
labels_dir=f"{out_folder}/labels",
output_dir=f"{out_folder}/instance_models",
num_classes=2, # background + building
num_channels=3,
batch_size=4,
num_epochs=10,
learning_rate=0.005,
val_split=0.2,
visualize=True,
verbose=True,
)
Run inference¶
Now we'll use the trained model to make predictions on the test image. The instance_segmentation
function performs sliding window inference to handle large images.
# Define paths
masks_path = "naip_test_instance_prediction.tif"
model_path = f"{out_folder}/instance_models/best_model.pth"
# Run instance segmentation inference
geoai.instance_segmentation(
input_path=test_raster_path,
output_path=masks_path,
model_path=model_path,
num_classes=2,
num_channels=3,
window_size=512,
overlap=256,
confidence_threshold=0.5,
batch_size=4,
)
Adjust confidence threshold (optional)¶
You can control which predictions to keep by adjusting the confidence threshold. Higher values (e.g., 0.7) will be more conservative and only keep high-confidence detections, while lower values (e.g., 0.3) will be more permissive.
# Run inference with higher confidence threshold
masks_path_high_conf = "naip_test_instance_prediction_high_conf.tif"
geoai.instance_segmentation(
input_path=test_raster_path,
output_path=masks_path_high_conf,
model_path=model_path,
num_classes=2,
num_channels=3,
window_size=512,
overlap=256,
confidence_threshold=0.7, # Higher threshold for more confident predictions
batch_size=4,
)
Vectorize masks¶
Convert the predicted mask to vector format for better visualization and analysis.
output_vector_path = "naip_test_instance_prediction.geojson"
gdf = geoai.orthogonalize(masks_path, output_vector_path, epsilon=2)
Add geometric properties¶
Calculate area, perimeter, and other geometric properties for each detected building.
gdf_props = geoai.add_geometric_properties(gdf, area_unit="m2", length_unit="m")
Visualize results¶
geoai.view_raster(
masks_path, nodata=0, cmap="tab20", basemap=test_raster_path, backend="ipyleaflet"
)
geoai.view_vector_interactive(gdf_props, column="area_m2", tiles=test_raster_path)
Filter by area¶
Filter out small detections that might be noise or artifacts.
gdf_filtered = gdf_props[(gdf_props["area_m2"] > 50)]
geoai.view_vector_interactive(gdf_filtered, column="area_m2", tiles=test_raster_path)
Compare predictions with imagery¶
geoai.create_split_map(
left_layer=gdf_filtered,
right_layer=test_raster_path,
left_args={"style": {"color": "red", "fillOpacity": 0.2}},
basemap=test_raster_path,
)
Model Performance Analysis¶
Let's examine the training curves and model performance:
geoai.plot_performance_metrics(
history_path=f"{out_folder}/instance_models/training_history.pth",
figsize=(15, 5),
verbose=True,
)
Instance vs Semantic Segmentation Comparison¶
When to use Instance Segmentation:¶
- Individual object analysis: When you need to count, measure, or analyze individual objects
- Overlapping objects: When objects of the same class may overlap or touch
- Object tracking: When tracking individual objects across frames or images
- Spatial relationships: When analyzing relationships between individual objects
When to use Semantic Segmentation:¶
- Area coverage: When you only need to know what percentage of an image contains a certain class
- Land cover mapping: For continuous features like vegetation, water, roads
- Simpler models: When you want faster training and inference
- Pixel-level classification: When object boundaries are less important
Model Outputs:¶
Instance Segmentation (Mask R-CNN):
- Bounding boxes for each object
- Confidence scores for each detection
- Binary mask for each individual object
- Class label for each object
Semantic Segmentation:
- Single multi-class mask covering the entire image
- Probability map (optional)
- No distinction between individual objects
Performance Considerations:¶
Aspect | Instance Segmentation | Semantic Segmentation |
---|---|---|
Training Time | Slower (more complex model) | Faster |
Inference Time | Slower | Faster |
Memory Usage | Higher | Lower |
Accuracy | Better for distinct objects | Better for continuous classes |
Typical Batch Size | 2-8 | 8-32 |
Metrics:¶
Instance Segmentation Metrics:
- AP (Average Precision): Precision at different IoU thresholds
- AP@0.5: Average Precision at IoU threshold of 0.5
- AP@0.75: Average Precision at IoU threshold of 0.75
- AR (Average Recall): Recall averaged across IoU thresholds
Semantic Segmentation Metrics:
- IoU (Intersection over Union): Overlap between prediction and ground truth
- Dice Score: Similar to IoU but more sensitive to small objects
- Pixel Accuracy: Percentage of correctly classified pixels
Batch Processing (Optional)¶
If you have multiple images to process, you can use the batch inference function:
# Uncomment to process multiple images
# geoai.instance_segmentation_batch(
# input_dir="path/to/input/images",
# output_dir="path/to/output/masks",
# model_path=model_path,
# num_classes=2,
# num_channels=3,
# window_size=512,
# overlap=256,
# confidence_threshold=0.5,
# batch_size=4,
# )
Advanced: Multi-channel Input (RGBN)¶
If your imagery includes a near-infrared (NIR) band, you can train with 4 channels:
# Example for 4-channel (RGBN) imagery
# geoai.train_instance_segmentation_model(
# images_dir=f"{out_folder}/images",
# labels_dir=f"{out_folder}/labels",
# output_dir=f"{out_folder}/instance_models_rgbn",
# num_classes=2,
# num_channels=4, # RGB + NIR
# batch_size=4,
# num_epochs=10,
# learning_rate=0.005,
# val_split=0.2,
# verbose=True,
# )