Train a Semantic Segmentation Model using Segmentation-Models-PyTorch¶

This notebook demonstrates how to train semantic segmentation models for object detection (e.g., building detection) using the segmentation-models-pytorch library. Unlike instance segmentation with Mask R-CNN, this approach treats the task as pixel-level binary classification.

Install packages¶

To use the new functionality, ensure the required packages are installed.

In [ ]:

Copied!

# %pip install geoai-py
# %pip install geoai-py

Import libraries¶

In [ ]:

Copied!

import geoai
import geoai

Download sample data¶

We'll use the same dataset as the Mask R-CNN example for consistency.

In [ ]:

Copied!





train_raster_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_rgb_train.tif"
)
train_vector_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_train_buildings.geojson"
test_raster_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_test.tif"
)
train_raster_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_rgb_train.tif"
)
train_vector_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_train_buildings.geojson"
test_raster_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_test.tif"
)

In [ ]:

Copied!

train_raster_path = geoai.download_file(train_raster_url)
train_vector_path = geoai.download_file(train_vector_url)
test_raster_path = geoai.download_file(test_raster_url)
train_raster_path = geoai.download_file(train_raster_url)
train_vector_path = geoai.download_file(train_vector_url)
test_raster_path = geoai.download_file(test_raster_url)

Visualize sample data¶

In [ ]:

Copied!

geoai.get_raster_info(train_raster_path)
geoai.get_raster_info(train_raster_path)

In [ ]:

Copied!

geoai.view_vector_interactive(train_vector_path, tiles=train_raster_url)
geoai.view_vector_interactive(train_vector_path, tiles=train_raster_url)

In [ ]:

Copied!

geoai.view_raster(test_raster_url)
geoai.view_raster(test_raster_url)

Create training data¶

We'll create the same training tiles as before.

In [ ]:

Copied!





out_folder = "buildings"
tiles = geoai.export_geotiff_tiles(
    in_raster=train_raster_path,
    out_folder=out_folder,
    in_class_data=train_vector_path,
    tile_size=512,
    stride=256,
    buffer_radius=0,
)
out_folder = "buildings"
tiles = geoai.export_geotiff_tiles(
    in_raster=train_raster_path,
    out_folder=out_folder,
    in_class_data=train_vector_path,
    tile_size=512,
    stride=256,
    buffer_radius=0,
)

Train semantic segmentation model¶

Now we'll train a semantic segmentation model using the new train_segmentation_model function. This function supports various architectures from segmentation-models-pytorch:

Architectures: unet, unetplusplus deeplabv3, deeplabv3plus, fpn, pspnet, linknet, manet
Encoders: resnet34, resnet50, efficientnet-b0, mobilenet_v2, etc.

For more details, please refer to the segmentation-models-pytorch documentation.

Example 1: U-Net with ResNet34 encoder¶

In [ ]:

Copied!





# Train U-Net model
geoai.train_segmentation_model(
    images_dir=f"{out_folder}/images",
    labels_dir=f"{out_folder}/labels",
    output_dir=f"{out_folder}/unet_models",
    architecture="unet",
    encoder_name="resnet34",
    encoder_weights="imagenet",
    num_channels=3,
    num_classes=2,  # background and building
    batch_size=8,
    num_epochs=100,
    learning_rate=0.001,
    val_split=0.2,
    verbose=True,
)
# Train U-Net model
geoai.train_segmentation_model(
    images_dir=f"{out_folder}/images",
    labels_dir=f"{out_folder}/labels",
    output_dir=f"{out_folder}/unet_models",
    architecture="unet",
    encoder_name="resnet34",
    encoder_weights="imagenet",
    num_channels=3,
    num_classes=2,  # background and building
    batch_size=8,
    num_epochs=100,
    learning_rate=0.001,
    val_split=0.2,
    verbose=True,
)

Example 2: SegFormer with resnet152 encoder¶

In [ ]:

Copied!





geoai.train_segmentation_model(
    images_dir=f"{out_folder}/images",
    labels_dir=f"{out_folder}/labels",
    output_dir=f"{out_folder}/segformer_models",
    architecture="segformer",
    encoder_name="resnet152",
    encoder_weights="imagenet",
    num_channels=3,
    num_classes=2,
    batch_size=6,  # Smaller batch size for more complex model
    num_epochs=50,
    learning_rate=0.0005,
    val_split=0.2,
)
geoai.train_segmentation_model(
    images_dir=f"{out_folder}/images",
    labels_dir=f"{out_folder}/labels",
    output_dir=f"{out_folder}/segformer_models",
    architecture="segformer",
    encoder_name="resnet152",
    encoder_weights="imagenet",
    num_channels=3,
    num_classes=2,
    batch_size=6,  # Smaller batch size for more complex model
    num_epochs=50,
    learning_rate=0.0005,
    val_split=0.2,
)

Run inference¶

Now we'll use the trained model to make predictions on the test image.

In [ ]:

Copied!

# Define paths
masks_path = "naip_test_semantic_prediction.tif"
model_path = f"{out_folder}/unet_models/best_model.pth"
# Define paths
masks_path = "naip_test_semantic_prediction.tif"
model_path = f"{out_folder}/unet_models/best_model.pth"

In [ ]:

Copied!





# Run semantic segmentation inference
geoai.semantic_segmentation(
    input_path=test_raster_path,
    output_path=masks_path,
    model_path=model_path,
    architecture="unet",
    encoder_name="resnet34",
    num_channels=3,
    num_classes=2,
    window_size=512,
    overlap=256,
    batch_size=4,
)
# Run semantic segmentation inference
geoai.semantic_segmentation(
    input_path=test_raster_path,
    output_path=masks_path,
    model_path=model_path,
    architecture="unet",
    encoder_name="resnet34",
    num_channels=3,
    num_classes=2,
    window_size=512,
    overlap=256,
    batch_size=4,
)

Vectorize masks¶

Convert the predicted mask to vector format for better visualization and analysis.

In [ ]:

Copied!

output_vector_path = "naip_test_semantic_prediction.geojson"
gdf = geoai.orthogonalize(masks_path, output_vector_path, epsilon=2)
output_vector_path = "naip_test_semantic_prediction.geojson"
gdf = geoai.orthogonalize(masks_path, output_vector_path, epsilon=2)

Add geometric properties¶

In [ ]:

Copied!

gdf_props = geoai.add_geometric_properties(gdf, area_unit="m2", length_unit="m")
gdf_props = geoai.add_geometric_properties(gdf, area_unit="m2", length_unit="m")

Visualize results¶

In [ ]:

Copied!

geoai.view_raster(masks_path, nodata=0, basemap=test_raster_url, backend="ipyleaflet")
geoai.view_raster(masks_path, nodata=0, basemap=test_raster_url, backend="ipyleaflet")

In [ ]:

Copied!

geoai.view_vector_interactive(gdf_props, column="area_m2", tiles=test_raster_url)
geoai.view_vector_interactive(gdf_props, column="area_m2", tiles=test_raster_url)

In [ ]:

Copied!

gdf_filtered = gdf_props[(gdf_props["area_m2"] > 50)]
gdf_filtered = gdf_props[(gdf_props["area_m2"] > 50)]

In [ ]:

Copied!

geoai.view_vector_interactive(gdf_filtered, column="area_m2", tiles=test_raster_url)
geoai.view_vector_interactive(gdf_filtered, column="area_m2", tiles=test_raster_url)

In [ ]:

Copied!





geoai.create_split_map(
    left_layer=gdf_filtered,
    right_layer=test_raster_url,
    left_args={"style": {"color": "red", "fillOpacity": 0.2}},
    basemap=test_raster_url,
)
geoai.create_split_map(
    left_layer=gdf_filtered,
    right_layer=test_raster_url,
    left_args={"style": {"color": "red", "fillOpacity": 0.2}},
    basemap=test_raster_url,
)

Model Performance Analysis¶

Let's examine the training curves and model performance:

In [ ]:

Copied!





geoai.plot_performance_metrics(
    history_path=f"{out_folder}/unet_models/training_history.pth",
    figsize=(15, 5),
    verbose=True,
)
geoai.plot_performance_metrics(
    history_path=f"{out_folder}/unet_models/training_history.pth",
    figsize=(15, 5),
    verbose=True,
)

Performance Metrics¶

IoU (Intersection over Union) and Dice score are both popular metrics used to evaluate the similarity between two binary masks—often in image segmentation tasks. While they are related, they are not the same.

🔸 Definitions¶

IoU (Jaccard Index)¶

$$ \text{IoU} = \frac{|A \cap B|}{|A \cup B|} $$

Measures the overlap between predicted region $A$ and ground truth region $B$ relative to their union.
Ranges from 0 (no overlap) to 1 (perfect overlap).

Dice Score (F1 Score for Sets)¶

$$ \text{Dice} = \frac{2|A \cap B|}{|A| + |B|} $$

Measures the overlap between $A$ and $B$, but gives more weight to the intersection.
Also ranges from 0 to 1.

🔸 Key Differences¶

Metric	Formula	Penalizes	Sensitivity
IoU	$\frac{TP}{TP + FP + FN}$	FP and FN equally	Less sensitive to small objects
Dice	$\frac{2TP}{2TP + FP + FN}$	Less harsh on small mismatches	More sensitive to small overlaps

TP: True Positive, FP: False Positive, FN: False Negative

🔸 Relationship¶

Dice and IoU are mathematically related:

$$ \text{Dice} = \frac{2 \cdot \text{IoU}}{1 + \text{IoU}} \quad \text{or} \quad \text{IoU} = \frac{\text{Dice}}{2 - \text{Dice}} $$

🔸 When to Use What¶

IoU: Common in object detection and semantic segmentation benchmarks (e.g., COCO, Pascal VOC).
Dice: Preferred in medical imaging and when class imbalance is an issue, due to its sensitivity to small regions.