Building Detection from Aerial Imagery and LiDAR Data¶

This notebook demonstrates how to train semantic segmentation models for building detection from NAIP aerial imagery and height above ground (HAG) data derived from LiDAR data with just a few lines of code. You can adapt this notebook to segment other objects of interest (such as trees, cars, etc.) from aerial imagery and LiDAR data.

Install packages¶

To use the new functionality, ensure the required packages are installed.

In [ ]:

Copied!

# %pip install geoai-py
# %pip install geoai-py

Import libraries¶

In [ ]:

Copied!

import os
import geoai
import os
import geoai

Download sample data¶

We'll use the same dataset as the Mask R-CNN example for consistency.

In [ ]:

Copied!





train_aerial_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_train_naip.tif"
train_LiDAR_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_train_hag.tif"
train_building_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_buildings_train.geojson"
test_aerial_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_test_naip.tif"
test_LiDAR_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_test_hag.tif"
train_aerial_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_train_naip.tif"
train_LiDAR_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_train_hag.tif"
train_building_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_buildings_train.geojson"
test_aerial_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_test_naip.tif"
test_LiDAR_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_test_hag.tif"

In [ ]:

Copied!





train_aerial_path = geoai.download_file(train_aerial_url)
train_LiDAR_path = geoai.download_file(train_LiDAR_url)
train_building_path = geoai.download_file(train_building_url)
test_aerial_path = geoai.download_file(test_aerial_url)
test_LiDAR_path = geoai.download_file(test_LiDAR_url)
train_aerial_path = geoai.download_file(train_aerial_url)
train_LiDAR_path = geoai.download_file(train_LiDAR_url)
train_building_path = geoai.download_file(train_building_url)
test_aerial_path = geoai.download_file(test_aerial_url)
test_LiDAR_path = geoai.download_file(test_LiDAR_url)

Visualize sample data¶

Visualize the building footprints with the aerial imagery.

In [ ]:

Copied!

os.environ["TITILER_ENDPOINT"] = "https://giswqs-titiler-endpoint.hf.space"
os.environ["TITILER_ENDPOINT"] = "https://giswqs-titiler-endpoint.hf.space"

In [ ]:

Copied!

geoai.view_vector_interactive(train_building_path, tiles=train_aerial_url)
geoai.view_vector_interactive(train_building_path, tiles=train_aerial_url)

Visualize the building footprints with the height above ground (HAG) data derived from LiDAR data.

In [ ]:

Copied!

geoai.view_vector_interactive(train_building_path, tiles=train_LiDAR_url)
geoai.view_vector_interactive(train_building_path, tiles=train_LiDAR_url)

Stack bands¶

Stack the NAIP and HAG bands into a single image.

In [ ]:

Copied!





train_raster_path = "las_vegas_train_naip_hag.tif"
geoai.stack_bands(
    input_files=[train_aerial_path, train_LiDAR_path],
    output_file=train_raster_path,
    resolution=None,  # Automatically inferred from first image
    overwrite=True,
    dtype="Byte",  # or "UInt16", "Float32"
)
train_raster_path = "las_vegas_train_naip_hag.tif"
geoai.stack_bands(
    input_files=[train_aerial_path, train_LiDAR_path],
    output_file=train_raster_path,
    resolution=None,  # Automatically inferred from first image
    overwrite=True,
    dtype="Byte",  # or "UInt16", "Float32"
)

In [ ]:

Copied!





test_raster_path = "las_vegas_test_naip_hag.tif"
geoai.stack_bands(
    input_files=[test_aerial_path, test_LiDAR_path],
    output_file=test_raster_path,
    resolution=None,  # Automatically inferred from first image
    overwrite=True,
    dtype="Byte",  # or "UInt16", "Float32"
)
test_raster_path = "las_vegas_test_naip_hag.tif"
geoai.stack_bands(
    input_files=[test_aerial_path, test_LiDAR_path],
    output_file=test_raster_path,
    resolution=None,  # Automatically inferred from first image
    overwrite=True,
    dtype="Byte",  # or "UInt16", "Float32"
)

Create training data¶

We'll create the same training tiles as before.

In [ ]:

Copied!





out_folder = "buildings"
tiles = geoai.export_geotiff_tiles(
    in_raster=train_raster_path,
    out_folder=out_folder,
    in_class_data=train_building_path,
    tile_size=512,
    stride=256,
    buffer_radius=0,
)
out_folder = "buildings"
tiles = geoai.export_geotiff_tiles(
    in_raster=train_raster_path,
    out_folder=out_folder,
    in_class_data=train_building_path,
    tile_size=512,
    stride=256,
    buffer_radius=0,
)

Train semantic segmentation model¶

Now we'll train a semantic segmentation model using the new train_segmentation_model function. This function supports various architectures from segmentation-models-pytorch:

Architectures: unet, unetplusplus deeplabv3, deeplabv3plus, fpn, pspnet, linknet, manet
Encoders: resnet34, resnet50, efficientnet-b0, mobilenet_v2, etc.

For more details, please refer to the segmentation-models-pytorch documentation.

Let's train a U-Net with ResNet34 encoder

In [ ]:

Copied!





# Train U-Net model
geoai.train_segmentation_model(
    images_dir=f"{out_folder}/images",
    labels_dir=f"{out_folder}/labels",
    output_dir=f"{out_folder}/unet_models",
    architecture="unet",
    encoder_name="resnet34",
    encoder_weights="imagenet",
    num_channels=5,
    num_classes=2,  # background and building
    batch_size=8,
    num_epochs=50,
    learning_rate=0.001,
    val_split=0.2,
    verbose=True,
)
# Train U-Net model
geoai.train_segmentation_model(
    images_dir=f"{out_folder}/images",
    labels_dir=f"{out_folder}/labels",
    output_dir=f"{out_folder}/unet_models",
    architecture="unet",
    encoder_name="resnet34",
    encoder_weights="imagenet",
    num_channels=5,
    num_classes=2,  # background and building
    batch_size=8,
    num_epochs=50,
    learning_rate=0.001,
    val_split=0.2,
    verbose=True,
)

Evaluate the model¶

In [ ]:

Copied!





geoai.plot_performance_metrics(
    history_path=f"{out_folder}/unet_models/training_history.pth",
    figsize=(15, 5),
    verbose=True,
)
geoai.plot_performance_metrics(
    history_path=f"{out_folder}/unet_models/training_history.pth",
    figsize=(15, 5),
    verbose=True,
)

Run inference¶

Now we'll use the trained model to make predictions on the test image.

In [ ]:

Copied!

# Define paths
masks_path = "building_masks.tif"
model_path = f"{out_folder}/unet_models/best_model.pth"
# Define paths
masks_path = "building_masks.tif"
model_path = f"{out_folder}/unet_models/best_model.pth"

In [ ]:

Copied!





# Run semantic segmentation inference
geoai.semantic_segmentation(
    input_path=test_raster_path,
    output_path=masks_path,
    model_path=model_path,
    architecture="unet",
    encoder_name="resnet34",
    num_channels=5,
    num_classes=2,
    window_size=512,
    overlap=256,
    batch_size=8,
)
# Run semantic segmentation inference
geoai.semantic_segmentation(
    input_path=test_raster_path,
    output_path=masks_path,
    model_path=model_path,
    architecture="unet",
    encoder_name="resnet34",
    num_channels=5,
    num_classes=2,
    window_size=512,
    overlap=256,
    batch_size=8,
)

Vectorize masks¶

Convert the predicted mask to vector format for better visualization and analysis.

In [ ]:

Copied!

output_vector_path = "building_masks.geojson"
gdf = geoai.orthogonalize(masks_path, output_vector_path, epsilon=2)
output_vector_path = "building_masks.geojson"
gdf = geoai.orthogonalize(masks_path, output_vector_path, epsilon=2)

Add geometric properties¶

In [ ]:

Copied!

gdf_props = geoai.add_geometric_properties(gdf, area_unit="m2", length_unit="m")
print(f"Number of buildings: {len(gdf_props)}")
gdf_props = geoai.add_geometric_properties(gdf, area_unit="m2", length_unit="m")
print(f"Number of buildings: {len(gdf_props)}")

Visualize results¶

In [ ]:

Copied!

geoai.view_raster(masks_path, nodata=0, basemap=test_aerial_url, backend="ipyleaflet")
geoai.view_raster(masks_path, nodata=0, basemap=test_aerial_url, backend="ipyleaflet")

In [ ]:

Copied!

geoai.view_vector_interactive(gdf_props, column="area_m2", tiles=test_aerial_url)
geoai.view_vector_interactive(gdf_props, column="area_m2", tiles=test_aerial_url)

In [ ]:

Copied!

gdf_filtered = gdf_props[(gdf_props["area_m2"] > 50)]
print(f"Number of buildings: {len(gdf_filtered)}")
gdf_filtered = gdf_props[(gdf_props["area_m2"] > 50)]
print(f"Number of buildings: {len(gdf_filtered)}")

In [ ]:

Copied!

geoai.view_vector_interactive(gdf_filtered, column="area_m2", tiles=test_aerial_url)
geoai.view_vector_interactive(gdf_filtered, column="area_m2", tiles=test_aerial_url)

In [ ]:

Copied!





geoai.create_split_map(
    left_layer=gdf_filtered,
    right_layer=test_aerial_url,
    left_args={"style": {"color": "red", "fillOpacity": 0.2}},
    basemap=test_aerial_url,
)
geoai.create_split_map(
    left_layer=gdf_filtered,
    right_layer=test_aerial_url,
    left_args={"style": {"color": "red", "fillOpacity": 0.2}},
    basemap=test_aerial_url,
)

Performance Metrics¶

IoU (Intersection over Union) and Dice score are both popular metrics used to evaluate the similarity between two binary masks—often in image segmentation tasks. While they are related, they are not the same.

🔸 Definitions¶

IoU (Jaccard Index)¶

$$ \text{IoU} = \frac{|A \cap B|}{|A \cup B|} $$

Measures the overlap between predicted region $A$ and ground truth region $B$ relative to their union.
Ranges from 0 (no overlap) to 1 (perfect overlap).

Dice Score (F1 Score for Sets)¶

$$ \text{Dice} = \frac{2|A \cap B|}{|A| + |B|} $$

Measures the overlap between $A$ and $B$, but gives more weight to the intersection.
Also ranges from 0 to 1.

🔸 Key Differences¶

Metric	Formula	Penalizes	Sensitivity
IoU	$\frac{TP}{TP + FP + FN}$	FP and FN equally	Less sensitive to small objects
Dice	$\frac{2TP}{2TP + FP + FN}$	Less harsh on small mismatches	More sensitive to small overlaps

TP: True Positive, FP: False Positive, FN: False Negative

🔸 Relationship¶

Dice and IoU are mathematically related:

$$ \text{Dice} = \frac{2 \cdot \text{IoU}}{1 + \text{IoU}} \quad \text{or} \quad \text{IoU} = \frac{\text{Dice}}{2 - \text{Dice}} $$

🔸 When to Use What¶

IoU: Common in object detection and semantic segmentation benchmarks (e.g., COCO, Pascal VOC).
Dice: Preferred in medical imaging and when class imbalance is an issue, due to its sensitivity to small regions.