Building Detection from Aerial Imagery and LiDAR Data¶
This notebook demonstrates how to train semantic segmentation models for building detection from NAIP aerial imagery and height above ground (HAG) data derived from LiDAR data with just a few lines of code. You can adapt this notebook to segment other objects of interest (such as trees, cars, etc.) from aerial imagery and LiDAR data.
Install packages¶
To use the new functionality, ensure the required packages are installed.
# %pip install geoai-py
Import libraries¶
import os
import geoai
Download sample data¶
We'll use the same dataset as the Mask R-CNN example for consistency.
train_aerial_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_train_naip.tif"
train_LiDAR_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_train_hag.tif"
train_building_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_buildings_train.geojson"
test_aerial_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_test_naip.tif"
test_LiDAR_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/las_vegas_test_hag.tif"
train_aerial_path = geoai.download_file(train_aerial_url)
train_LiDAR_path = geoai.download_file(train_LiDAR_url)
train_building_path = geoai.download_file(train_building_url)
test_aerial_path = geoai.download_file(test_aerial_url)
test_LiDAR_path = geoai.download_file(test_LiDAR_url)
Visualize sample data¶
Visualize the building footprints with the aerial imagery.
os.environ["TITILER_ENDPOINT"] = "https://titiler.xyz"
geoai.view_vector_interactive(train_building_path, tiles=train_aerial_url)
Visualize the building footprints with the height above ground (HAG) data derived from LiDAR data.
geoai.view_vector_interactive(train_building_path, tiles=train_LiDAR_url)
Stack bands¶
Stack the NAIP and HAG bands into a single image.
train_raster_path = "las_vegas_train_naip_hag.tif"
geoai.stack_bands(
input_files=[train_aerial_path, train_LiDAR_path],
output_file=train_raster_path,
resolution=None, # Automatically inferred from first image
overwrite=True,
dtype="Byte", # or "UInt16", "Float32"
)
test_raster_path = "las_vegas_test_naip_hag.tif"
geoai.stack_bands(
input_files=[test_aerial_path, test_LiDAR_path],
output_file=test_raster_path,
resolution=None, # Automatically inferred from first image
overwrite=True,
dtype="Byte", # or "UInt16", "Float32"
)
Create training data¶
We'll create the same training tiles as before.
out_folder = "buildings"
tiles = geoai.export_geotiff_tiles(
in_raster=train_raster_path,
out_folder=out_folder,
in_class_data=train_building_path,
tile_size=512,
stride=256,
buffer_radius=0,
)
Train semantic segmentation model¶
Now we'll train a semantic segmentation model using the new train_segmentation_model
function. This function supports various architectures from segmentation-models-pytorch
:
- Architectures:
unet
,unetplusplus
deeplabv3
,deeplabv3plus
,fpn
,pspnet
,linknet
,manet
- Encoders:
resnet34
,resnet50
,efficientnet-b0
,mobilenet_v2
, etc.
For more details, please refer to the segmentation-models-pytorch documentation.
Let's train a U-Net with ResNet34 encoder
# Train U-Net model
geoai.train_segmentation_model(
images_dir=f"{out_folder}/images",
labels_dir=f"{out_folder}/labels",
output_dir=f"{out_folder}/unet_models",
architecture="unet",
encoder_name="resnet34",
encoder_weights="imagenet",
num_channels=5,
num_classes=2, # background and building
batch_size=8,
num_epochs=50,
learning_rate=0.001,
val_split=0.2,
verbose=True,
)
Evaluate the model¶
geoai.plot_performance_metrics(
history_path=f"{out_folder}/unet_models/training_history.pth",
figsize=(15, 5),
verbose=True,
)
Run inference¶
Now we'll use the trained model to make predictions on the test image.
# Define paths
masks_path = "building_masks.tif"
model_path = f"{out_folder}/unet_models/best_model.pth"
# Run semantic segmentation inference
geoai.semantic_segmentation(
input_path=test_raster_path,
output_path=masks_path,
model_path=model_path,
architecture="unet",
encoder_name="resnet34",
num_channels=5,
num_classes=2,
window_size=512,
overlap=256,
batch_size=8,
)
Vectorize masks¶
Convert the predicted mask to vector format for better visualization and analysis.
output_vector_path = "building_masks.geojson"
gdf = geoai.orthogonalize(masks_path, output_vector_path, epsilon=2)
Add geometric properties¶
gdf_props = geoai.add_geometric_properties(gdf, area_unit="m2", length_unit="m")
print(f"Number of buildings: {len(gdf_props)}")
Visualize results¶
geoai.view_raster(masks_path, nodata=0, basemap=test_aerial_url, backend="ipyleaflet")
geoai.view_vector_interactive(gdf_props, column="area_m2", tiles=test_aerial_url)
gdf_filtered = gdf_props[(gdf_props["area_m2"] > 50)]
print(f"Number of buildings: {len(gdf_filtered)}")
geoai.view_vector_interactive(gdf_filtered, column="area_m2", tiles=test_aerial_url)
geoai.create_split_map(
left_layer=gdf_filtered,
right_layer=test_aerial_url,
left_args={"style": {"color": "red", "fillOpacity": 0.2}},
basemap=test_aerial_url,
)
Performance Metrics¶
IoU (Intersection over Union) and Dice score are both popular metrics used to evaluate the similarity between two binary masks—often in image segmentation tasks. While they are related, they are not the same.
🔸 Definitions¶
IoU (Jaccard Index)¶
$$ \text{IoU} = \frac{|A \cap B|}{|A \cup B|} $$
- Measures the overlap between predicted region $A$ and ground truth region $B$ relative to their union.
- Ranges from 0 (no overlap) to 1 (perfect overlap).
Dice Score (F1 Score for Sets)¶
$$ \text{Dice} = \frac{2|A \cap B|}{|A| + |B|} $$
- Measures the overlap between $A$ and $B$, but gives more weight to the intersection.
- Also ranges from 0 to 1.
🔸 Key Differences¶
Metric | Formula | Penalizes | Sensitivity |
---|---|---|---|
IoU | $\frac{TP}{TP + FP + FN}$ | FP and FN equally | Less sensitive to small objects |
Dice | $\frac{2TP}{2TP + FP + FN}$ | Less harsh on small mismatches | More sensitive to small overlaps |
TP: True Positive, FP: False Positive, FN: False Negative
🔸 Relationship¶
Dice and IoU are mathematically related:
$$ \text{Dice} = \frac{2 \cdot \text{IoU}}{1 + \text{IoU}} \quad \text{or} \quad \text{IoU} = \frac{\text{Dice}}{2 - \text{Dice}} $$
🔸 When to Use What¶
- IoU: Common in object detection and semantic segmentation benchmarks (e.g., COCO, Pascal VOC).
- Dice: Preferred in medical imaging and when class imbalance is an issue, due to its sensitivity to small regions.