Semantic Segmentation¶
Semantic segmentation is the task of assigning a class label to every pixel in an image. Unlike image classification, which produces a single label for an entire scene, or object detection, which draws bounding boxes around regions of interest, semantic segmentation produces a dense, pixel-level map that delineates the precise boundaries of every feature in the image. This makes it one of the most powerful techniques in geospatial AI, where the goal is often to map the exact extent of land cover types, buildings, water bodies, or other features across a landscape.
In remote sensing, semantic segmentation transforms raw satellite or aerial imagery into thematic maps that quantify what covers the Earth's surface and where. A single model can turn a multispectral satellite scene into a water mask, a building footprint layer, or a full land cover classification, all at the native resolution of the input imagery. The applications span environmental monitoring, urban planning, disaster response, agricultural assessment, and climate science.
This chapter introduces the foundations of semantic segmentation and demonstrates the complete workflow, from data preparation through model training and inference, across three progressively complex applications:
- Building detection from high-resolution aerial imagery, a binary segmentation task that illustrates the core concepts.
- Surface water mapping across three different sensor types: standard RGB imagery, multispectral Sentinel-2 data, and NAIP aerial photography, showing how the same architecture adapts to different input modalities.
- Land cover classification, a multi-class segmentation problem that assigns every pixel to one of thirteen land cover categories.
Each application follows a consistent pattern: acquire data, create training tiles, train a U-Net model, evaluate performance, run inference on held-out imagery, and visualize the results.
Foundations of Semantic Segmentation¶
Before diving into the hands-on applications, it is worth understanding two key concepts that underpin every model in this chapter: architectures and encoders.
A deep learning architecture defines the overall structure of a neural network: how data flows through its layers, how features are extracted at different spatial scales, and how the final output is produced. Think of it as the blueprint of a factory. The blueprint specifies where raw materials enter, how they are processed through a series of stations, and where finished products emerge. In a segmentation network, the "raw material" is an input image and the "finished product" is a pixel-wise class map.
Within this blueprint, the encoder is a specialized component that progressively compresses the input image into a compact set of feature representations. It acts like a preprocessing line that distills messy raw data into standardized, information-rich parts. The decoder then takes these compressed representations and reconstructs them back to the original spatial resolution, producing the final segmentation map. This encoder-decoder pattern is the foundation of nearly all modern segmentation architectures.
In short:
- Architecture = the factory blueprint (overall design and data flow)
- Encoder = the preprocessing line (compresses inputs into learned features)
- Decoder = the finishing line (reconstructs features into a pixel-level output)
Types of Architectures¶
Different architectures are suited for different tasks:
- Feedforward Neural Networks: simple, one-directional flow of data.
- Convolutional Neural Networks (CNNs): specialized for images, capturing spatial patterns like edges and textures.
- Recurrent Neural Networks (RNNs): designed for sequences, such as speech or time series.
- Transformers: powerful models for language and beyond, using attention mechanisms (e.g., ChatGPT).
For semantic segmentation of imagery, CNN-based encoder-decoder architectures remain the dominant approach. The encoder extracts hierarchical features at progressively coarser spatial scales, while the decoder upsamples these features back to the input resolution. Skip connections between corresponding encoder and decoder layers preserve fine-grained spatial detail that would otherwise be lost during downsampling.
Encoders and Transfer Learning¶
An encoder takes an input, such as an image or a sentence, and compresses it into a smaller, meaningful form called a feature representation or embedding. This process retains the essential information while filtering out noise.
The power of modern encoders comes largely from transfer learning. Rather than training an encoder from scratch on a small geospatial dataset, we start with an encoder that has already been pre-trained on millions of natural images (typically ImageNet). These pre-trained encoders have learned to recognize edges, textures, shapes, and patterns that transfer well to remote sensing imagery. Fine-tuning a pre-trained encoder on a domain-specific dataset is faster, requires less data, and typically produces better results than training from scratch.
Practical Implementation¶
The segmentation_models.pytorch library provides a modular framework that cleanly separates architectures from encoders. This means any encoder can be paired with any architecture:
- Architectures:
unet,unetplusplus,deeplabv3,deeplabv3plus,fpn,pspnet,linknet,manet - Encoders:
resnet34,resnet50,efficientnet-b0,mobilenet_v2, and many more
The GeoAI package builds on this library, providing high-level functions that handle the full training and inference pipeline so that practitioners can focus on their geospatial problem rather than deep learning boilerplate.
Environment Setup¶
The code in this chapter requires the GeoAI package and a GPU for model training. The chapter can be run locally with a CUDA-capable GPU or in Google Colab with a free T4 GPU (under Runtime > Change runtime type > T4 GPU).
To install GeoAI locally, create a virtual environment and install the package. Refer to the GeoAI installation guide for full details.
conda create -n geo python=3.12
conda activate geo
conda install -c conda-forge mamba
mamba install -c conda-forge geoai
For GPU support:
mamba install -c conda-forge geoai "pytorch=*=cuda*"
Alternatively, install via pip:
pip install geoai-py
Install Packages¶
# %pip install geoai-py
Import Libraries¶
import geoai
Building Detection from Aerial Imagery¶
Building detection is one of the most widely studied applications of semantic segmentation in remote sensing. Accurate building footprints are essential for urban planning, population estimation, disaster damage assessment, and infrastructure mapping. As a binary segmentation problem (building vs. background), it provides a clear and intuitive introduction to the segmentation workflow before tackling more complex multi-class problems later in this chapter.
In this section, we train a U-Net model on high-resolution NAIP (National Agriculture Imagery Program) aerial imagery with 3-band RGB input at 1-meter spatial resolution. The training data consists of image tiles paired with building footprint polygons that have been rasterized into binary masks.
Download Sample Data¶
The training dataset consists of a NAIP image tile and corresponding building footprints from the Overture Maps project. A separate NAIP image serves as the held-out test set for inference. For instructions on downloading NAIP imagery and Overture Maps data for a custom area of interest, see the data download tutorial.
train_raster_url = "https://data.source.coop/opengeos/geoai/naip_rgb_train.tif"
train_vector_url = (
"https://data.source.coop/opengeos/geoai/naip_train_buildings.geojson"
)
test_raster_url = "https://data.source.coop/opengeos/geoai/naip_test.tif"
train_raster_path = geoai.download_file(train_raster_url)
train_vector_path = geoai.download_file(train_vector_url)
test_raster_path = geoai.download_file(test_raster_url)
Visualize Sample Data¶
Before training, it is good practice to inspect the data. The following cells display raster metadata, an interactive overlay of the building footprints on the training imagery, and the test image that the model will later predict on.
geoai.get_raster_info(train_raster_path)
geoai.view_vector_interactive(train_vector_path, tiles=train_raster_url)
geoai.view_raster(test_raster_url)
Create Training Tiles¶
Deep learning models process fixed-size image patches rather than entire scenes. The export_geotiff_tiles function slices the training raster and its corresponding labels into 512x512 pixel tiles with a stride of 256 pixels, creating overlapping patches that increase the effective training set size and help the model learn features near tile boundaries.
out_folder = "buildings"
tiles = geoai.export_geotiff_tiles(
in_raster=train_raster_path,
out_folder=out_folder,
in_class_data=train_vector_path,
tile_size=512,
stride=256,
buffer_radius=0,
)
Train the Model¶
The model uses a U-Net architecture with a ResNet34 encoder pre-trained on ImageNet. U-Net's skip connections between the encoder and decoder preserve fine spatial details critical for delineating building edges. With only 2 classes (background and building), 3 input channels (RGB), and a 20% validation split, this configuration provides a solid baseline for binary segmentation.
# Train U-Net model
geoai.train_segmentation_model(
images_dir=f"{out_folder}/images",
labels_dir=f"{out_folder}/labels",
output_dir=f"{out_folder}/unet_models",
architecture="unet",
encoder_name="resnet34",
encoder_weights="imagenet",
num_channels=3,
num_classes=2, # background and building
batch_size=8,
num_epochs=20,
learning_rate=0.001,
val_split=0.2,
verbose=True,
)
Evaluate the Model¶
Training curves reveal whether the model is learning effectively. A healthy training run shows both training and validation loss decreasing over epochs, with validation loss closely tracking training loss. A growing gap between the two indicates overfitting, where the model memorizes training examples rather than learning generalizable features.
geoai.plot_performance_metrics(
history_path=f"{out_folder}/unet_models/training_history.pth",
figsize=(15, 5),
verbose=True,
)
Run Inference¶
With the trained model in hand, we apply it to the held-out test image. The semantic_segmentation function slides a 512x512 window across the input raster with 256-pixel overlap, runs each patch through the model, and stitches the predictions into a seamless output raster. The overlap ensures smooth transitions between adjacent patches.
masks_path = "naip_test_semantic_prediction.tif"
model_path = f"{out_folder}/unet_models/best_model.pth"
geoai.semantic_segmentation(
input_path=test_raster_path,
output_path=masks_path,
model_path=model_path,
architecture="unet",
encoder_name="resnet34",
num_channels=3,
num_classes=2,
window_size=512,
overlap=256,
batch_size=4,
)
Visualize Raster Masks¶
The raw prediction is a raster where each pixel holds a class index. Overlaying it on the original imagery provides a quick visual assessment of model quality.
geoai.view_raster(
masks_path,
nodata=0,
colormap="binary",
basemap=test_raster_url,
)
Vectorize Predictions¶
For many downstream applications, vector polygons are more useful than raster masks. The orthogonalize function converts the raster predictions to polygons and regularizes them into clean, right-angled shapes that better represent building geometry.
output_vector_path = "naip_test_semantic_prediction.geojson"
gdf = geoai.orthogonalize(masks_path, output_vector_path, epsilon=2)
Add Geometric Properties¶
Computing geometric properties such as area, perimeter, and shape indices enables filtering out noise and analyzing the spatial characteristics of detected buildings.
gdf_props = geoai.add_geometric_properties(gdf, area_unit="m2", length_unit="m")
Visualize Results¶
The interactive maps below show detected buildings colored by area. Filtering out very small polygons (< 10 m^2) removes noise from the predictions, and a split map provides a side-by-side comparison of predictions against the original imagery.
geoai.view_vector_interactive(gdf_props, column="area_m2", tiles=test_raster_url)
gdf_filtered = gdf_props[(gdf_props["area_m2"] > 10)]
geoai.view_vector_interactive(gdf_filtered, column="area_m2", tiles=test_raster_url)
geoai.create_split_map(
left_layer=gdf_filtered,
right_layer=test_raster_url,
left_args={"style": {"color": "red", "fillOpacity": 0.2}},
basemap=test_raster_url,
)
geoai.empty_cache()
Surface Water Mapping¶
Surface water mapping is among the most important applications of geospatial AI. Accurate, up-to-date water body maps are essential for ecosystem monitoring, flood risk assessment, agricultural water management, urban planning, and climate change research. Water bodies have irregular, often complex boundaries that make pixel-level segmentation far more appropriate than bounding-box detection.
This section demonstrates surface water mapping across three progressively complex data sources, each introducing new concepts while reinforcing the core segmentation workflow:
- Non-georeferenced satellite imagery (RGB, standard image formats) introduces the basic workflow.
- Sentinel-2 multispectral imagery (6 spectral bands, GeoTIFF) demonstrates how additional spectral information improves discrimination.
- NAIP aerial imagery (4 bands, 1-meter resolution) applies the workflow to the highest-resolution data commonly available.
Water Mapping with Non-Georeferenced Satellite Imagery¶
Working with standard image formats (JPG/PNG) without embedded geographic coordinates is a natural starting point. Many publicly available satellite image datasets are distributed this way, and this format isolates the core computer vision problem from geospatial complexity. The techniques learned here transfer directly to georeferenced imagery.
Download Sample Data¶
The waterbody dataset from Kaggle (credit: Francisco Escobar) contains 2,841 satellite image pairs with corresponding binary water masks. The dataset offers diverse geographic coverage across continents and climate zones, includes lakes, rivers, ponds, and coastal areas, and spans multiple seasons and lighting conditions.
Dataset characteristics:
- Total image pairs: 2,841 training examples
- Image format: RGB satellite imagery (3 channels)
- Mask format: Binary masks (255 = water, 0 = background)
- Variable image sizes: 256x256 to 1024x1024+ pixels
- Global coverage: Samples from diverse geographic regions and water body types
url = "https://data.source.coop/opengeos/geoai/waterbody-dataset.zip"
out_folder = geoai.download_file(url)
print(f"Downloaded dataset to {out_folder}")
The unzipped dataset contains two folders: images and masks. Each folder contains 2,841 images in JPG format. The images folder holds the original satellite imagery, and the masks folder holds the corresponding binary water masks.
Train the Model¶
As with building detection, we use a U-Net architecture with a ResNet34 encoder initialized from ImageNet weights. U-Net's encoder-decoder structure with skip connections is well suited to capturing the irregular boundaries of water bodies at multiple spatial scales.
ResNet34 strikes a good balance between model capacity and computational efficiency. Its 34-layer depth is sufficient for learning the spectral and textural features that distinguish water from land, while remaining fast enough for rapid iteration. Transfer learning from ImageNet gives the encoder a strong initialization: it already recognizes edges, textures, and color gradients, and needs only to specialize these features for water detection.
Key training parameters:
num_channels=3: RGB inputnum_classes=2: binary classification (background vs. water)batch_size=16: balances GPU memory usage and gradient stabilitylearning_rate=0.001: standard starting point for Adam optimizerval_split=0.2: reserves 20% of the data for validationtarget_size=(512, 512): standardizes variable-sized images to a uniform input size
For a complete list of available architectures and encoders, refer to the segmentation_models.pytorch documentation.
# Test train_segmentation_model with automatic size detection
geoai.train_segmentation_model(
images_dir=f"{out_folder}/images",
labels_dir=f"{out_folder}/masks",
output_dir=f"{out_folder}/unet_models",
architecture="unet", # The architecture to use for the model
encoder_name="resnet34", # The encoder to use for the model
encoder_weights="imagenet", # The weights to use for the encoder
num_channels=3, # number of channels in the input image
num_classes=2, # background and water
batch_size=16, # The number of images to process in each batch
num_epochs=20, # training for 20 epochs, in practice you may need more epochs for best results
learning_rate=0.001, # learning rate for the optimizer
val_split=0.2, # 20% of the data for validation
target_size=(512, 512), # target size of the input image
verbose=True, # print progress
)
Training produces several output files in the unet_models directory:
best_model.pth: the checkpoint with the highest validation IoUfinal_model.pth: the checkpoint from the final epochtraining_history.pth: complete training metrics for analysis and plottingtraining_summary.txt: a human-readable summary of configuration and results
Evaluate the Model¶
Model evaluation determines whether the network has learned to generalize beyond the training data. Three metrics are particularly important for semantic segmentation:
Loss measures the discrepancy between predictions and ground truth. During training, both training loss and validation loss should decrease. If validation loss begins to rise while training loss continues falling, the model is overfitting, memorizing training examples rather than learning generalizable patterns.
IoU (Intersection over Union), also called the Jaccard index, is the standard metric for segmentation quality. It is defined as the area of overlap between the prediction and ground truth divided by the area of their union. IoU ranges from 0.0 (no overlap) to 1.0 (perfect agreement). An IoU above 0.7 is generally considered good performance.
F1 Score is the harmonic mean of precision and recall. Precision measures the fraction of predicted positive pixels that are truly positive (how many of the pixels the model labeled as water actually are water). Recall measures the fraction of actual positive pixels that the model correctly identified (how many of the real water pixels the model found). The F1 score balances these two concerns: a model with high precision but low recall detects water accurately where it predicts it, but misses many water pixels; a model with high recall but low precision finds most water pixels but also produces many false positives. F1 ranges from 0.0 to 1.0, with higher values indicating a better balance between precision and recall.
geoai.plot_performance_metrics(
history_path=f"{out_folder}/unet_models/training_history.pth",
figsize=(15, 5),
verbose=True,
)
Run Inference on a Single Image¶
With the model trained and evaluated, we can apply it to individual images. Note that for a rigorous evaluation, inference should always be performed on images the model has never seen during training. Here we use one of the training images to demonstrate the workflow; the batch inference section that follows uses a fully independent test set.
index = 3 # change it to other image index, e.g., 100
test_image_path = f"{out_folder}/images/water_body_{index}.jpg"
ground_truth_path = f"{out_folder}/masks/water_body_{index}.jpg"
prediction_path = f"{out_folder}/prediction/water_body_{index}.png" # save as png to preserve exact values and avoid compression artifacts
model_path = f"{out_folder}/unet_models/best_model.pth"
geoai.semantic_segmentation(
input_path=test_image_path,
output_path=prediction_path,
model_path=model_path,
architecture="unet",
encoder_name="resnet34",
num_channels=3,
num_classes=2,
window_size=512,
overlap=256,
batch_size=32,
)
fig = geoai.plot_prediction_comparison(
original_image=test_image_path,
prediction_image=prediction_path,
ground_truth_image=ground_truth_path,
titles=["Original", "Prediction", "Ground Truth"],
figsize=(15, 5),
save_path=f"{out_folder}/prediction/water_body_{index}_comparison.png",
show_plot=True,
)
Run Inference on Multiple Images¶
Operational applications require processing large volumes of imagery efficiently. The semantic_segmentation_batch function processes an entire directory of images with consistent parameters, producing corresponding prediction masks for each input.
url = "https://data.source.coop/opengeos/geoai/waterbody-dataset-sample.zip"
data_dir = geoai.download_file(url)
print(f"Downloaded dataset to {data_dir}")
images_dir = f"{data_dir}/images"
masks_dir = f"{data_dir}/masks"
predictions_dir = f"{data_dir}/predictions"
geoai.semantic_segmentation_batch(
input_dir=images_dir,
output_dir=predictions_dir,
model_path=model_path,
architecture="unet",
encoder_name="resnet34",
num_channels=3,
num_classes=2,
window_size=512,
overlap=256,
batch_size=4,
quiet=True,
)
geoai.empty_cache()
Water Mapping with Sentinel-2 Multispectral Imagery¶
The previous section demonstrated water detection using standard RGB imagery. In practice, satellite sensors capture far more spectral information than the three visible bands. Sentinel-2, a European Space Agency (ESA) mission, provides 13 spectral bands at 10-20 meter resolution with global coverage every 5 days. The additional spectral bands, particularly in the near-infrared and shortwave infrared regions, dramatically improve the ability to discriminate water from other land cover types.
Water has a distinctive spectral signature: it absorbs strongly in the near-infrared and shortwave infrared, producing very low reflectance values in these bands compared to vegetation and soil. By including these bands as additional input channels, the segmentation model can exploit physical properties of water that are invisible in RGB imagery alone.
The six spectral bands used in this analysis are:
| Band | Wavelength | Role in Water Detection |
|---|---|---|
| Blue (490 nm) | Visible | Water absorption, atmospheric correction |
| Green (560 nm) | Visible | Water clarity, vegetation health |
| Red (665 nm) | Visible | Land-water contrast |
| NIR (842 nm) | Near-infrared | Strong water absorption; critical discriminator |
| SWIR1 (1610 nm) | Shortwave infrared | Very low water reflectance; excellent separator |
| SWIR2 (2190 nm) | Shortwave infrared | Separates water from wet soil and vegetation |
Download Sample Data¶
The Earth Surface Water Dataset (credit: Xin Luo) contains Sentinel-2 Level 2A (atmospherically corrected) imagery with 6 spectral bands and expert-annotated water masks. The dataset includes separate training and validation splits for rigorous evaluation.
url = "https://data.source.coop/opengeos/geoai/dset-s2.zip"
data_dir = geoai.download_file(url, output_path="dset-s2.zip")
The dataset is organized into four directories:
tra_scene/tra_truth: training images and corresponding water masksval_scene/val_truth: independent validation images and masks
images_dir = f"{data_dir}/tra_scene"
masks_dir = f"{data_dir}/tra_truth"
tiles_dir = f"{data_dir}/tiles"
Create Training Tiles¶
As before, the large Sentinel-2 scenes are sliced into 512x512 tiles. Because the training set contains multiple scenes, the export_geotiff_tiles_batch function processes all image-mask pairs in a directory at once.
result = geoai.export_geotiff_tiles_batch(
images_folder=images_dir,
masks_folder=masks_dir,
output_folder=tiles_dir,
tile_size=512,
stride=384,
quiet=True,
)
Train the Model¶
The model configuration is identical to the RGB water mapping model except for num_channels=6, which tells the encoder to accept 6-band input. When the number of input channels differs from the 3 channels that ImageNet weights expect, the first convolutional layer is automatically adapted: the pre-trained weights for the original 3 channels are preserved, and additional channels are initialized with reasonable defaults.
geoai.train_segmentation_model(
images_dir=f"{tiles_dir}/images",
labels_dir=f"{tiles_dir}/masks",
output_dir=f"{tiles_dir}/unet_models",
architecture="unet",
encoder_name="resnet34",
encoder_weights="imagenet",
num_channels=6,
num_classes=2, # background and water
batch_size=32,
num_epochs=20, # training for 20 epochs, in practice you may need more epochs for best results
learning_rate=0.001,
val_split=0.2,
verbose=True,
)
Evaluate the Model¶
geoai.plot_performance_metrics(
history_path=f"{tiles_dir}/unet_models/training_history.pth",
figsize=(15, 5),
verbose=True,
)
Run Inference on Validation Data¶
The validation set was held out entirely during training, providing an unbiased estimate of model performance on unseen imagery.
images_dir = f"{data_dir}/val_scene"
masks_dir = f"{data_dir}/val_truth"
predictions_dir = f"{data_dir}/predictions"
model_path = f"{tiles_dir}/unet_models/best_model.pth"
geoai.semantic_segmentation_batch(
input_dir=images_dir,
output_dir=predictions_dir,
model_path=model_path,
architecture="unet",
encoder_name="resnet34",
num_channels=6,
num_classes=2,
window_size=512,
overlap=256,
batch_size=32,
quiet=True,
)
Visualize Results¶
A side-by-side comparison of the original image (displayed as a false-color composite using bands 5-4-3), the model prediction, and the ground truth mask reveals how well the model captures water body boundaries.
image_id = "S2A_L2A_20190318_N0211_R061" # Change to other image id, e.g., S2B_L2A_20190620_N0212_R047
test_image_path = f"{data_dir}/val_scene/{image_id}_6Bands_S2.tif"
ground_truth_path = f"{data_dir}/val_truth/{image_id}_S2_Truth.tif"
prediction_path = f"{data_dir}/predictions/{image_id}_6Bands_S2_mask.tif"
save_path = f"{data_dir}/{image_id}_6Bands_S2_comparison.png"
fig = geoai.plot_prediction_comparison(
original_image=test_image_path,
prediction_image=prediction_path,
ground_truth_image=ground_truth_path,
titles=["Original", "Prediction", "Ground Truth"],
figsize=(15, 5),
save_path=save_path,
show_plot=True,
indexes=[5, 4, 3],
divider=5000,
)
Apply the Model to New Sentinel-2 Imagery¶
To demonstrate real-world applicability, we download a Sentinel-2 scene over Minnesota and run the trained model on it. This tests the model's ability to generalize to imagery from a different geographic region and acquisition date than the training data.
s2_path = "s2.tif"
url = "https://data.source.coop/opengeos/geoai/s2-minnesota-2025-08-31-subset.tif"
geoai.download_file(url, output_path=s2_path)
geoai.view_raster(
s2_path, indexes=[4, 3, 2], vmin=0, vmax=5000, layer_name="Sentinel-2"
)
s2_mask = "s2_mask.tif"
model_path = f"{tiles_dir}/unet_models/best_model.pth"
geoai.semantic_segmentation(
input_path=s2_path,
output_path=s2_mask,
model_path=model_path,
architecture="unet",
encoder_name="resnet34",
num_channels=6,
num_classes=2,
window_size=512,
overlap=256,
batch_size=32,
)
geoai.view_raster(
s2_mask,
nodata=0,
colormap="binary",
layer_name="Water",
basemap=s2_path,
basemap_args={"indexes": [4, 3, 2], "vmin": 0, "vmax": 5000},
)
geoai.empty_cache()
Water Mapping with NAIP Aerial Imagery¶
The final water mapping example uses NAIP (National Agriculture Imagery Program) aerial photography, which provides the highest spatial resolution commonly available for large-scale analysis. NAIP imagery covers the continental United States at 1-meter resolution with four spectral bands: Red, Green, Blue, and Near-Infrared. Updated every 2-3 years, NAIP data is freely available through USGS and other data portals.
At 1-meter resolution, NAIP captures fine-scale features that are invisible at Sentinel-2's 10-20 meter resolution: narrow streams, small ponds, swimming pools, and the detailed shoreline geometry of larger water bodies. However, higher resolution also means more spectral variability within a single land cover class, which can make segmentation more challenging.
Download Sample Data¶
The training and test imagery are pre-processed NAIP scenes with corresponding rasterized water masks. For instructions on downloading NAIP imagery for a custom area of interest, see the NAIP download tutorial.
train_raster_url = "https://data.source.coop/opengeos/geoai/naip/naip_water_train.tif"
train_masks_url = "https://data.source.coop/opengeos/geoai/naip/naip_water_masks.tif"
test_raster_url = "https://data.source.coop/opengeos/geoai/naip/naip_water_test.tif"
train_raster_path = geoai.download_file(train_raster_url)
train_masks_path = geoai.download_file(train_masks_url)
test_raster_path = geoai.download_file(test_raster_url)
geoai.print_raster_info(train_raster_path, show_preview=False)
Visualize Sample Data¶
geoai.view_raster(train_masks_url, nodata=0, opacity=0.5, basemap=train_raster_url)
geoai.view_raster(test_raster_url)
Create Training Tiles¶
out_folder = "naip"
tiles = geoai.export_geotiff_tiles(
in_raster=train_raster_path,
out_folder=out_folder,
in_class_data=train_masks_path,
tile_size=512,
stride=256,
buffer_radius=0,
)
Train the Model¶
The NAIP model uses num_channels=4 to accommodate the four-band input (RGB + NIR). A slightly higher learning rate of 0.005 is used here, which can accelerate convergence but requires careful monitoring to avoid overshooting.
geoai.train_segmentation_model(
images_dir=f"{out_folder}/images",
labels_dir=f"{out_folder}/labels",
output_dir=f"{out_folder}/models",
architecture="unet",
encoder_name="resnet34",
encoder_weights="imagenet",
num_channels=4,
batch_size=8,
num_epochs=20,
learning_rate=0.005,
val_split=0.2,
)
Evaluate the Model¶
geoai.plot_performance_metrics(
history_path=f"{out_folder}/models/training_history.pth",
figsize=(15, 5),
verbose=True,
)
Run Inference¶
masks_path = "naip_water_prediction.tif"
model_path = f"{out_folder}/models/best_model.pth"
geoai.semantic_segmentation(
test_raster_path,
masks_path,
model_path,
architecture="unet",
encoder_name="resnet34",
encoder_weights="imagenet",
window_size=512,
overlap=128,
batch_size=32,
num_channels=4,
)
geoai.view_raster(
masks_path,
nodata=0,
layer_name="Water",
basemap=test_raster_url,
)
Vectorize and Analyze Predictions¶
Converting raster masks to vector polygons enables spatial analysis of individual water bodies. Filtering by geometric properties, such as minimum area or elongation ratio, removes noise and isolates features with physically plausible shapes.
output_path = "naip_water_prediction.geojson"
gdf = geoai.raster_to_vector(
masks_path, output_path, min_area=1000, simplify_tolerance=1
)
gdf = geoai.add_geometric_properties(gdf)
len(gdf)
geoai.view_vector_interactive(gdf, tiles=test_raster_url)
The elongation ratio helps distinguish genuine water bodies from long, narrow artifacts such as road edges or shadows. Filtering out highly elongated features (elongation > 10) retains compact water bodies while removing false positives.
gdf["elongation"].hist()
gdf_filtered = gdf[gdf["elongation"] < 10]
len(gdf_filtered)
Visualize Results¶
geoai.view_vector_interactive(gdf_filtered, tiles=test_raster_url)
geoai.create_split_map(
left_layer=gdf_filtered,
right_layer=test_raster_url,
left_args={"style": {"color": "red", "fillOpacity": 0.2}},
basemap=test_raster_url,
)
geoai.empty_cache()
Land Cover Classification¶
The preceding sections addressed binary segmentation: each pixel belongs to either the target class or the background. Land cover classification extends this to multi-class segmentation, where every pixel is assigned to one of many possible categories such as water, forest, impervious surface, cropland, or wetland.
Multi-class segmentation is fundamentally the same task as binary segmentation, with two key differences. First, the num_classes parameter increases from 2 to the number of land cover categories (13 in this example). Second, the loss function and evaluation metrics operate across all classes simultaneously, and the model must learn to distinguish between classes that may be spectrally similar, such as deciduous forest and cropland in summer.
The classification scheme used here is adopted from the Chesapeake Land Cover project, which provides 13 land cover classes for the Chesapeake Bay watershed region. The training data consists of NAIP 4-band aerial imagery paired with rasterized land cover labels where each pixel contains an integer class value (0-12).
Download Sample Data¶
train_raster_url = (
"https://data.source.coop/opengeos/geoai/m_3807511_ne_18_060_20181104.tif"
)
train_landcover_url = (
"https://data.source.coop/opengeos/geoai/m_3807511_ne_18_060_20181104_landcover.tif"
)
test_raster_url = (
"https://data.source.coop/opengeos/geoai/m_3807511_se_18_060_20181104.tif"
)
train_raster_path = geoai.download_file(train_raster_url)
train_landcover_path = geoai.download_file(train_landcover_url)
test_raster_path = geoai.download_file(test_raster_url)
Visualize Sample Data¶
geoai.view_raster(train_landcover_url, basemap=train_raster_url)
geoai.view_raster(test_raster_url)
Create Training Tiles¶
Label images for multi-class segmentation must contain integer class values (0, 1, 2, ..., 12) rather than binary masks. Class 0 typically represents background, and classes 1-12 represent the land cover types.
out_folder = "landcover"
tiles = geoai.export_geotiff_tiles(
in_raster=train_raster_path,
out_folder=out_folder,
in_class_data=train_landcover_path,
tile_size=512,
stride=256,
buffer_radius=0,
)
Train the Model¶
The model configuration mirrors the previous examples but with num_classes=13 to accommodate the full land cover classification scheme. With more classes to distinguish, a longer training run (20 epochs) gives the model sufficient time to learn the subtle spectral differences between categories.
# Train U-Net model
geoai.train_segmentation_model(
images_dir=f"{out_folder}/images",
labels_dir=f"{out_folder}/labels",
output_dir=f"{out_folder}/unet_models",
architecture="unet",
encoder_name="resnet34",
encoder_weights="imagenet",
num_channels=4,
num_classes=13,
batch_size=8,
num_epochs=20,
learning_rate=0.001,
val_split=0.2,
verbose=True,
plot_curves=True,
)
Run Inference¶
# Define paths
masks_path = "naip_test_semantic_prediction.tif"
model_path = f"{out_folder}/unet_models/best_model.pth"
# Run semantic segmentation inference
geoai.semantic_segmentation(
input_path=test_raster_path,
output_path=masks_path,
model_path=model_path,
architecture="unet",
encoder_name="resnet34",
num_channels=4,
num_classes=13,
window_size=512,
overlap=256,
batch_size=4,
)
Visualize Results¶
The predicted class map is colorized using the same color scheme as the training labels, enabling direct visual comparison.
geoai.write_colormap(masks_path, train_landcover_path, output=masks_path)
geoai.view_raster(masks_path, basemap=test_raster_url)
Summary¶
This chapter presented the complete semantic segmentation workflow for geospatial applications, progressing from binary building detection through binary water mapping to multi-class land cover classification. Several key themes emerged across these applications:
Architecture and encoder choice: The U-Net architecture with a ResNet34 encoder proved effective across all tasks, sensor types, and class configurations. Its encoder-decoder structure with skip connections preserves the fine spatial detail needed for accurate boundary delineation, while transfer learning from ImageNet provides a strong initialization regardless of the number of input channels.
Adapting to different sensors: The same fundamental workflow applies whether the input is 3-band RGB imagery, 4-band NAIP, or 6-band Sentinel-2 data. The primary change is the num_channels parameter; the rest of the pipeline, from tiling through training and inference, remains identical. Additional spectral bands, particularly in the near-infrared and shortwave infrared, provide physically meaningful features that improve discrimination for targets like water.
From raster to vector: Segmentation predictions are raster masks, but many applications require vector polygons. Post-processing steps such as vectorization, shape regularization, geometric property computation, and filtering by area or elongation transform raw predictions into analysis-ready geospatial features.
Evaluation matters: IoU, F1 score, precision, and recall quantify segmentation quality from complementary angles, while training curves reveal whether a model is underfitting, overfitting, or converging well. Always evaluate on held-out data that the model has never seen during training.