Train a Semantic Segmentation Model with JPEG-2000 Images¶
This notebook demonstrates how to train semantic segmentation models with JPEG-2000 images. JPEG2000 is a lossless image compression standard that is widely used in remote sensing and GIS applications. It is a more efficient and flexible format than JPEG, which is a lossy image compression standard.
To support JPEG2000 images, we need to install the libgdal-jp2openjpeg
package with the following command:
conda install conda-forge::libgdal-jp2openjpeg
To generate JPEG2000 images, you can use the following command:
gdal_translate -of JP2OpenJPEG input.tif output.jp2
Install packages¶
To use the new functionality, ensure the required packages are installed.
# %pip install geoai-py
Import libraries¶
import geoai
Download sample data¶
We'll use the same dataset as the Mask R-CNN example for consistency.
train_raster_url = (
"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_rgb_train.jp2"
)
train_vector_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_train_buildings.geojson"
test_raster_url = (
"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_test.jp2"
)
train_raster_path = geoai.download_file(train_raster_url)
train_vector_path = geoai.download_file(train_vector_url)
test_raster_path = geoai.download_file(test_raster_url)
Create training data¶
We'll create the same training tiles as before.
out_folder = "buildings"
tiles = geoai.export_geotiff_tiles(
in_raster=train_raster_path,
out_folder=out_folder,
in_class_data=train_vector_path,
tile_size=512,
stride=256,
buffer_radius=0,
)
Train semantic segmentation model¶
Now we'll train a semantic segmentation model using the new train_segmentation_model
function. This function supports various architectures from segmentation-models-pytorch
:
- Architectures:
unet
,unetplusplus
deeplabv3
,deeplabv3plus
,fpn
,pspnet
,linknet
,manet
- Encoders:
resnet34
,resnet50
,efficientnet-b0
,mobilenet_v2
, etc.
For more details, please refer to the segmentation-models-pytorch documentation.
# Train U-Net model
geoai.train_segmentation_model(
images_dir=f"{out_folder}/images",
labels_dir=f"{out_folder}/labels",
output_dir=f"{out_folder}/unet_models",
architecture="unet",
encoder_name="resnet34",
encoder_weights="imagenet",
num_channels=3,
num_classes=2, # background and building
batch_size=8,
num_epochs=5,
learning_rate=0.001,
val_split=0.2,
verbose=True,
)
Run inference¶
Now we'll use the trained model to make predictions on the test image.
# Define paths
masks_path = "naip_test_semantic_prediction.tif"
model_path = f"{out_folder}/unet_models/best_model.pth"
# Run semantic segmentation inference
geoai.semantic_segmentation(
input_path=test_raster_path,
output_path=masks_path,
model_path=model_path,
architecture="unet",
encoder_name="resnet34",
num_channels=3,
num_classes=2,
window_size=512,
overlap=256,
batch_size=4,
)
Vectorize masks¶
Convert the predicted mask to vector format for better visualization and analysis.
output_vector_path = "naip_test_semantic_prediction.geojson"
gdf = geoai.orthogonalize(masks_path, output_vector_path, epsilon=2)
Add geometric properties¶
gdf_props = geoai.add_geometric_properties(gdf, area_unit="m2", length_unit="m")
Visualize results¶
geoai.view_vector_interactive(gdf_props, column="area_m2", tiles="Esri.WorldImagery")
gdf_filtered = gdf_props[(gdf_props["area_m2"] > 50)]
geoai.view_vector_interactive(gdf_filtered, column="area_m2", tiles="Esri.WorldImagery")
Model Performance Analysis¶
Let's examine the training curves and model performance:
geoai.plot_performance_metrics(
history_path=f"{out_folder}/unet_models/training_history.pth",
figsize=(15, 5),
verbose=True,
)