Field Boundary Detection with Instance Segmentation¶

This notebook demonstrates an end-to-end pipeline for detecting agricultural field boundaries using instance segmentation with Mask R-CNN. Accurate field boundary delineation is essential for precision agriculture, crop monitoring, subsidy verification, and land use planning.

We use the Fields of The World (FTW) benchmark dataset, which provides Sentinel-2 imagery (4 bands at 10 m resolution) paired with instance segmentation masks across 25 countries. Each chip provides two temporal windows (window_a and window_b) captured on different dates, so that seasonal vegetation differences can help delineate boundaries. We work with the Luxembourg subset, which is small enough for a tutorial while containing high-quality annotations.

Install packages¶

Uncomment the following line to install the required packages.

In [ ]:

Copied!

# %pip install geoai-py
# %pip install geoai-py

Import libraries¶

In [ ]:

Copied!





import os
from pathlib import Path
import geopandas as gpd
import geoai
import os
from pathlib import Path
import geopandas as gpd
import geoai

Download the FTW dataset¶

The Fields of The World (FTW) dataset contains 70,462 samples across 25 countries with Sentinel-2 imagery (4 bands: Red, Green, Blue, NIR at 10 m resolution) and instance segmentation masks. Each chip is 256×256 pixels.

We download the Luxembourg subset, one of the smallest, making it ideal for a tutorial.

In [ ]:

Copied!

geoai.download_ftw(countries=["luxembourg"], output_dir="ftw_data")
geoai.download_ftw(countries=["luxembourg"], output_dir="ftw_data")

Explore the dataset¶

The FTW dataset includes a GeoParquet file with metadata and geometry for each chip, including the train/val/test split.

In [ ]:

Copied!





country_dir = os.path.join("ftw_data", "luxembourg")
chips_gdf = gpd.read_parquet(os.path.join(country_dir, "chips_luxembourg.parquet"))

print(f"Total chips: {len(chips_gdf)}")
print(f"\nSplit distribution:")
print(chips_gdf["split"].value_counts())
country_dir = os.path.join("ftw_data", "luxembourg")
chips_gdf = gpd.read_parquet(os.path.join(country_dir, "chips_luxembourg.parquet"))

print(f"Total chips: {len(chips_gdf)}")
print(f"\nSplit distribution:")
print(chips_gdf["split"].value_counts())

Visualize the spatial distribution of training, validation, and test chips.

In [ ]:

Copied!

geoai.view_vector_interactive(chips_gdf, column="split")
geoai.view_vector_interactive(chips_gdf, column="split")

Display sample image–mask pairs. Each mask uses unique integer IDs to distinguish individual field instances.

In [ ]:

Copied!

geoai.display_ftw_samples("ftw_data", country="luxembourg", num_samples=4)
geoai.display_ftw_samples("ftw_data", country="luxembourg", num_samples=4)

Prepare training data¶

GeoAI's Mask R-CNN pipeline expects images/ and labels/ directories with uint8 GeoTIFFs. The prepare_ftw function rescales Sentinel-2 reflectance (0–10,000) to uint8 (0–255), organizes files, and prepares test chips.

In [ ]:

Copied!

data = geoai.prepare_ftw("ftw_data", country="luxembourg")
data
data = geoai.prepare_ftw("ftw_data", country="luxembourg")
data

Verify that the prepared tiles look correct.

In [ ]:

Copied!





geoai.display_training_tiles(
    output_dir="field_boundaries",
    num_tiles=4,
    figsize=(12, 6),
    cmap="tab20",
)
geoai.display_training_tiles(
    output_dir="field_boundaries",
    num_tiles=4,
    figsize=(12, 6),
    cmap="tab20",
)

Train instance segmentation model¶

We train a Mask R-CNN model with a ResNet-50 + FPN backbone.

Key parameters:

num_classes=2 — Background (0) and field (1).
num_channels=4 — Sentinel-2 bands (R, G, B, NIR). NIR helps distinguish vegetation boundaries.
instance_labels=True — The FTW masks already encode unique instance IDs, so geoai should use them directly instead of running connected-component labeling.
num_epochs=20 — Sufficient for demonstration; increase to 50–100 for production.
val_split=0.2 — Reserves 20% of chips for validation.

In [ ]:

Copied!





geoai.train_instance_segmentation_model(
    images_dir=data["images_dir"],
    labels_dir=data["labels_dir"],
    output_dir="field_boundaries/models",
    num_classes=2,
    num_channels=4,
    batch_size=4,
    num_epochs=20,
    learning_rate=0.005,
    val_split=0.2,
    instance_labels=True,
    visualize=True,
    verbose=True,
)
geoai.train_instance_segmentation_model(
    images_dir=data["images_dir"],
    labels_dir=data["labels_dir"],
    output_dir="field_boundaries/models",
    num_classes=2,
    num_channels=4,
    batch_size=4,
    num_epochs=20,
    learning_rate=0.005,
    val_split=0.2,
    instance_labels=True,
    visualize=True,
    verbose=True,
)

Training performance¶

Examine the training and validation loss curves to assess model convergence.

In [ ]:

Copied!





geoai.plot_performance_metrics(
    history_path="field_boundaries/models/training_history.pth",
    figsize=(15, 5),
    verbose=True,
)
geoai.plot_performance_metrics(
    history_path="field_boundaries/models/training_history.pth",
    figsize=(15, 5),
    verbose=True,
)

Run inference¶

Apply the trained model to a test image using sliding window inference with window size 256 and overlap 128.

In [ ]:

Copied!





test_images = sorted(Path(data["test_dir"]).glob("*.tif"))
test_image_path = str(test_images[0])
masks_path = "field_boundary_prediction.tif"
model_path = "field_boundaries/models/best_model.pth"

result = geoai.instance_segmentation(
    input_path=test_image_path,
    output_path=masks_path,
    model_path=model_path,
    num_classes=2,
    num_channels=4,
    window_size=256,
    overlap=128,
    confidence_threshold=0.5,
    batch_size=4,
    vectorize=True,
    class_names=["background", "building"],
)
result
test_images = sorted(Path(data["test_dir"]).glob("*.tif"))
test_image_path = str(test_images[0])
masks_path = "field_boundary_prediction.tif"
model_path = "field_boundaries/models/best_model.pth"

result = geoai.instance_segmentation(
    input_path=test_image_path,
    output_path=masks_path,
    model_path=model_path,
    num_classes=2,
    num_channels=4,
    window_size=256,
    overlap=128,
    confidence_threshold=0.5,
    batch_size=4,
    vectorize=True,
    class_names=["background", "building"],
)
result

Visualize raw predictions¶

Each color represents a distinct field instance detected by the model.

In [ ]:

Copied!





geoai.view_raster(
    result["instance"],
    nodata=0,
    cmap="tab20",
    basemap=test_image_path,
    backend="ipyleaflet",
)
geoai.view_raster(
    result["instance"],
    nodata=0,
    cmap="tab20",
    basemap=test_image_path,
    backend="ipyleaflet",
)

In [ ]:

Copied!





geoai.view_raster(
    result["class_label"],
    nodata=0,
    cmap="binary",
    basemap=test_image_path,
    backend="ipyleaflet",
)
geoai.view_raster(
    result["class_label"],
    nodata=0,
    cmap="binary",
    basemap=test_image_path,
    backend="ipyleaflet",
)

In [ ]:

Copied!

geoai.view_raster(
    result["score"], nodata=0, basemap=test_image_path, backend="ipyleaflet"
)
geoai.view_raster(
    result["score"], nodata=0, basemap=test_image_path, backend="ipyleaflet"
)

In [ ]:

Copied!

geoai.view_vector_interactive(result["vector"], tiles=test_image_path, column="score")
geoai.view_vector_interactive(result["vector"], tiles=test_image_path, column="score")

Clean instance mask¶

Remove small spurious detections and fill holes between adjacent instances using clean_instance_mask. This is designed specifically for instance segmentation outputs (unlike clean_raster, which is for semantic/classification masks).

In [ ]:

Copied!





cleaned_masks_path = "field_boundary_prediction_cleaned.tif"
geoai.clean_instance_mask(
    result["instance"], cleaned_masks_path, min_area=100, max_hole_area=100
)
cleaned_masks_path = "field_boundary_prediction_cleaned.tif"
geoai.clean_instance_mask(
    result["instance"], cleaned_masks_path, min_area=100, max_hole_area=100
)

In [ ]:

Copied!





geoai.view_raster(
    cleaned_masks_path,
    nodata=0,
    cmap="tab20",
    basemap=test_image_path,
    backend="ipyleaflet",
)
geoai.view_raster(
    cleaned_masks_path,
    nodata=0,
    cmap="tab20",
    basemap=test_image_path,
    backend="ipyleaflet",
)

Vectorize predictions¶

Convert the cleaned raster mask to vector polygons for spatial analysis.

In [ ]:

Copied!

output_vector_path = "field_boundary_prediction.geojson"
gdf = geoai.raster_to_vector(cleaned_masks_path, output_vector_path)
output_vector_path = "field_boundary_prediction.geojson"
gdf = geoai.raster_to_vector(cleaned_masks_path, output_vector_path)

Compare predictions with imagery¶

Use a split map to visually compare the detected field boundaries against the original Sentinel-2 imagery.

In [ ]:

Copied!





geoai.create_split_map(
    left_layer=gdf,
    right_layer=test_image_path,
    left_args={"style": {"color": "red", "fillOpacity": 0.2}},
    basemap=test_image_path,
)
geoai.create_split_map(
    left_layer=gdf,
    right_layer=test_image_path,
    left_args={"style": {"color": "red", "fillOpacity": 0.2}},
    basemap=test_image_path,
)

Geometric properties¶

Calculate geometric properties for each detected field:

Property	Description
Area	Field size in hectares — critical for yield estimation and subsidy programs
Perimeter	Boundary length — useful for fencing cost estimation
Elongation	Major/minor axis ratio — distinguishes strip fields from compact parcels
Solidity	Area/convex hull area ratio — measures boundary irregularity
Extent	Area/bounding box area ratio — indicates how rectangular a field is

In [ ]:

Copied!

gdf_props = geoai.add_geometric_properties(gdf, area_unit="ha", length_unit="m")
gdf_props.head()
gdf_props = geoai.add_geometric_properties(gdf, area_unit="ha", length_unit="m")
gdf_props.head()

In [ ]:

Copied!

gdf_props.describe()
gdf_props.describe()

Visualize fields by property¶

In [ ]:

Copied!

geoai.view_vector_interactive(gdf_props, column="area_ha", tiles=test_image_path)
geoai.view_vector_interactive(gdf_props, column="area_ha", tiles=test_image_path)

In [ ]:

Copied!

geoai.view_vector_interactive(gdf_props, column="elongation", tiles=test_image_path)
geoai.view_vector_interactive(gdf_props, column="elongation", tiles=test_image_path)

Batch processing¶

Process all test images at once.

In [ ]:

Copied!





geoai.instance_segmentation_batch(
    input_dir=data["test_dir"],
    output_dir="field_boundaries/predictions",
    model_path=model_path,
    num_classes=2,
    num_channels=4,
    window_size=256,
    overlap=128,
    confidence_threshold=0.5,
    batch_size=4,
)
geoai.instance_segmentation_batch(
    input_dir=data["test_dir"],
    output_dir="field_boundaries/predictions",
    model_path=model_path,
    num_classes=2,
    num_channels=4,
    window_size=256,
    overlap=128,
    confidence_threshold=0.5,
    batch_size=4,
)

Summary¶

This notebook demonstrated a complete field boundary detection pipeline:

Data acquisition — Downloaded the FTW Luxembourg dataset with geoai.download_ftw().
Data preparation — Rescaled Sentinel-2 reflectance to uint8 with geoai.prepare_ftw().
Training — Trained Mask R-CNN with instance_labels=True to preserve field identity.
Inference — Applied sliding window inference to test imagery.
Post-processing — Cleaned instance masks with clean_instance_mask(), then vectorized.
Analysis — Computed geometric properties (area, perimeter, elongation, solidity).

Tips for improving results¶

More training data: Download additional FTW countries with geoai.download_ftw(countries=["france", "austria"]).
Both temporal windows: Use window="window_b" in prepare_ftw() for a different season, or stack both for 8-band input.
Longer training: Increase num_epochs to 50–100.
Confidence tuning: Lower confidence_threshold (e.g., 0.3) to detect more fields at the cost of more false positives.
Post-processing: Adjust min_area and max_hole_area in clean_instance_mask() to match your target field sizes.