Export Training Data in Multiple Formats (PASCAL VOC, COCO, YOLO)¶
This notebook demonstrates how to export geospatial training data in three popular object detection formats:
- PASCAL VOC: XML-based format, widely used in computer vision
- COCO: JSON-based format, standard for object detection benchmarks
- YOLO: Text-based format with normalized coordinates, optimized for YOLO models
Install packages¶
Ensure the required packages are installed.
# %pip install geoai-py
Import libraries¶
import geoai
import json
from pathlib import Path
Download sample data¶
We'll use the same building detection dataset from the segmentation example.
train_raster_url = (
"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_rgb_train.tif"
)
train_vector_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_train_buildings.geojson"
train_raster_path = geoai.download_file(train_raster_url)
train_vector_path = geoai.download_file(train_vector_url)
Visualize sample data¶
geoai.get_raster_info(train_raster_path)
geoai.view_vector_interactive(train_vector_path, tiles=train_raster_path)
Format 1: PASCAL VOC (XML)¶
PASCAL VOC format stores annotations in XML files with bounding boxes and class labels. This is the default format and is widely used in traditional object detection frameworks.
Output structure:
pascal_voc_output/
├── images/ # GeoTIFF tiles
├── labels/ # Label masks (GeoTIFF)
└── annotations/ # XML annotation files
pascal_output = "buildings_pascal_voc"
stats = geoai.export_geotiff_tiles(
in_raster=train_raster_path,
out_folder=pascal_output,
in_class_data=train_vector_path,
tile_size=512,
stride=256,
buffer_radius=0,
metadata_format="PASCAL_VOC",
# max_tiles=10, # Limit for demo purposes
)
Examine PASCAL VOC output¶
# List annotation files
xml_files = list(Path(f"{pascal_output}/annotations").glob("*.xml"))
print(f"Found {len(xml_files)} XML annotation files")
# Display first annotation file
if xml_files:
with open(xml_files[0], "r") as f:
print(f"\nSample annotation ({xml_files[0].name}):\n")
print(f.read())
Format 2: COCO (JSON)¶
COCO format uses a single JSON file containing all annotations, images, and categories. This is the standard format for modern object detection benchmarks.
Output structure:
coco_output/
├── images/ # GeoTIFF tiles
├── labels/ # Label masks (GeoTIFF)
└── annotations/
└── instances.json # COCO annotations
COCO JSON structure:
{
"images": [{"id": 0, "file_name": "tile_000000.tif", "width": 512, "height": 512}],
"annotations": [{"id": 1, "image_id": 0, "category_id": 1, "bbox": [x, y, w, h]}],
"categories": [{"id": 1, "name": "building", "supercategory": "object"}]
}
coco_output = "buildings_coco"
stats = geoai.export_geotiff_tiles(
in_raster=train_raster_path,
out_folder=coco_output,
in_class_data=train_vector_path,
tile_size=512,
stride=256,
buffer_radius=0,
metadata_format="COCO",
# max_tiles=10,
)
Examine COCO output¶
# Load COCO annotations
coco_file = f"{coco_output}/annotations/instances.json"
with open(coco_file, "r") as f:
coco_data = json.load(f)
print(f"COCO Dataset Summary:")
print(f" Images: {len(coco_data['images'])}")
print(f" Annotations: {len(coco_data['annotations'])}")
print(f" Categories: {len(coco_data['categories'])}")
# Display categories
print(f"\nCategories:")
for cat in coco_data["categories"]:
print(f" {cat}")
# Display first image
if coco_data["images"]:
print(f"\nFirst image:")
print(f" {coco_data['images'][0]}")
# Display first annotation
if coco_data["annotations"]:
print(f"\nFirst annotation:")
print(f" {coco_data['annotations'][0]}")
Format 3: YOLO (Text)¶
YOLO format uses text files with normalized bounding box coordinates. Each image has a corresponding .txt
file with one line per object.
Output structure:
yolo_output/
├── images/ # GeoTIFF tiles
├── labels/ # Label masks (GeoTIFF) + YOLO .txt files
└── classes.txt # Class names (one per line)
YOLO annotation format (normalized coordinates 0-1):
<class_id> <x_center> <y_center> <width> <height>
0 0.5 0.5 0.3 0.2
yolo_output = "buildings_yolo"
stats = geoai.export_geotiff_tiles(
in_raster=train_raster_path,
out_folder=yolo_output,
in_class_data=train_vector_path,
tile_size=512,
stride=256,
buffer_radius=0,
metadata_format="YOLO",
# max_tiles=10,
)
Examine YOLO output¶
# Load classes
classes_file = f"{yolo_output}/classes.txt"
with open(classes_file, "r") as f:
classes = f.read().strip().split("\n")
print(f"Classes ({len(classes)}):")
for i, cls in enumerate(classes):
print(f" {i}: {cls}")
# List annotation files
txt_files = list(Path(f"{yolo_output}/labels").glob("*.txt"))
print(f"\nFound {len(txt_files)} YOLO annotation files")
# Display first annotation file
if txt_files:
with open(txt_files[0], "r") as f:
lines = f.readlines()
print(f"\nSample annotation ({txt_files[0].name}):")
print(f" Format: <class_id> <x_center> <y_center> <width> <height>")
for line in lines[:5]: # Show first 5 objects
print(f" {line.strip()}")
if len(lines) > 5:
print(f" ... and {len(lines) - 5} more objects")
Format Comparison¶
When to Use Each Format¶
Format | Best For | Pros | Cons |
---|---|---|---|
PASCAL VOC | Traditional CV frameworks, quick inspection | Human-readable XML, one file per image | Verbose, not ideal for large datasets |
COCO | Modern object detection, benchmarking, complex datasets | Efficient JSON, supports multiple annotations types | Single file can be large, requires parsing |
YOLO | YOLO models (v3-v8), real-time detection | Compact, fast to parse, normalized coordinates | Less human-readable, limited metadata |
Coordinate Systems¶
- PASCAL VOC: Absolute pixel coordinates
[xmin, ymin, xmax, ymax]
- COCO: Absolute pixel coordinates
[x, y, width, height]
(top-left corner) - YOLO: Normalized coordinates
[x_center, y_center, width, height]
(0-1 range)
GeoAI Extensions¶
All formats preserve geospatial information:
- PASCAL VOC: CRS, transform, and bounds in
<georeference>
element - COCO: CRS and transform as custom fields in image metadata
- YOLO: Georeferenced GeoTIFF tiles maintain spatial context
Multi-Class Example¶
The formats also support multi-class datasets. Here's how class information is stored:
PASCAL VOC:
<object>
<name>building</name>
<bndbox>...</bndbox>
</object>
COCO:
{
"categories": [
{"id": 1, "name": "building", "supercategory": "object"},
{"id": 2, "name": "road", "supercategory": "object"}
]
}
YOLO:
classes.txt:
building
road
annotations:
0 0.5 0.5 0.3 0.2 # class_id 0 = building
1 0.7 0.3 0.2 0.1 # class_id 1 = road
Summary¶
The export_geotiff_tiles
function now supports three popular annotation formats:
- ✅ PASCAL VOC (XML) - Traditional, human-readable
- ✅ COCO (JSON) - Modern benchmark standard
- ✅ YOLO (TXT) - Lightweight, optimized for YOLO
All formats maintain geospatial context through georeferenced GeoTIFF tiles, making them ideal for training object detection models on remote sensing imagery.
Choose the format that best fits your model training framework:
- Use COCO for detectron2, MMDetection, or benchmark comparisons
- Use YOLO for YOLOv5, YOLOv8, or ultralytics
- Use PASCAL VOC for TensorFlow Object Detection API or legacy frameworks
Using Exported Data for Training¶
The training functions in GeoAI now support all three annotation formats directly! Here's how to use them for training models.
Training with COCO Format¶
Use input_format="coco"
and point labels_dir
to the instances.json
file:
# Train semantic segmentation model with COCO format
geoai.train_segmentation_model(
images_dir=f"{coco_output}/images",
labels_dir=f"{coco_output}/annotations/instances.json", # Path to COCO JSON
output_dir="models_coco",
input_format="coco", # Specify COCO format
architecture="unet",
encoder_name="resnet34",
num_epochs=20, # Reduced for demo
batch_size=8,
verbose=True,
)
geoai.plot_performance_metrics(
history_path=f"models_coco/training_history.pth",
figsize=(15, 5),
verbose=True,
)
# Train instance segmentation model with COCO format
geoai.train_instance_segmentation_model(
images_dir=f"{coco_output}/images",
labels_dir=f"{coco_output}/annotations/instances.json",
output_dir="models_maskrcnn_coco",
input_format="coco",
num_epochs=20,
batch_size=8,
)
geoai.plot_performance_metrics(
history_path=f"models_maskrcnn_coco/training_history.pth",
figsize=(15, 5),
verbose=True,
)
Training with YOLO Format¶
Use input_format="yolo"
and point images_dir
to the root directory containing images/
and labels/
subdirectories:
# Train semantic segmentation model with YOLO format
geoai.train_segmentation_model(
images_dir=yolo_output, # Root directory containing images/ and labels/
labels_dir="", # Not used for YOLO format
output_dir="models_yolo",
input_format="yolo", # Specify YOLO format
architecture="unet",
encoder_name="resnet34",
num_epochs=20,
batch_size=8,
verbose=True,
)
geoai.plot_performance_metrics(
history_path=f"models_yolo/training_history.pth",
figsize=(15, 5),
verbose=True,
)
# Train instance segmentation model with YOLO format
geoai.train_instance_segmentation_model(
images_dir=yolo_output,
labels_dir="",
output_dir="models_maskrcnn_yolo",
input_format="yolo",
num_epochs=20,
batch_size=8,
)
geoai.plot_performance_metrics(
history_path=f"models_maskrcnn_yolo/training_history.pth",
figsize=(15, 5),
verbose=True,
)
Training with Directory Format (Default)¶
The default behavior uses separate images_dir
and labels_dir
directories:
# Standard directory format (default behavior)
geoai.train_segmentation_model(
images_dir=f"{pascal_output}/images",
labels_dir=f"{pascal_output}/labels",
output_dir="models_directory",
# input_format="directory" is the default, can be omitted
architecture="unet",
encoder_name="resnet34",
num_epochs=20,
batch_size=8,
verbose=True,
)
geoai.plot_performance_metrics(
history_path=f"models_directory/training_history.pth",
figsize=(15, 5),
verbose=True,
)
Training Summary¶
Both train_segmentation_model()
and train_instance_segmentation_model()
functions now accept the input_format
parameter to load data in any of these formats:
Input Format | input_format Value |
images_dir |
labels_dir |
---|---|---|---|
COCO | "coco" |
Path to images directory | Path to instances.json |
YOLO | "yolo" |
Root directory with images/ and labels/ |
Empty string "" or not used |
Directory | "directory" (default) |
Path to images directory | Path to labels directory |
Benefits¶
- Maximum Flexibility: Use any annotation format without conversion
- Geospatial Preservation: All formats maintain georeferencing through GeoTIFF tiles
- Framework Compatibility: Export in one format, train in another
- Consistent API: Same training functions work with all formats
Example Workflow¶
- Export training data in COCO format for sharing with collaborators
- Export same data in YOLO format for YOLOv8 experiments
- Train both semantic and instance segmentation models using the same data
- All while maintaining full geospatial context for deployment on satellite imagery
This provides a complete end-to-end workflow for geospatial deep learning!
Using TIMM Models with Multiple Formats¶
The train_timm_segmentation_model()
function also supports all three annotation formats, providing access to a wider range of encoder backbones from the TIMM library (e.g., EfficientNet, ConvNeXt, Swin Transformer):
# Train with TIMM encoder using COCO format
geoai.train_timm_segmentation_model(
images_dir=f"{coco_output}/images",
labels_dir=f"{coco_output}/annotations/instances.json",
output_dir="models_timm_coco",
input_format="coco", # Specify COCO format
encoder_name="efficientnet-b3", # TIMM encoder
architecture="unet",
encoder_weights="imagenet",
num_epochs=20,
batch_size=8,
verbose=True,
)
# Or with YOLO format
geoai.train_timm_segmentation_model(
images_dir=yolo_output,
labels_dir="",
output_dir="models_timm_yolo",
input_format="yolo",
encoder_name="efficientnet-b3",
num_epochs=20,
)