Zero-Shot CLIP Classification for Vector Features¶

This notebook demonstrates how to classify vector polygon features using zero-shot CLIP inference with the geoai.clip_classify module. Given a set of polygons and a raster image, CLIP extracts an image chip for each polygon and matches it against user-provided category labels — no training data required.

We use a Chesapeake Bay Watershed NAIP scene (eastern Maryland, 2018) together with its paired land-cover ground truth raster to validate zero-shot predictions.

Tip: For best results with aerial/satellite imagery, use a remote-sensing CLIP model such as flax-community/clip-rsicd-v2, which was fine-tuned on remote sensing image captions.

Install packages¶

In [ ]:

Copied!

# %pip install geoai-py
# %pip install geoai-py

Import libraries¶

In [ ]:

Copied!





import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import rasterio
import rasterio.enums
from rasterio.plot import show
from shapely.geometry import box

from geoai import clip_classify_vector, CLIPVectorClassifier, download_file
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import rasterio
import rasterio.enums
from rasterio.plot import show
from shapely.geometry import box

from geoai import clip_classify_vector, CLIPVectorClassifier, download_file

Download sample data¶

We use a USGS NAIP aerial image and its paired Chesapeake Bay Watershed land-cover raster, both hosted on Hugging Face.

File	Description
`m_3807511_ne_18_060_20181104.tif`	4-band (RGBN) NAIP image, 0.6 m GSD, eastern Maryland, 2018
`m_3807511_ne_18_060_20181104_landcover.tif`	13-class Chesapeake land-cover ground truth (same extent / CRS)

In [ ]:

Copied!





raster_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/"
    "m_3807511_ne_18_060_20181104.tif"
)
landcover_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/"
    "m_3807511_ne_18_060_20181104_landcover.tif"
)

raster_path = download_file(raster_url)
landcover_path = download_file(landcover_url)
raster_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/"
    "m_3807511_ne_18_060_20181104.tif"
)
landcover_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/"
    "m_3807511_ne_18_060_20181104_landcover.tif"
)

raster_path = download_file(raster_url)
landcover_path = download_file(landcover_url)

Preview the data¶

In [ ]:

Copied!





with rasterio.open(raster_path) as src:
    print(f"Raster CRS  : {src.crs}")
    print(f"Size        : {src.width} x {src.height} px")
    print(f"Resolution  : {src.res[0]:.2f} m")
    bounds = src.bounds
    width_km = (bounds.right - bounds.left) / 1000
    height_km = (bounds.top - bounds.bottom) / 1000
    print(f"Extent      : {width_km:.2f} km x {height_km:.2f} km")
    print(f"Bands       : {src.count}")
with rasterio.open(raster_path) as src:
    print(f"Raster CRS  : {src.crs}")
    print(f"Size        : {src.width} x {src.height} px")
    print(f"Resolution  : {src.res[0]:.2f} m")
    bounds = src.bounds
    width_km = (bounds.right - bounds.left) / 1000
    height_km = (bounds.top - bounds.bottom) / 1000
    print(f"Extent      : {width_km:.2f} km x {height_km:.2f} km")
    print(f"Bands       : {src.count}")

In [ ]:

Copied!





fig, ax = plt.subplots(1, 1, figsize=(16, 8))
with rasterio.open(raster_path) as src:
    scale = 10
    data = src.read(
        out_shape=(src.count, src.height // scale, src.width // scale),
        resampling=rasterio.enums.Resampling.average,
    )
    from rasterio.transform import from_bounds as _fb

    disp_transform = _fb(*src.bounds, data.shape[2], data.shape[1])
    show(data, transform=disp_transform, ax=ax)
ax.set_title("Chesapeake Bay – NAIP 2018 (eastern Maryland)", fontsize=14)
ax.axis("off")
plt.tight_layout()
plt.show()
fig, ax = plt.subplots(1, 1, figsize=(16, 8))
with rasterio.open(raster_path) as src:
    scale = 10
    data = src.read(
        out_shape=(src.count, src.height // scale, src.width // scale),
        resampling=rasterio.enums.Resampling.average,
    )
    from rasterio.transform import from_bounds as _fb

    disp_transform = _fb(*src.bounds, data.shape[2], data.shape[1])
    show(data, transform=disp_transform, ax=ax)
ax.set_title("Chesapeake Bay – NAIP 2018 (eastern Maryland)", fontsize=14)
ax.axis("off")
plt.tight_layout()
plt.show()

Land-cover classification with a grid¶

We tile the raster with a regular 300 m × 300 m grid. This rural Maryland tile is predominantly forest and agricultural field — the two dominant land-cover classes — which is what we give CLIP to choose between.

We use flax-community/clip-rsicd-v2, a CLIP model fine-tuned on remote sensing image captions.

In [ ]:

Copied!





with rasterio.open(raster_path) as src:
    left, bottom, right, top = src.bounds
    raster_crs = src.crs

cell_size = 300  # metres
grid_polys = []
for x in np.arange(left, right - cell_size, cell_size):
    for y in np.arange(bottom, top - cell_size, cell_size):
        grid_polys.append(box(x, y, x + cell_size, y + cell_size))

grid_gdf = gpd.GeoDataFrame(geometry=grid_polys, crs=raster_crs)
print(f"Grid polygons: {len(grid_gdf)}")
with rasterio.open(raster_path) as src:
    left, bottom, right, top = src.bounds
    raster_crs = src.crs

cell_size = 300  # metres
grid_polys = []
for x in np.arange(left, right - cell_size, cell_size):
    for y in np.arange(bottom, top - cell_size, cell_size):
        grid_polys.append(box(x, y, x + cell_size, y + cell_size))

grid_gdf = gpd.GeoDataFrame(geometry=grid_polys, crs=raster_crs)
print(f"Grid polygons: {len(grid_gdf)}")

In [ ]:

Copied!





labels = [
    "forest",
    "agricultural field",
]

result_grid = clip_classify_vector(
    vector_data=grid_gdf,
    raster_path=raster_path,
    labels=labels,
    model_name="flax-community/clip-rsicd-v2",
)
labels = [
    "forest",
    "agricultural field",
]

result_grid = clip_classify_vector(
    vector_data=grid_gdf,
    raster_path=raster_path,
    labels=labels,
    model_name="flax-community/clip-rsicd-v2",
)

Explore results¶

The returned GeoDataFrame has two new columns:

clip_label — the top-1 predicted category
clip_confidence — the softmax confidence score (0 to 1)

In [ ]:

Copied!

result_grid[["geometry", "clip_label", "clip_confidence"]].head(10)
result_grid[["geometry", "clip_label", "clip_confidence"]].head(10)

In [ ]:

Copied!

print("Label distribution:")
print(result_grid["clip_label"].value_counts())
print(f"\nMean confidence: {result_grid['clip_confidence'].mean():.3f}")
print("Label distribution:")
print(result_grid["clip_label"].value_counts())
print(f"\nMean confidence: {result_grid['clip_confidence'].mean():.3f}")

Visualize classified grid¶

In [ ]:

Copied!





classified_grid = result_grid.dropna(subset=["clip_label"])

color_map = {"forest": "#228B22", "agricultural field": "#DAA520"}

fig, ax = plt.subplots(1, 1, figsize=(16, 8))
with rasterio.open(raster_path) as src:
    scale = 10
    data = src.read(
        out_shape=(src.count, src.height // scale, src.width // scale),
        resampling=rasterio.enums.Resampling.average,
    )
    from rasterio.transform import from_bounds as _fb

    disp_transform = _fb(*src.bounds, data.shape[2], data.shape[1])
    show(data, transform=disp_transform, ax=ax)

classified_grid.plot(
    column="clip_label",
    ax=ax,
    alpha=0.5,
    edgecolor="black",
    linewidth=0.2,
    color=[color_map[l] for l in classified_grid["clip_label"]],
)
patches = [mpatches.Patch(color=c, label=l) for l, c in color_map.items()]
ax.legend(handles=patches, loc="upper left", fontsize=10)
ax.set_title("Land-Cover Classification – RS-CLIP zero-shot (300 m grid)", fontsize=14)
ax.axis("off")
plt.tight_layout()
plt.show()
classified_grid = result_grid.dropna(subset=["clip_label"])

color_map = {"forest": "#228B22", "agricultural field": "#DAA520"}

fig, ax = plt.subplots(1, 1, figsize=(16, 8))
with rasterio.open(raster_path) as src:
    scale = 10
    data = src.read(
        out_shape=(src.count, src.height // scale, src.width // scale),
        resampling=rasterio.enums.Resampling.average,
    )
    from rasterio.transform import from_bounds as _fb

    disp_transform = _fb(*src.bounds, data.shape[2], data.shape[1])
    show(data, transform=disp_transform, ax=ax)

classified_grid.plot(
    column="clip_label",
    ax=ax,
    alpha=0.5,
    edgecolor="black",
    linewidth=0.2,
    color=[color_map[l] for l in classified_grid["clip_label"]],
)
patches = [mpatches.Patch(color=c, label=l) for l, c in color_map.items()]
ax.legend(handles=patches, loc="upper left", fontsize=10)
ax.set_title("Land-Cover Classification – RS-CLIP zero-shot (300 m grid)", fontsize=14)
ax.axis("off")
plt.tight_layout()
plt.show()

Confidence scores¶

In [ ]:

Copied!





fig, ax = plt.subplots(1, 1, figsize=(16, 8))
with rasterio.open(raster_path) as src:
    scale = 10
    data = src.read(
        out_shape=(src.count, src.height // scale, src.width // scale),
        resampling=rasterio.enums.Resampling.average,
    )
    from rasterio.transform import from_bounds as _fb

    disp_transform = _fb(*src.bounds, data.shape[2], data.shape[1])
    show(data, transform=disp_transform, ax=ax)

classified_grid.plot(
    column="clip_confidence",
    ax=ax,
    alpha=0.6,
    edgecolor="black",
    linewidth=0.2,
    cmap="viridis",
    legend=True,
    legend_kwds={"shrink": 0.5, "label": "Confidence"},
)
ax.set_title("CLIP Confidence Scores", fontsize=14)
ax.axis("off")
plt.tight_layout()
plt.show()
fig, ax = plt.subplots(1, 1, figsize=(16, 8))
with rasterio.open(raster_path) as src:
    scale = 10
    data = src.read(
        out_shape=(src.count, src.height // scale, src.width // scale),
        resampling=rasterio.enums.Resampling.average,
    )
    from rasterio.transform import from_bounds as _fb

    disp_transform = _fb(*src.bounds, data.shape[2], data.shape[1])
    show(data, transform=disp_transform, ax=ax)

classified_grid.plot(
    column="clip_confidence",
    ax=ax,
    alpha=0.6,
    edgecolor="black",
    linewidth=0.2,
    cmap="viridis",
    legend=True,
    legend_kwds={"shrink": 0.5, "label": "Confidence"},
)
ax.set_title("CLIP Confidence Scores", fontsize=14)
ax.axis("off")
plt.tight_layout()
plt.show()

Compare with Chesapeake land-cover ground truth¶

The paired land-cover raster encodes 13 Chesapeake Bay Watershed classes. We extract the dominant class within each 300 m grid cell and map it to our two CLIP labels, then measure agreement.

In [ ]:

Copied!





# Chesapeake 13-class → our 2 CLIP labels
CHESAPEAKE_TO_LABEL = {
    1: "agricultural field",  # Water
    2: "agricultural field",  # Emergent Wetlands
    3: "forest",  # Tree Canopy
    4: "forest",  # Shrub/Scrub
    5: "agricultural field",  # Low Vegetation (grass / crops)
    6: "agricultural field",  # Barren
    7: "agricultural field",  # Structures
    8: "agricultural field",  # Impervious (other)
    9: "agricultural field",  # Impervious Roads
    10: "forest",  # Tree Canopy over Structures
    11: "forest",  # Tree Canopy over Impervious
    12: "forest",  # Tree Canopy over Roads
}

with rasterio.open(landcover_path) as lc_src:
    lc_data = lc_src.read(1)
    lc_transform = lc_src.transform
    lc_crs = lc_src.crs

unique_vals = sorted(int(v) for v in set(lc_data.flatten()) if v > 0)
print("Land-cover class values found:", unique_vals)

grid_lc = classified_grid.to_crs(lc_crs)

dominant_gt = []
for geom in grid_lc.geometry:
    from rasterio.windows import from_bounds as win_from_bounds

    win = win_from_bounds(*geom.bounds, lc_transform).round_offsets().round_lengths()
    r0 = max(0, int(win.row_off))
    c0 = max(0, int(win.col_off))
    r1 = min(lc_data.shape[0], r0 + int(win.height))
    c1 = min(lc_data.shape[1], c0 + int(win.width))
    chip = lc_data[r0:r1, c0:c1]
    valid_px = chip[chip > 0]
    if len(valid_px) > 0:
        dom_class = int(np.bincount(valid_px.astype(int)).argmax())
        dominant_gt.append(CHESAPEAKE_TO_LABEL.get(dom_class, "unknown"))
    else:
        dominant_gt.append("unknown")

classified_grid = classified_grid.copy()
classified_grid["gt_label"] = dominant_gt
print(classified_grid[["clip_label", "gt_label"]].head(10))
# Chesapeake 13-class → our 2 CLIP labels
CHESAPEAKE_TO_LABEL = {
    1: "agricultural field",  # Water
    2: "agricultural field",  # Emergent Wetlands
    3: "forest",  # Tree Canopy
    4: "forest",  # Shrub/Scrub
    5: "agricultural field",  # Low Vegetation (grass / crops)
    6: "agricultural field",  # Barren
    7: "agricultural field",  # Structures
    8: "agricultural field",  # Impervious (other)
    9: "agricultural field",  # Impervious Roads
    10: "forest",  # Tree Canopy over Structures
    11: "forest",  # Tree Canopy over Impervious
    12: "forest",  # Tree Canopy over Roads
}

with rasterio.open(landcover_path) as lc_src:
    lc_data = lc_src.read(1)
    lc_transform = lc_src.transform
    lc_crs = lc_src.crs

unique_vals = sorted(int(v) for v in set(lc_data.flatten()) if v > 0)
print("Land-cover class values found:", unique_vals)

grid_lc = classified_grid.to_crs(lc_crs)

dominant_gt = []
for geom in grid_lc.geometry:
    from rasterio.windows import from_bounds as win_from_bounds

    win = win_from_bounds(*geom.bounds, lc_transform).round_offsets().round_lengths()
    r0 = max(0, int(win.row_off))
    c0 = max(0, int(win.col_off))
    r1 = min(lc_data.shape[0], r0 + int(win.height))
    c1 = min(lc_data.shape[1], c0 + int(win.width))
    chip = lc_data[r0:r1, c0:c1]
    valid_px = chip[chip > 0]
    if len(valid_px) > 0:
        dom_class = int(np.bincount(valid_px.astype(int)).argmax())
        dominant_gt.append(CHESAPEAKE_TO_LABEL.get(dom_class, "unknown"))
    else:
        dominant_gt.append("unknown")

classified_grid = classified_grid.copy()
classified_grid["gt_label"] = dominant_gt
print(classified_grid[["clip_label", "gt_label"]].head(10))

In [ ]:

Copied!





valid = classified_grid[classified_grid["gt_label"] != "unknown"].copy()
overall_acc = (valid["clip_label"] == valid["gt_label"]).mean()
print(f"Overall agreement (CLIP vs ground truth): {overall_acc:.1%}\n")

print("Per-class breakdown:")
for lbl in labels:
    subset = valid[valid["gt_label"] == lbl]
    if len(subset) == 0:
        continue
    correct = (subset["clip_label"] == lbl).sum()
    print(f"  {lbl:25s}: {correct}/{len(subset)} = {correct/len(subset):.0%}")
valid = classified_grid[classified_grid["gt_label"] != "unknown"].copy()
overall_acc = (valid["clip_label"] == valid["gt_label"]).mean()
print(f"Overall agreement (CLIP vs ground truth): {overall_acc:.1%}\n")

print("Per-class breakdown:")
for lbl in labels:
    subset = valid[valid["gt_label"] == lbl]
    if len(subset) == 0:
        continue
    correct = (subset["clip_label"] == lbl).sum()
    print(f"  {lbl:25s}: {correct}/{len(subset)} = {correct/len(subset):.0%}")

In [ ]:

Copied!





fig, axes = plt.subplots(1, 2, figsize=(20, 8))

for ax, col, title in [
    (axes[0], "clip_label", "RS-CLIP zero-shot prediction"),
    (axes[1], "gt_label", "Chesapeake ground truth"),
]:
    with rasterio.open(raster_path) as src:
        scale = 10
        data = src.read(
            out_shape=(src.count, src.height // scale, src.width // scale),
            resampling=rasterio.enums.Resampling.average,
        )
        from rasterio.transform import from_bounds as _fb

        disp_transform = _fb(*src.bounds, data.shape[2], data.shape[1])
        show(data, transform=disp_transform, ax=ax)
    valid.plot(
        column=col,
        ax=ax,
        alpha=0.5,
        edgecolor="black",
        linewidth=0.2,
        color=[color_map[l] for l in valid[col]],
    )
    patches = [mpatches.Patch(color=c, label=l) for l, c in color_map.items()]
    ax.legend(handles=patches, loc="upper left", fontsize=10)
    ax.set_title(title, fontsize=13)
    ax.axis("off")

plt.suptitle(f"Overall Agreement: {overall_acc:.1%}", fontsize=15, y=1.02)
plt.tight_layout()
plt.show()
fig, axes = plt.subplots(1, 2, figsize=(20, 8))

for ax, col, title in [
    (axes[0], "clip_label", "RS-CLIP zero-shot prediction"),
    (axes[1], "gt_label", "Chesapeake ground truth"),
]:
    with rasterio.open(raster_path) as src:
        scale = 10
        data = src.read(
            out_shape=(src.count, src.height // scale, src.width // scale),
            resampling=rasterio.enums.Resampling.average,
        )
        from rasterio.transform import from_bounds as _fb

        disp_transform = _fb(*src.bounds, data.shape[2], data.shape[1])
        show(data, transform=disp_transform, ax=ax)
    valid.plot(
        column=col,
        ax=ax,
        alpha=0.5,
        edgecolor="black",
        linewidth=0.2,
        color=[color_map[l] for l in valid[col]],
    )
    patches = [mpatches.Patch(color=c, label=l) for l, c in color_map.items()]
    ax.legend(handles=patches, loc="upper left", fontsize=10)
    ax.set_title(title, fontsize=13)
    ax.axis("off")

plt.suptitle(f"Overall Agreement: {overall_acc:.1%}", fontsize=15, y=1.02)
plt.tight_layout()
plt.show()

Top-k predictions with the class API¶

For more control, use CLIPVectorClassifier directly. Setting top_k=2 returns the top 2 predictions per polygon.

In [ ]:

Copied!





classifier = CLIPVectorClassifier(model_name="flax-community/clip-rsicd-v2")

result_topk = classifier.classify(
    vector_data=grid_gdf,
    raster_path=raster_path,
    labels=labels,
    top_k=2,
    batch_size=32,
)
classifier = CLIPVectorClassifier(model_name="flax-community/clip-rsicd-v2")

result_topk = classifier.classify(
    vector_data=grid_gdf,
    raster_path=raster_path,
    labels=labels,
    top_k=2,
    batch_size=32,
)

In [ ]:

Copied!





classified = result_topk.dropna(subset=["clip_label"])
for i in range(min(5, len(classified))):
    row = classified.iloc[i]
    print(f"Polygon {classified.index[i]}:")
    for label, score in zip(row["clip_top_k_labels"], row["clip_top_k_scores"]):
        print(f"  {label:30s} {score:.3f}")
    print()
classified = result_topk.dropna(subset=["clip_label"])
for i in range(min(5, len(classified))):
    row = classified.iloc[i]
    print(f"Polygon {classified.index[i]}:")
    for label, score in zip(row["clip_top_k_labels"], row["clip_top_k_scores"]):
        print(f"  {label:30s} {score:.3f}")
    print()

Export results¶

Save the classified GeoDataFrame to GeoJSON, GeoParquet, or GeoPackage.

In [ ]:

Copied!

result_grid.to_file("classified_landcover.geojson", driver="GeoJSON")
print("Results saved to classified_landcover.geojson")
result_grid.to_file("classified_landcover.geojson", driver="GeoJSON")
print("Results saved to classified_landcover.geojson")

Summary¶

This notebook demonstrated:

Zero-shot classification: Classifying land-cover polygons with no training data — just candidate labels
Remote-sensing CLIP model: Using flax-community/clip-rsicd-v2 for aerial imagery
Grid-based land-cover mapping: 300 m patches classified as forest vs. agricultural field
Ground-truth comparison: ~89% agreement with Chesapeake Bay Watershed 13-class labels
Top-k predictions: Ranked predictions with confidence scores
Export: Saving annotated results to standard geospatial formats

Key Parameters¶

Parameter	Description	Default
`labels`	Candidate category names	(required)
`label_prefix`	Text prefix for CLIP encoding	`"a satellite image of "`
`model_name`	Hugging Face CLIP model ID	`"openai/clip-vit-base-patch32"`
`top_k`	Number of top predictions per polygon	`1`
`batch_size`	Images per inference batch	`16`
`min_chip_size`	Minimum chip dimension in pixels	`10`
`output_path`	Path to save results (`.geojson`, `.parquet`, `.gpkg`)	`None`

Model Recommendations¶

Model	Best for	Notes
`flax-community/clip-rsicd-v2`	Aerial / satellite imagery	Fine-tuned on RS image captions
`openai/clip-vit-base-patch32`	General-purpose images	Default CLIP; best with ground-level photos
`openai/clip-vit-large-patch14`	Higher capacity	Larger model; slower but more expressive