Geospatial Image Analysis with Moondream Vision Language Model¶

This notebook demonstrates how to use the Moondream vision language model for geospatial image analysis. Moondream is a small but powerful vision language model that can:

Caption: Generate descriptions of satellite/aerial imagery
Query: Answer questions about image content
Detect: Locate objects with bounding boxes
Point: Find specific objects and return their coordinates

The GeoAI integration adds support for GeoTIFF files with automatic georeferencing of outputs.

Install packages¶

Uncomment the following line to install the required packages.

In [ ]:

Copied!

# %pip install -U geoai-py
# %pip install -U geoai-py

Import libraries¶

In [ ]:

Copied!

import leafmap
from geoai import MoondreamGeo
import geoai
import leafmap
from geoai import MoondreamGeo
import geoai

Download sample data¶

We'll use a sample GeoTIFF image of a parking lot with buildings and vegetation.

In [ ]:

Copied!

url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/parking_lot.tif"
image_path = geoai.download_file(url)
image_path
url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/parking_lot.tif"
image_path = geoai.download_file(url)
image_path

Visualize the image¶

Let's first visualize the sample image on an interactive map.

In [ ]:

Copied!

m = leafmap.Map()
m.add_raster(image_path, layer_name="Satellite Image")
m
m = leafmap.Map()
m.add_raster(image_path, layer_name="Satellite Image")
m

Initialize the Moondream processor¶

Load the Moondream2 model. The first time you run this, the model will be downloaded from HuggingFace (~3.7GB).

Note: For reproducibility, we specify a specific model revision date.

In [ ]:

Copied!

processor = MoondreamGeo(
    model_name="vikhyatk/moondream2", revision="2025-06-21", device="cuda"
)
processor = MoondreamGeo(
    model_name="vikhyatk/moondream2", revision="2025-06-21", device="cuda"
)

Image Captioning¶

Generate a description of the satellite image. The length parameter controls the detail level: "short", "normal", or "long".

In [ ]:

Copied!

result = processor.caption(image_path, length="normal")
print(result["caption"])
result = processor.caption(image_path, length="normal")
print(result["caption"])

Visual Question Answering¶

Ask questions about the image content and get natural language answers.

In [ ]:

Copied!

result = processor.query("How many buildings are in the image?", image_path)
print(result["answer"])
result = processor.query("How many buildings are in the image?", image_path)
print(result["answer"])

In [ ]:

Copied!

result = processor.query("What are the building roof colors?", image_path)
print(result["answer"])
result = processor.query("What are the building roof colors?", image_path)
print(result["answer"])

Object Detection¶

Detect objects and get bounding boxes. Results are automatically georeferenced when using GeoTIFF input.

Detect buildings¶

In [ ]:

Copied!

result = processor.detect(image_path, "building", output_path="buildings.geojson")
print(f"Detected {len(result['objects'])} buildings")
result = processor.detect(image_path, "building", output_path="buildings.geojson")
print(f"Detected {len(result['objects'])} buildings")

View the GeoDataFrame with georeferenced bounding boxes:

In [ ]:

Copied!

result["gdf"]
result["gdf"]

Add the detected buildings to the map:

In [ ]:

Copied!

m.add_gdf(result["gdf"], layer_name="buildings")
m
m.add_gdf(result["gdf"], layer_name="buildings")
m

Point Localization¶

Find object center points instead of bounding boxes. Useful for counting and locating objects.

Find building centroids¶

In [ ]:

Copied!





result = processor.point(
    image_path, "building", output_path="building_centroids.geojson"
)
print(f"Found {len(result['points'])} building centroids")
result = processor.point(
    image_path, "building", output_path="building_centroids.geojson"
)
print(f"Found {len(result['points'])} building centroids")

In [ ]:

Copied!

m.add_gdf(result["gdf"], layer_name="building_centroids")
m.add_gdf(result["gdf"], layer_name="building_centroids")

Detect trees¶

In [ ]:

Copied!

result = processor.detect(image_path, "tree", output_path="trees.geojson")
print(f"Detected {len(result['objects'])} trees")
result = processor.detect(image_path, "tree", output_path="trees.geojson")
print(f"Detected {len(result['objects'])} trees")

In [ ]:

Copied!

m.add_gdf(result["gdf"], layer_name="trees")
m.add_gdf(result["gdf"], layer_name="trees")

Find tree centroids¶

In [ ]:

Copied!

result = processor.point(image_path, "trees", output_path="tree_centroids.geojson")
print(f"Found {len(result['points'])} tree centroids")
result = processor.point(image_path, "trees", output_path="tree_centroids.geojson")
print(f"Found {len(result['points'])} tree centroids")

In [ ]:

Copied!

m.add_gdf(result["gdf"], layer_name="tree_centroids")
m.add_gdf(result["gdf"], layer_name="tree_centroids")

Display final map¶

View all detected objects and centroids on the map.

In [ ]:

Copied!

m
m