Geospatial Image Analysis with Moondream Vision Language Model¶
This notebook demonstrates how to use the Moondream vision language model for geospatial image analysis. Moondream is a small but powerful vision language model that can:
- Caption: Generate descriptions of satellite/aerial imagery
- Query: Answer questions about image content
- Detect: Locate objects with bounding boxes
- Point: Find specific objects and return their coordinates
The GeoAI integration adds support for GeoTIFF files with automatic georeferencing of outputs.
Install packages¶
Uncomment the following line to install the required packages.
# %pip install -U geoai-py
Import libraries¶
import leafmap
from geoai import MoondreamGeo
import geoai
Download sample data¶
We'll use a sample GeoTIFF image of a parking lot with buildings and vegetation.
url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/parking_lot.tif"
image_path = geoai.download_file(url)
image_path
Visualize the image¶
Let's first visualize the sample image on an interactive map.
m = leafmap.Map()
m.add_raster(image_path, layer_name="Satellite Image")
m
Initialize the Moondream processor¶
Load the Moondream2 model. The first time you run this, the model will be downloaded from HuggingFace (~3.7GB).
Note: For reproducibility, we specify a specific model revision date.
processor = MoondreamGeo(
model_name="vikhyatk/moondream2", revision="2025-06-21", device="cuda"
)
Image Captioning¶
Generate a description of the satellite image. The length parameter controls the detail level: "short", "normal", or "long".
result = processor.caption(image_path, length="normal")
print(result["caption"])
Visual Question Answering¶
Ask questions about the image content and get natural language answers.
result = processor.query("How many buildings are in the image?", image_path)
print(result["answer"])
result = processor.query("What are the building roof colors?", image_path)
print(result["answer"])
result = processor.detect(image_path, "building", output_path="buildings.geojson")
print(f"Detected {len(result['objects'])} buildings")
View the GeoDataFrame with georeferenced bounding boxes:
result["gdf"]
Add the detected buildings to the map:
m.add_gdf(result["gdf"], layer_name="buildings")
m
result = processor.point(
image_path, "building", output_path="building_centroids.geojson"
)
print(f"Found {len(result['points'])} building centroids")
m.add_gdf(result["gdf"], layer_name="building_centroids")
Detect trees¶
result = processor.detect(image_path, "tree", output_path="trees.geojson")
print(f"Detected {len(result['objects'])} trees")
m.add_gdf(result["gdf"], layer_name="trees")
Find tree centroids¶
result = processor.point(image_path, "trees", output_path="tree_centroids.geojson")
print(f"Found {len(result['points'])} tree centroids")
m.add_gdf(result["gdf"], layer_name="tree_centroids")
Display final map¶
View all detected objects and centroids on the map.
m