Zarr Sprint Recap

Zarr Logo

On February 7th and 8th, in collaboration with Earthmover, we held a Zarr sprint at the LEAP NSF Science and Technology Center at Columbia University in New York City. A wide array of contributors from government, academia, and industry came to the sprint, including people from NASA, CarbonPlan, Development Seed, Earthmover, Upstream Tech, Columbia University, Hydronos Labs, and Fused.

In this post, I give a very brief overview of each of the topic areas we discussed. More importantly, I link out to the open issues, pull requests, discussions, and meeting opportunities identified at the sprint for continued development.

Zarr Specification

The purpose of the sprint was to continue development of the Zarr specification. Zarr is a chunked, compressed, N-dimensional array format primarily designed for storing large numerical arrays efficiently. It is commonly used in scientific computing, geospatial, bioimaging, and data analysis contexts.

Enhancements to the Zarr specification that we discussed at the spring are described below.

Chunk Manifest / Virtual Concatenation

In this breakout session, the group engaged in a long technical discussion about a way to define arrays in a Zarr store as concatenations of other arrays in the store. You can read a draft Zarr Enhancement Proposal (ZEP) of the discussion here. Shoutout to Tom Nicholas for documenting this so well!


Joe Hamman led a group focusing on enabling support for V3 in Zarr-Python. This was part of an ongoing effort working toward Zarr-Python version 3.0 (roadmap).

The focus of this group was on closing outstanding issues on the roadmap and testing the development branch in common geospatial applications. Zarr-Python has traditionally been the canonical implementation of Zarr, and it is therefore a current priority since this effort delivers immediate impact to the largest swath of users, including those that use Zarr through downstream libraries (e.g. Xarray, Dask, Anndata, etc.).

Geospatial Multi-Scales/Pyramids

In the Zarr pyramids breakout group, Thomas Maschler and Max Jones discussed the motivations for following the OGC TileMatrixSet 2.0 specification within the GeoZarr specification, which will be shared as a new issue to supersede GeoZarr Issue #30. They also discussed reading those TMS into rio-tiler using Xarray and started a refactor of ndpyramid to support the TMS specification.

Alternate backend for reading remote Zarr stores

Kyle Barron worked on a prototype for an alternate store for Zarr Python using new async Python bindings to Rust’s object-store project. You can see a prototype of object-store-based store implementation at zarr-python#1661.

GeoZarr Specification

Throughout the sprint, the GeoZarr focus group, led by Brianna Pagán, worked on examining the interoperability of GeoZarr and different existing tooling and store support. You can see the table here.

One of the biggest realizations was that ArcGIS has a lot of existing support for Zarr, which is really exciting news! For other tools, there is still work to be done, especially for GeoTIFF-like data being stored in Zarr, which translates to updates needed within the GeoZarr specification. For example, there are functionality issues tied to support or lack thereof for specific compression algorithms. The GeoZarr Steering Working Group is working on providing a list of supported compressions for commonly used tools. There is also work to be done on specifying the organizational structure of GeoZarr and understanding where requirements from CF diverge from the Zarr data model. For this, we are focusing efforts on involving folks with CF expertise to guide these conversations.

If you are interested in helping out, please join the next bi-weekly GeoZarr meeting every other Wednesday at 11 EST. The next will be March 20th and you can find the invite on the Zarr calendar or join directly from this link. Check out the notes from past meetings at the hackmd.

HTTP Extension

A final priority of the Zarr Sprint was to get efforts rolling on how to better visualize Zarr on the web.

Kevin Booth is the lead on this effort. Currently, he has added some sidecar files with links to reference parent, child, and root relationships within a Zarr store that would allow a client to be able traverse a Zarr store without needing an object storage interface with list capabilities. To demonstrate how this could work, Xavier Nogueira created traverzarr which allows to navigate a Zarr store as if it were in a file system. A more detailed blog post with updates on this work to come in the next week.

This work continues to be worked on after the sprint. In collaboration with the Zarr community, the Cloud-Native Geospatial Foundation has started holding bi-weekly meetings to hack on this work. The next will be held at 12 EST on March 14th. If you would like to be involved in this, email to be added to the meeting invite, or find the meeting link at the Zarr calendar here.

It was great to get a group of people together to spend some dedicated time on Zarr, and plenty of work remains. Please help keep the momentum of these efforts going by responding to any GitHub Pull Requests, Issues, or Discussions that you have opinions on and joining any of the established Zarr meetings that are of interest to you.

Our blog is open source. You can suggest edits on GitHub.

Connect with us: