A deep dive into GeoParquet Downloader QGIS Plug-in
Last month I released my first QGIS plug-in, and promised I’d write an in-depth post about it. I’ll give an overview and dig into some of the motivations, and then I’ll put the details of my experience of coding with AI in its own follow up post.
data:image/s3,"s3://crabby-images/40f1f/40f1f8e381fb0ef51f1a957f02d53dbdd72287b7" alt=""
Background
I’ve been a long time QGIS user, though am very far from an expert — I mostly open different files and visualize them. I’ve never been able to afford an Esri license, so it’s QGIS all the way for me. And I’ve always loved the plugin ecosystem: the fact that many people worldwide are adding all kinds of functionality so that anyone can customize it to their needs is just awesome, and a testament to the power of open source. There’s still things Esri can do better, but we’re now at the point where there’s a lot of things QGIS can do better.
I also recently have ‘become a coder’ again, thanks to the power of AI tools. I’ll dive into more of the experience in my next post, but it meant that I could tackle something like a new QGIS plugin as a (long) weekend project. I started it just to see if I could, and things kept working, so I kept pushing on.
Motivations
One of my latest missions is to advance GeoParquet as a format to fulfill the promise of cloud-native vector data, enabling organizations to get most all the functionality of a Web Features Service like GeoServer by simply putting up their data as GeoParquet on a cloud bucket. I was so excited when Overture Maps embraced the format, but they also got a good bit of pushback for not having a ‘download’ button and using traditional data formats.
I was confident that if things evolved right it shouldn’t be hard to give traditional GIS users an even better experience of getting the data, since you can easily stream just what you need and transform it on the fly. A big shout out to Jake Wasserman and Overture for really stepping in to help push forward the evolution, proposing the key bbox covering and upgrading Overture to fully implement it.
A few months ago it became possible to use my favorite new geospatial tool, DuckDB (or a number of other tools), with any Overture data layer to select a spatial subset of the whole world and download just the area you cared about in tens of seconds and often faster.
data:image/s3,"s3://crabby-images/60aec/60aec4172bf04b086cb60d93ab6172b062f8a426" alt=""
Getting Overture data today
Overture has great docs for using DuckDB, and they also built a nice command-line tool, but you still have to be tech-oriented and inclined to use a terminal. They did also build a nice Explorer app, that lets you download small amounts of data. But if you wanted more than a few megabytes worth of data to load up in QGIS there still weren’t great options for those who do want to learn to use a terminal and CLI tools.
So I decided to see how far the LLM coding tools had come and figure out if I’d be able to write a QGIS plugin. QGIS development had always intimidated me: I think I had one class in college that did desktop UI’s and I found it hard to grok. But my first attempt got something on my screen and within twenty minutes I had a reasonable kernel of functionality. I ended up able to get the vast majority of it working as I wanted to in a few days during the week of Thanksgiving — coding on the plane and sneaking in mini-sprints between family time.
So my goal was to make it as easy as possible for any QGIS user to download Overture, and indeed to not force GeoParquet on them: with the plugin you can easily request data as a GeoPackage. And I also wanted to make it easy to download any GeoParquet data, so that the tool isn’t just for Overture data, but enables anyone distributing their data as GeoParquet to easily enable QGIS users to get their data.
Plugin Overview
This animated gif probably gives the quickest overview to understand what the plugin enables:
data:image/s3,"s3://crabby-images/381e8/381e8d76f0abad8f94523d0fc58238c338766b53" alt=""
The idea is to make it simple to just download GeoParquet data into a local copy in QGIS. It currently just uses the bounds of the viewport, but I hope a future version can give more options to draw a geometry or use other QGIS layers (contributors welcome!).
Currently there are a few pre-set layers. All of Overture is obviously available, and it’s got a dedicated button to open its panel. And Source Cooperative is easily the other largest single collection of GeoParquet files (and if you have open data you’d like to make available on Source then you can likely host it there for free — just reach out!). I still need to add more Source Cooperative files, indeed I hope to make a complete fiboa & Fields of The World section, as we’ve got a lot of data up there.
data:image/s3,"s3://crabby-images/e2352/e23526ca51eedd1e909e0d24568b799b807cda93" alt=""
And after the initial release I added a Hugging Face section, which for now is just the Foursquare OS Places dataset, but it seems like more will be added (I contemplated adding the various embeddings datasets but wasn’t sure of the practical use case of making it easier to download). And you can also just enter any custom URL to a GeoParquet online.
Right now you can download data as GeoParquet, DuckDB and GeoPackage. GeoPackage will always work, as all QGIS installations support it. GeoParquet should work on most more recent installations, though OS/X is less straightforward (But I am working with opengis.ch to try to make this better!). DuckDB right now won’t load in QGIS, but I’m starting to collaborate with QDuckDB plugin team and I think I should be able to render the results of a DuckDB download if their plugin is installed.
data:image/s3,"s3://crabby-images/273e8/273e8d153c18209c818b4c038876d480e7099fa0" alt=""
The awesome QDuckDB plugin
And that team also deserves a shout-out. Their plugin was the one I looked at the most for how to structure things, and they are working to solve a core issue that I need for the plugin to work well — install DuckDB. DuckDB is the core engine that powers the entire thing, as everything I did was just wrappers to all of its amazing functionality.
Installing the plugin
If this seems like something that’s useful to you it should be pretty easy to install the plugin. Just open the plugin manager and search for ‘GeoParquet’.
data:image/s3,"s3://crabby-images/d2c35/d2c35ef45df600e8d77ef82c10e239129f0e668b" alt=""
I think the installation process is now pretty good. Matt Travis, the first outside contributor to the plugin, worked to get it to automatically install. I think it works most of the time, but I’m not 100% sure — it attempts to automatically use ‘pip’ to install DuckDB, but I’d guess that’s sometimes blocked. My hope is GDAL 3.11 with ADBC support will enable a more ‘native’ DuckDB experience in QGIS, and that we’ll be able to include it as a core dependency.
data:image/s3,"s3://crabby-images/c48be/c48be6d190770911bc961d7c7ebe8bb801dd53fd" alt=""
ADBC GDAL/OGR docs — coming in 3.11!
Future Features
It is on the list for the plugin to add support for more formats (which should be a great first issue for any potential contributors) — FlatGeobuf is the top of my list, and File Geodatabase also sounds interesting. If there’s other formats desired just add them to the issue. I’m pretty opposed to adding Shapefile since it comes with so many limitations that I think will get in the way of using Overture and other data, but if someone wants to make a PR and really needs it I’m sure I’d accept it.
I’ve got a number of ideas in the issue tracker, but I’d love to hear from others what they’d like to see. I don’t see this being a huge project, and indeed I could see one route of ‘success’ being that this type of functionality is more incorporated into the QGIS core. It’s a bit of a different workflow, that I actually think would also be interesting with traditional geospatial servers (WFS, ArcGIS Feature Service, etc). Instead of having QGIS try to stream data on each screen change just have the user manually ‘check out’ the data that they want — download it and then display / use that local version.
The top future ideas that I’m thinking about are:
- User configurable data sources, instead of me maintaining the list and manually updating when new ones come. I could see making it so an organization can make their own ‘tab’ and even button that has a bunch of layers that are relevant to their users. And to make it easy for a user to add their own layers, instead of having to enter each manually.
- More download options beyond just the viewport, as mentioned above.
- Better integration with STAC, though that will need data providers to implement. But ideally you could point at a STAC catalog and get the list of Geoparquet layers to download.
- Ability to point at a GeoParquet file and see how well it implements (in-progress) GeoParquet best practices. I started a library to help do this, so hope to finish that and wrap it in this QGIS plugin (or maybe it will be a standalone plugin).
What’s Next
I’d love more help on this project, and my hope is to make it an experiment of AI-enabled open source. Since I wrote 99% of it with AI coding tools I’m very happy to have all the contributions be similarly made, so if you’ve been wondering about how it all works and want a practical introduction that creates code for others to use then please take an issue!
I had thought I would also share more about my experience of using AI coding tools to create it, but since this post is already quite long I’ll break it up into its own. I’ve also got a number of insights into the state of public GeoParquet files and how we can improve the ecosystem of public data, but I’ll also save that for its own post. So stay tuned! I hope to publish both of those posts soon.
Our blog is open source. You can suggest edits on GitHub.