Why Does Cloud-Native Geospatial Matter to GIS Professionals?

By Linda Stevens, Founder, Spatial Spirits & Bill Dollins, Founder and President, Cercana Systems LLC
06 Feb 2025

Cloud-Native Geospatial represents a significant shift in how geospatial data is processed, stored, and analyzed. This approach offers GIS Professionals greater scalability, allowing them to handle massive datasets without relying on traditional and often limited on-premise infrastructure. Additionally, the cloud-native approach enhances collaboration by enabling multiple users to access and work on shared datasets in real-time, regardless of their physical location, helping to eliminate data silos. This level of accessibility and flexibility empowers GIS professionals to deliver faster results, streamline workflows, and adapt to the growing demands of modern geospatial applications.

What is Cloud-Native Geospatial?

Cloud-native geospatial refers to the practice of leveraging cloud-based technologies and architectures to handle geospatial data in the cloud, ideally without migrating it between heavy/purpose-built storage and file formats. This approach focuses on scalability, flexibility, and integration with modern cloud ecosystems to meet the growing demands for processing and analyzing spatial data. By adopting cloud-native principles, geospatial applications can take advantage of distributed computing, serverless architectures, high-capacity storage, and managed services offered by cloud providers, reducing operational overhead while improving performance.

Cloud-native geospatial enables the efficient use of large datasets by providing direct access to the section of the data you need without expensive clip operations. Additionally, complex geospatial processes can take advantage of distributed computing architectures, reducing the linear nature of traditional GIS workflows.

Examples of Cloud-Native Geospatial Data Formats

One prominent example of a cloud-native geospatial data format is Cloud Optimized GeoTIFF (COG). COGs are specifically designed for efficient access and use in a cloud environment, allowing users to retrieve only the portions of data they need rather than downloading entire files. This makes them ideal for handling large raster datasets, such as satellite imagery or digital elevation models.

Another widely adopted format is Zarr, which is commonly used for multidimensional array data. Zarr enables parallel and random-access data reading, making it particularly suitable for large-scale climate and weather datasets. Combined with cloud storage, Zarr allows researchers and professionals to collaborate on complex analyses without managing extensive, localized data copies. Parquet and GeoParquet are examples of cloud-native formats that simplify working with tabular geospatial data. Column-based storage - instead of the row-based approach of traditional GIS file formats - provides efficient compression, and these formats are well-suited for fast access and analysis of large vector datasets, including geometries and attribute tables. These formats, among others, help to optimize geospatial workflows for cloud architectures, promoting efficiency, interoperability, and innovation.

How to use Cloud-native geospatial formats in your GIS solution.

Integrating cloud-native geospatial formats into your GIS solution starts with selecting the appropriate tools and frameworks that align with your needs. Fortunately, many widely-used GIS platforms like ArcGIS and QGIS already support cloud-native formats because core geospatial libraries such as GDAL, geopandas, and R’s raster package support COG, Zarr, and GeoParquet. As support for cloud-native formats continues to grow, more GIS platforms will likely adopt these formats, allowing seamless integration of cloud-based data into your workflows, enabling you to take full advantage of the scalability and accessibility provided by cloud storage.

For example, when working with COGs, GDAL can access only the required portions of the images for your workflow rather than downloading full images, significantly reducing bandwidth and computation costs. Similarly, tools like Zarr-Python or xarray are excellent options for handling Zarr-formatted multidimensional datasets, offering powerful data analysis and visualization capabilities in cloud-centric environments. Zarr is also supported as a multi-dimensional raster format in ArcGIS.

Another critical step is configuring your GIS infrastructure to use cloud storage services effectively, such as Amazon S3, Google Cloud Storage, or Azure Blob Storage. These platforms offer APIs and SDKs for streamlined integration, making it easier to incorporate them into geospatial workflows to manage and process large datasets in real-time. Using cloud-based object storage enables the use of compute resources in the cloud, which can decentralize your geospatial workflows and offload complex processing that can tax traditional desktop and server resources.

Adopting these practices enables you to fully leverage the capabilities of cloud-native geospatial formats, ensuring that your GIS solution remains well-positioned to meet the dynamic requirements of modern geospatial applications.

What database technology supports Cloud-Native Geospatial formats?

Several database technologies are well-suited for supporting cloud-native geospatial formats, providing efficient querying, storage, and spatial data analysis. PostgreSQL with the PostGIS extension is a widely recognized solution, offering robust support for geospatial data types, functions, and compatibility with formats such as GeoJSON and WKT. While GeoPackage can be imported and exported using tools integrated with PostGIS, full compatibility may require additional processing. The Crunchy Bridge for Analytics also enables direct access to cloud-native formats using PostgreSQL.

Cloud-native query platforms such as Amazon Aurora (PostgreSQL-compatible), Google BigQuery, Snowflake, and Azure Cosmos DB deliver scalable solutions for analyzing geospatial data, often integrating seamlessly with cloud storage systems. These platforms offer native support for specific geospatial data types and functions, making them particularly suitable for modern geospatial applications.

For large-scale geospatial data processing, joins, and operations in distributed environments, technologies like Apache Spark, with geospatial add-ons such as Apache Sedona (formerly GeoSpark), are highly effective. Apache Sedona enables parallel processing and advanced spatial analysis, ensuring 10X higher performance for complex queries. Similarly, cloud-native data warehouses or lakehouses such as Databricks (which uses Apache Spark) provide geospatial SQL capabilities, facilitating the analysis of formats like WKT and WKB alongside traditional datasets. The team behind Apache Sedona also offers Wherobots for serverless data warehouse/lakehouse compute built with a distributed computing architecture (Spark+Sedona) for highly optimized geospatial data workloads. DuckDB, while optimized for single-node analytics, also supports geospatial extensions and is ideal for high-performance local analysis.

Selecting the appropriate database technology depends on the application’s specific requirements, including data volume, query complexity, and whether real-time or batch processing is needed. Organizations can build a robust and future-ready foundation for modern geospatial applications by integrating scalable database technologies with cloud-native geospatial formats.

Can Cloud-Native Geospatial Support GeoAI?

Machine learning (ML) and artificial intelligence (AI) are integral components of the cloud-native geospatial ecosystem. Frameworks such as PyTorch (most popular) and TensorFlow are widely used for geospatial applications, enabling tasks such as predictive modeling, object detection, and land cover classification. These capabilities can provide accurate automated insights, supporting applications in fields like precision agriculture, disaster response, and climate change mitigation.

The diversity and volume of geospatial data products and GeoAI models derived from them can make it challenging to find relevant data products and reproduce model predictions. To address this problem, the cloud-native geospatial community is defining cloud-native data formats and taking the next step to create cloud-native standards for data catalogs and collections. These standards make it easier to search and associate geospatial data and models. The Spatio-Temporal Asset Catalog (STAC) specification is an industry standard for describing satellite imagery, aerial imagery, lidar, and other types of geospatial data. This standard includes extensions for describing different kinds of assets, including the Machine Learning Model (MLM) specification. The MLM provides searchable metadata that links model artifact files, model input requirements, hardware and runtime requirements, and associations to published STAC datasets to make it easy for machine learning frameworks to reproduce model inference and to make models searchable in model catalogs. You can learn more about the STAC ecosystem of standard and tooling at https://stacspec.org/en.

Cloud-native infrastructure is crucial in advancing GeoAI by enabling scalable and efficient data processing. Platforms such as AWS SageMaker, Google Cloud AI, and Azure Machine Learning facilitate seamless integration of ML workflows with geospatial data. Additionally, cloud-native geospatial formats, such as Cloud Optimized GeoTIFF (COG) and Zarr, ensure fast and efficient access to large geospatial datasets, reducing the latency associated with traditional data handling, especially when stored in cloud-native object storage.

Beyond predictive modeling and classification, GeoAI is relevant to a wide range of analytical capabilities, including spatial clustering, route optimization, spatial interpolation, viewshed analysis, hot spot analysis, and spatial regression. These capabilities are further enhanced by the parallel processing power and distributed architecture offered by cloud-native technologies, making it possible to handle complex and large-scale geospatial data and workloads efficiently.

Organizations can build robust GeoAI solutions that address diverse and evolving geospatial challenges by leveraging cloud-native geospatial technologies and ML frameworks3.

Get started with Cloud-Native Geospatial.

Cloud-Native Geospatial represents a transformative approach to managing, analyzing, and sharing geospatial data. Geospatial practitioners can achieve greater scalability, efficiency, and collaboration with cloud-based storage, computing, and modern data formats such as Cloud Optimized GeoTIFF (COG), Zarr, and GeoParquet. Integrating cloud-native geospatial formats into GIS workflows helps organizations remain flexible and capable of handling the growing demands of modern geospatial applications. With the right tools, infrastructure, and database technologies, cloud-native geospatial enables organizations to streamline operations and drive innovation in the geospatial field.

You can learn about cloud-native geospatial by exploring the Guide to Cloud-Optimized Geospatial Formats. Discover the essentials of cloud-native geo data formats and standards by watching the CNG 101 webinar—an excellent resource for an overview of CNG.

Want to get involved? Join the Cloud-Native Geospatial Forum (CNG) community and become part of the team pushing the envelope to bring cloud-native standards and technology to geospatial!

Our blog is open source. You can suggest edits on GitHub.