U.S. Large-Scale Solar Photovoltaics Database - Release of a new geospatial dataset

November 8, 2023

Collaboration between Berkeley Lab and USGS produces the most detailed and comprehensive publicly available large-scale solar facility database to date. 

Berkeley Lab, in collaboration with the U.S. Geological Survey (USGS), released the United States Large-Scale Solar Photovoltaic Database (USPVDB) today. The USPVDB is a detailed and comprehensive dataset of ground-mounted large-scale solar (LSS) photovoltaic energy facility locations and their attributes in the United States. The data can be downloaded in multiple formats, is accessible via an online viewer, and will be updated annually.

Over the past decade, LSS development has increased substantially in terms of both electricity generation capacity and the number of new facilities coming online each year. The USPVDB provides the means to analyze the historical trends in LSS and to more accurately assess the potential costs and benefits of future development, while also providing unprecedented easy access to anyone curious about LSS substantial recent build-out.

An article published in Nature Scientific Data describes the USPVDB and its development process. The authors will host a webinar covering the USPVDB on November 16 at 1 PM Eastern / 10 AM Pacific. Register for the webinar here: https://lbnl.zoom.us/webinar/register/WN_DABtH_86SNe4TSZPKn_QPg.

The USPVDB provides unprecedented quality data on the location and attributes for nearly LSS facilities across the United States.

The dataset comprises 3,699 LSS ground-mounted facilities with capacities greater than 1 MWdc in operation across 47 states and Washington, D.C. through the end of 2021 (Figure 1). The database contains both geospatial polygons encircling the installed equipment such as panels and inverters. The polygons were hand-drawn using high resolution aerial imagery. This is a significant improvement over other datasets, including that of the Energy Information Agency (EIA), that only contain coordinates of the central point of a large-scale solar facility, which are not independently verified for accuracy. 

The USPVDB database includes detailed facility attributes, including size of the array area, panel technology type, axis type, year of installation, and rated capacity to produce electricity. Further, the USPVDB brings a variety of attributes together, such as if the location was presently or previously environmentally contaminated and if the facility has agrivoltaic features. The team merged attributes from the Environmental Protection Agency’s RE-Powering data and the National Renewable Energy Laboratory’s InSPIRE agrivoltaics data. This allowed the USPVDB to, respectively, provide depictions of facility site types, such as greenfield or previous, current, or suspected contamination, as well as identify agrivoltaic facilities where agricultural or environmental services are provided between panel groups or surrounding arrays.

Figure 1. Locations of USPVDB facilities

The USPVDB was constructed through a careful process of visual verification and quality assurance.

The USPVDB development process was carried out across four primary stages further detailed and visualized in Figure 2:

  • Stage 1 - Georectifying PV facility coordinates: Starting from facility latitude and longitude included in EIA data,10 the locations of LSS facilities were visually verified using high-resolution aerial imagery.
  • Stage 2 - Digitizing LSPV array polygons: Using the georectified PV facility coordinates, polygons were drawn manually around the extent of panel arrays and inverters by USGS and LBNL analysts using ArcGIS software.
  • Stage 3 - Quality assurance and quality control (QA/QC) troubleshooting, and in-depth investigations: Various QA/QC processes were employed to ensure the highest achievable level of accuracy. This included using different analysts to inspect work, comparing the USPVDB to several other datasets of US LSS, and running statistical tests to find outliers.
  • Stage 4 - Populating facility attributes: Additional data attributes were appended for each facility, drawing on EIA and several other data sources.

Figure 2. USPVDB development process

USPVDB datasets are publicly available for use in research, policy analysis, decision-making, and general exploration.

The USPVDB and its associated viewer data may be used by government agencies, scientists, private companies, and other stakeholders for a variety of analyses that were previously unattainable. Further, the ease of access to the data will unlock new opportunities for a wide variety of stakeholders to learn and interface with trends in LSS deployment across the U.S. The USPVDB is available in geospatial (shapefile, GeoJSON) and tabular (csv) versions, and can be downloaded as an entire dataset or in parts via an application protocol interface (API). Further, an interactive web viewer enables a number of unique opportunities for data exploration and visualization, such as deployment over time and by project sizes.

We thank the U.S. Department of Energy Solar Energy Technologies Office for its support of this work, as well as the numerous individuals and organizations who generously provided data, information, and reviewed our work.