Platform
Search Coverage Property History Alerts
Solutions
Solar & Energy Solar Permit Data Roofing HVAC Real Estate Insurance Contractors
Company
Pricing API Docs Blog Sign In Get API Key

Data methodology

Updated June 11, 2026

How we turn hundreds of inconsistent government permit feeds into one clean, queryable, honestly-labeled dataset — and exactly what we will and won't claim about it.

In short: PermitStack ingests U.S. building permits directly from official government open-data portals every day, normalizes every source into one schema, classifies each permit into ~20 categories, rejects impossible (future-dated) records, and labels every jurisdiction with a data_status so you always know whether its data is current. Today: 49M+ permits, 522 active jurisdictions, 24 historical-archive sources, 50 states.

49M+
Permit records
522
Active jurisdictions
24
Historical-archive sources
50
States

Where the data comes from

Every permit in PermitStack originates from a government authority having jurisdiction — a city or county building department — and is published through that government's own open-data infrastructure. We do not buy data from a reseller and we do not infer permits from imagery. We read each jurisdiction's official feed directly.

U.S. jurisdictions publish on a handful of common platforms, and we built a dedicated connector for each:

A small number of very large statewide datasets — for example California's NEM solar interconnection data (~2.7M records) and New York's NY-Sun program (~550K) — run through dedicated bulk loaders but land in the exact same schema. You can see every jurisdiction, its platform, and its current freshness on the coverage page.

Daily ingestion cadence

We run ingestion every day, beginning at 03:00 UTC. Each run is incremental: for every jurisdiction we ask its source only for permits added or changed since our last successful pull, which keeps the load light on the upstream portal and lets a newly issued permit appear in our API within days of issuance. Statewide bulk feeds refresh on their own schedule (the largest, California solar, monthly), and we re-run any feed in full when a source migrates or changes shape.

This daily cadence is one of our core advantages. For comparison, Shovels.ai — the category leader — refreshes roughly twice monthly (publicly reported, as of June 2026). We update faster and we publish that fact rather than burying it.

Normalization into one schema

Raw permit feeds disagree about almost everything: field names, date formats, status vocabularies, coordinate projections, even what counts as a "permit." Our loader maps each source's columns onto a single canonical record, so a permit from a Socrata city in Texas and an ArcGIS county in California come out identical in our API. Normalization includes:

Classification into ~20 categories

Permit "types" are free text and wildly inconsistent — a roof replacement might be filed as REROOF, RE-ROOF, or a sentence in a description field (see reroof vs. re-roof). Our classifier reads each permit's type and description and assigns one of ~20 normalized categories, so you can query "all solar permits" or "all roofing permits" across every jurisdiction with a single filter instead of guessing local spellings.

Current category counts across the dataset:

CategoryPlain meaningRecords
OTHEROther17,201,635
ELECTRICALElectrical6,308,600
SOLARSolar5,935,469
PLUMBINGPlumbing4,759,253
INTERIOR_REMODELInterior Remodel3,215,412
ROOFINGRoofing2,388,748
HVACHvac2,317,455
MECHANICALMechanical2,142,565
RENOVATIONRenovation1,789,952
NEW_CONSTRUCTIONNew Construction1,382,646
ADDITIONAddition1,087,590
SIGNSign673,937
FENCEFence656,612
DEMOLITIONDemolition566,999
POOLPool579,318
FIRE_ALARMFire Alarm535,441
FOUNDATIONFoundation252,709
GRADINGGrading131,545
BATTERYBattery45,168
EV_CHARGEREv Charger31,109

Each major category also has a glossary entry — for example solar, roofing, HVAC, electrical, and new construction — and several map to a dedicated landing page like solar permit data.

Future-date rejection & data integrity

Upstream feeds are not clean. Several publish placeholder or typo dates — a permit "issued" in the year 2099, or filed next month. We do not pass those through. During ingestion we reject impossible future-dated values on date_filed, date_issued, and date_completed, nulling the bad date and logging it rather than letting a garbage timestamp pollute a roof-age or freshness calculation. A failed record in a batch never takes down the good records around it.

Honest labeling: the data_status flag

This is the part we care most about. Coverage counts are easy to inflate by leaving dead cities in the total. We don't. Every jurisdiction carries a data_status flag with one of three values:

This flag is exposed in API responses and rendered on the coverage page next to every jurisdiction, with a freshness indicator showing how recently each source published. The discipline is simple: a "no permit found" should never silently mean "we quietly stopped getting data here." We'd rather tell you a jurisdiction is archived than let you draw a wrong conclusion from a gap.

What we will and won't claim

We report live counts, not round marketing numbers — the figures on this page are read from the production database when it's generated. Permit data is a near-complete but not total record of construction activity: some work is unpermitted, and coverage varies by jurisdiction. We say so, we label it, and we point you to the coverage page so you can verify before you rely on a city. For the vocabulary behind any field referenced here, see the building permit glossary.

Build on honestly-labeled permit data

49M+ U.S. building permits, refreshed daily, with transparent per-jurisdiction data status. Free tier, no credit card.

Get a Free API Key