Aedifice Research · Methodology

How I produce these reports.

First-person notes on data, joins, embodied-carbon calibration, field anchoring, review, and reproducibility — from a practitioner who does the analysis himself.

How this work happens

I do the research. I pull the data from the authoritative public sources — NYC Open Data, MapPLUTO, DSNY, BIC, DOB, LL84, LPC — and load it into a Supabase mirror so the rows are queryable. I write the SQL, I write the Python pipeline, I write the charts, and I write the prose. Every headline number in every chapter ties back to a named resource ID and a transform I can re-run.

When the data falls short I say so. When I reach for a factor — an embodied-carbon intensity, a diversion rate, a jobs multiplier — I cite the source, and when the source publishes a range rather than a point value I write the range. When a claim needs to be anchored in practice, I anchor it in the practice I have seen, most recently the 2025 facade restoration of the Woolworth Building, and disclose the anchor. That is the whole method. The sections below are the expanded form.

01
Data ingestion
I work from authoritative public sources — the NYC Open Data sources for MapPLUTO, DOB filings, DSNY tonnage, BIC haulers, LL84 and LL97 benchmarking; the Landmarks Preservation Commission designation feed; the DCAS property feed; the certificate-of-occupancy feed. Each resource is pulled at the time of publication, saved with its resource ID and the pull date, and mirrored into Supabase so the row counts quoted in the text match the exact version of the source the pipeline read.
Beyond New York I use the Inventory of Carbon and Energy (ICE) database, the Carbon Leadership Forum's WBLCA v2 benchmarks, the IEA's AI for Climate and Energy series, BLS Occupational Employment Statistics, the EU EPBD recast, and the Danish Bygningsreglement. Each is named wherever its number is used.
02
Joins and identifiers
In New York the canonical identifiers are BBL (Borough-Block-Lot), BIN (Building Identification Number), DOB job number, and ACRIS document ID. Most of the city's building feeds are keyed on BBL; a minority publish BIN; a small number publish both. PLUTO carries BBL but no BIN, which is why building-level joins against DOB BIN feeds require a BBL round-trip through a table that publishes both (for example DOB Safety Violations).
When identifiers are missing or malformed I quantify the gap in the chapter's narrative rather than suppress it. The same discipline generalises to other jurisdictions; each gets documented when it enters the research agenda.
03
Analysis stack
Once in Supabase I analyse the data in SQL (PostgreSQL) and Python (pandas, scikit-learn, lifelines for survival analysis). Charts are produced in matplotlib with Helvetica Neue registered as the vector font and svg.fonttype = "path" so glyphs render the same across every browser. SVGs are generated per chapter and versioned next to the pipeline.
Each headline figure is written back to a headline-numbers.json per chapter, which the TSX pages import rather than hard-code — so the site and the pipeline cannot drift.
04
Embodied carbon
Embodied-carbon estimates combine three inputs: (1) gross floor area from the jurisdiction's authoritative cadastre, (2) era-specific construction typology derived from year built and building class, and (3) per-typology carbon-intensity factors calibrated to the Carbon Leadership Forum's baseline studies, the Inventory of Carbon and Energy (ICE) database, the RICS Whole-Life-Carbon standard, and peer-reviewed material-life research.
Where regional Environmental Product Declaration (EPD) data exists, I localise the intensity factor and document the adjustment inline.
05
Field anchoring
I calibrate abstract numbers against the buildings I have been in. The 2025 facade restoration of the Woolworth Building — a project I participated in and have written about publicly — is the nearest field anchor for anything facade- or preservation-adjacent. When a chapter uses numeric values that derive from field observation (unit material quantities, take-off rates, crew compositions) the anchor is named in the chapter body or the chapter's technical section.
06
Review
Before publication every report is read by at least one external professional — a facade engineer, preservation architect, materials scientist, policy analyst, or carbon accountant, chosen for the report's subject — whose role is to challenge assumptions and flag overreach. Reviewer names and affiliations appear in each report.
07
Uncertainty
Every headline number is published with a confidence band. Known sources of error — identifier-join failures, typology misclassification, intensity-factor regional mismatch, sentinel-year data-quality bias in the cadastre — are enumerated in the chapter's technical section and propagated through the final estimate. When a source of error cannot be quantified, I say so.
08
Open code
The analysis code for every published report is linked from the report's technical section under an open license. Re-running the code against the current source data must reproduce the report's numbers within the disclosed uncertainty. When it does not, a correction follows.
09
Reproducibility
My explicit standard: any careful reader with intermediate data skills must be able to reproduce the headline numbers in one afternoon, from the primary sources, on a laptop.
10
Limitations
What this method does not capture: operational-emissions dynamics beyond annual benchmarking, informal and off-permit material flows, buildings whose filings fall outside DOB jurisdiction, and structural-typology misclassification beyond the reported rate. I state these openly in every report rather than let them quietly distort the headline.

All publications

How this work happens

Data ingestion

Joins and identifiers

Analysis stack

Embodied carbon

Field anchoring

Review

Uncertainty

Open code

Reproducibility

Limitations