Risks and Governance

The honest chapter. Where Building Intelligence breaks, where it creates new risk, and the oversight framework that would let the automation be trusted.

AI's value in the circular-economy transition is real but contingent. The tools exist; the data is beginning to exist; the governance is the bottleneck. That is where the next decade's most leveraged policy work lives.

Abstract

The preceding four chapters argued that AI is a practical lever on New York's circular-economy transition: the inference stack is adequate, the public datasets are ML-ready, the global standards to make the stack portable are in place. This chapter is the balance sheet. Every claim in the report depends on AI systems deployed into a regulatory environment where the cost of a wrong answer is measured in hundreds of thousands of dollars per building, where the training data is systematically skewed toward over-represented typologies, and where the incumbent software vendors have every incentive to lock the resulting market.

We enumerate six risk categories that a responsible Building Intelligence practice has to treat: hallucinated code compliance, training-data bias, the energy cost of inference itself, data ownership ambiguity, regulatory capture by incumbents, and false confidence in categories the underlying data does not cover. Each risk has a corresponding governance answer that is mostly clerical work — model cards, disclosure requirements, audit processes, procurement templates, open-standard preferences, data-rights declarations. We close with a four-element framework that ties the six answers together and relates it to existing AI-governance literature (NIST AI RMF, EU AI Act, EO 14110, Anthropic's Responsible Scaling Policy). The report ends on the working claim of Chapter 1: the binding constraint is measurement. Governance is how the measurement gets built without becoming a new form of capture.

1. Six risks, six governance answers

Risk 1 — Hallucinated code compliance

Large language models generate plausible-sounding but wrong building-code interpretations. The failure mode is specific: the model asserts, in confident prose, that an envelope retrofit on a pre-1938 Class A-2 building does or does not require a variance; the assertion is wrong; the owner proceeds; the Department of Buildings issues a stop-work order; the rework cost lands in the mid-six figures. Stanford HAI's LegalBench and HELM evaluations (Stanford HAI, 2023–2024) find that frontier models produce legally-incorrect outputs at rates between twelve and thirty percent on specialist-code tasks depending on jurisdiction and domain— well above the tolerance for a compliance verdict. Pilot reports from AEC firms deploying LLMs against ICC and NYC construction code corroborate the pattern (practitioner interviews, 2024–2025).

Governance answer. No model output should clear as a compliance verdict without a licensed professional in the loop. Every AI compliance tool sold into the building market should publish a confidence threshold below which its output is surfaced as a recommendation, not a decision, and a measured accuracy rate against a held-out code-interpretation benchmark. The rule is the one NIST AI RMF already proposes (NIST, 2023): the risk class of the output determines the required human oversight.

Risk 2 — Training-data bias toward over-represented typologies

The LL84 training set is not representative of the city it will be used to govern. Three numbers, verified directly against the live Supabase mirror of NYC Open Data in April 2026, frame the problem. First, LL84 reaches 27,922 distinct BBLs in 2024 against 858,644 PLUTO parcels — 3.3 percent of the registry. Second, of the 28,173 BBLs the city's own sustainability-CBL roster flags as required to report, only 17,410 actually did so in 2023 — a 61.8 percent compliance rate, not the ninety-plus figure that advocacy communications routinely cite. Third, the reporting pool itself is skewed by construction era: LL84 2023 rows split 4.17 to 1 pre-1991 versus post-1991, so any model trained on this data will over-fit twentieth-century construction and systematically mispredict the thermal behavior of post-1991 buildings, which happen to be the cohort where heat-pump economics and envelope-continuity assumptions diverge most from the older stock.

The AI Now Institute's sectoral-bias work (AI Now, 2022–2024) documents the same pattern in other public-data settings: when a regulated subset is used to train a tool applied to the broader population, the unregulated population carries the error. The building-sector version is cleaner because the distribution is numerically traceable to a single roster — and because the gap is large enough that a disaggregated-validation requirement cannot be hand-waved.

Governance answer. Disaggregated validation is the standard response. Any AI tool sold into NYC building compliance should publish accuracy by PLUTO building-class bucket, by construction era, and by occupancy type — not a single aggregate score. Third-party audit of the training-data distribution, on the model-card pattern Mitchell et al. (2019) established, should be a procurement requirement for public-sector purchases.

Risk 3 — Inference energy vs. abatement delivered

Training and inference have real carbon footprints. Patterson et al. (2021) estimated training-run emissions for frontier models ranging from tens to hundreds of tonnes of CO₂-equivalent per run (arXiv:2104.10350); Google's 2024 sustainability disclosure reported Gemini-class training and serving energy at the megawatt- hour scale (Google, 2024). The MIT Energy Initiative's 2023–2024 data-center work finds inference — not training — now dominates lifetime energy for deployed models (MITEI, 2023).

The building-sector case is quantitative. A retrofit-recommendation model whose lifetime training plus inference footprint is on the order of five megawatt-hours, deployed against a portfolio whose retrofit recommendations avoid five hundred megawatt-hours, is a net winner by two orders of magnitude. A frontier-model chat agent that consumes a hundred kilowatt-hours per advisory query and whose typical recommendation is a ten-dollar sensor install is a net loser. The difference is not detectable without disclosure.

Governance answer. Any AI tool sold into the building-compliance market should publish its training energy budget, its measured per-query inference energy, and the abatement claim it supports. The disclosure is the same format Google, Meta, and Microsoft already publish for their internal stacks (Google, 2024); requiring it of vendors is a small regulatory step.

Risk 4 — Data ownership ambiguity

Who owns the energy-use data in ll84_monthly_energy? The building owner submits it; the utility generates it; the tenant pays for most of it; the city publishes it. The same ownership question applies to facade imagery captured under FISP, boiler telemetry piped through manufacturer clouds, and occupancy sensors installed by property managers. The current regime answers the question inconsistently: LL84's disclosure rules treat the data as publishable in aggregate; GDPR (EU Regulation 2016/679) treats tenant-resolvable energy traces as personal data; FTC guidance on connected-device data sits somewhere between the two (FTC, 2023).

Governance answer. A building-level data-rights declaration, adapted from the HIPAA pattern for physical plant, would resolve the ambiguity with one legal artifact per building: what is collected, by whom, for what purpose, with what onward- sharing permissions, for how long. ULI's 2023 work on data-sharing in the built environment sketches the outline; the policy work is to make it binding.

Risk 5 — Regulatory capture of AI tools by incumbents

Vendor-locked compliance platforms are the obvious capture vector. A city that mandates LL97 compliance filings through a single-vendor digital-twin provider has redirected a public-compliance flow into private rent, and the rent is paid by every covered building for the life of the regulation. The pattern to avoid is the one BIM followed: an open standard (IFC) existed, was starved of public procurement, and the market consolidated around a closed-file format whose rent has been extracted from the AEC sector for twenty-five years (Stigler Center / ProMarket coverage of AEC platform concentration, 2022–2024).

Governance answer. Public procurement of Building Intelligence tools should prefer open standards — IFC and BCF from buildingSMART International, ISO 19650 for information management, and open-API implementations over closed ones. The EU's EPBD recast (Directive (EU) 2024/1275) already points in this direction by requiring digital building logbooks in open, machine-readable form. New York's version of that requirement is the lever.

Risk 6 — False confidence in under-measured categories

The AI trained on LL84 will happily produce a monthly tenant-energy prediction, a commercial-waste-tonnage estimate, or a building-level embodied-carbon figure. None of these quantities are measured at building resolution in New York today (see Chapter 1). The model output is plausible; the underlying data does not support it. Gal and Ghahramani (2016, arXiv:1506.02142) framed the difference as epistemic versus aleatoric uncertainty — the model's confidence reflects the former, the data's representativeness determines the latter, and the two are routinely conflated at the output layer. The MIT-IBM Watson AI Lab's work on confidence calibration (2023)makes the same point in applied form.

Governance answer. Structured uncertainty disclosure at the output layer — a required field alongside every building-level prediction that reports the training-data coverage, the measured calibration error, and the epistemic status of the estimate. The AI has to report its own ignorance. The convention already exists in weather forecasting (ensemble spread, calibration diagrams) and in clinical ML (prediction intervals). Porting it to building-sector inference is design work, not research work.

Matrix mapping six risk categories in Building Intelligence to their corresponding governance answers. — **Figure 5.1.** The six-risk governance matrix. Each row pairs a failure mode in AI-for-buildings with the oversight mechanism that addresses it. The answers cluster into four categories — disclosure, audit, open standards, ownership — which we develop into a framework in Section 2.

#	Risk	Failure mode	Exposure	Governance answer
1	Hallucinated code compliance	Plausible-sounding but wrong code interpretations	$500k+ rework per wrong verdict on a single building	Human-in-loop review; published confidence thresholds
2	Training-data bias	LL84 training set is 4.17:1 pre-1991 vs post-1991; only 61.8% of required BBLs actually report; 39% of reporter BBLs don't resolve to PLUTO at all	Models systematically mispredict post-1991 thermal behavior and under-serve the 93% of the registry outside LL84	Disaggregated validation by class and era; third-party audit of training-data distribution
3	Inference energy vs. abatement delivered	AI tool consumes more energy than the retrofit it recommends saves	Net-negative carbon from the very tools sold as a climate solution	Published training + inference energy budgets per tool sold into compliance markets
4	Data ownership ambiguity	Unclear rights over energy, facade imagery, boiler telemetry, occupancy sensors	Tenant privacy violations; vendor re-sale of city-regulated data	Building-level data-rights declaration, modeled on HIPAA
5	Regulatory capture by incumbents	Closed compliance checkers and single-vendor digital twins redirect public work into private rent	The BIM-captured-by-Autodesk pattern, repeated for Building Intelligence	Public procurement preference for open standards (IFC, BCF, ISO 19650)
6	False confidence in under-measured categories	AI predicts tenant energy, commercial waste, embodied carbon with no underlying data	Policy built on model outputs whose epistemic status is unreported	Structured uncertainty disclosure at the output layer

2. A governance framework in four elements

The six governance answers above cluster. Four categories cover them all, and each has an analogue in the existing AI-governance literature — which means the building-sector framework is not a clean-sheet design but an adaptation of frameworks that already have regulatory precedent.

Element 1 — Disclosure

Every AI tool sold into the building-compliance market publishes a model card (Mitchell et al., 2019) covering training data, evaluation benchmarks, known limitations, and the training plus measured inference energy budget. The disclosure obligation maps directly onto NIST AI RMF 1.0's “Map” and “Measure” functions (NIST, 2023) and on Anthropic's Responsible Scaling Policy disclosure commitments (Anthropic, 2024 revision). In building procurement terms, the model card becomes a line item in the RFP response.

Element 2 — Audit

City and state agencies — or independent auditors under contract — periodically sample AI compliance verdicts and compare them to human ground truth. Published accuracy rates, disaggregated by building class, become part of the procurement record. The audit mechanism is the one the EU AI Act codifies for “high-risk” systems (Regulation (EU) 2024/1689): ex-ante conformity assessment plus ex-post monitoring. Building-compliance AI is the canonical high-risk use case under the Act's Annex III criteria.

Element 3 — Open standards

Public procurement strongly prefers open formats — IFC, BCF, buildingSMART's data-dictionary standards, ISO 19650 for information management — over proprietary ones. The principle is the one White House Executive Order 14110 (2023) applies to federal AI procurement: interoperability is a security property, not just a convenience. In the building sector, it is also the anti-capture mechanism. A city whose LL97 filings can only be read by a single vendor's software has, practically, outsourced the legal status of the regulation to that vendor.

Element 4 — Ownership

A building-level data-rights declaration, adapted from HIPAA's physical-plant equivalent, resolves the ambiguity Risk 4 documents. One legal artifact per building, revocable by the owner, naming every data stream, every processor, and every downstream use. The ULI data-sharing framework (ULI, 2023) sketches the format; the governance work is to make it the default.

These four elements are mutually reinforcing. Disclosure without audit is marketing. Audit without open standards produces audits against proprietary benchmarks. Open standards without ownership leave the data's legal status unresolved. Ownership without disclosure makes it impossible for an owner to know what they are consenting to. The framework works as a system or it does not work at all.

Implications

1. Governance decides whether AI becomes infrastructure or capture.

Without the four-element framework, AI in buildings becomes another vector for platform concentration — proprietary compliance checkers, closed digital twins, vendor-locked data schemas — of the kind BIM produced under Autodesk. With the framework, it becomes the backbone of the circular transition: a readable, auditable, portable substrate over which retrofit, reuse, and deconstruction decisions are made in public view. The choice is determined not by the technology but by the procurement templates and disclosure rules adopted in the next three to five years.

2. The governance work is mostly clerical.

None of the four elements requires a research breakthrough. Model cards are a template. Audit protocols are a contract. Open-standard preferences are a two-paragraph change to a city RFP. Building- level data-rights declarations are a legal form. The work is unglamorous, largely invisible to capital, and sits across the seams of agencies that do not currently coordinate — the Department of Buildings, the Mayor's Office of Climate and Environmental Justice, the Department of City Planning, the Public Advocate, the City Comptroller. Public sponsorship is the only mechanism that reliably produces this kind of unglamorous cross-agency work.

3. The tools exist; the data is beginning to exist; the governance is the bottleneck.

Chapter 1 established that about one percent of New York's buildings are machine-readable. Chapters 2 and 3 showed that the AI toolkit is adequate for the readable subset today and extends to the unreadable subset with modest sensing investment. Chapter 4 established that the global standards to make the stack portable are already in force. What is missing is the policy framework that would prevent the transition from being captured in the process of being built. That is where the next decade's most leveraged building-sector policy work lives.

A research program for responsible Building Intelligence

Chapter 1 closed with the pyramid: 1.08 million buildings, about one percent machine-readable, an order-of-magnitude drop at every tier. This chapter closes the report by naming the condition under which the pyramid can be inverted. Getting from one percent coverage to eighty percent coverage is partly an AI problem — the models, the inference patterns, the text classifiers that turn violations text into condition indices at citywide scale. It is partly a sensing problem — sub-meters, lidar sleds, thermal imagery at urban resolution. But neither the AI nor the sensing produces a circular built environment on its own. The binding step is the governance that decides who can read the data, who can act on it, on what standards, with what oversight, and with what recourse when the automation is wrong.

Responsible Building Intelligence is the discipline that takes all three seriously at once: the inference, the measurement, and the governance. Aedifice Research will continue to document each of the three in subsequent reports. The work of Report No. 02 ends here. The work the report points to is the work of the next decade.

How to cite

Edwards, J. (2026). Machine-Readable Buildings: How AI Accelerates the Circular Economy in New York. Chapter 5 — Risks and Governance. Aedifice Research, Report No. 02. Retrieved from https://aedifice-research.vercel.app/research/publications/machine-readable-buildings/chapter-5-risks-governance.

Selected references

AI Now Institute. (2022–2024). Sectoral bias in public-data AI systems. New York University.
Anthropic. (2024). Responsible Scaling Policy (revision). Anthropic PBC.
buildingSMART International. (2024). IFC, BCF, and ISO 19650 open standards for the built environment.
Directive (EU) 2024/1275 of the European Parliament and of the Council on the energy performance of buildings (recast).
Executive Order 14110. (2023). Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. The White House.
Federal Trade Commission. (2023). Guidance on connected-device and consumer data.
Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. arXiv:1506.02142.
Google. (2024). 2024 Environmental Report. Alphabet Inc.
MIT Energy Initiative. (2023–2024). Data-center and inference energy publications.
Mitchell, M., et al. (2019). Model cards for model reporting. Proceedings of FAT*.
National Institute of Standards and Technology. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST.
Patterson, D., et al. (2021). Carbon emissions and large neural network training. arXiv:2104.10350.
Regulation (EU) 2024/1689 of the European Parliament and of the Council (the AI Act).
Stanford HAI. (2023–2024). LegalBench and HELM evaluations. Stanford University.
Urban Land Institute. (2023). Data-sharing and privacy in the built environment.

← Chapter 4 · The Global Layer

Methodology Code & data

End of Report No. 02