How a framing plan becomes a load-rated model

Seven stages. The model reads, groups and estimates inputs; deterministic code verifies what it can and computes everything structural. This page shows the actual mechanism: the real prompts and the real source, served live from the running process, so what you read here cannot drift from what executes.

Scope: simply supported, uniformly loaded single spans. W-shapes per AISC 360-16 F2/G2.1 (compact, Cb = 1.0); open-web joists per published SJI total-safe-load tables (ASD). Channels, HSS, angles and unknown sections are recognized and refused with the reason on screen. No continuity, no point loads, no snow drift, no composite action.

The design rule. The model never computes a structural number. It reads pixels and judges groupings, the two things code cannot do here. Every model claim that can be checked is checked: cited dimension strings must appear verbatim in the text layer, claimed grid-bay spans must match measured bubble positions, load quotes must verify in the notes. Claims that fail verification are demoted to a weaker confidence tier on screen, never silently dropped, never shown as confirmed.

Stages A and B

Member extractioncode first; VLM only when pixels are all there is

A page with a text layer is read by a deterministic grammar: every word is matched against the member-designation patterns (W/C/MC/HSS/L shapes, K/H/LH/DLH/KCS joists), EXISTING prefixes are merged from the same text line, multi-word HSS sizes are fused. On the development sheet this grammar produced the 187-label ground truth that the raster prototype was scored against; here it is the extractor, so vector extraction has no model error at all.

Image-only pages get the raster path: 3x3 overlapping tiles at 150 DPI, each sent to the vision model with the prompt below. The model returns label text plus 0-1000 normalized coordinates (its trained convention); code converts those back to PDF points through the known tile rectangle and dedupes across overlaps. This path inherits the measured behavior of the raster-reader prototype (95 percent recall on its scored sheet) and its caveats.

engine.py · extract_vector_members pulled live via inspect from the running module

loading…

the exact raster extraction prompt

Stage C

Scale, anchored to the sheet's own written dimensionscode

Spans and tributary widths need points-per-foot. The printed scale note ("1/8" = 1'-0"") is not trusted: sheets get plotted off-scale. Instead, the sheet's own dimension strings are the anchor, an idea kept from the predecessor project and generalized so it no longer needs grid bubbles:

a dimension CHAIN is 3+ collinear dimension texts. Each text sits at the midpoint of its own segment, so:

pt_per_ft = (distance from first to last label center) / (f1/2 + f2 + ... + f(n-1) + fn/2)

The long baseline averages out the per-label centering offsets that make adjacent-pair ratios noisy. On Garrett S-1.2 the two chains land at 9.0034 and 8.9989 pt/ft against the stated 9.0 (0.03 percent off). Chains vote; the dominant cluster's median wins; fewer than 2 agreeing chains or spread past 2 percent leaves the scale UNKNOWN and the affected members unrated.

The raster path asks the vision model for one or two clear dimensions per tile WITH the pixel endpoints of their extension lines, derives pt-per-ft from each, and requires at least two independent readings agreeing within 2 percent. Units are converted by code (the prompt forbids the model from converting). Either path's result is shown with its source, and the workspace has an override box; an override re-derives tributary widths and re-rates, engine only.

engine.py · merge_dims + find_chains + scale_from_chains live source; note the Revit split-token handling and the elevation veto

loading…

the exact raster scale prompt

Stage D

Spansmodel groups + citescode verifies

Which beams form a bank, and what they span, is genuine judgment over the drawing, so it is the model's job. The model sees the full sheet image, the member list with real coordinates, the dimension strings, and the measured grid bays. It must place every label in a group or exclude it with a reason, and it must state how it got each span:

claimed basis	code verification	resulting tag
dimension_string + verbatim quote	quote found in the text layer (whitespace-insensitive) AND its parsed value matches the claimed span within 3 percent	measured
grid_spacing	claimed span matches an adjacent grid-bay distance measured from drawn bubble circles at the derived scale	measured
any basis + named support members	the claimed span is cross-checked against the measured separation of the named support labels; disagreement past 15 percent downgrades the span and prints the conflict on the member (a verbatim quote can verify and still be misapplied; this catches that)	downgrade on conflict
scaled_estimate, or any failed verification	none possible	model-grouped (shown in blue, editable)
label the model could not place	n/a	assumed: not rated until edited

A span edit can apply to the whole group, because a joist bank is one span and one correction should fix all of its members at once. Members can also be selected directly on the plan, one click or a shift-drag box, and edited together; that path re-runs the same deterministic engine and nothing else.

the exact stage-D prompt template

loading…

engine.py · grid_baysdeterministic bay measurement used both in the prompt and in verification

loading…

Stage E

Tributary widthscode, a stated proxy

For each plan member, code measures the perpendicular distance to the nearest parallel member label on each side, halves both, and sums. For an interior joist in a regular bank that is exactly the joist spacing; for an interior girder it is half the left bay plus half the right bay, which is its tributary when load arrives uniformly. This is a proxy and the UI says so: it trusts label positions, a missed neighbor inflates it, edge members get a single-sided value flagged as such, and members with no parallel neighbor get no tributary and are not rated. Where the sheet prints an on-center note ("W14x22 AT 6'-2 1/4" OC") the model may cite it; the note is verified verbatim and then beats the proxy.

engine.py · bank_spacing_triblive source, including the honesty comments

loading…

Stage F

Loadsmodel quotescode verifies, defaults flagged

Code finds the candidate notes pages (any page with LIVE/DEAD/SNOW LOAD and PSF), the model extracts roof DL and LL with verbatim quotes, and code verifies each quote in the cited page's text layer, the same pattern the pre-RFI prototype uses for its findings. A verified value is shown with its quote receipt. A value the sheet does not state falls back to a default (20 psf DL, 20 psf LL) flagged until you confirm or edit it; image-only inputs always get flagged defaults because there is no text layer to verify against. On the Garrett set this stage reads "LIVE LOAD = 40 PSF" with a verified quote and honestly reports that the dead-load buildup is not stated.

the exact stage-F prompt template

loading…

Stage G

Ratingcode only, with code references on every line

rater.py computes line loads from (DL + added dead) and (LL + added live) times the tributary width, adds member self-weight to dead explicitly, and checks:

members	checks	basis
W-shapes (14-shape two-source-verified AISC v15.0 table)	flexure F2 (yielding + LTB, Cb = 1.0), shear G2.1, deflection L/240 total	LRFD 1.2D+1.6L and 1.4D, plus ASD D+L
joists 16H4, 16H5 (SJI 1961 H-series), 32LH06, 32LH07 (SJI 45th Ed.), 30KCS4 (KCS moment/shear)	total load vs published total safe load; published L/360 live figure where the table prints one	ASD as published (the tables have no LRFD basis, and the report says so)
channels, HSS, angles, sections not in the verified tables	refused with the reason on screen; never silently rated

Existing members (EXISTING prefix) default to Fy = 36 ksi, a stated assumption for pre-1986 steel, editable per member. W-shape bracing defaults to Lb = 0 (deck or joists brace the flange), a stated assumption switchable to Lb = span per member.

Why the sliders are instant and exact: for a simply supported uniform member every demand is linear in the added area load, so each check is stored as

demand(p_dead, p_live) = base + k_dead * p_dead + k_live * p_live utilization = demand / capacity (capacity is constant)

computed once by the engine. The browser recomputes utilization at slider time with three multiplications per check; nothing is approximated and no model is anywhere near it. The same coefficients drive the colored plan, the table, and the CSV export.

rater.py · rate_w_shapelive source; fixtures re-run the calc-copilot hand checks against AISC Manual values

loading…

rater.py · rate_joistSJI table arithmetic: next row up, conservative; self-weight on the demand side

loading…

The trust story

Editable inputs are the product

A drawing never states everything a rating needs, so any tool that claims to rate beams without a human in the loop is lying somewhere. The contract here: every input the model produced is on screen with its confidence tag and its provenance sentence; every input is editable, from its group row or by selecting members on the plan; every edit re-runs only the deterministic engine and re-tags the value "edited". The engineer corrects three or four inputs instead of transcribing two hundred, and the output is a calc sheet whose every line carries a code reference.

engine.py · rate_allthe only path from inputs to verdicts, for the pipeline and for edits alike

loading…

Limits

What this does not do, on purpose

limit	consequence
single-span, simply supported, uniform load only	girders loaded by a few concentrated joist reactions are approximated by their uniform tributary equivalent; that is usually conservative for midspan moment with full coverage, and wrong for partial-coverage or one-sided loading. The predecessor project modeled true point loads; this one trades that for generality and says so.
Cb = 1.0, no continuity, no cantilevers	conservative for most roof framing; cantilevers come out as separate short members if labeled, or wrong if not. Edit or ignore those.
tributary = bank spacing of labels	a proxy that inherits label-extraction quality; flagged per member, edge members flagged single-sided.
no snow drift, no wind on equipment, no seismic mass change	the added-load sliders model uniform area load only; drift zones near steps need an engineer.
joists only from the carried verified table rows	other joists are recognized and listed as not rated; never green.
scale unknown means unrated members	no silent fallback to the printed scale note when written dimensions cannot confirm it.

Why the predecessor of this exact idea failed, and what is different. An earlier project in this repo turned the same Garrett sheet into member verdicts and was judged not demoable: its AI stage re-read text that the text layer already provided (decorative), its geometric heuristics encoded one firm's drafting habits but were presented as general, and several claims shipped before being checked against the artifact. This build's answers: the model only ever works where code cannot (raster pixels, grouping judgment, reading prose notes) and its checkable claims are machine-verified; geometry is limited to two stated, inspectable measurements (dimension chains, label spacing) that are tagged as proxies rather than presented as understanding; and every claim on this page is either served live from the running code or carries the measured number next to it. The honest residue is visible in the product: model-grouped spans are blue, not green, and the gray column says why each refusal happened.

Appendix

Cost, caps, caching

model	…
metering	token usage from each API response is the accounting source of truth; the budget is checked before every call; hard cap per job, abort before overspend. A vector sheet costs roughly $0.10 to $0.40 (one grouping call with thinking, one loads call); a raster sheet adds ~13 tile calls.
caching	responses cached by (model + prompt + images) hash. Bundled samples share a cache, so repeat demo runs replay instantly and free; uploads always run live.
restart safety	jobs persist to runs/ (job.json + events.jsonl + results.json); a reload resumes from the URL on the page the run analyzed; a dead event stream shows a banner, not a spinner.