Beam rater
methodology
Back to the app

How a framing plan becomes a load-rated model

Seven stages. The model reads, groups and estimates inputs; deterministic code verifies what it can and computes everything structural. This page shows the actual mechanism: the real prompts and the real source, served live from the running process, so what you read here cannot drift from what executes.

Scope: simply supported, uniformly loaded single spans. W-shapes per AISC 360-16 F2/G2.1 (compact, Cb = 1.0); open-web joists per published SJI total-safe-load tables (ASD). Channels, HSS, angles and unknown sections are recognized and refused with the reason on screen. No continuity, no point loads, no snow drift, no composite action.

framing plan PDF or image A+B members code: text-layer grammar or VLM tiles (raster) C scale code: written-dimension chains, abort on spread D spans model groups + cites; code verifies citations E tributary code: bank spacing (stated proxy, editable) F loads model quotes notes; code verifies verbatim G rating code only: AISC 360-16 + SJI tables, linear coeffs every input lands in an editable table with a confidence tag (measured / model-grouped / assumed); edits re-run stage G only. Sliders re-rate client-side because demands are linear in added psf. no member is shown green without a complete, stated input chain; everything unrated says why
The design rule. The model never computes a structural number. It reads pixels and judges groupings, the two things code cannot do here. Every model claim that can be checked is checked: cited dimension strings must appear verbatim in the text layer, claimed grid-bay spans must match measured bubble positions, load quotes must verify in the notes. Claims that fail verification are demoted to a weaker confidence tier on screen, never silently dropped, never shown as confirmed.
Stages A and B

Member extractioncode first; VLM only when pixels are all there is

A page with a text layer is read by a deterministic grammar: every word is matched against the member-designation patterns (W/C/MC/HSS/L shapes, K/H/LH/DLH/KCS joists), EXISTING prefixes are merged from the same text line, multi-word HSS sizes are fused. On the development sheet this grammar produced the 187-label ground truth that the raster prototype was scored against; here it is the extractor, so vector extraction has no model error at all.

Image-only pages get the raster path: 3x3 overlapping tiles at 150 DPI, each sent to the vision model with the prompt below. The model returns label text plus 0-1000 normalized coordinates (its trained convention); code converts those back to PDF points through the known tile rectangle and dedupes across overlaps. This path inherits the measured behavior of the raster-reader prototype (95 percent recall on its scored sheet) and its caveats.

engine.py · extract_vector_members pulled live via inspect from the running module
loading…
the exact raster extraction prompt
Stage C

Scale, anchored to the sheet's own written dimensionscode

Spans and tributary widths need points-per-foot. The printed scale note ("1/8" = 1'-0"") is not trusted: sheets get plotted off-scale. Instead, the sheet's own dimension strings are the anchor, an idea kept from the predecessor project and generalized so it no longer needs grid bubbles:

a dimension CHAIN is 3+ collinear dimension texts. Each text sits at the midpoint of its own segment, so:

  pt_per_ft = (distance from first to last label center) / (f1/2 + f2 + ... + f(n-1) + fn/2)

The long baseline averages out the per-label centering offsets that make adjacent-pair ratios noisy. On Garrett S-1.2 the two chains land at 9.0034 and 8.9989 pt/ft against the stated 9.0 (0.03 percent off). Chains vote; the dominant cluster's median wins; fewer than 2 agreeing chains or spread past 2 percent leaves the scale UNKNOWN and the affected members unrated.

The raster path asks the vision model for one or two clear dimensions per tile WITH the pixel endpoints of their extension lines, derives pt-per-ft from each, and requires at least two independent readings agreeing within 2 percent. Units are converted by code (the prompt forbids the model from converting). Either path's result is shown with its source, and the workspace has an override box; an override re-derives tributary widths and re-rates, engine only.

engine.py · merge_dims + find_chains + scale_from_chains live source; note the Revit split-token handling and the elevation veto
loading…
the exact raster scale prompt
Stage D

Spansmodel groups + citescode verifies

Which beams form a bank, and what they span, is genuine judgment over the drawing, so it is the model's job. The model sees the full sheet image, the member list with real coordinates, the dimension strings, and the measured grid bays. It must place every label in a group or exclude it with a reason, and it must state how it got each span:

claimed basiscode verificationresulting tag
dimension_string + verbatim quotequote found in the text layer (whitespace-insensitive) AND its parsed value matches the claimed span within 3 percent measured
grid_spacingclaimed span matches an adjacent grid-bay distance measured from drawn bubble circles at the derived scalemeasured
any basis + named support membersthe claimed span is cross-checked against the measured separation of the named support labels; disagreement past 15 percent downgrades the span and prints the conflict on the member (a verbatim quote can verify and still be misapplied; this catches that) downgrade on conflict
scaled_estimate, or any failed verificationnone possible model-grouped (shown in blue, editable)
label the model could not placen/aassumed: not rated until edited

A span edit can apply to the whole group, because a joist bank is one span and one correction should fix all of its members at once. Members can also be selected directly on the plan, one click or a shift-drag box, and edited together; that path re-runs the same deterministic engine and nothing else.

the exact stage-D prompt template
loading…
engine.py · grid_baysdeterministic bay measurement used both in the prompt and in verification
loading…
Stage E

Tributary widthscode, a stated proxy

For each plan member, code measures the perpendicular distance to the nearest parallel member label on each side, halves both, and sums. For an interior joist in a regular bank that is exactly the joist spacing; for an interior girder it is half the left bay plus half the right bay, which is its tributary when load arrives uniformly. This is a proxy and the UI says so: it trusts label positions, a missed neighbor inflates it, edge members get a single-sided value flagged as such, and members with no parallel neighbor get no tributary and are not rated. Where the sheet prints an on-center note ("W14x22 AT 6'-2 1/4" OC") the model may cite it; the note is verified verbatim and then beats the proxy.

engine.py · bank_spacing_triblive source, including the honesty comments
loading…
Stage F

Loadsmodel quotescode verifies, defaults flagged

Code finds the candidate notes pages (any page with LIVE/DEAD/SNOW LOAD and PSF), the model extracts roof DL and LL with verbatim quotes, and code verifies each quote in the cited page's text layer, the same pattern the pre-RFI prototype uses for its findings. A verified value is shown with its quote receipt. A value the sheet does not state falls back to a default (20 psf DL, 20 psf LL) flagged until you confirm or edit it; image-only inputs always get flagged defaults because there is no text layer to verify against. On the Garrett set this stage reads "LIVE LOAD = 40 PSF" with a verified quote and honestly reports that the dead-load buildup is not stated.

the exact stage-F prompt template
loading…
Stage G

Ratingcode only, with code references on every line

rater.py computes line loads from (DL + added dead) and (LL + added live) times the tributary width, adds member self-weight to dead explicitly, and checks:

memberschecksbasis
W-shapes (14-shape two-source-verified AISC v15.0 table) flexure F2 (yielding + LTB, Cb = 1.0), shear G2.1, deflection L/240 total LRFD 1.2D+1.6L and 1.4D, plus ASD D+L
joists 16H4, 16H5 (SJI 1961 H-series), 32LH06, 32LH07 (SJI 45th Ed.), 30KCS4 (KCS moment/shear) total load vs published total safe load; published L/360 live figure where the table prints one ASD as published (the tables have no LRFD basis, and the report says so)
channels, HSS, angles, sections not in the verified tables refused with the reason on screen; never silently rated

Existing members (EXISTING prefix) default to Fy = 36 ksi, a stated assumption for pre-1986 steel, editable per member. W-shape bracing defaults to Lb = 0 (deck or joists brace the flange), a stated assumption switchable to Lb = span per member.

Why the sliders are instant and exact: for a simply supported uniform member every demand is linear in the added area load, so each check is stored as

demand(p_dead, p_live) = base + k_dead * p_dead + k_live * p_live    utilization = demand / capacity (capacity is constant)

computed once by the engine. The browser recomputes utilization at slider time with three multiplications per check; nothing is approximated and no model is anywhere near it. The same coefficients drive the colored plan, the table, and the CSV export.

rater.py · rate_w_shapelive source; fixtures re-run the calc-copilot hand checks against AISC Manual values
loading…
rater.py · rate_joistSJI table arithmetic: next row up, conservative; self-weight on the demand side
loading…
The trust story

Editable inputs are the product

A drawing never states everything a rating needs, so any tool that claims to rate beams without a human in the loop is lying somewhere. The contract here: every input the model produced is on screen with its confidence tag and its provenance sentence; every input is editable, from its group row or by selecting members on the plan; every edit re-runs only the deterministic engine and re-tags the value "edited". The engineer corrects three or four inputs instead of transcribing two hundred, and the output is a calc sheet whose every line carries a code reference.

engine.py · rate_allthe only path from inputs to verdicts, for the pipeline and for edits alike
loading…
Limits

What this does not do, on purpose

limitconsequence
single-span, simply supported, uniform load onlygirders loaded by a few concentrated joist reactions are approximated by their uniform tributary equivalent; that is usually conservative for midspan moment with full coverage, and wrong for partial-coverage or one-sided loading. The predecessor project modeled true point loads; this one trades that for generality and says so.
Cb = 1.0, no continuity, no cantileversconservative for most roof framing; cantilevers come out as separate short members if labeled, or wrong if not. Edit or ignore those.
tributary = bank spacing of labelsa proxy that inherits label-extraction quality; flagged per member, edge members flagged single-sided.
no snow drift, no wind on equipment, no seismic mass changethe added-load sliders model uniform area load only; drift zones near steps need an engineer.
joists only from the carried verified table rowsother joists are recognized and listed as not rated; never green.
scale unknown means unrated membersno silent fallback to the printed scale note when written dimensions cannot confirm it.
Why the predecessor of this exact idea failed, and what is different. An earlier project in this repo turned the same Garrett sheet into member verdicts and was judged not demoable: its AI stage re-read text that the text layer already provided (decorative), its geometric heuristics encoded one firm's drafting habits but were presented as general, and several claims shipped before being checked against the artifact. This build's answers: the model only ever works where code cannot (raster pixels, grouping judgment, reading prose notes) and its checkable claims are machine-verified; geometry is limited to two stated, inspectable measurements (dimension chains, label spacing) that are tagged as proxies rather than presented as understanding; and every claim on this page is either served live from the running code or carries the measured number next to it. The honest residue is visible in the product: model-grouped spans are blue, not green, and the gray column says why each refusal happened.
Appendix

Cost, caps, caching

model
meteringtoken usage from each API response is the accounting source of truth; the budget is checked before every call; hard cap per job, abort before overspend. A vector sheet costs roughly $0.10 to $0.40 (one grouping call with thinking, one loads call); a raster sheet adds ~13 tile calls.
cachingresponses cached by (model + prompt + images) hash. Bundled samples share a cache, so repeat demo runs replay instantly and free; uploads always run live.
restart safetyjobs persist to runs/ (job.json + events.jsonl + results.json); a reload resumes from the URL on the page the run analyzed; a dead event stream shows a banner, not a spinner.