The scoring model

Every parameter, every formula, in the open.

The free scorecard and the paid audit run on the same model. Here is all of it: the twelve levers with their parameters, the formulas, the conservatism cap, and a worked example you can check by hand. The paid audit replaces every benchmark default below with a measured value from your own data.

part of the Token-Efficiency Standard v0.1

The twelve levers

applicable is the fraction of total spend a lever can touch. sLow / sExp / sHigh are the low, expected, and high savings rates on that applicable slice. weight is the lever's contribution to the score. Every range sits inside published, vendor-neutral benchmark evidence and leans conservative on the expected case.

levertierapplicablesLowsExpsHighweight
visibility01.000.020.040.061.5
caps01.000.020.050.101.2
zombies01.000.050.100.181.2
routing10.500.400.550.701.8
caching10.400.200.300.401.5
batch10.250.450.500.501.2
arbitrage10.200.200.300.401.2
hygiene21.000.080.150.251.3
semantic20.200.100.200.350.8
distill20.150.400.600.850.9
triage31.000.050.120.201.4
finops31.000.000.030.061.4

Routing carries the heaviest weight and rate because an oversized model burns 4 to 20 times the tokens of a right-sized one on routine work. finops has a near-zero direct rate but high weight, because it is what keeps every other saving from creeping back.

The formulas

Adoption maps an answer to a number: no = 0, partial = 0.5, yes = 1.0.

TES-Score = round( 100 × Σ weighti·adoptioni / Σ weighti )

Levers overlap (a cached token is also a routed token), so we do not sum them. We compound the spend remaining after each lever, then apply a hard cap:

recoverable = 1 − Πi ( 1 − applicablei · ratei · (1 − adoptioni) )
recoverable = min( recoverable, 0.65 )

The 0.65 cap is a deliberate conservatism guard. No matter what the inputs imply, the model never claims more than 65% of current spend is recoverable. Savings in dollars are the recoverable fraction times annual spend, computed three times for the low, expected, and high cases.

A worked example

Take a $5,000,000 estate that answers no to all twelve levers (the maximally wasteful corner). The score is 0, band Tokenmaxxer. Compounding the expected rates gives a raw recoverable fraction of about 0.73, which the cap pulls to 0.65, so the expected recoverable figure is $3.25M. Any real estate has some levers in place, which lifts the product term, drops the fraction below the cap, and opens up the low-to-high range. The point of the cap is that even the worst case cannot produce an absurd number.

What changes in the paid audit

The defaults above exist to size the prize from self-reported answers. In a paid engagement we replace each lever's applicable share with the measured share from your spend cube, substitute measured savings rates where your own pilots or our prior engagements give them, and drop the cap entirely, because the savings then come from measurement rather than the model. Every re-tune is recorded, old value, new value, and source, so the estimate stays auditable.

the Standard · run the scorecard · the engagement