Validation & Evidence

Framework Validation

Why the SFR framework must be tested before it can be trusted.

A proposed standard that produces no independent evaluation record is an assertion. A proposed standard that demonstrates repeatable classifications across independent evaluators has begun to build the evidence base that earns trust and advances toward ratification. This section explains why validation matters, what the validation infrastructure consists of, and what must exist before the framework can advance from Stage 1 Community Review to Stage 2 Independent Evaluation.

Why Repeatability Matters


A classification framework has one non-negotiable requirement: two independent evaluators applying the same methodology to the same system with the same evidence must reach the same classification outcome. If they do not, the framework is not measuring what it claims to measure. The classification criteria are ambiguous, the evidence hierarchy is not being applied consistently, or the methodology produces evaluator-dependent results. None of these conditions is acceptable in a framework that aspires to standards status.

Repeatability is not a procedural nicety. It is the structural property that distinguishes a standard from an opinion. An evaluation of a simulation system conducted by one evaluator is evidence. An evaluation conducted by two independent evaluators that reaches the same result is reproducible evidence. A body of such evidence, accumulated across multiple systems and evaluators, is the foundation on which a formal standard is built.

The SFR framework's evaluation methodology defines three criteria (Causative Accuracy, Temporal Coherence, Human Response Relevance) and a four-tier evidence hierarchy. These definitions are the mechanism by which repeatability is intended to be achieved. But whether they actually produce repeatable results in practice — across real systems, real evidence, and real evaluators who were not involved in writing the framework — is an empirical question that has not yet been answered.

Repeatability is not claimed. It is demonstrated. The validation infrastructure exists to generate the demonstration.

Why Independent Evaluation Matters


An evaluation conducted by the framework's authors is not independent. It may still be technically rigorous, but it carries an inherent limitation: the authors know what the framework is intended to produce. Their application of the criteria may be unconsciously guided by that knowledge. Confirmation bias in evaluation is not a personal failing — it is a structural risk that applies to any framework whose evaluation is conducted only by those who designed it.

Independent evaluation means evaluation conducted by parties who had no involvement in the design of the framework, who are applying the criteria from the published documentation alone, and who receive no guidance from the authors during the evaluation process. The result of independent evaluation is more informative than any internally-conducted evaluation because it tests whether the published framework is clear enough, complete enough, and unambiguous enough for someone who did not write it to apply it correctly.

Where independent evaluators apply the methodology and reach different conclusions from each other, the disagreement itself is data. It identifies the specific criteria, conditions, or evidence-type combinations where the methodology is ambiguous. Those are the gaps that must be resolved before the framework can claim repeatability. Without independent evaluation, those gaps remain invisible — and present in the methodology without being known.

What Independent Evaluation Tests

Whether the framework criteria are clear enough to apply without author guidance. Whether the evidence hierarchy resolves ambiguous cases consistently. Whether two evaluators who follow the methodology agree on what the evidence shows.

What Independent Evaluation Does Not Do

It does not certify the system being evaluated. It does not endorse the evaluator. It does not produce a score or ranking. It produces a classification determination, an evidence tier, and a record — nothing more.

How Evidence Strengthens Standards Development


Standards development is not a design process — it is an evidence accumulation process. The normative documents define what the standard requires. The validation process tests whether the standard's requirements are measurable, consistent, and reproducible. Evidence accumulated through validation is the feedback mechanism that closes the loop between normative intent and practical reality.

Specifically, a growing body of evaluation records does three things for the SFR framework:

The evaluation record infrastructure exists to accumulate evidence. Evidence is what earns a standard its standing.

Validation Infrastructure


The SFR Validation & Evidence layer consists of five documents, each addressing a distinct aspect of the evidence accumulation process.

Readiness for Stage 2: Independent Evaluation


The Adoption Roadmap defines Stage 1 as Community Review and Stage 2 as Independent Evaluation. Advancing from Stage 1 to Stage 2 requires specific conditions to be met. These conditions are listed below with their current status.

Stage 2 Readiness Assessment — SFR v0.9 Draft
Stage 1 → Stage 2: Required Conditions
  • At least two organizations from different implementation pathway types have engaged in community review Required: Documented feedback from at least two distinct organization types (manufacturer, university, motorsport, aviation, rehabilitation, military, or researcher) following review of the normative corpus. Not yet met.
    Not Met
  • Substantive community review feedback has been collected and documented Required: Written feedback on the normative corpus from external reviewers. Specific: feedback on classification criteria, evaluation methodology, and canonical definitions. Not yet collected.
    Not Met
  • Any substantive feedback has been resolved or documented as a known gap Required: Review feedback assessed and either incorporated into the normative corpus or formally acknowledged as a gap with a resolution path. Not yet applicable (no feedback collected).
    Not Met
  • Evaluation infrastructure is published and structurally complete Required: Evaluation record template, inter-evaluator agreement tracking structure, and results registry are defined and publicly accessible. Partially met: this document set constitutes that infrastructure. Operational use has not yet begun.
    Partial
  • At least one organization has volunteered a system for independent evaluation Required: A system owner or operator has formally submitted a system for evaluation under the Pilot Validation Program. This constitutes the transition trigger from Stage 1 to Stage 2. Not yet submitted.
    Not Met

The framework currently satisfies none of the Stage 2 advancement conditions in full. The infrastructure required to support Stage 2 activity is now in place. Advancement depends on community engagement — specifically, organizations from the implementation pathway types reviewing the normative corpus, providing feedback, and in at least one case, volunteering a system for independent evaluation through the Pilot Validation Program.

Stage 2 cannot begin through declaration. It begins when evidence accumulation begins.

Evidence Before Authority

A framework that asks organizations to reference it in procurement documents, research publications, and policy instruments is asking for a form of trust. That trust should not be granted on the basis of the framework's internal consistency alone — it should be earned through a demonstrated record of independent, repeatable application. The validation infrastructure defined here is the mechanism by which that record is built. It is not complete. It does not yet contain results. But its existence is the first structural step toward a framework that has earned the standing it is proposing to occupy.

The standard must be tested. The testing must be independent. The record must be public. This is the evidence infrastructure that makes all three possible.