Evaluation Process — SFR Framework

Section 1

Criterion Assessment Structure

Each of the three fundamental criteria is assessed independently against the required inputs collected during the reference test. For each criterion, the evaluator assigns one of three outcomes: Pass, Fail, or Insufficient Data. These outcomes are determined by the presence, absence, or ambiguity of required input data — not by subjective impression.

Criterion	Outcome	Condition
A Causative Accuracy	PASS	Motion telemetry (R1) and physics telemetry (R2) confirm that motion output during reference events is derived from live physics state. Actuator telemetry (R3) shows no post-processed or scripted motion profiles. The motion event corresponds directly to the physics event that caused it.
	FAIL	Motion telemetry reveals scripted, canned, or post-processed motion that does not correspond to the live physics state. Or: motion is present but actuator telemetry confirms it is generated from a source other than the physics model output. Or: motion at the cockpit reference point does not originate at the vehicle's center of mass.
	INSUFFICIENT DATA	Physics telemetry (R2) or actuator telemetry (R3) is unavailable, below quality threshold, or contains gaps that prevent comparison with motion telemetry. Causative source cannot be confirmed or denied from available evidence.
B Temporal Coherence	PASS	Synchronization measurements (R4) confirm that the motion cue arrives within the correct temporal relationship to the physics event across all reference events. Combined Axis event (Event 3) data confirms that independent axes do not corrupt each other's timing under simultaneous demand.
	FAIL	Synchronization data reveals that the motion cue arrives after a delay that exceeds the valid temporal relationship for the channel. Or: visual cue precedes the motion cue by a measurable interval under Event 2 conditions. Or: simultaneous axis demand in Event 3 produces timing degradation in one or both axes.
	INSUFFICIENT DATA	Synchronization measurements (R4) are missing, use different time bases without verified offset correction, or contain gaps during reference events. Temporal relationship between channels cannot be established from available evidence.
C Human Response Relevance	PASS	Control system measurements (R5) confirm that the participant's control corrections occur at Event 4 (Limit-State Threshold) in a pattern consistent with physical sensation response rather than visual anticipation. The correction onset timing and pattern is consistent with a vestibularly-driven response.
	FAIL	Control system measurements reveal that the participant's control corrections at the limit state are delayed relative to the physics event in a pattern consistent with visual response rather than physical sensation response. Or: participant completes the limit-state event without any measurable correction, indicating the physical cue is insufficient to trigger a response.
	INSUFFICIENT DATA	Control system measurements (R5) are unavailable or do not capture the relevant response window. The response pattern cannot be classified as physical or visual from available evidence alone.

A system classification requires a determination on each criterion. Insufficient Data on any criterion prevents a final In-the-Loop classification, regardless of Pass results on other criteria.

Section 2

Borderline System Handling

Borderline cases arise when a system's architecture or available data does not produce a clear Pass or Fail determination on one or more criteria. The following guidance applies to four common borderline scenarios.

Hybrid Architectures

A system that meets some but not all structural requirements of the In-the-Loop Standard. For example: physics-driven motion (Req. 1 met) but rotation not resolved at center of mass (Req. 2 not met). In this case, Criterion A is evaluated against what the data shows, not what the system's documentation claims. A hybrid architecture does not receive partial credit. Criteria are assessed on evidence, not on architectural intent.

Rule: Evaluate on evidence. Partial structural compliance does not produce a partial Pass.

Incomplete Required Data

When one or more required inputs are unavailable for a specific criterion but available for others, the evaluation proceeds on the criteria for which required data exists. The criterion with missing data is assigned Insufficient Data. The evaluator must not infer a Pass from evidence that does not directly address the criterion in question.

Rule: Insufficient Data is the correct outcome for missing required inputs. It is not a soft Fail — it is an open determination pending evidence.

Missing Telemetry for One Axis

When actuator or motion telemetry is available for some axes but not others during a reference event, the evaluator may complete the assessment for the axes with available data. For Criterion B assessment, if the key axis for the reference event (e.g., yaw for Event 2) has missing telemetry, that event is invalid for Criterion B purposes and must be repeated or noted as Insufficient Data for that criterion.

Rule: Partial axis telemetry supports a partial assessment. The criterion defaults to Insufficient Data only if the missing axis is required for the specific criterion being assessed.

Partially Compliant Systems

A system may Pass Criterion A and B but Fail Criterion C, or any other combination. The overall classification reflects the weakest criterion result. A system that Passes two criteria but Fails one cannot be classified as In-the-Loop. The final classification is determined by the classification logic defined in Section 5.

Rule: In-the-Loop requires Pass on all three criteria. Any Fail on any criterion results in Surface-Level or Out-of-the-Loop classification depending on the structural nature of the failure.

Section 3

Evidence Hierarchy

When evaluating a system, evidence is weighted according to its source. Higher-tier evidence takes precedence over lower-tier evidence. Lower-tier evidence cannot override higher-tier evidence, even if it contradicts it. Discrepancies between tiers are recorded in the evidence summary.

1

Highest Authority

Measured Telemetry

Calibrated, timestamped data captured by instrumentation during reference events. Includes motion telemetry (R1), physics telemetry (R2), actuator telemetry (R3), synchronization measurements (R4), and control system measurements (R5). Tier 1 is the primary basis for all criterion assessments. It cannot be overridden by any other tier.
2

High Authority

System Architecture Documentation

Manufacturer or developer-supplied specifications, block diagrams, physics model documentation, and actuator specifications. Used when Tier 1 data is unavailable for a specific parameter. May support an assessment but cannot produce a Pass determination on its own if Tier 1 data is absent. Establishes the basis against which Tier 1 measurements are interpreted.
3

Limited Authority

Observed Behavior

Structured observation of system behavior during evaluation, without instrumentation. Includes evaluator notes on motion timing, visual synchronization, and driver response patterns during reference events. May support Insufficient Data determinations or provide context for Tier 1 and Tier 2 evidence. Cannot produce a Pass determination. May support a Fail determination when behavior clearly contradicts a Pass condition.
4

Lowest Authority

Manufacturer Claims

Marketing materials, product descriptions, claimed specifications, testimonials, and unverified assertions about system performance. Tier 4 evidence carries no weight in classification determinations. It may be cited in the evidence summary for context, but it cannot support, modify, or override any criterion assessment at any other tier. Absence of a Tier 4 claim is not evidence of absence of the condition.

Tier 4 evidence cannot override Tier 1. A manufacturer's claim that a system is in-the-loop is not evidence that it is. Tier 1 measurement determines the outcome.

Section 4

Repeatability Requirement

Repeatability is not a quality goal for SFR evaluations. It is a structural requirement. An evaluation that cannot be reproduced under the same conditions is not a valid evaluation — it is a single observation. For a classification to be credible, it must satisfy the following three repeatability conditions simultaneously.

R1

Same Methodology. The evaluation must be conducted according to the Reference Test Methodology (reference vehicle, reference events, reference conditions, measurement procedures). Any deviation from the reference methodology invalidates the repeatability requirement for that evaluation session.
R2

Same Test Conditions. The system under evaluation must be in the same operational configuration across all sessions. Software version, hardware configuration, calibration state, and environmental conditions must be documented and held constant. A classification conducted on a non-standard configuration does not apply to the standard configuration.
R3

Same Classification Outcome. When the same system is evaluated by a different evaluator under the same methodology and test conditions, the classification must be the same. If two evaluators reach different classifications from the same evidence, the evidence is insufficient for a definitive determination and the system is classified as Insufficient Data pending resolution of the discrepancy.

The repeatability requirement is what distinguishes a classification standard from an opinion. It requires that the methodology, not the evaluator, determines the outcome.

A classification that changes depending on who performs the evaluation is not a classification. It is a judgment call. The repeatability requirement eliminates judgment calls from the classification process.

Section 5

Evaluation Output Format

Every SFR evaluation produces a standardized output record in the following format. No numerical scores are assigned. The output contains three components: a Classification Result, Supporting Findings per criterion, and an Evidence Summary.

SFR Evaluation Output Record — Standard Format

Classification Result

In-the-Loop Surface-Level Out-of-the-Loop

One result is assigned. In-the-Loop requires Pass on all three criteria. Surface-Level applies when one or more criteria Fail and the system has physics-derived motion present. Out-of-the-Loop applies when no physics-derived motion is delivered to the participant.

Supporting Findings — Per Criterion

Criterion A Causative Accuracy

PASS or FAIL or INSUFFICIENT DATA

+ one-sentence finding statement

Criterion B Temporal Coherence

PASS or FAIL or INSUFFICIENT DATA

+ one-sentence finding statement

Criterion C Human Response Relevance

PASS or FAIL or INSUFFICIENT DATA

+ one-sentence finding statement

Evidence Summary

Tier level used for each criterion assessment. Data quality notes. Required inputs present/absent. Optional inputs used (if any). Discrepancies between evidence tiers (if any). Date of evaluation, reference event set used, system configuration at time of evaluation.

The output record contains no numerical scores. A Classification Result of In-the-Loop means all three criteria passed. Any other result specifies which criteria failed or produced Insufficient Data so that the nature and location of the limitation is clear.

The output format is standardized so that results from different evaluators and different sessions can be compared on a common basis.

Process Statement

Classification Without Numbers

The SFR evaluation process at v0.9 deliberately avoids numerical scoring. This is not a limitation — it is a deliberate sequencing decision. Before assigning numbers to a classification system, the classification logic must be repeatable without numbers. A framework that cannot produce consistent Pass/Fail determinations will not produce consistent scores. The goal of this sprint is to establish repeatable classification before introducing numerical scoring in a future version.

When the three reference methodology documents (Reference Test Methodology, Evaluation Inputs, and this document) produce consistent, repeatable classifications across independent evaluators, the framework is ready to define the numerical scoring layer that sits above it. Until then, structural classification is the foundation.

Repeatability before scoring. Structure before numbers. Classification before rating.

SFR Evaluation Process

Criterion Assessment Structure

Borderline System Handling

Hybrid Architectures

Incomplete Required Data

Missing Telemetry for One Axis

Partially Compliant Systems

Evidence Hierarchy

Measured Telemetry

System Architecture Documentation

Observed Behavior

Manufacturer Claims

Repeatability Requirement

Evaluation Output Format

Classification Without Numbers