Evaluation Record Template

Evaluation Record Template

The standards-report format for each completed classification.

This page defines the evaluation record format used for each completed SFR classification. The template is the canonical structure for all records entered in the Results Registry. Every field is required. The template is shown below with field descriptions and example placeholder values.

About the Evaluation Record


The evaluation record is the primary output of an SFR classification evaluation. It documents the system being evaluated, the evidence available, the determination for each criterion, and the resulting classification. Records are completed independently by each evaluator and then compared to produce the inter-evaluator agreement assessment.

The record format is designed to be unambiguous in its required fields, so that two evaluators filling out independent records for the same system can be systematically compared without requiring judgment about what each evaluator meant by their entries. All determination fields use controlled vocabulary (Pass / Fail / Insufficient Data). All evidence fields reference the tier classification from the Evidence Hierarchy.

The evaluation record is not a score sheet. It is a structured evidence record. Its value is in what it captures, not in whether the result is "good" or "bad."

Record Template


The following is the complete evaluation record template, shown with placeholder values indicating the type of entry expected in each field.

SFR Evaluation Record
Simulation Fidelity Rating Framework — Classification Determination
SFR v0.9 Draft
Evaluation ID SFR-EVAL-[NNNN]
Evaluation Date YYYY-MM-DD
Evaluator ID IND-[XX] (Independent)
Framework Version SFR v0.9 Draft / June 2026
Record Status Draft / Final / Disputed
Companion Record SFR-EVAL-[NNNN]-B
Part A — System Identification
System Identifier Anonymized system identifier assigned by program coordinator (e.g., SYS-2026-001). Not the manufacturer name or model designation.
System Type General system type description: e.g., "Vehicle simulator, 6-DOF motion platform, enclosed cockpit"
Application Domain Primary intended use domain: e.g., motorsport training, aviation training, rehabilitation, research
Submitting Organization Organization type only (e.g., "University research program"). Individual names not included in public record.
Test Location Country / region where evaluation was conducted
Part B — Evidence Sources
Tier 1 Evidence List telemetry sources captured: e.g., "Physics output logs, actuator command logs, synchronized timestamp capture across channels." State "None available" if absent.
Tier 2 Evidence List architecture documents received: e.g., "Physics model specification v2.3, actuator datasheet, synchronization architecture diagram." State "None provided" if absent.
Tier 3 Evidence Describe structured observation conducted: e.g., "Reference test sequence observed in person by both evaluators. Video recorded." State "None" if not applicable.
Tier 4 Evidence Note any manufacturer claims received: e.g., "Vendor provided technical brochure claiming physics-based motion. Not used for any determination." Note all Tier 4 material even if not used.
Highest Tier Available State the highest evidence tier available across all criteria: Tier 1 / Tier 2 / Tier 3 / Tier 4
Part C — Criterion Determinations
Criterion A — Causative Accuracy
Determination [ Pass / Fail / Insufficient Data ] Evidence Tier Used State the tier of evidence on which the determination is based Basis One to three sentences explaining the specific evidence and reasoning that produced this determination. Quote specific documents or data points. Limitations Note any evidentiary limitations that affected confidence in this determination, or state "None identified."
Criterion B — Temporal Coherence
Determination [ Pass / Fail / Insufficient Data ] Evidence Tier Used State the tier of evidence on which the determination is based Basis One to three sentences explaining the specific evidence and reasoning that produced this determination. Quote specific documents or data points. Limitations Note any evidentiary limitations that affected confidence in this determination, or state "None identified."
Criterion C — Human Response Relevance
Determination [ Pass / Fail / Insufficient Data ] Evidence Tier Used State the tier of evidence on which the determination is based Basis One to three sentences explaining the specific evidence and reasoning that produced this determination. Quote specific documents or data points. Limitations Note any evidentiary limitations that affected confidence in this determination, or state "None identified."
Part D — Final Classification
Classification Determination
[ In-the-Loop / Surface-Level / Out-of-the-Loop ]
Derivation: Classification is derived from criterion determinations according to the SFR Evaluation Process document. All three criteria Pass → In-the-Loop. Any criterion Fail → Surface-Level or Out-of-the-Loop depending on motion presence. Any Insufficient Data → classification withheld pending additional evidence.
Part E — Evaluator Notes
Methodology Notes Any observations about how the methodology was applied, ambiguities encountered, or interpretations made. Particularly important where evaluator made a judgment call that another evaluator might make differently.
Framework Gap Observations Any observations about places where the published framework criteria were unclear, ambiguous, or insufficient to produce a confident determination without additional interpretation. These entries are the primary input to methodology refinement.
Evidence Gap Observations Any observations about evidence types that were needed but not available, or that would have changed the determination if present.
Part F — Supporting Documentation Index
Document 1 Document name / type / date / tier classification
Document 2 Document name / type / date / tier classification
Document 3+ Additional documents as applicable
Telemetry Files Reference to telemetry capture files if Tier 1 evidence was collected. File format, timestamp range, channels captured.

Field Definitions and Controlled Vocabulary


Field Required Controlled Vocabulary
Evaluation IDYesFormat: SFR-EVAL-[4-digit sequence]. Assigned by program coordinator before evaluation begins.
Criterion DeterminationYesExactly one of: Pass / Fail / Insufficient Data. No other values accepted.
Evidence Tier UsedYesExactly one of: Tier 1 / Tier 2 / Tier 3 / Tier 4. Must reflect the highest tier available and actually used.
Final ClassificationYesExactly one of: In-the-Loop / Surface-Level / Out-of-the-Loop / Withheld (Insufficient Data). Derived from criterion determinations, not independently chosen.
Record StatusYesDraft: evaluation complete, not yet compared. Final: compared with companion record, agreement confirmed. Disputed: comparison complete, disagreement unresolved.
Framework Gap ObservationsOptionalFree text. No controlled vocabulary. Required only where evaluator encountered a methodological ambiguity. "None identified" is a valid entry.

Records that use non-controlled vocabulary in determination fields are not valid evaluation records and cannot be entered in the Results Registry. Controlled vocabulary exists to enable systematic comparison.

The Record Is the Evidence

An evaluation that is not recorded in a structured, comparable format has limited value for framework development. The evaluation record is the mechanism by which individual evaluation results become part of the evidence base — by which a judgment made by an evaluator in one location becomes reproducible and comparable to a judgment made by a different evaluator about a different system in a different location. The template exists to make that comparison possible.

Every field is required. Controlled vocabulary is mandatory. The record format is the evidence infrastructure.