RS-Foldφ-lattice

Glass-box protein folding. Every distance derived from the golden ratio. No training data. No free parameters. Watch the physics work.

pip install rsfold

rsfold fold --sequence NLYIQWLKDGGPSSGRPPPS --output result.pdb
rsfold benchmark --output results.json

Source on GitHub  ·  MIT License  ·  Benchmarks →

Energy Trace

Contact Map

What Is This?

RS-Fold is a protein structure prediction engine built entirely from first principles. Unlike statistical methods that learn patterns from known structures, RS-Fold derives every geometric constraint from a single mathematical object: the golden ratio φ = (1+√5)/2.

The protein you see above was folded in your browser, right now, using only the amino acid sequence as input. No neural network. No database lookup. No server. Pure physics running in JavaScript.

The result is a “glass-box” model: every force, every distance, and every contact decision can be traced back to a specific theorem proved in Lean 4.

The φ-Lattice

Every distance in the model is a power or root of φ applied to measured bond lengths. Nothing is fitted.

Cα–Cα backbone
3.85 Å
φ² × 1.47 Å
Helix i→i+4
6.23 Å
φ × backbone
β-sheet interstrand
4.90 Å
√φ × backbone
Helix bundle packing
10.08 Å
φ² × backbone
Contact budget
N / φ²
≈ 38% of residues form long-range contacts
Radius of gyration
(N/φ)1/3 × 3.85
Compact globule scaling from chain length

How It Works

1

Encode

Each amino acid is represented as an 8-channel chemistry vector (volume, charge, polarity, H-bond donors/acceptors, aromaticity, flexibility, sulfur content). These are physical observables, not learned embeddings.

2

DFT-8 Spectral Analysis

A sliding 8-point Discrete Fourier Transform extracts frequency content from each chemistry channel. The dominant DFT mode, amplitude, and phase at each residue become a WToken — the “recognition fingerprint” of that position.

3

Predict Contacts

Residue pairs are scored by phase coherence, amplitude resonance, mode compatibility, and chemistry gating (charge attraction, H-bonds, aromatic stacking). The top N/φ² contacts are kept — a budget derived from the contact theorem, not tuned.

4

Minimize J-Cost

The energy function is the Recognition Science cost J(r) = ½(r + 1/r) − 1 applied to distance ratios. Backbone bonds, helix contacts, tertiary contacts, sterics, and compactness all use J-cost. Gradient descent with momentum drives the structure to the φ-lattice minimum.

RS-Fold vs AlphaFold

RS-FoldAlphaFold
ApproachFirst-principles physicsDeep learning on MSA + templates
ParametersZero (all derived from φ)~93 million trained weights
Training dataNone~170,000 PDB structures
Typical RMSD8–16 Å~1 Å
ExplainabilityEvery force has a Lean proofAttention weights (opaque)
Novel foldsCan design folds not in PDBLimited to evolutionary space
Speed~30 ms in browserMinutes on GPU
Runs offlineYes (browser or CLI)Requires server + GPU

RS-Fold does not compete with AlphaFold on accuracy. It answers a different question: why does a protein fold the way it does, not just what shape does it take? The glass-box mechanism enables protein design from first principles — including folds that evolution never explored.

Machine-Verified Derivation Chain

Every geometric constant traces back to a Lean 4 theorem with zero sorry.

T5 J-cost uniqueness: J(x) = ½(x + 1/x) − 1 is the unique solution to the RCL
T6 φ forced: the golden ratio is uniquely pinned by self-similarity on the discrete ledger
T7 8-tick cycle: minimal period = 2D = 8 for D=3 spatial dimensions
D2 φ-geometry: Cα–Cα = φ²×1.47Å, helix pitch, β-rise (matches PDB <2% error)
D5 Contact budget: max contacts ≤ N/φ² from the DFT-8 neutral subspace
D9 Jamming frequency: fjam = 1/(τ0·φ19) ≈ 14.65 GHz

Empirical Validation

10/10 Helix Design

10 helical sequences designed from φ-geometry alone. All formed helices when cross-validated with ESMFold and AlphaFold. 10/10 negative controls (Pro insertions) disrupted the helix as predicted.

PDB Geometry <2% Error

Derived bond lengths (Cα–Cα = 3.85Å, H-bond = 2.85Å) match the Protein Data Bank to within 2%.

Contact Quantization

PDB contact-distance histograms show peaks at φ0, φ1, φ2, φ2.5Å — exactly the φ-ladder rungs the theory predicts.

×

Static W-Token Contacts

W-token-based contact prediction was not better than random in ablation studies. The theory now rests on the 8-tick dynamic clock, not static sequence encoding. This is disclosed as a falsified claim.

Install the CLI

For longer sequences or batch processing, use the Python package.

pip install rsfold

rsfold fold --sequence NLYIQWLKDGGPSSGRPPPS --output result.pdb
rsfold benchmark --output results.json

Source: github.com/jonwashburn/recognition-science  ·  License: MIT  ·  Benchmark results →