
Training a machine learning model on antibody-antigen binding requires affinity measurements that span several orders of magnitude with high resolution. Companies building sequence-affinity models do not use naive synthetic libraries for this. They design libraries to yield binders across a deliberate range of affinities, spanning tight, moderate, and weak binders, so that enrichment scores and rank-ordered bins can serve as affinity proxies in the training set. The proxy is suboptimal and flawed in two ways.
Binned enrichment scores are a crude proxy for affinity, and they carry a systematic error that goes beyond the primary issue of limited resolution. Binders with fast koff dissociate from the antigen before cells reach the detector. They sort into low or negative bins. In the training dataset, they are labeled non-binders. However, many of them are not non-binders. They bind with measurable affinity. They were simply not bound at the moment of detection. The dataset never captures them as binders because the assay architecture removes them before labeling begins. Fast off-rate sequence space is absent from the training data. Those variants were removed before labeling began. The model learns affinity output from slow koff sequences as the definition of binding, because it never saw anything else.
This post covers the established display methods, what they actually measure versus what they enrich, and how a workflow combining yeast display affinity maturation with SPOC high-throughput SPR produces the data these models need to differentiate binders for more accurate learning of biological interactions.

Affinity measurement platforms compared across KD resolution floor, library size, data type, and throughput. Display-based methods (phage, ribosome, mRNA, yeast, mammalian) produce enrichment scores, not quantitative affinities. Tite-Seq returns apparent KD values but saturates below ~300 pM and requires impractical volumes at that limit. SPOC HT-SPR and KinExA resolve into the picomolar range with full kon, koff, and KD per variant. Chaser denotes a solution-phase kinetics extension of SPR in which a competing ligand is introduced after the association phase to measure true solution-phase koff, removing rebinding artifacts inherent to standard surface-based SPR (Quinn, Anal. Chem. 2025).
Phage Display
Phage display is the most widely used platform for antibody discovery. Libraries exceed 10¹¹ unique clones. Selection works by biopanning: iterative cycles of antigen binding, washing, and amplification. Stringency is controlled by reducing antigen concentration and increasing wash harshness across rounds.
Phage display excels as a selection platform for identifying hits. Quantitative affinity measurement requires a separate, orthogonal assay. It does not generate KD values. Hits are confirmed by SPR or BLI after selection. Primary outputs from naïve libraries vary widely by target, library design, and panning strategy, typically in the nM-to-sub-µM range, with tighter affinities usually obtained after maturation. With CDR walking, chain shuffling, or small perturbation mutagenesis combined with deep sequencing, campaigns can reach low-picomolar KD. The picomolar number is confirmed by SPR post-selection, not by phage display itself.
Each panning round includes stringent wash steps. Phage particles displaying variants with fast koff dissociate from the antigen during the wash and are lost. Selection systematically enriches for slow off-rate binders, regardless of KD, which may be okay for therapeutic discovery campaigns but is suboptimal for ML training. A variant with a fast kon and fast koff can have the same or better equilibrium affinity as a slow kon/slow koff variant, yet be depleted across successive rounds. The selected pool is kinetically biased before any measurement is made.
For ML training data, phage display produces sequences and binary binding calls, not a KD landscape.
Yeast Display
Yeast display attaches antibody fragments to the Aga2p surface anchor on Saccharomyces cerevisiae. FACS sorts cells based on fluorescent antigen binding. KD is measured by equilibrium titration: cells are labeled across a series of antigen concentrations and MFI is plotted against concentration to extract a dissociation constant.
FACS titrations in routine workflows become unreliable at sub-nM range. The practical floor depends on antigen prep quality, nonspecific background, display level, and assay design. Below this range, non-specific binding, pipetting precision, and dead-cell signal exceed the antigen-specific signal variance. Off-rate enrichment (labeling cells with antigen, then chasing with unlabeled competitor) can enrich binders below this range but does not produce a KD value. It identifies slow off-rate variants without quantifying them.
FACS sorting also introduces a wash-dependent bias. Cells are washed between labeling and sorting. Variants with fast koff dissociate antigen during that interval and score lower fluorescence than their KD would predict. Slow off-rate variants are systematically favored. Two variants with identical KD but different kinetics will sort into different fluorescence bins. The selected pool does not represent the full kinetic diversity of the library.
Library sizes reach 10⁸–10⁹, constrained by yeast transformation efficiency. Yeast display provides a key advantage over phage: FACS gating allows sorting on the ratio of antigen signal to expression tag signal, correcting for display level variation across clones. This normalization is crucial to separate expression from binding when comparing affinities across a diverse population.
Yeast display in scFv format introduces a Gly-Ser linker that constrains VH-VL geometry. For some antibodies, this alters the paratope relative to the native Fab or IgG format. Fab display formats are available and increasingly used, but library sizes are smaller and the workflow is more involved.
Tite-Seq
Tite-Seq is a pooled deep mutational scanning method built on yeast display. Cells are labeled at 8–11 antigen concentrations spanning several orders of magnitude and sorted into four fluorescence bins at each concentration. Bins are deep-sequenced using NGS. Full binding titration curves are inferred for thousands of sequence variants simultaneously, using mean bin number as a proxy for mean cellular fluorescence.
The quantitative lower detection limit is approximately 300 pM. Even reaching that limit requires antigen concentrations low enough that the volumes needed become impractical at scale. Below 300 pM, essentially all cells sort into the top fluorescence bin at the lowest workable antigen concentrations. The titration curve cannot be resolved. Variants with KD values of 10 pM, 50 pM, and 150 pM all appear saturated and cannot be distinguished from one another.
The dynamic range is roughly 4 orders of magnitude, from ~300 pM to ~1 µM. Within that window, Tite-Seq provides simultaneous KD estimates for thousands of variants. Outside it, the method produces rank-order saturation, not quantitative values.
The original Tite-Seq paper (Adams et al., eLife 2016) demonstrated the method on CDR1H and CDR3H single-point mutants of a single scFv parental. MAGMA-seq (Koska et al., Nat Commun 2024) extended the approach to Fab format and multi-antibody, multi-antigen pools, with barcoded libraries across up to 10 parental antibodies screened against two antigens simultaneously. The quantitative floor for current MAGMA-seq demonstrations is in the low-nanomolar range, reflecting the affinities of the antibodies tested. Antibodies with KD above 1 nM generate sufficient bin variance to reconstruct reliable curves. By analogy to Tite-Seq and yeast display titration physics, variants tighter than ~300 pM are expected to saturate the assay, but this has not been benchmarked across a broad panel in the 2024 paper.
Both methods are limited to scFv or Fab formats and require the target antigen in fluorescently labeled form. The sorting step includes a wash, and the same fast koff bias that affects clonal yeast display applies here. Variants with fast dissociation rates lose antigen signal before cells reach the detector, compressing their apparent fluorescence toward lower bins. The bin score encodes a mix of equilibrium affinity and off-rate, not KD alone.
Mammalian Cell Display
Mammalian cell display presents antibodies on the surface of HEK293T or CHO cells in scFv, Fab, or full IgG format. FACS-based equilibrium titration gives KD values through the same curve-fitting approach used in yeast display.
FACS-based mammalian display has the same general limitations as yeast display for very tight binders. Sub-nM affinities usually require confirmation off-platform by SPR.
The wash-dependent fast koff bias is present here as well. Cells are washed between antigen labeling and sorting. Variants with fast off-rates dissociate during that window and are scored lower than their equilibrium KD would indicate. The selection enriches for slow off-rate binders across all displayed formats, including full IgG.
Library sizes are 10⁶–10⁷, constrained by mammalian transfection efficiency. Mammalian display is used for affinity maturation and functional screening of pre-selected leads. Full IgG display preserves glycosylation, native HC-LC pairing, and bivalent geometry, which makes it most relevant for selecting in the format that will advance to the clinic.
Ribosome Display and mRNA Display
Ribosome display and mRNA display are cell-free selection platforms. Ribosome display produces ternary complexes of mRNA, ribosome, and nascent protein selected against biotinylated antigen in solution. mRNA display covalently links the mRNA to the protein via a puromycin linker, improving complex stability.
Library sizes are 10¹²–10¹⁴ for ribosome display and 10¹³–10¹⁵ for mRNA display. Solution-phase selection can be performed at antigen concentrations in the low-picomolar range. Published campaigns have reached sub-100 pM KD values confirmed by SPR post-selection.
Neither platform generates KD values directly. Quantification requires reformatting and orthogonal measurement. Single-chain formats (scFv) are most common; Fab ribosome display has been demonstrated but requires managing two polypeptide chains and is more involved.
Both platforms involve wash steps during selection against immobilized or bead-captured antigen. Ternary complexes with fast koff dissociate during washing and are lost. The selection bias toward slow off-rate binders applies here as it does in cell-based display, and the effect compounds across iterative selection rounds.
The Shared Limit
Every display method covered here shares two limitations when used for ML training data generation. The first is the quantitative floor: Tite-Seq reaches approximately 300 pM but requires impractical antigen volumes to get there; standard yeast and mammalian display FACS titration becomes unreliable below approximately 1 nM; MAGMA-seq demonstrations to date have been concentrated in the nanomolar regime. Below these thresholds, display-based measurements compress to rank-order saturation.
The second limitation runs through the entire measurement, not just the floor. Every selection step that includes a wash preferentially depletes fast off-rate binders. A variant with a fast kon and fast koff dissociates antigen during the wash interval and scores lower fluorescence than its KD warrants. By the time a library has gone through multiple selection rounds, the surviving pool is enriched for slow koff variants and depleted of fast koff variants, independent of equilibrium affinity. Models trained on this data do not see the full kinetic landscape. They learn an affinity-proxied landscape that is both compressed at the tight end and kinetically filtered throughout.
The Missing Binders
Every display-based method covered here shares a structural bias: wash steps during selection deplete fast koff variants before any measurement is made. This is not a calibration error or a sensitivity limitation. It is a selection artifact built into the workflow.
A variant with a fast koff dissociates from the antigen during the wash interval. In cell-based display, it loses fluorescent antigen signal before reaching the FACS detector and sorts into a low or negative bin. In bead-based ribosome or mRNA display, the ternary complex dissociates before the capture step and is washed away. In phage panning, the phage particle elutes during the wash and is lost from the amplified pool. Across all formats, the outcome is the same. Fast koff variants are assigned low scores, depleted across iterative rounds, or discarded entirely.
The labels they receive are wrong. A fast koff binder that dissociates during the wash scores near zero fluorescence. In a training dataset, that variant is labeled a non-binder. It is not. It binds. It binds with measurable affinity. Its KD may be identical to a slow koff variant that scores high fluorescence and gets labeled a strong binder. The difference between them is kinetic, not thermodynamic. The display assay cannot distinguish the two.
Models trained on these datasets learn that certain sequence features correspond to non-binding. Some of those features do not correspond to non-binding. They correspond to fast dissociation. The model has no way to know the difference, because the training data never included a fast koff binder with a measured affinity. That class of molecule was removed from the dataset before labeling began.

Display-based methods lose information at four levels: fast off-rate binders are depleted during selection (A), continuous affinity values are discretized into FACS bins (B), distinct KD values collapse into the same bin (C), and kinetic differences between variants are never captured (D). SPOC HT-SPR retains both fast and slow off-rate binders, measures affinity as a continuous variable, resolves distinct KD values quantitatively, and returns full kon and koff per variant. The downstream difference is a model trained on a compressed, kinetically filtered approximation of affinity space using display methods versus one trained on the full SPR kinetic landscape.
The bias compounds. Libraries selected through multiple FACS or panning rounds are progressively depleted of fast koff variants. By the time a training library is assembled from display-enriched sequences, fast koff sequence space is severely underrepresented. The model trains on a kinetically filtered slice of affinity space and generalizes as if that slice were complete.
The SPOC Workflow for ML Training
SPOC’s data generation workflow uses a two-stage process. Yeast display affinity maturation generates a sequence-diverse population of variants enriched for binding. FACS selection at progressively reduced antigen concentrations identifies the high-affinity fraction. This stage uses display-based methods for what they do well: large-scale, rapid enrichment.
The enriched variants move downstream to SPR on SPOC’s HyperSynaptiKx platform. SPOC has reported that its cell-free expression system produces VHH, scFv, and Fab and that SPR measures full kinetic profiles (kon, koff, and KD) for the top variants per campaign. SPOC workflow processes 1,152–2,304 variants per campaign with a quantitative lower limit for SPR KD measurement of approximately 1–10 pM. The detection floor extends into the femtomolar regime using the Chaser method and mixed-phase assay (Quinn, Anal. Chem. 2025).
SPOC SPR measures kinetic binding parameters using concentration-dependent sensorgrams, independent of changes in protein expression levels. There is no wash step between binding and detection. A fast koff variant produces a distinct, steeply declining dissociation curve. It is quantified. It is not washed away, binned near zero, or labeled a non-binder. Its kon, koff, and KD are measured directly alongside slow koff variants with the same equilibrium affinity. Two variants that would be indistinguishable by FACS, or that would sort into opposite ends of the binding distribution, produce separate, fully characterized kinetic profiles by SPR.
The training data that comes out of this workflow contains both classes of binder. Fast koff sequence space is represented. Slow koff sequence space is represented. The model sees the full kinetic distribution across the affinity range, not a wash-filtered approximation of it.
Tite-Seq can push the floor to ~300 pM but requires impractical antigen volumes to get there and still cannot resolve kinetic differences at equilibrium. Standard yeast or mammalian display FACS titration becomes unreliable below ~1 nM and cannot process thousands of variants per campaign. Neither approach separates variants with the same equilibrium KD but different kinetic profiles. High-throughput SPR instruments can run 96 flow cells in parallel, but feeding them requires upstream enrichment, followed by binder synthesis and purification. Yeast display affinity maturation provides that enrichment. Each component handles what it is suited for. SPOC bridges these critical gaps.