A Patient-First Foundation Model for Computational Pathology
1CSSE, Concordia University, Montréal, Canada 2Mila – Québec AI Institute, Montréal, Canada 3CHUM, Université de Montréal, Montréal, Canada 4IRIC, Université de Montréal, Montréal, Canada 5Université de Montréal, Montréal, Canada 6Canada CIFAR Chair, Polytechnique Montréal, Montréal, Canada 7Dept. of Pathology, McGill University, Montréal, Canada
Checkpoint and task definitions are downloaded automatically from HuggingFace on first use.
Computational pathology needs whole-slide image (WSI) foundation models that transfer across diverse clinical tasks, yet current approaches remain largely slide-centric, often depend on private data and expensive paired-report supervision, and do not explicitly model relationships among multiple slides from the same patient. We present MOOZY, a patient-first pathology foundation model in which the patient case, not the individual slide, is the core unit of representation. MOOZY explicitly models dependencies across all slides from the same patient via a case transformer during pretraining, combining multi-stage self-supervision with scaled low-cost task supervision. In Stage 1, we pretrain a vision-only slide encoder on 77,134 public slide feature grids using masked self-distillation. In Stage 2, we align these representations with clinical semantics using a case transformer and multi-task supervision over 333 tasks from 56 public datasets, including 205 classification and 128 survival tasks across four endpoints. Across sixteen held-out tasks, MOOZY improves macro weighted F1, balanced accuracy, and macro weighted ROC-AUC relative to PRISM by +4.19%, +7.93%, and +6.95%, respectively. MOOZY is also parameter efficient with 85.77M parameters, 14× smaller than GigaPath. These results suggest that patient-level pretraining yields transferable embeddings, providing a path toward scalable patient-first histopathology foundation models.
Stage 1 (top): A frozen patch encoder extracts per-patch features arranged into a spatial grid. Multi-scale crops are sampled with spatial augmentations and block-based masking. A student slide encoder and EMA teacher are jointly trained via CLS-level self-distillation and masked patch prediction. Stage 2 (bottom): The pretrained slide encoder produces per-slide embeddings; a case transformer aggregates them into a unified case embedding, routed to task-specific classification and survival heads.
MOOZY is trained entirely on public data. Stage 1 uses 77,134 slide feature grids (53,286 at 20× and 23,848 at 40×) extracted from ~1.67 billion patches across ~31.8 TB of raw WSI data. Stage 2 uses 41,089 supervised cases (45,179 unique whole-slide images) across 333 tasks from 56 datasets — all 32 TCGA cohorts, all 10 CPTAC cohorts, REG, BC-Therapy, BRACS, CAMELYON17, DHMC Kidney, DHMC LUAD, EBRAINS, IMP Colorectum, IMP Cervix, MBC, MUT-HET-RCC, NADT Prostate, NAT-BRCA, and PANDA. Supervision covers 205 classification and 128 survival tasks across four endpoints (OS, DSS, DFI, PFI) and 23 anatomical sites.
(a) MLP-probe weighted F1 across sixteen held-out tasks. Brackets show [min–max] per task (center = min, outer ring = max).
(b) Linear-probe weighted F1 across the same sixteen held-out tasks.
(c) Macro-averaged weighted F1 (MLP probe) vs. total parameter count (log scale). Bubble size indicates total parameters.
Frozen-feature MLP probe on sixteen held-out tasks. Bold = best, underline = second best. Mean ± std over 5 folds.
| Task | Metric | CHIEF | GigaPath | PRISM | Madeleine | TITAN | MOOZY |
|---|---|---|---|---|---|---|---|
| Residual Cancer Burden | F1 | 0.46 | 0.45 | 0.46 | 0.51 | 0.43 | 0.56 |
| AUC | 0.60 | 0.55 | 0.58 | 0.63 | 0.58 | 0.74 | |
| Bal. Acc | 0.44 | 0.40 | 0.43 | 0.48 | 0.38 | 0.51 | |
| TP53 Mutation | F1 | 0.82 | 0.76 | 0.85 | 0.84 | 0.87 | 0.87 |
| AUC | 0.81 | 0.76 | 0.85 | 0.85 | 0.91 | 0.86 | |
| Bal. Acc | 0.83 | 0.76 | 0.84 | 0.84 | 0.88 | 0.86 | |
| BAP1 Mutation | F1 | 0.86 | 0.84 | 0.80 | 0.85 | 0.84 | 0.89 |
| AUC | 0.75 | 0.63 | 0.71 | 0.78 | 0.82 | 0.79 | |
| Bal. Acc | 0.75 | 0.66 | 0.66 | 0.75 | 0.75 | 0.78 | |
| ACVR2A Mutation | F1 | 0.89 | 0.80 | 0.85 | 0.89 | 0.87 | 0.91 |
| AUC | 0.80 | 0.74 | 0.83 | 0.76 | 0.79 | 0.91 | |
| Bal. Acc | 0.80 | 0.65 | 0.81 | 0.81 | 0.76 | 0.90 | |
| Histologic Grade | F1 | 0.71 | 0.77 | 0.73 | 0.75 | 0.73 | 0.78 |
| AUC | 0.71 | 0.77 | 0.67 | 0.74 | 0.71 | 0.75 | |
| Bal. Acc | 0.73 | 0.77 | 0.73 | 0.74 | 0.73 | 0.77 | |
| KRAS Mutation | F1 | 0.77 | 0.77 | 0.72 | 0.81 | 0.80 | 0.85 |
| AUC | 0.76 | 0.72 | 0.61 | 0.70 | 0.80 | 0.80 | |
| Bal. Acc | 0.74 | 0.76 | 0.63 | 0.77 | 0.81 | 0.79 | |
| IDH Status | F1 | 0.92 | 0.94 | 0.91 | 0.92 | 0.94 | 0.97 |
| AUC | 0.96 | 0.97 | 0.95 | 0.96 | 0.97 | 0.99 | |
| Bal. Acc | 0.92 | 0.94 | 0.91 | 0.91 | 0.94 | 0.97 | |
| Treatment Response | F1 | 0.53 | 0.51 | 0.57 | 0.49 | 0.49 | 0.58 |
| AUC | 0.70 | 0.68 | 0.69 | 0.59 | 0.60 | 0.68 | |
| Bal. Acc | 0.48 | 0.40 | 0.51 | 0.35 | 0.37 | 0.48 | |
| BRCA PAM50 Subtype | F1 | 0.67 | 0.68 | 0.70 | 0.68 | 0.72 | 0.63 |
| AUC | 0.83 | 0.83 | 0.85 | 0.84 | 0.87 | 0.80 | |
| Bal. Acc | 0.51 | 0.52 | 0.57 | 0.53 | 0.58 | 0.50 | |
| HNSC mRNA Subtype | F1 | 0.54 | 0.60 | 0.60 | 0.59 | 0.62 | 0.55 |
| AUC | 0.75 | 0.77 | 0.78 | 0.77 | 0.79 | 0.72 | |
| Bal. Acc | 0.55 | 0.61 | 0.59 | 0.58 | 0.61 | 0.54 | |
| UCEC Genomic Subtype | F1 | 0.55 | 0.57 | 0.63 | 0.57 | 0.66 | 0.56 |
| AUC | 0.75 | 0.76 | 0.81 | 0.75 | 0.82 | 0.74 | |
| Bal. Acc | 0.53 | 0.54 | 0.60 | 0.55 | 0.62 | 0.52 | |
| Synaptophysin Grade | F1 | 0.78 | 0.79 | 0.76 | 0.81 | 0.80 | 0.79 |
| AUC | 0.61 | 0.74 | 0.50 | 0.68 | 0.64 | 0.61 | |
| Bal. Acc | 0.64 | 0.69 | 0.58 | 0.65 | 0.68 | 0.65 | |
| RAS/BRAF Status | F1 | 0.89 | 0.89 | 0.84 | 0.88 | 0.90 | 0.94 |
| AUC | 0.68 | 0.48 | 0.56 | 0.33 | 0.80 | 0.75 | |
| Bal. Acc | 0.68 | 0.68 | 0.57 | 0.66 | 0.68 | 0.85 | |
| Keratinizing SCC Grade | F1 | 0.74 | 0.68 | 0.72 | 0.69 | 0.72 | 0.73 |
| AUC | 0.75 | 0.69 | 0.74 | 0.72 | 0.72 | 0.72 | |
| Bal. Acc | 0.73 | 0.68 | 0.72 | 0.69 | 0.71 | 0.72 | |
| Non-Keratinizing SCC Grade | F1 | 0.80 | 0.78 | 0.75 | 0.80 | 0.81 | 0.77 |
| AUC | 0.66 | 0.61 | 0.52 | 0.66 | 0.71 | 0.66 | |
| Bal. Acc | 0.79 | 0.75 | 0.72 | 0.79 | 0.78 | 0.73 | |
| Primary vs. Metastasis | F1 | 0.92 | 0.92 | 0.92 | 0.93 | 0.93 | 0.93 |
| AUC | 0.62 | 0.60 | 0.66 | 0.75 | 0.77 | 0.69 | |
| Bal. Acc | 0.57 | 0.58 | 0.63 | 0.63 | 0.64 | 0.65 |
Macro-average across sixteen held-out tasks. Each entry averages over five MIL architectures (MeanMIL, ABMIL, CLAM, DSMIL, TransMIL).
| Metric | Backbone | UNI v2 | Phikon v2 | CONCH v1.5 | MUSK | MOOZY |
|---|---|---|---|---|---|---|
| F1 (weighted) | 0.723 | 0.722 | 0.714 | 0.740 | 0.720 | 0.769 |
| ROC-AUC (weighted) | 0.707 | 0.707 | 0.697 | 0.720 | 0.695 | 0.763 |
| Balanced Acc | 0.643 | 0.637 | 0.625 | 0.661 | 0.637 | 0.702 |
Macro-average across sixteen held-out tasks. Each stage and the case aggregator are toggled independently.
| Setting | Stage 1 | Stage 2 | Case Agg. | F1 | AUC | Bal. Acc |
|---|---|---|---|---|---|---|
| Stage 1 only | ✓ | ✗ | ✗ | 0.743 | 0.715 | 0.662 |
| Stage 2 only w/o case agg. | ✗ | ✓ | ✗ | 0.743 | 0.721 | 0.664 |
| Stage 2 only | ✗ | ✓ | ✓ | 0.731 | 0.697 | 0.659 |
| MOOZY w/o case agg. | ✓ | ✓ | ✗ | 0.749 | 0.737 | 0.682 |
| MOOZY | ✓ | ✓ | ✓ | 0.769 | 0.763 | 0.702 |
A board-certified pathologist reviewed attention maps across 20 representative WSIs sampled from eight held-out evaluation cohorts and five encoders. MOOZY achieved the lowest mean semantic gap score (1.00, versus 1.38 for TITAN and 1.75 for PRISM) and near-balanced tumor vs. non-tumor attention (shift 2.63), suggesting broad, diagnostically relevant coverage.
Dimensionality reduction of slide embeddings from four encoders. MOOZY shows the clearest class separation on cancer-type tasks.
@inproceedings{kotp2026moozypatientfirstfoundationmodel,
title={MOOZY: A Patient-First Foundation Model for Computational Pathology},
author={Kotp, Yousef and Trinh, Vincent Quoc-Huy and Pal, Christopher and Hosseini, Mahdi S.},
booktitle={European Conference on Computer Vision (ECCV)},
year={2026},
url={https://arxiv.org/abs/2603.27048},
}