ICML 2026 accepted paper

HDTree

Generative Modeling of Cellular Hierarchies for Robust Lineage Inference

HDTree learns tree-structured latent representations with a unified hierarchical codebook and validates inferred cell lineages through diffusion-based generation conditioned on hierarchical paths.

Zelin Zang, WenZhe Li, Yongjie Xu, Chang Yu, Changxi Chi, Jingbo Zhou, Zhen Lei, Stan Z. Li

HDTree framework overview: encoder, hierarchical tree codebook, diffusion decoder, and lineage analysis.
HDTree combines semantic encoding, a Hierarchical Tree Codebook, and a diffusion decoder for tree-conditioned generative validation and lineage analysis.

Paper Overview

The paper studies hierarchical representation learning for cellular differentiation, where the goal is not only to cluster cells but also to infer robust developmental trajectories and validate them generatively.

Unified Hierarchical Codebook

A shared vector-quantized tree codebook avoids branch-specific neural modules and decouples tree depth from network size.

Diffusion-Based Validation

A path-conditioned diffusion decoder reconstructs and generates samples along learned hierarchical trajectories.

Lineage Inference

The learned tree is transformed into a weighted graph, enabling shortest-path inference between progenitor and differentiated states.

Motivation

Existing hierarchical methods either lack generative validation or rely on branch-specific modules that become unstable for deep or sparse hierarchies.

Motivation comparing hierarchical clustering, deep tree generation models, and HDTree.
Motivation. HDTree is designed to improve stability, generativity, accuracy, and training scalability for hierarchical modeling.

Paper Figures

Representative figures from the paper: the model architecture, lineage inference case study, and tree-conditioned generation.

HDTree method overview.
Method overview. Encoder, Hierarchical Tree Codebook, and diffusion decoder are optimized with SCL, HQL, and DDP losses.
TreeVAE versus HDTree lineage visualization on C. elegans.
Lineage case study. HDTree produces lineage structures that better align with developmental-time ground truth than TreeVAE.
HDTree generative validation on MNIST and C. elegans.
Generative validation. HDTree simulates transitions along learned paths on MNIST and C. elegans.

Representative Results

The paper evaluates tree performance, clustering performance, reconstruction, lineage alignment, ablations, and computational cost across general and single-cell datasets.

General Benchmarks

On MNIST, HDTree reports DP 91.9, LP 96.6, ACC 96.6, and NMI 92.4, while improving reconstruction metrics over TreeVAE.

Single-Cell Benchmarks

On Limb, HDTree reaches DP 41.0, LP 57.2, ACC 55.0, and NMI 46.6 under the paper evaluation setting.

Lineage Ground Truth

On C. elegans, HDTree improves temporal-neighborhood consistency over TreeVAE by more than 4 points in later developmental windows.

Public Release

The public repository is a minimal reproducibility package for MNIST and Limb. Other paper datasets and lineage case studies are not included in this release.

Released checkpointDPLPACCNMINotes
MNIST0.932620.973100.973100.92999Representative single-run checkpoint
MNIST reconstruction validation0.934540.973800.973800.932777000 validation samples, reconstruction loss 44.97935, log likelihood -13.98696
Limb0.410290.583700.528600.49042K=10, batch size 1000, exaggeration 0.5, nu 0.3

Reproduce

The README contains full setup, data placement, training, and checkpoint validation instructions.

pip install -r requirements.txt
scripts/smoke_test.sh
scripts/train_mnist.sh
scripts/train_limb.sh
pip install huggingface_hub
huggingface-cli download zelinzang/HDTree-ICML-checkpoints --local-dir .
scripts/validate_checkpoint.sh mnist checkpoints/mnist/hdtree_mnist_best_epoch59_acc0.97570.pth

Citation

Use the final proceedings citation when it becomes available.

@inproceedings{zang2026hdtree,
  title = {HDTree: Generative Modeling of Cellular Hierarchies for Robust Lineage Inference},
  author = {Zang, Zelin and Li, WenZhe and Xu, Yongjie and Yu, Chang and Chi, Changxi and Zhou, Jingbo and Lei, Zhen and Li, Stan Z.},
  booktitle = {International Conference on Machine Learning},
  year = {2026},
  url = {https://github.com/zangzelin/code_HDTree_icml}
}