ScHiCAtt: Enhancing Single-Cell Hi-C Data Resolution Using Attention-Based Models

The spatial organization of chromatin is fundamental to gene regulation and essential for proper cellular function. The Hi-C technique remains the leading method for unraveling 3D genome structures; however, limited resolution, data sparsity, and incomplete coverage in single-cell Hi-C data pose significant challenges for comprehensive analysis. We propose ScHiCAtt (Single-cell Hi-C Attention-Based Model), which leverages attention mechanisms to capture both long-range and local dependencies in Hi-C data, significantly enhancing resolution while preserving biologically meaningful interactions. By dynamically focusing on regions of interest, attention mechanisms effectively mitigate data sparsity and enhance model performance in low-resolution contexts. Extensive experiments on Human and Drosophila single-cell Hi-C data demonstrate that ScHiCAtt consistently outperforms existing methods in terms of computational and biological reproducibility metrics across various downsampling ratios. Our results also show superior generalization across different chromosomes of the same cell type, as well as across cell types, species, and from single-cell to bulk Hi-C data, highlighting the robustness and adaptability of our approach.

Description of the Two Models

In this study, we developed two models to evaluate the generalization of scHi-C data enhancement across different chromosomes, cell types, and species.

Human Cell Model

The first model focuses on learning from one human cell type (Human Brain Prefrontal Cortex (PFC): oligodendrocytes (ODC) cell type) and testing on different chromosomes of the cell type, a different human cell type, and other species.

Training Data:

Drosophila Model

The second model evaluates how well the scHi-C data enhancement approach generalizes across species, specifically from Drosophila to human.

Training Data: