Skip to content

pfizer-opensource/phenotype_reversion

Repository files navigation

Daniel R. Wong et al.

correspondence: daniel.wong@pfizer.com

Machine Learning and Computational Sciences, Pfizer Worldwide Research Development and Medical, 610 Main Street, Cambridge, Massachusetts 02139, USA

Repository for the manuscript entitled 'Phenotypic reversion and target prioritization for cellular inflammation via representation learning with foundation models'

Dataset

The dataset can be downloaded from here: 10.5281/zenodo.18792213.

Place all the files in this repository in your local project directory: phenotype_reversion/data/

For convenience, the single cell embeddings as well as UMAP reductions of those embeddings for the different single cell foundation models have also been provided in the same data repository.

imru_full.h5ad contains the transcript count matrix. 

the 'gene_target' column reflects the gene that was knocked down via CRISPRi, while the 'condition' column refers to whether or not the cells were treated with IL-1B & TNFa. If the cytokine was introduced, the treatment condition = 'Treated' else 'Untreated'.

The controls in the 'gene_target' column are denoted as no target ('NO-TARGET') or safe target ('SAFE_TARGET').

.obs contains keys: ['condition', 'cellID', 'cell_treat', 'n_genes', 'nGene', 'nUMI', 'log10GenesPerUMI', 'mitoRatio', 'gene_target', 'guide', 'welltag', 'flask']

Both treated and untreated condition have the same controls, and roughly same sets of perturbations except:

Treated has unique perturbations: ['SH3PXD2A', 'EMCN', 'RAB11FIP3', 'CKAP5', 'POLR2K']

Untreated has unique perturbations: ['VWF', 'HNRNPA3', 'TRIM13', 'CORO1C', 'DDX27', 'PACSIN2'] 

For all *processed.h5ad files, adata.X will contain the embedding as numpy array, adata.obsm['umap'] will contain the coordinates of UMAP applied to the embedding, e.g. adata.obsm['umap'] = UMAP(adata.X).

Software Installation

We provide a .yml file called perturb_foundation.yml with all necessary dependencies to perform the analyses: conda env create -f perturb_foundation.yml -n perturb_foundation

Analyses

Run parallel_eval.sh to perform all analyses and replicate the findings presented in the paper.

analysis.py - creates initial plots of method's embedding space and gets target prioritizations depending on method.

misc.py - other function calls to set up ChatGPT and DE approach rankings.

downstream.py - runs enrichr, gsea, and other method-agnostic analyses after analysis.py completes.

About

repository for MLCS + IMRU publication on phenotype reversion with single cell foundation models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors