TEAM: A MULTIPLE TESTING ALGORITHM ON THE AGGREGATION TREE FOR FLOW CYTOMETRY ANALYSIS

Authors

Pura, JA; Li, X; Chan, C; Xie, J

Abstract

In immunology studies, flow cytometry is a commonly used multivariate single-cell assay. One key goal in flow cytometry analysis is to detect the immune cells responsive to certain stimuli. Statistically, this problem can be translated into comparing two protein expression probability density functions (PDFs) before and after the stimulus; the goal is to pinpoint the regions where these two PDFs differ. Further screening of these differential regions can be performed to identify enriched sets of responsive cells. In this paper we model identifying differential density regions as a multiple testing problem. First, we partition the sample space into small bins. In each bin we form a hypothesis to test the existence of differential PDFs. Second, we develop a novel multiple testing method, called TEAM (testing on the aggregation tree method), to identify those bins that harbor differential PDFs while controlling the false discovery rate (FDR) under the desired level. TEAM embeds the testing procedure into an aggregation tree to test from fine-to coarse-resolution. The procedure achieves the statistical goal of pinpointing density differences to the smallest possible regions. TEAM is computationally efficient, capable of analyzing large flow cytometry data sets in much shorter time compared with competing methods. We applied TEAM and competing methods on a flow cytometry data set to identify T cells responsive to the cytomegalovirus (CMV)-pp65 antigen stimulation. With additional downstream screening, TEAM successfully identified enriched sets containing monofunctional, bifunctional, and polyfunctional T cells. Competing methods either did not finish in a reasonable time frame or provided less interpretable results. Numerical simulations and theoretical justifications demonstrate that TEAM has asymptotically valid, powerful, and robust performance. Overall, TEAM is a computationally efficient and statistically powerful algorithm that can yield meaningful biological insights in flow cytometry studies.