Acute myeloid leukemia xenograft success prediction: Saving time

HIGHLIGHTS • Extracellular phenotype, apoptosis, or cell cycle profile at thawing cannot predict acute myeloid leukemia (AML) xenograft potential.• Leukemic long-term culture-initiating cell content distinguishes fast and delayed engraftment potential.• Defined here is a 1-week carboxyfluorescein succinimidyl ester-based assay that predicts AML xenograft success.

The in vivo xenotransplantation assay in NOD/SCID IL-2Rγ common chain null (NSG) mice is currently the model most frequently used to study the biology of leukemiainitiating cells (LICs); however, a substantial proportion of samples from patients with acute myeloid leukemia (AML) with a good prognosis fail to engraft in mice. Other newly described humanized mouse models such as NSG-SG3M and MISRTG mice might improve such sample engraftment [1]. Yet, we recently evidenced the extinction of myelodysplastic syndrome propagating cells (MDS-PCs) using NSG-SG3M mice, which suggests that human cytokine stimulation might exhaust the LIC compartment of particular leukemias [2]. Alternatively, we found that subcutaneous implantation of gelatin sponges seeded with human stromal cells allows engraftment of good-risk AML in NSG mice. However, as observed by others using subcutaneous humanized ossicles, these ectopic leukemic grafts do not invade recipient bone marrow [3][4][5]. Because all these models are either not fully characterized or not fully available, the straightforward intravenous NSG model is still the most commonly used model. Here we further investigated xenograft failure in this model and developed a flow cytometry-based assay that allow prediction of the xenograft potential of a noncharacterized AML sample.

Cells
AML cells were obtained after receipt of informed consent from St Bartholomew's Hospital. Details of the patient samples are listed in Supplementary Table E1 (online only, available at www.exphem.org). Co-culture experiments were previously described [6]. AML samples were collected at diagnosis, and mononucleated cells were isolated within 24 hours after collection by Ficoll-Paque Plus density gradient (GE Healthcare, France). Cord blood (CB) cells were obtained after receipt of informed consent from the Royal Free Hospital (UK). Both AML and CB sample collections were approved by the East London ethical committee and in accordance with the Declaration of Helsinki. Three to 5 different CB samples were pooled, and mononuclear cells were obtained by density centrifugation. Lineage markers expressing cells were depleted using StemSep columns and human progenitor enrichment cocktail (StemCell Technologies, Vancouver, BC, Canada). CD34 + CD38 − cells (hematopoietic stem progenitor cells [HSPCs]) and CD34 + CD38 + cells (hematopoietic progenitor cells [HPCs]) were sorted on a MoFlo cell sorter (DakoCytomation Colorado, Fort Collins, CO, USA) or a BD FACS Aria (BD Biosciences, UK). Gates were set up to exclude nonviable cells and debris. Briefly, lineagedepleted recovered cells were washed twice and stained with anti-CD34 Percp, anti-CD38 PE-cy7, AlexaFluor647-conjugated Annexin-V (Invitrogen), and DAPI (4',6-diamidino-2-phenylindole). The purity of sorted fractions was assessed to ensure the sort quality. The stromal cell line mesenchymal MS-5 and the human osteosarcoma cell line SaOS-2 were obtained from the DSMZ cell bank (Braunschweig Germany) and maintained in Iscove's modified Dulbecco's medium (IMDM) containing 10% fetal calf serum (FCS) + 2 mmol/L L-glutamine or in McCoy's 5a medium containing 15% FCS + 2 mmol/L L-glutamine, respectively. Human umbilical vein endothelial cells (HUVECs) obtained from Lonza were propagated in endothelial growth medium-2, EGM-2-MV (Lonza, UK) in culture dishes coated with type I collagen (StemCell Technologies). MS-5, SaOS-2, and HUVEC feeders were cultured in their respective media and subcultured when reaching 80% confluence. Sca-1, CD56, and CD31 were identified as specific markers for 100% of MS-5, SaOS-2, and HUVEC, respectively, and used for feeder exclusion in fluorescence-activated cell sorting (FACS) analysis. All three antibodies were from BD Pharmingen (Oxford Science Park, UK).

Adoptive transfer of human hematopoietic cells in immunodeficient mice
NOD/SCID (NS) and NSG mice were a kind gift of Dr. Leonard Shultz. All animal experiments were performed in accordance with Home Office and CRUK guidelines. Adult NS or NSG mice were injected intravenously with 10 7 T-depleted mononuclear AML cells.
In the current study, we define nonengrafter [NE] samples as samples for which 10 7 CD3 + -depleted AML MNCs injected cells were not able to engraft at a detectable level (cutoff: 0.1%) of human myeloidonly leukemic population CD45 + CD33 + CD19 − and murine CD45 − , 12 weeks after injection into NSG mice. For newborn xenograft, 2.5 to 10 × 10 6 AML cells were injected into day 2 neonate NSG mice via an intrahepatic (IH) or facial (FV) vein. For some engrafter (E) samples, LIC and non-LIC phenotypes were functionally defined by xenograft experiments with sorted subpopulations (Supplementary Figure E1, online only, available at www.exphem.org). Mouse bone marrow cells were collected and analyzed by flow cytometry as detailed previously [7,8].
Long-term culture Co-culture experiments were performed as previously described [9,10] as bulk culture or using a limiting dilution analysis (LDA) both on confluent monolayers of MS-5, supplemented with recombinant human interleukin (IL)-3, granulocyte colony-stimulating factor (G-CSF), and TPO (MS-5 + 3GT) (20 ng/ml each; Peprotech, London, U.K., http://www.peprotech.com) in MyeloCult H5100 (StemCell Technologies, Vancouver, BC, Canada, http://www.stemcell.com). Cells were cultured at 37°C in 5% CO2-humidified incubators. Cells were plated in 20 replicates in 96-well microplates containing confluent MS-5 monolayers. Half the medium was done twice a week without disrupting the established feeders. After 5 weeks, LTC medium was replaced by methylcellulose H4435 (StemCell Technologies). After an additional 2 weeks, each well was scored as negative if no colonies were present. To determine the frequency of leukemia long-term culture-initiating cells (L-LTC-ICs), LDA was calculated using LCalc software (StemCell Technologies) according to Poisson statistics and the method of maximum likelihood.
Fluorescence dilution factor AML cells (1 to 10 × 10 5 ) were thawed and stained with 0.8 µmol/L carboxyfluorescein diacetate succinimidyl ester (CFSE) (Invitrogen, UK) for 10 min at 37°C in phosphate-buffered saline (PBS). Cells were washed twice and incubated in duplicate on pre-established confluent MS-5 (or SAOS-2 or HUVEC) layers at 37°C in Myelocult medium (StemCell Technologies) without cytokine supplementation. Cells were incubated 18 hours before assessing the initial input CFSE fluorescence intensity to allow for the turnover of CFSElabeled proteins to stabilize [11]. After 18 hours and 1 week, wells were harvested by trypsinization and stained with anti-Sca-1-PE (for MS-5 staining), or anti-CD56-PE (for SaOS-2 staining), or anti-CD31-PE (for HUVEC staining) and anti-CD45-APC-Cy7 antibodies (BD Pharmingen). The CFSE median fluorescence index (MFI) was measured by FACS at 18 hours and day 7 on viable (Annexin V and DAPI negative) human hematopoietic cells (CD45 positive and Sca-1 negative). FDF was defined as the ratio of the 18-hour CFSE MFI to the 1-week CFSE MFI. AML heterogeneity evidenced through FACS scattered light measurements meant that the width of the labeled input population exceeded the limits for peak resolution, even under optimal instrument conditions [12,13]. As we reported previously [10], larger cells labeled more brightly with CFSE compared with smaller blasts within the same sample. Consequently, as the cells divide, the width of each division peak overlaps heavily with previous and subsequent peaks, preventing accurate peak resolution. Instead, we defined the fluorescence dilution factor (FDF) as the ratio of the 18-hour CFSE MFI to that of the D7 CFSE MFI. The CFSE MFI was measured on viable (Annexin V-Alexa Fluor 647 and DAPI negative) human hematopoietic cells (CD45-APC-Cy7 positive and Sca-1-PE or CD56-PE or CD31-PE negative, excluding residual normal lymphocytes CD45 high /SCC low for all analyses. At day 7, the same procedure was applied. Cytometric calibration was controlled using CountBright beads. Flow cytometry analysis FACS analysis was performed with a BD LSRII flow cytometer (BD Biosciences, UK). After 1 to 5 weeks of co-culture, nonadherent and adherent cells were harvested through trypsinization. Recovered cells were resuspended and stained in Annexin binding buffer (BD Biosciences) and with anti-SCA-1-PE or anti-CD56-PE or anti-CD31-PE, and anti-CD45-APC-Cy7, anti-CD34-Percp, anti-CD38-PE-Cy7 antibodies, and Lin-FITC (BD Biosciences), as well as with AlexaFluor647-conjugated Annexin-V (Invitrogen) and DAPI (Sigma). Only viable (both DAPI and Annexin-V negative fraction) human hematopoietic cells (CD45-APC-Cy7 positive and Sca-1-PE or CD56-PE or CD31-PE negative) were assessed for all analyses. Cell viability was expressed as the percentage of DAPIand Annexin-V-negative cells within human hematopoietic cells.
Cell cycle analysis. Cells were fixed in 2% paraformaldehyde for 10 min, permeabilized with 0.1% Triton X-100 for 10 min, resuspended in 500 µL of PBS with 2% FCS and 1 µg/mL DAPI for 15 minutes, and processed for cell cycle analysis on an LSRII analyzer (BD Pharmingen). Cells in G1 were those with 2N DNA content and Ki-67-positive cells, cells in S were those with a DNA content ranging from 2N to 4N, and cells in G2/M were those with a 4N DNA content.
Fold expansion. After 1 to 5 weeks of co-culture, well content was harvested by trypsinization and stained, and CountBright absolute counting beads were used to assess by FACS the total number of viable human hematopoietic cells following the manufacturer's recommendations. For LTC, no accumulative count was applied because hematopoietic cell adhesions vary from one feeder to another and no precise semipopulation could be operated.

Statistics
Data were analyzed for statistical significance using the nonparametric unpaired Mann-Whitney or paired two-tail test. The Kaplan-Meier method was used to establish patients' overall survival curves, and the Mantel-Cox log rank test was used to assign a significance level. Observed differences were regarded as statistically significant if the calculated two-sided p value was <0.05.

Mathematical model
Engraftment status was encoded as a binary variable and a logistic predictive model was built for this variable based on the interaction of FDF and viability. More precisely, R function generalized linear models (GLMs) was used to fit a logistic model with family binomial and predictive formulas. Engraftment status = viability + FDF/viability. The test reveals that use of the full model (i.e., including a separate index term) did not have a significant effect on the resultant model (as assessed by the R function analysis of variance) and so was omitted. The empirical false detection rate for various predictive thresholds was calculated by explicitly calculating the number of false predictions made when applying the model to the observed data. All calculations were performed using R 3.0.1.

Results and Discussion
We analyzed 45 AML samples, 18 of which (40%) could not give rise to leukemic engraftment when injected intravenously into NOD/SCID mice (NS), β2-microglobulin-deficient (B2m(null)) NS or NSG mice [8,14] (Supplementary Table E1, online only, available at www.exphem.org). As reported before by our group [8], we confirm, in this new cohort of patients, a strong negative correlation between potential and patient survival (Supplementary Figure E2A). The newborn NSG model, although described as an even more permissive xenograft assay [15], revealed itself helpless because none of the five nonengrafting (NE) samples tested in this model produced a detectable graft (Supplementary Figure E2B). We first wondered whether intrinsic differences between NE and engrafter (E) mice might be identified to explain engraftment success or failure. We tested the hypothesis of a lower frequency of LICs in the NE samples; however, no differences were observed phenotypically in terms of the proportion of CD34 + or CD34 + CD38 − cells when comparing 27 E and 18 NE mice (Supplementary Table E1). Also, similar heterogeneous viability and a similar mean percentage of cells in the G0 or G2/M phases were quantified in both the E and NE groups (Supplementary Figure E1). We questioned if analyzing the bulk AML population might conceal the actual LIC signal. To test this hypothesis, we compared apoptosis level and cell cycle profile in the same sample, between LICs and bulk AML of E samples. LIC phenotype was determined for 10 E samples by assessing the xenograft potential of different sorted fractions (Supplementary Table E1; Supplementary Figure E3, online only, available at www.exphem.org). Heterogeneous LIC phenotype was observed, confirming already published data [7,16,17]. Two samples were even censored because xenograft potential was found in all cytometry quadrants defined by CD34 and CD38 expression. For the 8 remaining samples, the apoptosis level or cell cycle profile was very similar for the LIC enclosing sub-compartment and the entire sample population, suggesting those parameters are surprisingly shared between LICs and the non-LICs (Supplementary Figure E1). When combined, these data suggested that cell phenotype, viability, or cell cycle profiling markers after thawing cannot account for the engraftment defect in NE samples.
We recently determined that good-risk AML patient samples at diagnosis have a five to seven times lower frequency of leukemic long-term culture-initiating cells (L-LTC-ICs) compared with intermediate-and poor-risk AML samples [10]. Long-term cultures revealed that NE samples also have a lower content of L-LTC-ICs. Consequently, 8.41 times more NE cells were required at day 0 to obtain positive wells at week 5 as compared with E samples (Fig. 1A, p < 0.05). NE samples also yielded, on average, 2.7 times fewer leukemic cells than E samples at week 5 (Fig. 1B, p < 0.05). Plotting L-LTC-IC frequency against fold expansion, determined at week 5, highlighted the tight relationship between these two parameters and allowed differentiation of the low-expanding, L-LTC-IC-impoverished NE samples from the high-expanding, L-LTC-IC-enriched E samples (Fig. 1C). Favorable-risk leukemic cells were recently reported to require up to 1 year to establish a detectable graft in recipients [6]. Effectively, two NE mice that failed to engraft in NS or NSG mice at 12 weeks were also investigated at 30 or 40 weeks. For the two samples, a distinct leukemic engraftment was seen at weeks 30 and 40, respectively (Fig. 1D). Thus, conventional intravenous inoculation distinguishes short-and long-latency engrafter mice, rather than discern absolute engraftment potential. However, we cannot exclude the acquisition of an addition mutation(s) over this duration nor that the leukemic cells might have modified the bone marrow microenvironment to reach leukemia onset [18,19].
To investigate further the dynamics of leukemic stem/ progenitors ex vivo, we next exploited the newly defined fluorescence dilution factor (FDF) using carboxyfluorescein succinimidyl ester (CFSE) staining to track cell division over 7 days [10]. This parameter, which has a value ≥1, measures the median dye dilution of CFSE for the nonapoptotic leukemic cell population. We previously found that the FDF strongly correlates with leukemic stem/progenitor enrichment in the sample and identified that NE samples have a lower FDF compared with E samples (p < 0.001). We further state here that the mean FDF values for the E and NE groups are not influenced by co-culture of the samples with human osteoblastic or endothelial cells. In contrast, sorted normal hematopoietic stem/progenitor cells (HSPCs) or normal hematopoietic progenitor cells (HPCs) clearly had a lower FDF when cultured with osteoblastic SaOS-2 cells and a higher FDF when cultured with endothelial human umbilical vein endothelial cells (HUVECs) as compared with the MS-5 coculture system ( Fig. 2A, p < 0.001). These new results suggest that primary AML expansion ex vivo is self-controlled and that its measurement can distinguish E and NE samples.
Lastly, we wonder if the CFSE-based assay could be used to predict xenograft success. When FDF was plotted as a function of total leukemic cell viability determined on day 7, we discovered a distinct pattern between 19 E and 14 NE samples. In the NE samples, lower proliferation was associated with lower viability, and contrarily, in the E samples, increased proliferation was associated with higher viability (Fig. 2B). Of note, the viability after 1 week or the fold expansion determined by the cell count did not differentiate E and NE samples regardless of the co-culture system used (Supplementary Figure E4, online only, available at www.exphem.org), which further indicates the utility of FDF as a predictive variable. With use of the observed data, a logistic regression model was constructed and used to generate a single predictive parameter (Ɵ) for any given data set, with prediction thresholds calculated from the observed data: . .
The nonengrafter prediction tool implementing this equation in Excel is available in Supplementary Table E2 (online only, available at www.exphem.org). If θ > 0.6363636 then the sample status is predicted to be "engrafter." If θ < 0.3535354, then status is predicted to be "nonengrafter." If 0.3535354 ≤ θ ≤ 0.6363636, then no prediction is made. For this training cohort, this modeling forecast engraftment status for 80% of the cases with 90% exactness independent of prognostic group (Fig. 2C). To validate our modeling, we wonder if this engraftment prediction would be as efficient as the actual tested patient derived xenograft status in predicting patient outcome (illustrated in Supplementary  Figure E2A). FDF and viability after 1 week were determined on another cohort of 53 patient samples. A prediction was made using the above equation at the 0.05 FDR level for 70% of the sample. No prediction could be made for 16 samples. For the remaining samples, the observed patient survival times, upon the E and NE predictions, resulted in a  significant difference in median survival (p < 0.026 level) (twosided Wilcoxon test, Fig. 2D). This result is evidences of the reliability of our prediction calculator in stratifying AML aggressiveness. Our results indicate that the initial size of the leukemic stem/progenitor compartment, as well as the ex vivo cycling behavior, correlates with the xenograft potential of the sample. Thus, the actual stemness of the cycling subpopulation identified ex vivo should be further investigated, notably through their transcriptome analysis using the leukemia stem cell gene signature defined recently [20,21]. This could resolve whether the divisions observed are related to self-renewal potential. Alternatively, good outcome-related NE samples might also undergo more differentiation than poor outcome-related E samples. However, based on our recent analysis, using similar ex vivo culture conditions, we did not observe any increase in differentiation markers [22]. Furthermore, the fact that NE could engraft in NSG mice by extending the observation period (Fig. 1D) [6] suggests that these samples are not simply more differentiated than E samples.
Although a successful xenograft can be anticipated from poor-prognosis and good-prognosis patient samples, our technique should be especially helpful in including intermediaterisk samples, which represent the majority of samples, and in maintaining the standard experimental duration at 12 weeks. Furthermore, among the 3Rs Principles of Humane Experimental Technique by Russell and Burch (replacement, reduction, and refinement) our methods and the Ɵ value should serve for the reduction and refinement principles, by minimizing the number of animals used for preselecting E samples and by avoiding unnecessary failed xenografts with NE samples.