Block | Before preprocessing | After preprocessing |
---|---|---|
Metabolome | 865 | 854 |
Microbiome | 634 | 180 |
Clinical | 21 | 21 |
1 Introduction
Diabetes affects over 37 million Americans and one in five veterans. Diabetics are prone to the development of non-healing wounds on their feet, which can often lead to lower-limb amputation. While much is known about the wound microbiome, very few studies have investigated the interplay between microbes and metabolites in the diabetic wound microenvironment. Further, studies integrating omics-level microbiome and metabolome datasets are largely non-existent. The present work aims to address this deficit by bringing together two -omics data blocks (microbiome and metabolome) with a clinical block to predict whether or not wounds heal.
Two analytic objectives will be addressed. The primary objective (see Section 2.4 and Section 3) is to see whether a wound healing signature can be detected that integrates across data blocks. The secondary objective (see Section 2.3) is to explore whether the analysis might benefit from a multilevel approach.
2 Methods
2.1 Data
45 debridement samples were collected from 13 patients during the normal course of wound treatment at the Boise Veterans Affairs Medical Center. 16S rRNA sequencing and ultra-high-performance liquid chromatography/tandem accurate mass spectrometry were then utilized to determine wound microbiomes and metabolomes, respectively. Clinical data were extracted from patients’ medical records. The outcome measure was whether the sample was taken from a wound that failed to heal (non-healing), or from a wound that progressed to healing and remained closed for greater than thirty days (healing).
2.2 Preprocessing
Data from the microbiome and metabolome blocks were processed using procedures appropriate for those data types.
These additional steps were taken prior to modeling: features with near zero variance were excluded from all three blocks (microbiome, metabolome, clinical); features in the microbiome block with OTU counts less than 1% of the total were excluded; features in the microbiome block underwent the centered log transformation (Lê Cao et al., 2016); features in all blocks were standardized to zero means and unit variances (Lê Cao & Welham, 2022). Table 1 summarizes the number of features in each block.
2.3 Multilevel decomposition
Data in this project are multilevel—i.e., samples are nested within patients. To determine whether discrimination between healers and non-healers might benefit from a multilevel approach, two PCAs were conducted on each of the -omics data blocks: one without and one with multilevel decomposition (Liquet et al., 2012). The results are shown in Figure 1 and Figure 2, where the plotted numbers are patient identifiers. Neither multilevel decomposition appears to reduce clustering by patients, or improve clustering by healing status. This is true for the metabolome data even when outliers are removed. The overall data pattern suggests that samples from the same patients are not correlated. Consequently, the multiblock model outlined below does not employ multilevel decomposition.
2.4 Multiblock model
The healing versus non-healing outcome was modeled as a function of three data blocks (metabolome, microbiome, and clinical) using multiblock sparse partial least squares discriminant analysis (sPLS-DA; (Lê Cao & Welham, 2022; R Core Team, 2024; Rohart et al., 2017; Singh et al., 2019)). The optimal number of components per block was determined by fitting sPLS-DA models for each block individually and using seven-fold cross-validation with 100 repeats to select the number of components associated with the lowest balanced error rate. For the metabolome and microbiome blocks the optimal number of components was four; for the clinical block it was two. Based on these results, two components were used in the multiblock model: while not ideal for the -omics blocks, using a smaller number of components was beneficial in that it allowed for a simpler solution and acted as a hedge against overfitting.
The value of the between-block weights in the design matrix was set to 0.1. This prioritized predictive accuracy while still allowing the model to learn correlations between data blocks.
Finally, seven-fold cross-validation with 100 repeats was used to select the number of features per component and block associated with the lowest balanced error rate. The data grid that was explored was uniform across blocks, running from one feature to half the number of features in the block available after preprocessing (see Table 1). This led to the number of selected features displayed in Table 2.
Block | N |
---|---|
Metabolome | 420 |
Microbiome | 91 |
Clinical | 16 |
Total | 527 |
3 Results
Final model performance was assessed using 7-fold cross-validation with 100 repeats. This gave classification error rates of 8.39% when only component one was considered, and 5.38% when both components were considered. These numbers indicate good performance: roughly 19 out of every 20 samples were correctly classified. Moreover, the small decrease in error from one to two components suggests that the addition of further components would lead to only marginal improvements.
Panel B of Figure 3 summarizes the multi-omic signature the model learned. This representation plots each of the 527 selected features according to their correlation with components one and two of their data block. Features positioned closer to the outer dashed circle play a larger role in predicting the outcome. The predictive multiblock signature can be read off by considering the position of each feature relative to the others (see panel A of Figure 3 for a primer). The most prominent part of the signature is arrayed along the horizontal. In particular, Enterococcus is highly negatively correlated with component one, and Methylobacterium shows a more modest, positive correlation to component one. Each of these microbiome features have mirror-image relationships to the clusters of metabolite features on the left and right of the figure: Enterococcus is positively correlated to the metabolite features on the left and negatively correlated to the metabolite features on the right, while Methylobacterium is positively correlated to the metabolite features on the right and negatively correlated to the features on the left. The second part of the signature is arrayed along the vertical: tryptamine and a group of ceramides are negatively correlated with the microbiome features towards the bottom of the figure, and positively correlated with the microbiome features towards the top.
Figure 4 offers an alternative representation of the multi-omic signature. Rather than showing all features selected into the model (as in Panel B of Figure 3) Figure 4 only displays features that have correlations with at least one other feature above/below ±0.38. The two parts of the signature are now represented as separate clusters—the larger containing Enterococcus and Methylobacterium, and the smaller centering around tryptamine and the ceramides. Note that even though Methylobacterium is portrayed as having only a small number of connections to the nearby metabolome features, it actually connects to every feature that Enterococcus does. These associations are not shown in Figure 4 however, because they do not meet the ±0.38 cutoff.
3.1 Enterococcus & Methylobacterium
Figure 5 provides a fuller view of the way in which Enterococcus and Methylobacterium combine with other model features to predict healing. Panel A is a heat map of correlations between 20 metabolite and clinical features (y-axis) and Enterococcus and Methylobacterium (x-axis). Features with higher expression in healers are printed in bold. Y-axis features were chosen for having the ten most positive or ten most negative correlations with Enterococcus. Panel B positions all metabolite and clinical features in 2D space representing their correlations with Enterococcus (x-axis) and Methylobacterium (y-axis). This strikingly demonstrates the mirror-image relationship that the two OTUs have with other model features.
3.2 Tryptamine & the ceramides
The upper-right feature cluster in Figure 4 illustrates the portion of the multi-omic signature centered around tryptamine and the ceramides. Figure 6 offers more detail by giving the relationships among features. In particular, it indicates that tryptamine and the ceramide group show very similar patterns of correlation with respect to other model features.
4 Conclusions
This work addressed two main objectives. The first was to determine whether samples from healing versus non-healing wounds could be discriminated using integrative -omics methods. The results demonstrate that a multiblock sPLS-DA model can learn to accurately classify the two sample types. Further, the model identified a multi-omic signature defined on component one by the opposition between Enterococcus and Methylobacterium and their associated metabolites, and on component two by the relationship between a ceramide/tryptamine group and a group of metabolite and clinical features.
The second objective was to investigate whether the analysis might benefit from a multilevel approach. Multilevel and non-multilevel PCAs were performed on the metabolome and microbiome data blocks separately. The results for both blocks indicated that multilevel decomposition did not reduce clustering by patients or improve clustering by outcome.
Objectives for further work include (1) following up on the multilevel analysis by looking to see how adding a multilevel decomposition to the sPLS-DA model affects results, and (2) experimenting in the sPLS-DA model with increasing amounts of regularization to determine whether similar multi-omic signatures can be detected using fewer features.