Integration of microbiome, metabolome, and clinical markers to predict wound healing

wound healing
-omics integration
diabetes
matrix factorization
Author

Jeremy Boyd

Published

May 11, 2024

Modified

July 30, 2024

1 Introduction

Diabetes affects over 37 million Americans and one in five veterans. Diabetics are prone to the development of non-healing wounds on their feet, which can often lead to lower-limb amputation. While much is known about the wound microbiome, very few studies have investigated the interplay between microbes and metabolites in the diabetic wound microenvironment. Further, studies integrating omics-level microbiome and metabolome datasets are largely non-existent. The present work aims to address this deficit by bringing together two -omics data blocks (microbiome and metabolome) with a clinical block to predict whether or not wounds heal.

Two analytic objectives will be addressed. The primary objective (see Section 2.4 and Section 3) is to see whether a wound healing signature can be detected that integrates across data blocks. The secondary objective (see Section 2.3) is to explore whether the analysis might benefit from a multilevel approach.

2 Methods

2.1 Data

45 debridement samples were collected from 13 patients during the normal course of wound treatment at the Boise Veterans Affairs Medical Center. 16S rRNA sequencing and ultra-high-performance liquid chromatography/tandem accurate mass spectrometry were then utilized to determine wound microbiomes and metabolomes, respectively. Clinical data were extracted from patients’ medical records. The outcome measure was whether the sample was taken from a wound that failed to heal (non-healing), or from a wound that progressed to healing and remained closed for greater than thirty days (healing).

2.2 Preprocessing

Data from the microbiome and metabolome blocks were processed using procedures appropriate for those data types.

These additional steps were taken prior to modeling: features with near zero variance were excluded from all three blocks (microbiome, metabolome, clinical); features in the microbiome block with OTU counts less than 1% of the total were excluded; features in the microbiome block underwent the centered log transformation (Lê Cao et al., 2016); features in all blocks were standardized to zero means and unit variances (Lê Cao & Welham, 2022). Table 1 summarizes the number of features in each block.

Table 1: Feature counts before and after preprocessing.
Block Before preprocessing After preprocessing
Metabolome 865 854
Microbiome 634 180
Clinical 21 21

2.3 Multilevel decomposition

Data in this project are multilevel—i.e., samples are nested within patients. To determine whether discrimination between healers and non-healers might benefit from a multilevel approach, two PCAs were conducted on each of the -omics data blocks: one without and one with multilevel decomposition (Liquet et al., 2012). The results are shown in Figure 1 and Figure 2, where the plotted numbers are patient identifiers. Neither multilevel decomposition appears to reduce clustering by patients, or improve clustering by healing status. This is true for the metabolome data even when outliers are removed. The overall data pattern suggests that samples from the same patients are not correlated. Consequently, the multiblock model outlined below does not employ multilevel decomposition.

Figure 1: Metabolome PCA with and without multilevel decomposition. Plotted numbers are patient identifiers.
Figure 2: Microbiome PCA with and without multilevel decomposition. Plotted numbers are patient identifiers.

2.4 Multiblock model

The healing versus non-healing outcome was modeled as a function of three data blocks (metabolome, microbiome, and clinical) using multiblock sparse partial least squares discriminant analysis (sPLS-DA; (Lê Cao & Welham, 2022; R Core Team, 2024; Rohart et al., 2017; Singh et al., 2019)). The optimal number of components per block was determined by fitting sPLS-DA models for each block individually and using seven-fold cross-validation with 100 repeats to select the number of components associated with the lowest balanced error rate. For the metabolome and microbiome blocks the optimal number of components was four; for the clinical block it was two. Based on these results, two components were used in the multiblock model: while not ideal for the -omics blocks, using a smaller number of components was beneficial in that it allowed for a simpler solution and acted as a hedge against overfitting.

The value of the between-block weights in the design matrix was set to 0.1. This prioritized predictive accuracy while still allowing the model to learn correlations between data blocks.

Finally, seven-fold cross-validation with 100 repeats was used to select the number of features per component and block associated with the lowest balanced error rate. The data grid that was explored was uniform across blocks, running from one feature to half the number of features in the block available after preprocessing (see Table 1). This led to the number of selected features displayed in Table 2.

Table 2: Selected feature counts
Block N
Metabolome 420
Microbiome 91
Clinical 16
Total 527

3 Results

Final model performance was assessed using 7-fold cross-validation with 100 repeats. This gave classification error rates of 8.39% when only component one was considered, and 5.38% when both components were considered. These numbers indicate good performance: roughly 19 out of every 20 samples were correctly classified. Moreover, the small decrease in error from one to two components suggests that the addition of further components would lead to only marginal improvements.

Panel B of Figure 3 summarizes the multi-omic signature the model learned. This representation plots each of the 527 selected features according to their correlation with components one and two of their data block. Features positioned closer to the outer dashed circle play a larger role in predicting the outcome. The predictive multiblock signature can be read off by considering the position of each feature relative to the others (see panel A of Figure 3 for a primer). The most prominent part of the signature is arrayed along the horizontal. In particular, Enterococcus is highly negatively correlated with component one, and Methylobacterium shows a more modest, positive correlation to component one. Each of these microbiome features have mirror-image relationships to the clusters of metabolite features on the left and right of the figure: Enterococcus is positively correlated to the metabolite features on the left and negatively correlated to the metabolite features on the right, while Methylobacterium is positively correlated to the metabolite features on the right and negatively correlated to the features on the left. The second part of the signature is arrayed along the vertical: tryptamine and a group of ceramides are negatively correlated with the microbiome features towards the bottom of the figure, and positively correlated with the microbiome features towards the top.

Figure 3: Circle plots represent features selected into the model, as well as relationships among features. (A) An example circle plot. Gray dots are model features plotted in 2D space, indicating their correlation with model components one (x-axis), and two (y-axis). The dashed circles are guides representing correlations of ±0.5 (inner circle) and ±1 (outer circle). Features closer to the outer circle are more important for predicting the outcome. The angle made by connecting two features through the origin gives the sign of their correlation. Acute angles indicate positive correlations (red and green), obtuse angles represent negative correlations (purple), and right angles indicate no correlation (blue). Moreover, the length of the connecting lines gives the magnitude of the correlation. For example, the angle made by the red and green lines is identical, but the red lines are longer, indicating a stronger positive correlation between the red features relative to the green features. (B) A circle plot with all 527 features selected into the model. This indicates a two-part multi-omic signature. In the first part, Enterococcus and Methylobacterium are negatively associated with one another, and have mirror-image relationships to the metabolite clusters on the left and right of the figure. In the second part, tryptamine and a ceramide group are negatively/positively correlated with the microbiome features towards the bottom/top of the graph. Only the features with the largest correlations to components one and two in each block are labeled.

Figure 4 offers an alternative representation of the multi-omic signature. Rather than showing all features selected into the model (as in Panel B of Figure 3) Figure 4 only displays features that have correlations with at least one other feature above/below ±0.38. The two parts of the signature are now represented as separate clusters—the larger containing Enterococcus and Methylobacterium, and the smaller centering around tryptamine and the ceramides. Note that even though Methylobacterium is portrayed as having only a small number of connections to the nearby metabolome features, it actually connects to every feature that Enterococcus does. These associations are not shown in Figure 4 however, because they do not meet the ±0.38 cutoff.

Figure 4: A network graph representing the two-part mult-omic signature discovered by the model. The first part consists of Enterococcus and Methylobacterium and their pattern of correlations to the metabolite cloud in the lower left of the graph. The second part centers around tryptamine and the ceramides and their associations with the features in the upper right of the graph. Only model features with at least one correlation to another feature above/below ±0.38 are shown. Colors indicate the data block each feature is from. Features with higher expression in healers are plotted as circles; those with higher expression in non-healers are plotted as triangles.

3.1 Enterococcus & Methylobacterium

Figure 5 provides a fuller view of the way in which Enterococcus and Methylobacterium combine with other model features to predict healing. Panel A is a heat map of correlations between 20 metabolite and clinical features (y-axis) and Enterococcus and Methylobacterium (x-axis). Features with higher expression in healers are printed in bold. Y-axis features were chosen for having the ten most positive or ten most negative correlations with Enterococcus. Panel B positions all metabolite and clinical features in 2D space representing their correlations with Enterococcus (x-axis) and Methylobacterium (y-axis). This strikingly demonstrates the mirror-image relationship that the two OTUs have with other model features.

Figure 5: (A) Model features with the ten most positive and ten most negative correlations to Enterococcus. The signs on these correlations are flipped with respect to Methylobacterium. Feature names written in bold have higher expression in healers. (B) The relationship between correlations with Enterococcus and correlations with Methylobacterium is highly regular. Each circle represents a selected metabolite or clinical feature. The black line is a linear model relating the correlations that these features have with Enterococcus to the correlations that they have with Methylobacterium. Methylobacterium correlations are roughly the Enterococcus correlations multiplied by -0.5. All features shown on the y-axis in panel A are labeled in panel B.

3.2 Tryptamine & the ceramides

The upper-right feature cluster in Figure 4 illustrates the portion of the multi-omic signature centered around tryptamine and the ceramides. Figure 6 offers more detail by giving the relationships among features. In particular, it indicates that tryptamine and the ceramide group show very similar patterns of correlation with respect to other model features.

Figure 6: Heat map representing correlations between tryptamine, ceramides, and other model features. Features are included on the y-axis if they have at least one correlation with an x-axis feature above/below ±0.38. Feature names printed in bold have higher expression in healers. Parenthetical information after microbiome features indicates the taxonomic level at which its identity was resolved—e.g., (f)amily, (o)rder. Microbiome features with no parenthetical information were resolved at the genus level.

4 Conclusions

This work addressed two main objectives. The first was to determine whether samples from healing versus non-healing wounds could be discriminated using integrative -omics methods. The results demonstrate that a multiblock sPLS-DA model can learn to accurately classify the two sample types. Further, the model identified a multi-omic signature defined on component one by the opposition between Enterococcus and Methylobacterium and their associated metabolites, and on component two by the relationship between a ceramide/tryptamine group and a group of metabolite and clinical features.

The second objective was to investigate whether the analysis might benefit from a multilevel approach. Multilevel and non-multilevel PCAs were performed on the metabolome and microbiome data blocks separately. The results for both blocks indicated that multilevel decomposition did not reduce clustering by patients or improve clustering by outcome.

Objectives for further work include (1) following up on the multilevel analysis by looking to see how adding a multilevel decomposition to the sPLS-DA model affects results, and (2) experimenting in the sPLS-DA model with increasing amounts of regularization to determine whether similar multi-omic signatures can be detected using fewer features.

5 References

Lê Cao, K.-A., Costello, M.-E., Lakis, V. A., Bartolo, F., Chua, X.-Y., Brazeilles, R., & Rondeau, P. (2016). MixMC: A multivariate statistical framework to gain insight into microbial communities. PLOS ONE, 11(8), e0160169. https://doi.org/10.1371/journal.pone.0160169
Lê Cao, K.-A., & Welham, M. (2022). Multivariate data integration using R: Methods and applications with the mixOmics package. CRC Press.
Liquet, B., Lê Cao, K.-A., Hocini, H., & Thiébaut, R. (2012). A novel approach for biomarker selection and the integration of repeated measures experiments from two assays. BMC Bioinformatics, 13(1), 325. https://doi.org/10.1186/1471-2105-13-325
R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Rohart, F., Gautier, B., Singh, A., & Lê Cao, K.-A. (2017). mixOmics: An R package for ’omics feature selection and multiple data integration. PLOS Computational Biology, 13(11), e1005752. https://doi.org/10.1371/journal.pcbi.1005752
Singh, A., Shannon, C. P., Gautier, B., Rohart, F., Vacher, M., Tebbutt, S. J., & Lê Cao, K.-A. (2019). DIABLO: An integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics, 35(17), 3055–3062. https://doi.org/10.1093/bioinformatics/bty1054