Machine learning clarifies stress-based biosimilar degradation

Machine learning shows promise as a complementary approach to chromatographic techniques (separation of mixtures) to assess biosimilarity and stability, according to a recent study.

Researchers evaluated machine learning versus chromatographic analysis in the study of 3 biosimilars of trastuzumab and their reference product (Herceptin) under control and stress conditions. They concluded that the machine learning results correlated with chromatographic data and revealed models explaining the effects of pH and heat stress conditions.

Trastuzumab, a monoclonal antibody directed against the human epidermal growth factor receptor 2 (HER2), is approved for the treatment of metastatic breast cancer, early breast cancer and metastatic gastric cancer. Investigators found that the biosimilars showed great similarity under control conditions, but “differences in degradation patterns were detected under … forced degradation conditions” in the study.

First, the physicochemical characteristics of the reference product and the biosimilars of trastuzumab (approved for use in Egypt; and designated as B1, B2 and B3 in the study) were determined by size exclusion chromatography, chromatography of cation exchange and peptide mapping. Biologics were evaluated under control conditions and under pH and heat stress. Investigators then used unsupervised machine learning techniques to find patterns in the chromatographic data.

Chromatographic analysis

The authors stated that the main structure and the variations in size and load are quality attributes that may affect the quality, safety and efficacy of biologic drugs, including trastuzumab. These attributes were similar in the biosimilars and the reference product under control conditions, the authors found.

Thermal and pH stresses, the authors note, “are among the most studied stress conditions in forced degradation studies due to their direct effect on the size and profiles of the load variants of the. [monoclonal antibodies] MAb by deamidation and oxidation. Under heat stress and pH, the researchers found differences in the degradation of different products.

Size variants

Based on size exclusion chromatography, B2 and B3 showed a tendency to form high and low molecular weight variants under acidic and basic stress, and B2 showed degradation of 83% after 2 weeks under stress. acid. Under heat stress, B3 showed the greatest degradation, 39% after 2 weeks.

Load variants

Under acid stress, the products ranged from 19.9% ​​degradation of the main variant of the reference product at 2 weeks to 93% for B2. Under basic stress, all samples showed a comparable increase in the abundance of acid variants. Under heat stress, the distribution of charge variants of B2 and B3 was similar to the distribution of charge variants for the reference product, while B1 showed a greater abundance of acid variants.

Main components analysis

Investigators used unsupervised machine learning techniques, which find patterns in the data without prior training or predefined subcategories. Principal Component Analysis (PCA) is a method of reducing the complexity of large data to a small number of components that explain the larger percentage of the variance in the data set.

The authors plotted the size exclusion chromatography and cation exchange chromatography data to two-dimensional coordinates representing the 2 components (PC1 and PC2) that explained the most variance to identify patterns in the data. Analysis of the primary components of the chromatographic and peptide mapping data of the control samples did not show any outliers, which, according to the authors, supports the biosimilarity of the products.

The graph of the control and acid-stressed samples showed that the control samples were separated along the primary component 1 (PC1) axis, while the stressed samples were distributed along the PC2 axis. Samples of the same product were clustered “significantly next to each other,” the authors said, and their PCA results on control and acid-stressed samples suggest that 41% of the variance in the data is. due to applied stress, and 25% was due to inherent differences in the chromatographic profiles of the products.

Clustering analysis

The investigators also used 2 clustering techniques, k-means and spatial clustering based on the density of applications with noise (DBSCAN), on data from the first 2 PCs of their principal component analysis. According to the authors, cluster analysis is “an unsupervised exploratory technique aimed at finding a natural clustering in the data so that the elements of the same cluster are more similar to each other than to those of different clusters”.

Due to the “inherent variability” and “large number of possible structural variants” of monoclonal antibodies, the authors said, machine learning-assisted approaches have “great value” in assessing their critical quality attributes. They cited previous research using PCA to reveal trends in data on the biosimilarity and stability of other biologics, recombinant human growth hormone and infliximab.

The grouping of the K-means of the unstressed samples separated the products into 3 groups, the reference product and B2 each forming their own group, and B1 and B3 being assigned to the same group. DBSCAN has separated each product into its own cluster.

K-means pooling allowed the control samples and pH stressed samples to be separated into different groups, although the B2 control samples were pooled with the stressed reference product and the B3 samples. Pooled analysis suggested that B3 was most similar to the reference product under acid stress, while B2 was most similar under heat stress, and all products had a similar response to basic pH stress. The greatest variability between control samples was between the reference product and B2.

Finally, the application of principal component and cluster analyzes to the collective dataset of all applied chromatographic techniques supported the biosimilarity of the products, the authors said. This principal component analysis did not identify any sample that was significantly different from the others; k-means identified 3 clusters (reference product, B1 + B3 and B2), and DBSCAN identified 4 clusters, one containing each product.

The authors concluded that their results supported the biosimilarity of the products analyzed and “pointed out that with regard to the load and size profiles of the products studied, B2 showed higher variability (than B1 and B3) compared to HC under conditions of control and stress. They said the chromatographic fingerprints and machine learning results “were correlated and could reveal patterns related to the effect of different stress conditions on the different products studied.” They recommended that future studies explore other machine learning tools to interpret physicochemical data on biologics.

For further reading

The European Medicines Authority reports a pilot experiment to adapt the development of biosimilars or eliminate unnecessary testing, and the World Health Organization is developing guidelines to support the concept of adaptation.


Shatat SM, Al-Ghobashy MA, Fathalla FA, Abbas SS, Eltanany BM. Coupling of chromatographic profiling of trastuzumab with machine learning tools: a complementary approach for the assessment of biosimilarity and stability. J Chromatogr B Analyt Technol Biomed Life Sci. 2021; 1184: 122976. doi: 10.1016 / j.jchromb.2021.122976


Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top