Making Sense of Cannabis Strains Through Chemometrics in Review: Page 2 of 4

April 4, 2019
Abstract / Synopsis: 

The cannabis industry is constrained by the continued use of acronyms and nonstandard abbreviations for strain naming in lieu of a scientific-based standardized classification convention or lexicon. The rapidly expanding industry is evolving towards an evidence-based model of medicine where cannabis cultivars’ chemical and genotypic profiles can be correlated with sensory perception and pharmacological activities using multivariate analysis. Applying chemometric tools can result in not only the authentication of a given cannabis cultivar but also provide a quality control mechanism for both cannabis flower and any resulting cannabis-based drugs. Using chemometrics on cannabinoid and terpenoid expression data to segregate accessions into clusters provides the initial model on which to support targeted sequencing based on cosegregation of genetic markers associated with key agronomical and pharmacological traits. Such authenticated cannabis products command higher prices at both the wholesale and retail level.

What is Chemometrics?

Chemometrics is the use of statistical and mathematical methods to improve our understanding of chemical data (11). For the statistical analysis of chemical data, one often looks at multiple variable inputs (chemical components) and their interaction. Classic model assumptions are often not fulfilled by chemical data, for instance there will be less observations than variables, or correlations between the variables occur. For this purpose, multivariate data analysis, which is the simultaneous observation of more than one characteristic for a set of data, is particularly interesting and well suited for exploratory analyses to interpret patterns in the data and develop models. These models can then be routinely applied to future data to predict the same parameters of interest whether it is to discriminate the analysis of edible oils and fats by fourier transform-infrared (FT-IR) spectroscopy (12) or to detect the adulteration of virgin olive oil using mid-IR spectral chemometric data (13), the models are applicable to data obtained from any of a number of analytical instruments.


Classical Applications of Chemometrics

Principal component analyses (PCA) is one of the most common and simplest means to reduce information from multiple variables (for example, cannabinoid and terpenoid profiles) into synthetic variables (principal components) that summarize and encompass the variation and explains a certain percentage of the observed patterns. Cluster analysis (CA) provides another classic means to separate samples into groups that share a common property. These popular methods have been applied to clinical data for disease diagnostic (14), as an efficient and powerful tool for quality control and authentication of different herbs (15), the identification of the origin of consumer goods (16), and the classification of cannabis cultivars into chemovars (17). To take it one step further, one could model the observed clustering or structure based on chemical profiles to derive stable methods of naming taxonomic or pharmacological groups. 

Chemometrics has been applied for the past two decades for many diverse purposes including the classification of river water samples in Poland for pollution monitoring (18), to identifying adulterants in freeze dried coffee (19), and to identify the authenticity of food based on quality attributes (20). The application of chemometrics to cannabis has been a natural extension. Because chemical constituents can vary for any crop group depending on growing environment, harvest time, and subsequent drying and curing conditions, it seems reasonable to apply any of a number of analytical techniques available. Those techniques include high performance liquid chromatography (HPLC) coupled to mass spectrometry (MS) and gas chromatography (GC) coupled to MS, which can be used to establish the profile of cannabis-based chemicals through chemotyping to assure the repeatability and quality of pharmacologically active compounds in a given formulation. The concept of equivalence in herbal formulations was started in Germany (21) to establish clinically proven reference material.


Chemometrics Applied to Cannabis Breeding

Classical approaches to breeding are based on the selection of particular traits of interest in a large set of germplasm. The objectivity of the selection pressure exerted by plant breeders will depend on the techniques available to decipher cryptic traits. Chemical expression is temporarily cryptic for volatile compounds, such as terpenoids, and permanently cryptic in the case of cannabinoids that do not have perceivable odors. As such, the ability to detect particular molecules at early stages has the potential to speed up breeding efforts and reduce financial burden of extensive breeding experiments.

A prime example of a targeted breeding effort in cannabis was undertaken by Napro Research (22). Starting from a foundational breeding program developed by Ryan Lee, and using only conventional breeding techniques, the team selected from thousands of individuals plants and from a large set of varieties to develop high terpenoid and resin producing lines. These plants were initially screened for cannabinoids and terpenoids, and about 20 varieties of interest were thus selected for particular breeding goals, including rare traits and high, or interesting essential oil production. Using scaled metrics, the authors prioritized breeding pairs and followed the offspring through several generations to stabilize the traits of interest. They were the first to demonstrate the ability to produce cannabis accessions with divergent cannabinoids chemotypes with type I, II, and III plants expressing identical terpenoid profiles. The ability to modulate cannabinoid ratios while maintaining stable essential oil production offers many promises for both the medical and recreational markets. This information was further refined into color coded archetypes based on the full chemical profile of each variety and can be found online (47).


Application of Chemometrics to Cannabis Everyday

Besides the hopes to accelerate targeted breeding programs, insight into the chemical expression of individual cannabis varieties offers great promise to reach a consensus in terms of product nomenclature. Categorizing and naming biodiversity is an artificial agreement humanity has decided upon, using objective metrics such as chemical profiles that provide a repeatable means to classify organisms below the species level.

As large state- or nation-wide datasets containing chemical information of cannabis varieties from a large number of cultivators emerge, so does our capacity to synthesize this information into statistically supported groups. At the consumer level, this will translate into a meaningful system relying on multiple lines of evidence such as genetic lineage and chemical expression of terpenoids with given pharmacological properties that will help guide the end user and the medical professional towards an enhanced understanding of the cause-effect relationship between chemical profile and intended pharmacological effects.


Basis for Proprietariness

Assigning a chemoprofile and associated genotype to a cannabis cultivar would enhance the value of that cultivar and enable the ability with genotyping to acquire proprietary status through cultivar registration, trademarking, and patenting. In that vein, a number of utility patent applications have been filed with regards to specialty cannabis, their chemical profile, and processes used to generate them (23–25). Other plant patent applications have also been filed for hybrid (26–28) and heirloom (29) varieties, making claims for particular cannabinoid or terpenoid expression profiles, such as the variety Avidekel with a high amount of CBD (16.3%) and a very low amount of THC (0.8%).

Besides making for more defensible intellectual property (IP), having a chemical and genetic fingerprint of a cultivar would allow for authentication of the product at any time in the future and cultivar registration would prevent the reuse of the same cultivar name. Standardized analytical methods and data analytics are required to routinely characterize the large range of biologically active secondary metabolites made by the cannabis plant. Because the chemical profile can be influenced by the growing conditions and environment, it is important to also trace unique genetic markers associated with the desired chemotype. The associated genetic markers can be acquired from genotyping data and are required for the future breeding of cultivars specific for pharmacological use, fiber, food, or fuel.

  1. J. Sawler, J.M. Stout, K.M. Gardner, D. Hudson, J. Vidmar, and L. Butler, et al., PLoS One 10, e0133292. (2015).
  2. P. Henry, PeerJ PrePrints 3, e1980, doi: 10.7287/peerj.preprints.1553v2 (2015).
  3. R. Lynch, D. Vergara, S. Tittes, K. White, C.J. Schwartz, M.J. Gibbs, T.C. Ruthenburg, K. deCesare, D.P. Land, and N.C. Kane, Crit. Rev. Plant Sci. 35, 349–363, (2015).
  4. P. Henry, PeerJ PrePrints 5, e3307v1, (2017).
  5. C. Dufresnes, C. Jan, F. Bienert, J. Goudet, and L. Fumagalli, PLoS ONE 12(1), e0170522, (2017).
  6. K. McKernan, Y. Helbert, V. Tadigotla, S. McLaughlin, J. Spangler, L. Zhang, and D. Smith, bioRxiv doi: (2015).
  7. L. Ericksson, T. Byrne, E. Johansson, J. Trygg, and C. Vikström, Multi- and Megavariate Data Analysis Part 1: Basic Principles and Applications, Second Ed. (Umetrics, Umea, Sweden, 2006).
  8. L. Eriksson, E. Johansson, N. Kettaneh-Wold, J. Tyrgg, C. Wikström, and S. Wold, Multi- and Megavariate Data Analysis Part 1: Basic Principles and Applications, Second Ed. (Umetrics, Umeå, Sweden, 2006).
  9. A. Gilbert and J.A. DiVerdi, PLoS One 13(2), e0192247, (2018).
  10. D. Sweeney, “Mendocino County divided into cannabis appellations,” North Bay Business Journal (2016).
  11. M. Otto, Chemometrics. Statistics and Computer Application in Analytical Chemistry, (Wiley-VCH, New York, 1998).
  12. Y. Hong, et al., Food Chem. 93, 25–32 (2004).
  13. G. Gurdeniz and B. Ozen, Food Chem. 116, 519–525 (2009).
  14. N.A. Dang, H.G. Janssen, and A.H. Kolk, Bioanalysis 5(24), 3079–3097 (2013).
  15. H.A. Gad, S.H. El-Ahmady, M.I. Abou-Shoer, and M.M. Al-Asisi, Phytochemical Analysis 24(1), 1–24, (2012)
  16. I. Geana, A. Iordache, R. Ionete, A. Marinescu, A. Ranca, and M. Culea, Food Chem. 13, 1125–113 (2013).
  17. A. Hazekamp, K. Tejkalova, and S. Papadimitriou, Cannabis and Cannabinoid Research DOI: 10.1089/can.2016.0017 (2016).
  18. T. Kowalkowski, R. Zbytniewski, J. Szpejna, and B. Buszewski, Water Research 40, 744–752 (2006).
  19. R. Briandet, E.K. Kemsley, and R.H. Wilson, J. Science of Food and Agriculture 71, 359–366 (1996).
  20. L.M. Reid, C.P. O’Donnell, and G. Downey, Trends Food Sci. Technol. 17, 344–353 (2006).
  21. V.E. Tyler, J. Nat. Prod. 62, 1589–15792 (1999).
  22. M.A. Lewis, E.B. Russo, and K.M. Smith, Planta Med. 84, 225–233 (2018).
  23. E. De Meijer, “Cannabis sativa plants rich in cannabichromene and its acid, extracts thereof and methods of obtaining extracts therefrom.” Google Patents, (2011).
  24. M.A. Lewis, M.D. Backes, and M. Giese, “Breeding, production, processing and use of specialty cannabis.” Google Patents, (2015).
  25. M.W. Giese MW and M.A. Lewis, “Systems, apparatuses, and methods for classification.” Google Patents, (2016).
  26. Y. Cohen, “Cannabis plant named ‘avidekel’.” Google Patents, (2014).
  27. Y. Cohen, “Cannabis plant named erez.” Google Patents, (2014).
  28. Y. Cohen, “Cannabis plant named midnight.” Google Patents, (2014).
  29. S.W. Kubby, “Cannabis plant named ‘Ecuadorian Sativa’.” Google Patents, (2016).
  30. O. Aizpurua-Olaizola, U. Soydaner, E. Öztürk, D. Schibano, Y. Simsir, P. Navarro, N. Etxebarria, and A. Usobiaga, J. Natural Products 79, 324–331 (2016).
  31. M. Sexton and J. Ziskind, “Sampling cannabis for analytical purposes.” (2013).
  32. D.J. Potter, Drug Testing Anal. 58, S54–S61, http:// (2013).
  33. C. Orser, S. Johnson, M. Speck, A. Hilyard, and I. Afia, Natl. Prod. Chem. Res. DOI: 10.4172/2329-6838.1000304 (2017).
  34. Hillig KW (2004) A chemotaxonomic analysis of terpenoid variation in Cannabis. Biochem. Syst. Ecol. 32, 875–891.
  35. Hazekamp A, Fischedick JT (2012) Cannabis – from cultivar to chemovar.  Drug Test Anal 4:660–667.
  36. J.T. Fischedick, A. Hazekamp, T. Erkelens, et al., Phytochemistry 71, 2058–2073 (2010).
  37. E.B. Russo, Frontiers in Pharmacology 7, 1–19 (2016).
  38. B. Russo, Psychopharmacology 165, 431–432 (2003).
  39. E.B. Russo, Br. J. Pharmacology 163, 1344–1364 (2011).
  40. S. Elzinga, J. Fischedick, R. Podkolinski, et al., Nat. Prod. Chem. Res. 3, 1–9 (2015).
  41. G. Buchbaue, in Handbook of Essential Oils:  Science, Technology and Applications, K.H.C. Baser and G. Buchbauer, Eds. (CRC Press, Boca Raton, Florida, 2010) pp. 235–280.
  42. J.K. Booth, J.E. Page, and J. Bohlmann, PLOS One (2017).
  43. R.E. Schultes, W.M. Klein, T. Plowman, et al., “Cannabis: An Example of Taxonomic Neglect. Botanical Museum Leaflets, Harvard University 23, 337–367 (1974).
  44. S. Johnson, A. Hilyard, P. Henry, S. Tholson, A. Everett, M. Speck, and C. Orser, “Terpenoid Chemoprofiles Distinguish Drug-type Cannabis sativa L. Cultivars in Nevada,” The Emerald Conference (Poster presentation), San Diego, California, 2018.
  45. Medreleaf (2018) MR2017002
  46. MGC (2018) Blockchained DNA: The Information Chain for Advanced Growers and Regulators,


Cindy Orser, PhD, with Digipath Labs in Las Vegas, Nevada. Philippe Henry, PhD, is with VSSL Enterprises in Kelowna, British Columbia, Canada. Direct correspondence to [email protected].


How to Cite This Article

C Orser and P Henry, Cannabis Science and Technology 2(2), 38-47 (2019).