Making Sense of Cannabis Strains Through Chemometrics in Review: Page 3 of 4

April 4, 2019
Figure 2: Terpenoid chemovar clusters.
Figure 2: Terpenoid chemovar clusters.
Figure 3: Cannabinoid and terpenoid clustering.
Figure 3: Cannabinoid and terpenoid clustering.
Figure 4: Principal component analyses based on 21 SNPs.
Figure 4: Principal component analyses based on 21 SNPs identified using the chemometric model based approach described above. The 70 accessions shown were genotyped using Medicinal Genomics’ Strainseek V2 assay at about 10,000 SNPs, loading extraction was used to isolate SNPs associated with the most parsimonious chemovar classification model based on terpene expression, with three major groups including myrcene (red), limonene (blue), and terpinolene (green) dominant accessions.
Abstract / Synopsis: 

The cannabis industry is constrained by the continued use of acronyms and nonstandard abbreviations for strain naming in lieu of a scientific-based standardized classification convention or lexicon. The rapidly expanding industry is evolving towards an evidence-based model of medicine where cannabis cultivars’ chemical and genotypic profiles can be correlated with sensory perception and pharmacological activities using multivariate analysis. Applying chemometric tools can result in not only the authentication of a given cannabis cultivar but also provide a quality control mechanism for both cannabis flower and any resulting cannabis-based drugs. Using chemometrics on cannabinoid and terpenoid expression data to segregate accessions into clusters provides the initial model on which to support targeted sequencing based on cosegregation of genetic markers associated with key agronomical and pharmacological traits. Such authenticated cannabis products command higher prices at both the wholesale and retail level.

Terpenoids as Distinguishing Analytes

Since the vast majority of cannabis being grown in the U.S. is drug-type I defined as having a cannabidiolic acid (CBDA)–tetrahydrocannabinolic acid (THCA) ratio of <0.5%, various efforts have been underway to make sense of cannabis strain names through the use of data analytics on broader cannabis flower chemoprofiles. It turns out that cannabis terpenoids not only imbue pharmacologically important attributes, but uniquely provide a basis for a secondary nomenclature after established cannabinoid content. We, and others, have shown that while unique terpenoid chemoprofile patterns exist and can be assigned to a cannabis cultivar, the absolute amounts of any given terpenoid can be influenced by genetic, epigenetics, environmental, and cultivation factors (30–33).

Terpenes, or more accurately terpenoids, contribute the aromatic properties of cannabis and essential oils from many other plant species. The particular terpenoids associated with any given plant species turns out to be fairly specific; for example, limonene in lemons, beta-myrcene in mangoes, and so on. In cannabis, there is a range of potential terpenoids based on the genetics and expression profiles of a given cannabis cultivar. Terpenoids can also modulate the medicinal or recreational attributes of a given cannabis cultivar.

State-mandated cannabis testing regulations have resulted in a large database from the analysis of thousands of individual cannabis flower samples from artificially restricted geographical regions including terpenoids. The resulting detailed chemical database can serve as the basis for the development of a chemotaxonomic classification scheme outside of conjectural cultivar naming by strain. Of the roughly 140 identified terpenoids in cannabis, there seems to be consensus in the literature that between 17 to 19 are the most useful in defining a cannabis chemotype (34–36) and perhaps as few as three (33,37). Terpenoid content in the cured flower can range from 0.5% to 3% (36).

The obsession surrounding cannabinoids, in particular THC and CBD content fueled by growers and consumers alike, has overshadowed the importance of the terpenoid profile and content in specific cannabis cultivars. Today we know that terpenoids can be used to distinguish cannabis cultivars (17,33,37–40). Terpenoids demonstrate effects on the brain at very low ambient air levels in animal studies (41) and it is conjectured that terpenoids contained in cannabis also contribute to pharmacological activity as part of the entourage effect.

Broader chemotaxonomic classification schemes for cannabis cultivars have been reported based largely on cannabis strains grown in restricted geographic regions, such as in California under an unregulated testing environment (36) or in the Netherlands from strains grown by a single grower or collected from multiple commercial sources (17). In the only large-scale study, 2237 individual cannabis flower samples, representing 204 individual strains across 27 cultivators in a tightly regulated Nevada cannabis testing market, were analyzed across 11 cannabinoids and 19 terpenoids (32). Even though 98.3% of the samples were from drug type I cannabis strains by CBDA–THCA ratio of <0.5%, PCA of the combined terpenoid and cannabinoid dataset resulted in three distinct clusters that were distinguishable by terpene profiles alone, suggesting that just three terpenoid cluster assignments account for the diversity of drug-type cannabis strains currently being grown in Nevada (Figure 1). The inclusion of cannabinoid chemoprofile data did not add any further resolution beyond the three terpenoid clusters (Figure 2).


Terpenoid Clustering

In a previously published large-scale Nevada PCA study (33), the combined dataset resulted in three clusters distinguishable by terpene profiles alone (Figure 2) where cannabinoid content was of no distinguishing value (see Figure 3). Just three to five terpenoids—beta-myrcene, beta-caryophyllene, limonene, terpinolene, and gamma terpinene—were able to discriminate, which strongly suggests that a further delineation could be made based on the mandatory analysis of just three to five terpenoids by all independent cannabis testing laboratories (33). Importantly, when focusing on individual cultivars by name, such as Gorilla Glue #4 or Golden Goat, across studies, similar trends in chemoprofiles persist. Here we propose a further delineation of drug type I into the three subtypes based on limited terpene chemoprofiling:

  • Type IA: beta-myrcene, α-pinene, limonene, beta-caryophyllene
  • Type IB: gamma-terpenine, terpinolene, ocimene
  • Type IC: limonene, beta-myrcene, beta-caryophyllene, α-pinene (BLDT)

Others have shown that replicately-grown batches of the same cannabis cultivar produce remarkably consistent chemoprofiles (36) and that the three distinct genetic groups of broad leaflet drug type (BLDT), narrow leaflet drug type (NLDT), and hemp also show distinct terpenoid profiles overall (3). In the Nevada study, in an artificially restricted geographic region, individual cannabis cultivars remain remarkably consistent for terpenoid profiles, even across different cultivators, and cluster into one of the three groups.

It is interesting that for the past 70 plus years of covert cannabis breeding, primarily selecting for high THC content, the diversity and prevalence of terpenoids has seemingly been maintained. Now that the terpene synthase genes and transcriptome have been described for Cannabis sativa L., focused marker-assisted breeding programs will be able to modulate terpenoid content and create cultivars with standardized terpenoid profiles with the ratio of CBDA–THCA desired (42) and start to finally address the long taxonomically-neglected cannabis plant (43).

Future cannabis data analytic studies should take care to start with established stable genetic fingerprints of all cultivars included with precise note-taking on growth conditions to help understand any inherent future variability in terpenoid analytical testing data.

There is currently no adopted cannabis classification system based on terpenoid profiles, even though several scientists have promoted the idea. Furthermore, we now have numerous robust datasets that clearly reveal clustering power within a handful of terpenoids while the existing vernacular classification system, largely based on the phenotype of the strain, continues to be heavily promoted by several commercial entities spreading confusion today.


We Only Know What We Know

We can look to established agricultural commodity industries as models to see how new research findings—such as elucidating terpene pathways in the case of cannabis or introduction of pest-resistance in the case of wheat—can easily be incorporated, lead to registration of a new cultivar, and only strengthen the intellectual property position of specific cultivars. Even though we do not have a complete assemblage of the genes involved in what we believe are the pertinent pathways that contribute to the pharmacological activities of cannabis, nor do we have a complete picture of the human genetic variants that contribute to a therapeutic outcome, we have a starting point with terpenoids. Marketing and branding will require a more reliable experience or therapeutic outcome, and, therefore, tighter controls on authenticating what is actually being grown and processed.


Targeted SNP Assays to Mix and Match

Using chemometrics on cannabinoid and terpenoid expression data to segregate accessions into clusters provides the initial model on which to base targeted sequencing based on cosegregation of genetic markers associated with key traits of interest. Correlating the expression of diagnostic terpenes with variation at genetic loci thus offers the opportunity to identify informative mutations or SNPs in the cannabis genome that are associated with, for example, terpenoid expression (Figure 4) (4,44). These genetic markers can be assayed in a time and cost-effective manner using several variations of the polymerase chain reaction (PCR), as well as microfluidic approaches to enable high throughput genotyping.

A recent example includes 21 informative SNPs associated with terpene expression in cannabis samples from Nevada (Figure 4). The data comes from more than 2000 samples from 115 strains and 37 cultivators typed at 19 terpenoid and 11 cannabinoids markers (33). The chemotypic data, coupled with genome-wide genetic data from 70 accessions was used to constrain the structure of the dataset for use with the overfitting discriminant analysis of principal components (DAPC) algorithm described by Henry (2). One can note that the information contained in the 21 selected SNPs provide a broad means to classify accession into their respective groups based on their dominant terpene expression. Seven accessions in total appear to be misclassified into their inferred terpene group, five of which (Cherry Diesel, Helen Back #2, Sour Kush, Outer Space, and New York Sour Diesel) are predicted to express terpenes not typically seen based on the chemotypic data alone. This discrepancy may also be the result of varying environmental factors during cultivation, since the data used was from plants grown in different cultivation locations and not under a common-garden.

One particular advantage of using the genetic tools as a proxy for the above mentioned chemometric approach is that relatively simple equipment can be utilized to type these molecular markers and essentially make it a method of choice for rapid and cost-effective field and laboratory determination of cannabis origin and key agronomical and pharmacological traits. Quantitative PCR (qPCR) is considered the gold standard for accurate, sensitive, and fast quantification of nucleic acid sequences and has been established and validated for a broad range of applications including genotyping, pathogen detection, DNA methylation analysis, and other applications. Laboratory equipment is currently available under $6000 for a dual channel system (48).


  1. J. Sawler, J.M. Stout, K.M. Gardner, D. Hudson, J. Vidmar, and L. Butler, et al., PLoS One 10, e0133292. (2015).
  2. P. Henry, PeerJ PrePrints 3, e1980, doi: 10.7287/peerj.preprints.1553v2 (2015).
  3. R. Lynch, D. Vergara, S. Tittes, K. White, C.J. Schwartz, M.J. Gibbs, T.C. Ruthenburg, K. deCesare, D.P. Land, and N.C. Kane, Crit. Rev. Plant Sci. 35, 349–363, (2015).
  4. P. Henry, PeerJ PrePrints 5, e3307v1, (2017).
  5. C. Dufresnes, C. Jan, F. Bienert, J. Goudet, and L. Fumagalli, PLoS ONE 12(1), e0170522, (2017).
  6. K. McKernan, Y. Helbert, V. Tadigotla, S. McLaughlin, J. Spangler, L. Zhang, and D. Smith, bioRxiv doi: (2015).
  7. L. Ericksson, T. Byrne, E. Johansson, J. Trygg, and C. Vikström, Multi- and Megavariate Data Analysis Part 1: Basic Principles and Applications, Second Ed. (Umetrics, Umea, Sweden, 2006).
  8. L. Eriksson, E. Johansson, N. Kettaneh-Wold, J. Tyrgg, C. Wikström, and S. Wold, Multi- and Megavariate Data Analysis Part 1: Basic Principles and Applications, Second Ed. (Umetrics, Umeå, Sweden, 2006).
  9. A. Gilbert and J.A. DiVerdi, PLoS One 13(2), e0192247, (2018).
  10. D. Sweeney, “Mendocino County divided into cannabis appellations,” North Bay Business Journal (2016).
  11. M. Otto, Chemometrics. Statistics and Computer Application in Analytical Chemistry, (Wiley-VCH, New York, 1998).
  12. Y. Hong, et al., Food Chem. 93, 25–32 (2004).
  13. G. Gurdeniz and B. Ozen, Food Chem. 116, 519–525 (2009).
  14. N.A. Dang, H.G. Janssen, and A.H. Kolk, Bioanalysis 5(24), 3079–3097 (2013).
  15. H.A. Gad, S.H. El-Ahmady, M.I. Abou-Shoer, and M.M. Al-Asisi, Phytochemical Analysis 24(1), 1–24, (2012)
  16. I. Geana, A. Iordache, R. Ionete, A. Marinescu, A. Ranca, and M. Culea, Food Chem. 13, 1125–113 (2013).
  17. A. Hazekamp, K. Tejkalova, and S. Papadimitriou, Cannabis and Cannabinoid Research DOI: 10.1089/can.2016.0017 (2016).
  18. T. Kowalkowski, R. Zbytniewski, J. Szpejna, and B. Buszewski, Water Research 40, 744–752 (2006).
  19. R. Briandet, E.K. Kemsley, and R.H. Wilson, J. Science of Food and Agriculture 71, 359–366 (1996).
  20. L.M. Reid, C.P. O’Donnell, and G. Downey, Trends Food Sci. Technol. 17, 344–353 (2006).
  21. V.E. Tyler, J. Nat. Prod. 62, 1589–15792 (1999).
  22. M.A. Lewis, E.B. Russo, and K.M. Smith, Planta Med. 84, 225–233 (2018).
  23. E. De Meijer, “Cannabis sativa plants rich in cannabichromene and its acid, extracts thereof and methods of obtaining extracts therefrom.” Google Patents, (2011).
  24. M.A. Lewis, M.D. Backes, and M. Giese, “Breeding, production, processing and use of specialty cannabis.” Google Patents, (2015).
  25. M.W. Giese MW and M.A. Lewis, “Systems, apparatuses, and methods for classification.” Google Patents, (2016).
  26. Y. Cohen, “Cannabis plant named ‘avidekel’.” Google Patents, (2014).
  27. Y. Cohen, “Cannabis plant named erez.” Google Patents, (2014).
  28. Y. Cohen, “Cannabis plant named midnight.” Google Patents, (2014).
  29. S.W. Kubby, “Cannabis plant named ‘Ecuadorian Sativa’.” Google Patents, (2016).
  30. O. Aizpurua-Olaizola, U. Soydaner, E. Öztürk, D. Schibano, Y. Simsir, P. Navarro, N. Etxebarria, and A. Usobiaga, J. Natural Products 79, 324–331 (2016).
  31. M. Sexton and J. Ziskind, “Sampling cannabis for analytical purposes.” (2013).
  32. D.J. Potter, Drug Testing Anal. 58, S54–S61, http:// (2013).
  33. C. Orser, S. Johnson, M. Speck, A. Hilyard, and I. Afia, Natl. Prod. Chem. Res. DOI: 10.4172/2329-6838.1000304 (2017).
  34. Hillig KW (2004) A chemotaxonomic analysis of terpenoid variation in Cannabis. Biochem. Syst. Ecol. 32, 875–891.
  35. Hazekamp A, Fischedick JT (2012) Cannabis – from cultivar to chemovar.  Drug Test Anal 4:660–667.
  36. J.T. Fischedick, A. Hazekamp, T. Erkelens, et al., Phytochemistry 71, 2058–2073 (2010).
  37. E.B. Russo, Frontiers in Pharmacology 7, 1–19 (2016).
  38. B. Russo, Psychopharmacology 165, 431–432 (2003).
  39. E.B. Russo, Br. J. Pharmacology 163, 1344–1364 (2011).
  40. S. Elzinga, J. Fischedick, R. Podkolinski, et al., Nat. Prod. Chem. Res. 3, 1–9 (2015).
  41. G. Buchbaue, in Handbook of Essential Oils:  Science, Technology and Applications, K.H.C. Baser and G. Buchbauer, Eds. (CRC Press, Boca Raton, Florida, 2010) pp. 235–280.
  42. J.K. Booth, J.E. Page, and J. Bohlmann, PLOS One (2017).
  43. R.E. Schultes, W.M. Klein, T. Plowman, et al., “Cannabis: An Example of Taxonomic Neglect. Botanical Museum Leaflets, Harvard University 23, 337–367 (1974).
  44. S. Johnson, A. Hilyard, P. Henry, S. Tholson, A. Everett, M. Speck, and C. Orser, “Terpenoid Chemoprofiles Distinguish Drug-type Cannabis sativa L. Cultivars in Nevada,” The Emerald Conference (Poster presentation), San Diego, California, 2018.
  45. Medreleaf (2018) MR2017002
  46. MGC (2018) Blockchained DNA: The Information Chain for Advanced Growers and Regulators,


Cindy Orser, PhD, with Digipath Labs in Las Vegas, Nevada. Philippe Henry, PhD, is with VSSL Enterprises in Kelowna, British Columbia, Canada. Direct correspondence to [email protected].


How to Cite This Article

C Orser and P Henry, Cannabis Science and Technology 2(2), 38-47 (2019).