Making Sense of Cannabis Strains Through Chemometrics in Review

April 4, 2019
Figure 1: Map of the geographical distribution of the cannabis gene pools.
Figure 1: Map of the geographical distribution of the cannabis gene pools. Boxed labels indicate area of origin from which the cannabis plant spread alongside humanity. North American drug-type varieties (MJ) are likely stabilized poly-hybrids of BLD (C. sativa afghanica) and NLD (C. sativa indica).
Abstract / Synopsis: 

The cannabis industry is constrained by the continued use of acronyms and nonstandard abbreviations for strain naming in lieu of a scientific-based standardized classification convention or lexicon. The rapidly expanding industry is evolving towards an evidence-based model of medicine where cannabis cultivars’ chemical and genotypic profiles can be correlated with sensory perception and pharmacological activities using multivariate analysis. Applying chemometric tools can result in not only the authentication of a given cannabis cultivar but also provide a quality control mechanism for both cannabis flower and any resulting cannabis-based drugs. Using chemometrics on cannabinoid and terpenoid expression data to segregate accessions into clusters provides the initial model on which to support targeted sequencing based on cosegregation of genetic markers associated with key agronomical and pharmacological traits. Such authenticated cannabis products command higher prices at both the wholesale and retail level.

The over-proliferation of cannabis strain names following the establishment of cannabis complicit-states in the U.S. has led to confusion and resulted in a lack of transparency for the consumer at the dispensary.  There are many contributing factors as to why we find ourselves in this current state of disorganization, but primarily it is a consequence of the covert nature of the industry for the past 70 years where amateur plant breeders have been busy at work creating undocumented hybrid strain heritage resulting in a largely indefensible distinction between indica and sativa even though the vast majority of cannabis sold in dispensaries still hold on to this insupportable demarcation.

Beyond the vernacular conventions, the genus Cannabis harbors immense genetic diversity that is thought to segregate into four main gene pools (Figure 1): Narrow Leaf (European) Hemp (NLH; C. sativa sativa), Broad Leaf (Chinese) Hemp (BLH; C. sativa chinensis), Narrow Leaf Drug-type (NLD; C. sativa indica), and Broad Leaf Drug-type (BLD; C. sativa afghanica). Support for this proposed clustering into sub-species has recently gained support from investigations using genetic markers suitable for within species comparison, such as single nucleotide polymorphisms (SNPs) (1–4) and microsatellites or simple sequence repeats (SSRs) (5). Such endeavours have yielded some congruent patterns, but in a general sense, all authors agree on the fact that cannabis strains found in the current medical and recreational markets in North America (referred to hereon as MJ strains) are extensive hybridized plants (four-way poly-hybrids) with NLD and BLD ancestry and with high cannabidiol (CBD) varieties incorporating some portion of the pool of the European or Chinese hemp or novel mutants in cannabinoid synthesis pathways.

Further complications caused by the black-market of cannabis breeding over the past century has caused much speculation as to the origins of particular traits (such as the origin of minor cannabinoids, for example, cannabichromene [CBC], tetrahydrocannabivarin [THCV], and cannabidivarin [CBDV]). Of particular note, accessions of divergent origins may display similar traits as a result of intense selective pressures imposed by cultivators. One such example is the fact that modern drug-type cannabis has accumulated multiple copies of the tetrahydrocannabinolic acid synthase (THCAS) gene, thus potentially responsible for the increasing expression of THC in commercial cannabis accessions compared to heirloom varieties (6). 

North American cannabis strain naming is in need of adoption of a structured classification scheme based on horticultural and agronomic standards. The cannabis industry is being constrained by the lack of a scientific-based standardized classification convention and the continued use of acronyms and nonstandard abbreviations. The ramifications include a lack of quality control of product, a hit and miss process based on the sophistication of the entity and specific state regulations, and enforcement of those regulations. Therefore, the cannabis consumer patient often times has no real idea of the composition, consistency, or comparability of the cannabis product that they purchase. In today’s sophisticated world, the persistence of the current vernacular nomenclature combined with classifying cannabis chemovars as sativa or indica is scientifically indefensible based on peer-reviewed findings (1).

In these scientific times, exceptional investigative data analytic tools are available to bring definition to cannabis strains and clear the way to provide meaningful enlightenment including the basis for intellectual property. The rapidly expanding acceptance of legal cannabis in the U.S. on a state-by-state basis and the current Canadian-wide legal pot industry has seen new serious scientific attention drawn to address this flagrant deception at the consumer level. Data analytic tools include high resolution mass spectrometry to determine the chemical profile of cannabis strains, principle component analysis (PCA) to analyze the data (7,8), and genotyping to identify unique SNPs associated with the particular chemical profile of a given cannabis cultivar and coming soon, sensory profiling (9). 

The opportunity now exists to connect the human experience of sensory perception via our olfactory receptors with the quantifiable chemical phenotype of individual cannabis cultivars from analytical chemical analysis and as verified through genotyping and to one day arrive at an identifiable physiological endpoint. As Mendocino County, California seeks to establish cannabis appellations based on soil and microclimate in the same vein of viniculture (10), it is imperative today to apply validated scientific principles to assign a standard lexicon with concise descriptors. It will no longer be enough to apply only one descriptor to identify a cannabis cultivar; chemical analysis combined with genotyping and human smell will complete the denomination and eliminate misconceptions and worse, consumer fraud.

The rapidly expanding world cannabis market, which is growing at a much faster pace than the state-by-state adoption in the U.S., should be motivation for cannabis cultivators to adopt a uniform cannabis classification. As U.S. states are starting up cannabis programs one-by-one with little cooperation or standardization, entire countries are doing so with economic efficiency at the same pace. Globally, about 2.25% of the population consumes cannabis. Both medical and recreational cannabis make up a multibillion-dollar global industry. Lawful medical cannabis programs have already been implemented in Canada, Mexico, the United Kingdom, the Netherlands, Australia, Germany, Italy, Israel, Poland, the Czech Republic, Spain, Greece, Colombia, Uruguay, Peru, South Africa, and many other countries in quick pursuit regardless of global drug policy, which the World Health Organization is poised to re-evaluate in 2018.


Why Do We Need This?

The main barrier to the adoption of a new cannabis nomenclature will be changing human behavior, given that cannabis variants have been introduced, named, and hybridized at will until now. The motivating event for adoption of a new nomenclature will occur in Canada and California, where legal, regulated cannabis came online in 2018. California will be the largest cannabis recreational market in the biggest agricultural economy in the United States, and Canada is already a global cannabis exporter. Cannabis cultivation will rapidly mature and a cannabis registry will be one important part of that process as the industry rapidly evolves toward big agriculture versus boutique cannabis growers; in both instances, cannabis cultivar authentication will be key to success, and an important means of keeping market share.

An added value of having authenticated raw cannabis material is to help progress our understanding of the medical benefits of a particular cannabis chemotype, with the ultimate goal of correlating chemotypes with specific pharmacological outcomes. Ultimately it will not be the cultivar name that will be sought out, but the chemoprofile it produces, and the sensory perception that it elicits, at which point cultivars sharing the same chemoprofile could be combined prior to extraction and formulation.


The Approach and Challenge

Regulators should institute broader chemical profiling, genotyping, and mandatory cannabis cultivar registration with specific criteria required prior to ever growing the cultivar. Currently, only Nevada and Massachusetts require terpenoid analysis on every cannabis sample. Further confounding the situation for the recreational cannabis consumer and medical marijuana patient is the sole reliance placed on tetrahydrocannabinol (THC) content to establish the inherent value of flower, rather than taking the entirety of the pharmacologically active chemoprofile of the plant into account. Both state regulators and cannabis testing laboratories can help relieve the growing uncertainty. And those cannabis testing laboratories that go beyond simply quantifying cannabinoid potency and quality assurance are in a unique position to demystify cannabis strains through expanded chemometrics, genotyping, and consumer education. Both approaches are aimed to give cannabis consumers more confidence in what they are purchasing. On the upside, findings indicate that cannabis consumers are willing to pay a premium for genotyped, authenticated flower.

Although considerable chemoprofiling data have been gathered by various cannabis laboratories and groups, because of the lack of standardization in analytical methods used to collect chemoprofile data, one can never be fully confident with cross analyses. Likewise, because of nonstandardized sequencing approaches in genotyping, not all genetic sequence data are comparable. Standardization of analytical methodologies is urgent. 


  1. J. Sawler, J.M. Stout, K.M. Gardner, D. Hudson, J. Vidmar, and L. Butler, et al., PLoS One 10, e0133292. (2015).
  2. P. Henry, PeerJ PrePrints 3, e1980, doi: 10.7287/peerj.preprints.1553v2 (2015).
  3. R. Lynch, D. Vergara, S. Tittes, K. White, C.J. Schwartz, M.J. Gibbs, T.C. Ruthenburg, K. deCesare, D.P. Land, and N.C. Kane, Crit. Rev. Plant Sci. 35, 349–363, (2015).
  4. P. Henry, PeerJ PrePrints 5, e3307v1, (2017).
  5. C. Dufresnes, C. Jan, F. Bienert, J. Goudet, and L. Fumagalli, PLoS ONE 12(1), e0170522, (2017).
  6. K. McKernan, Y. Helbert, V. Tadigotla, S. McLaughlin, J. Spangler, L. Zhang, and D. Smith, bioRxiv doi: (2015).
  7. L. Ericksson, T. Byrne, E. Johansson, J. Trygg, and C. Vikström, Multi- and Megavariate Data Analysis Part 1: Basic Principles and Applications, Second Ed. (Umetrics, Umea, Sweden, 2006).
  8. L. Eriksson, E. Johansson, N. Kettaneh-Wold, J. Tyrgg, C. Wikström, and S. Wold, Multi- and Megavariate Data Analysis Part 1: Basic Principles and Applications, Second Ed. (Umetrics, Umeå, Sweden, 2006).
  9. A. Gilbert and J.A. DiVerdi, PLoS One 13(2), e0192247, (2018).
  10. D. Sweeney, “Mendocino County divided into cannabis appellations,” North Bay Business Journal (2016).
  11. M. Otto, Chemometrics. Statistics and Computer Application in Analytical Chemistry, (Wiley-VCH, New York, 1998).
  12. Y. Hong, et al., Food Chem. 93, 25–32 (2004).
  13. G. Gurdeniz and B. Ozen, Food Chem. 116, 519–525 (2009).
  14. N.A. Dang, H.G. Janssen, and A.H. Kolk, Bioanalysis 5(24), 3079–3097 (2013).
  15. H.A. Gad, S.H. El-Ahmady, M.I. Abou-Shoer, and M.M. Al-Asisi, Phytochemical Analysis 24(1), 1–24, (2012)
  16. I. Geana, A. Iordache, R. Ionete, A. Marinescu, A. Ranca, and M. Culea, Food Chem. 13, 1125–113 (2013).
  17. A. Hazekamp, K. Tejkalova, and S. Papadimitriou, Cannabis and Cannabinoid Research DOI: 10.1089/can.2016.0017 (2016).
  18. T. Kowalkowski, R. Zbytniewski, J. Szpejna, and B. Buszewski, Water Research 40, 744–752 (2006).
  19. R. Briandet, E.K. Kemsley, and R.H. Wilson, J. Science of Food and Agriculture 71, 359–366 (1996).
  20. L.M. Reid, C.P. O’Donnell, and G. Downey, Trends Food Sci. Technol. 17, 344–353 (2006).
  21. V.E. Tyler, J. Nat. Prod. 62, 1589–15792 (1999).
  22. M.A. Lewis, E.B. Russo, and K.M. Smith, Planta Med. 84, 225–233 (2018).
  23. E. De Meijer, “Cannabis sativa plants rich in cannabichromene and its acid, extracts thereof and methods of obtaining extracts therefrom.” Google Patents, (2011).
  24. M.A. Lewis, M.D. Backes, and M. Giese, “Breeding, production, processing and use of specialty cannabis.” Google Patents, (2015).
  25. M.W. Giese MW and M.A. Lewis, “Systems, apparatuses, and methods for classification.” Google Patents, (2016).
  26. Y. Cohen, “Cannabis plant named ‘avidekel’.” Google Patents, (2014).
  27. Y. Cohen, “Cannabis plant named erez.” Google Patents, (2014).
  28. Y. Cohen, “Cannabis plant named midnight.” Google Patents, (2014).
  29. S.W. Kubby, “Cannabis plant named ‘Ecuadorian Sativa’.” Google Patents, (2016).
  30. O. Aizpurua-Olaizola, U. Soydaner, E. Öztürk, D. Schibano, Y. Simsir, P. Navarro, N. Etxebarria, and A. Usobiaga, J. Natural Products 79, 324–331 (2016).
  31. M. Sexton and J. Ziskind, “Sampling cannabis for analytical purposes.” (2013).
  32. D.J. Potter, Drug Testing Anal. 58, S54–S61, http:// (2013).
  33. C. Orser, S. Johnson, M. Speck, A. Hilyard, and I. Afia, Natl. Prod. Chem. Res. DOI: 10.4172/2329-6838.1000304 (2017).
  34. Hillig KW (2004) A chemotaxonomic analysis of terpenoid variation in Cannabis. Biochem. Syst. Ecol. 32, 875–891.
  35. Hazekamp A, Fischedick JT (2012) Cannabis – from cultivar to chemovar.  Drug Test Anal 4:660–667.
  36. J.T. Fischedick, A. Hazekamp, T. Erkelens, et al., Phytochemistry 71, 2058–2073 (2010).
  37. E.B. Russo, Frontiers in Pharmacology 7, 1–19 (2016).
  38. B. Russo, Psychopharmacology 165, 431–432 (2003).
  39. E.B. Russo, Br. J. Pharmacology 163, 1344–1364 (2011).
  40. S. Elzinga, J. Fischedick, R. Podkolinski, et al., Nat. Prod. Chem. Res. 3, 1–9 (2015).
  41. G. Buchbaue, in Handbook of Essential Oils:  Science, Technology and Applications, K.H.C. Baser and G. Buchbauer, Eds. (CRC Press, Boca Raton, Florida, 2010) pp. 235–280.
  42. J.K. Booth, J.E. Page, and J. Bohlmann, PLOS One (2017).
  43. R.E. Schultes, W.M. Klein, T. Plowman, et al., “Cannabis: An Example of Taxonomic Neglect. Botanical Museum Leaflets, Harvard University 23, 337–367 (1974).
  44. S. Johnson, A. Hilyard, P. Henry, S. Tholson, A. Everett, M. Speck, and C. Orser, “Terpenoid Chemoprofiles Distinguish Drug-type Cannabis sativa L. Cultivars in Nevada,” The Emerald Conference (Poster presentation), San Diego, California, 2018.
  45. Medreleaf (2018) MR2017002
  46. MGC (2018) Blockchained DNA: The Information Chain for Advanced Growers and Regulators,


Cindy Orser, PhD, with Digipath Labs in Las Vegas, Nevada. Philippe Henry, PhD, is with VSSL Enterprises in Kelowna, British Columbia, Canada. Direct correspondence to [email protected].


How to Cite This Article

C Orser and P Henry, Cannabis Science and Technology 2(2), 38-47 (2019).