The phytochemical diversity of commercial Cannabis in the United States
Read the video transcript or watch the video version of a talk I gave summarizing the results of a recent study I co-authored about the chemistry of commercial cannabis.
Below is an auto-generated transcript of a talk I gave in November 2021 (beware of typos, etc!). It’s based on a recent study I co-authored about the chemical diversity of commercial Cannabis in the US. You can watch the video version here:
Well thank you everyone for coming and thank you guys for inviting me to this is super exciting for me i'm going to talk today the title of the talk is the phytochemistry of commercial cannabis in the united states and really what the talk will center on is largely some research that has been recently put online that i did with the university of colorado and some scientists over there on the chemistry of commercial cannabis in the united states and this is all rooted in a lot of the work that i do at leafly.
so just by way of background my name is nick and by day i'm the director of science and innovation at leafly which if you don't know what leaflet is it's a technology company in the legal cannabis industry it's the world's largest cannabis information resource a big part of leafly is a what you would call a three-sided marketplace so it's an e-commerce company not unlike something like amazon where we connect cannabis consumers to either the retailers or the brands that are actually providing legal access to those products in the space and there's also as we'll see a very large strain database and and content arm to leafly and so we provide a lot of educational resources to the consumer by night i run a podcast a long-form science podcast called mind and matter and i talk about a lot of stuff that's uh related to the topics i'll touch on today and many other things broadly speaking it's about how drugs biotechnology and the latest science impact our minds and bodies.
before i get into uh before i get into the talk itself my background is really in academic science so i've been at leafly and in legal cannabis industry for a little over five years and i started out before that about a decade in academia so i did my bachelors of science focused mainly on molecular genetics at the university of wisconsin i was really trying to understand how animals are built um from from molecules all the way up to the full organism and how they change over time i then went to harvard university to do my phd which was in systems neuroscience so how does the nervous system how does the brain actually work in awake behaving animals how does it create perceptions and behaviors and things like that and throughout that whole time i was very interested in cannabis i was very interested in psychoactive drugs both you know in their own terms how do they actually produce the psychoactive effects and the changes in states of consciousness that we have how does that work in the brain and also as a neuroscientist you often use these things as tools to actually you know probe what the brain is doing.
so long story short i'm an academic guy by training i have this sort of hardcore science background and then i get transposed into the legal cannabis industry and i'm going to kind of tell the story of of how that happened in many ways and how the type of mindset that i brought into the private sector into this very interesting and dynamic space translated into some of the work that i will share with you today so we will go over some background we will talk about the cannabis plant and some salient aspects of it we will talk about the legal cannabis industry and what i'll call traditional cannabis knowledge then i will spend the most most of the talk going into the research that i've done with some of my collaborators on the actual what the actual latest science of cannabis chemistry is saying and how that relates to some of that quote-unquote traditional cannabis knowledge and i'll spend a little time talking about the implications of that for for cannabis consumers in the legal industry and we'll try to leave some time for q&a i am going to go through a lot of slides just to be clear so this will be very information dense i think we will post it at some point after today so you can go back to it.
i like to start out with this slide to get people grounded in thinking about cannabis because in many ways cannabis as an organism is a lot like domestic dogs there are many many different dog breeds rottweilers great danes chihuahuas you name it they look different they act different lots of phenotypic diversity and what's super interesting about that is every single dog on the planet today no matter how well behaved or annoying it is no matter how big or how small it is no matter what kind of spots it has they were all uh evolved they were all domesticated from something that looked and acted very much like what we would recognize today as a as a wolf right so domestication can create phenotypic diversity very rapidly on evolutionary time scales we can go from something very much like a wolf to all of the diversity you see in dogs today in just a few thousand years and cannabis is basically the same in that sense we have taken something that was growing in the wild in central asia originally we've spread it all over the world and we have engaged in this process of domestication breeding and interbreeding different plants for different phenotypes and we've created a wonderful and beautiful diversity in this plant you can see that diversity you can smell that diversity you can taste it and you can even feel it allegedly when you smoke some of these different strains as they're called different strains of cannabis are the analog of the different breeds of dog that you see walking around and this right here this slide it it shows you the what i will call the fundamental mythos of the cannabis industry the entire industry since it's been legal and since well before that has been predicated on this categorization system here and the basic idea is that the way things look is related to the way they act and so the idea with indica hybrid and sativa are that there are three basic types of cannabis plants out there.
indica plants tend to look a certain way they tend to be shorter plants their leaves tend to be broader and so on and so forth sativa plants look a different way they tend to be taller and narrower for example and hybrid plants are in between they're a hybrid of the two and what people will tell you is that the way these things look is directly and clearly related to the way they will make you feel and so indicas are said to make you more calm and have more calming or sedative effects sativas are said to have more energizing effects hybrids are in the middle if you have ever walked into a dispensary or if you ever walk into one in the future anywhere in any state legal market here in the united states very likely what you will see is a bunch of cannabis flower products a bunch of different strains each of them will be categorized indica hybrid or sativa and you will have someone called a bud tender who's typically although not always going to be someone in their early mid-20s and typically although not always someone with very little science education and typically although not always someone paid minimum wage and they will tell you with a very high level of confidence how all of this stuff works indica plants will make you sleepy they will help if you have an issue with insomnia say sativa plants will make you feel you know energized and euphoric sometimes it even gets way way more specific than that they'll recommend individual strains with very interesting names for very specific effects or even in some ways acting almost like a doctor the same way that a doctor would prescribe you a drug in order to help with a certain ailment people are effectively being prescribed legal cannabis products in stores to help with something that might be a fairly innocuous or very serious actually.
The entire brand of leafly originally was actually created around this so the guys that started leafly actually live here in seattle they started approximately a decade ago and they came up with what became known as the tile system it was meant to reflect what i just told you and so you know as medical patients going into a dispensary for the first time you know even to this day if you walk into one of these places it's very overwhelming much of the time you see dozens maybe even hundreds of these things called strains each one belongs to one of these three groups and leafly really came up with a clever visualization system to make something very complicated and overwhelming very simple purple ones are indica they make you sleepy red ones are sativa they make you energized and we can think of all of these hundreds and hundreds of strains that we're being told about as tiles or nodes and something very much like the periodic table of chemistry and we can arrange it like this and now we've got a very clear map of how we're going to feel when we actually ingest these products i can tell you firsthand you know i know and i've i've heard early early big shot investors who who got into leafly say things like one of the reasons we um invested in leaflets because it was one of the only companies out there that was taking a scientific approach but again this visualization system reflects the central mythos of the industry and again really what it's saying is the way that plants look is directly and reliably related to the way they make you feel and so we're gonna explore that idea here
so as a scientist what i understood before i came in and as i was coming in was there's something very interesting going on here this is a diverse plant you can see the diversity you can smell the diversity it's right there in front of you um the diversity and the effects is also very interesting that's something that many people experience firsthand including myself i have many anecdotes i can tell you to that effect and i was always captivated by this mystery what is the logic there how can we understand this diversity and understand the different kinds of psychoactive and medicinal effects this plant has and how it relates to this diversity that we're seeing with our eyes.
i knew the answer to that as almost any scientist would is going to come from taking a chemistry first view because the way that something like this makes you feel the way that a product or a drug makes you feel is going to be a function a direct function of the chemistry inside of it right basically what cannabis is this plant is making a cocktail of drugs that are responsible both for its sensory attributes its psychoactive or recreational effects on the mind as well as its medicinal properties and that chemistry is contained in these very um beautiful structures on the flower of the female plant this is what we call the bud or the nug or just flower of the plant these things called trichomes and they're these small translucent structures that you see here and these secrete the essential oils of the plant.
If you zoom way way in this is what you see this is a microscopy image one of my famous most favorite microscopy images of all time it comes from a study in 2019 that was looking at cannabis trichomes and so what you see here is a structure um that's almost hair like you could almost think of it like like a like a hair structure that has almost like a sweat gland on it you've got a stalk here of plant cells that are holding up this head this top of the structure here and inside of that top part of this are secretory cells that are specialized to synthesize and secrete the molecules of interest here the cannabinoid acids the terpenes and some other things and basically this is a little bubble of essential oil a bubble of greasy oil and it contains all of the fun stuff all of the stuff that has psychoactive effects all the stuff that has medicinal effects and all of the stuff that you smell.
So what are those things well the two main classes of plant chemical compounds that you talk about here are the cannabinoids like thc and cbd and the terpenes like myrcine or beta-caryophyllene or any number of other ones that we could mention and these are two related classes of compounds the plant actually makes cannabinoid acids those cannabinoid acids are then activated or decarboxylated into the active psychoactive compound thc or the non-intoxicating but still psychoactive compound cbd there's also a number of other cannabinoids associated with this plant these compounds are bigger they are not volatile so you don't smell these compounds these are the principal psychoactive agent of the plant thc and some of its close chemical cousins cbd and other things which are also very interesting the terpenes on the other hand are a distinct but related class of compounds and these ones are volatile so they float up into the air and these are what you smell when you actually smell cannabis and each cannabis plant each different type of plant will produce a unique cocktail of all these things.
So we know that whatever particular chemical cocktail is produced by a particular plant is going to be largely what's dictating its psychoactive effect effects its medicinal potential as well as its sensory attributes the way that it smells the way that it tastes and all these things and so as a scientist you look at this and you say well this is this is where it's at you have to understand the logic by which the chemistry is different from different types of plants of cannabis and once you understand that you'll have a very clear map of how to think about differences in sensory attributes differences in psychoactive effects as well as the medical side of this the other thing i will mention before diving in further is we also know that for this plant as for many other creatures that you can study plants and animals there's going to be certain biological constraints that you expect when it comes to this chemistry and long story short when you look at the biochemistry of this plant there are certain biochemical cascades that happen in the plant the exact set of molecules that a particular plant produces is going to be determined by its genetics by the types of enzymes it has and all of that stuff is going to dictate how much thc acid it has compared to cbd acid is it going to have mostly one of those mostly the other one is going to have a combination of of each and it actually turns out that there's not freedom to vary in every direction here there's going to be some serious constraints on the ratios that can be produced based on the biochemistry and the genetics of the plant you're talking about so we'll keep that in mind we're going to see certain patterns of molecules we want to understand what those patterns are and then use that to think about all of the other fun stuff.
So what we're essentially asking here let's let's translate the language and the the central mythos of the cannabis industry into a more scientific hypothesis what you're saying if you believe and if you proclaim that indicas cause one effect sativas cause the opposite effect and hybrids cause another effect somewhere in the middle is you're saying that on average when you go and consume a product belonging to each category each of those is going to have a different set of chemical compounds inside of it or a different ratio of those compounds some of them might have more thc some of them might have more cbd they might have a different pattern of terpene molecules but if they reliably cause different effects they should reliably also have a different chemical composition so instead of thinking about the morphology and then jumping to conclusions about the effects i thought to myself well i'll just come in and i will get the data on the chemistry and i will disentangle all of this stuff and that's the story i'm going to tell you today.
But we have to take a slight detour because because there is a bit of a hiccup in the road so i released a study this is after i got to leafly and i started learning about the industry and i thought to myself well this is going to be a very simple problem to solve all i have to do is go talk to the labs that are legally testing all these products that have all of this chemistry data i will tell them how exciting an exercise this will be to understand this problem and really dig into the science here they'll share the data with me i'll figure out what the patterns are and then we're off to the races so long story short i did a study with my friend Michael Zoorob back at the school of government at Harvard and the reason we had to do this was i started literally calling labs and saying hey guys i'm a scientist i'm super excited about this let's look at the data and let's discover some stuff here and you know my naĂ¯vetĂ© coming from academia into the private sector was i didn't realize that everyone was going to say no i'm not just handing you my data so i was like okay that's a problem how do i get this data well it turned out i could do a FOIA request in Washington and i could get all of the data at least the potency data associated with the major cannabinoids from all the labs in the state of Washington and what Michael and i demonstrated was something that was already known in the industry at the time and this is known as sort of an open secret in the industry which is there's a problem with the labs and i'm going to come to that.
But i'll just say first we looked at THC and CBD content across these categories he said all right let's look at the indicas the hybrids and sativas if you google this stuff you'll find resources out there that say things like indicas have more THC than sativas you'll say things that say the exact opposite you'll say things like indicas have more CBD that's why they're sedating you you get all of this contradictory information so we said let's just look at the data we classify everything into hybrid indicative look at the THC content look at the CBD concept which i'm not showing you here and what you find is no difference.
So two things might explain that one there might not actually be a difference there two it might the difference might come from some of the minor connections in the terpenes that we could not look at because the data was not there but as we discovered there was another major problem if you actually look at the labs separately so what i'm showing you here is a graph of total THC content the distribution of total THC content in the state of Washington over a given period of time across six different labs and notice that each of these violins each of these bubbles here sits at a different location and notice that they also kind of have different shapes to them in some cases so on average the labs will kick out about 18 THC all the way up to about 23 THC and this is data that's that's a few years old now so that's interesting some labs are producing higher potency levels on average and i can tell you that we did all the statistical analysis that one would do and we showed that this this difference between the labs persists even when you control for the different producers and the different types of products that get submitted to these labs,
I'm just going to jump to the headline here what we showed formally is that absolutely statistically speaking these labs are producing different results what was already known at the time that explains this is there's a strong economic incentive to produce higher numbers for your clients labs that produce higher levels get more business that is a fact if you produce high levels you will get higher market share and in fact that's exactly what happened with lab e and lab f here lab f was subsequently suspended and shut down because they were found to be engaging in malpractice as a business lab e was suspended and i'm not sure if they're still around or not but as fate would have it the two labs that got suspended also were the ones producing the highest thc levels and this is this is again it's it's um an open secret in the industry it's been covered by leafly it's been covered elsewhere it happens all the time.
It's really because of this the why when you go to the store you walk around target or you walk on the grocery store or whatever why do you see so many things that are priced like this 19.99, 99.99 well it's because marketers are taking advantage of the way our brains work when i see a sign like this what your brain is probably doing is it's saying wow this product isn't even 20 bucks right 19.99 just feels so much cheaper than 20 bucks and something kind of similar psychologically is happening in the cannabis industry when a consumer walks into a dispensary what they're really doing and this has been formally documented is they're by and large trying to optimize the amount of thc they get per dollar that they spend so if you've got two cannabis flower products on the shelf right in front of you and they're the same price one of them is 21 and one of them is 18 percent which one you're gonna pick most consumers pick the one that's 21 when you talk to retailers what you will discover and producers is the producers are trying to sell to the stores they want their product in the stores the retailers will tend to tell those producers do not bring me cannabis flour that is less than 20 and the reason is that will sell slower and at a lower price because the consumer is going to gravitate towards those higher thc percentages that problem just cascades all the way back through the supply chain and what you get is producers shopping around from lab to lab and then bringing all their business to the one that gives them the highest number that's why we see patterns like this.
So this is now data that we analyzed at leafly because i discovered okay if i want to understand cannabis from a chemistry first perspective i need data from labs but because all the labs are not reliable i need to be very diligent about the labs that i actually take data from and so i have looked at data from dozens and dozens of labs across North America at this point here's a representative sample of eight and again average thc distributions per lab this is what you see there's some labs that give average percentages around 15 percent some to give average percentages up to 24 and notice that right the shape is different and notice that we kind of get a weirder funkier shape as you go up an average percentage here this lab4 here this is someone who's producing an average thc percentage of 20 percent quite a coincidence that it's right at 20 percent but also it's got this weird bubble up here this uh giant over representation of products that happen to be above 25 and again these these psychologically satisfying numbers 20 25 30 are what people are after that's how they spend their dollar and that's manifesting here and how the labs are allowing the results to produce look at this lab over here number one just blatantly skewed such that the majority of products are above 25 no biologically literate person who's familiar with cannabis would accept that is a legitimate measurement of the sample population in that state so you can't trust the data from all the labs
Long story short i spent a couple of years developing partnerships with labs across the country so we do the reason I have that data that you just saw is we actually require the labs to submit anonymized data samples to us so we statistically evaluate them and then we selectively work only with labs that we think are not clearly manipulating the data like like some of the examples you just saw so we're very diligent about that. It's largely just because i mean not only is it the right thing to do in an ethical sense but you know i'm a scientist if i want to discover something i want to be able to trust the data that i'm actually working with so one of the things we do lots of stuff with that data
One of the things that i did recently that i want to talk to you about is a research study that i did with some collaborators at the university of colorado that's what i'm going to tell you about next so we actually took that data from six different laboratory partners from six different states across the united states we pooled all that data so now we've got this very very large data set of cannabis chemistry that we have validated in different ways then i've done some diligence to acquire and we're going to tell you um what we what we found we investigated questions like are indicas hybrids and sativas meaningfully different in any way that maps onto the chemistry are these strain names these hundreds and hundreds of strain names that you see in the industry do they actually mean anything at all in terms of the chemistry we completely dissected the chemical diversity that we see in commercial cannabis in the united states and compared it to the labeling systems and categorization systems that you actually see in the commercial marketplace and as you will imagine as i as i go through this this has implications not only for leafly and how we interface with our consumer audience as an education resource but also for how the entire cannabis industry is sort of organized and how it's regulated ultimately this also has implications for things like like the medical side of it so i partnered with some people at the university of colorado in boulder there is a pre-print online that you can go read for free right now called the phytochemical diversity of commercial cannabis in the united states and i'm going to share with you some of the results from that
So first let's just look at cannabinoids what do we see across the board when we look at average cannabinoid content in flour for commercial cannabis this is what we see total cannabinoid content is what's being measured here the first bubble here the first violin is total thc content so notice we get a nice smooth gaussian or normal looking distribution as one would suspect it's not it does not have that weird shape like some of the ones that you saw before notice that the average here it's going to sit around 17 18 and that's what i would expect to see not only because i know that i trust these labs because i i did the diligence there but also when you look at the scientific literature when scientists actually go in and sign like real science labs not legal cannabis industry testing labs and they measure samples of cannabis from these different markets they get a similar result you don't get a result where the average is above 20 you get it right around this number notice that these other cannabinoids here these are all other cannabinoid molecules close cousins of thc but not thc that can be found in cannabis plants and sometimes are but notice that they're they're never really anything close to the abundance that you see for thc you'll get a total cbd content in that's high in a minority of legal products but for the most part they've got very low cbd levels cbg is another interesting non-psychoactive cannabinoid it's sometimes there it's fairly often they're at a low level one two three four percent maybe but very rare for it to go higher than that the other ones that you can talk about cbc cbn thcv occasionally even other ones by individual labs are measured um they're very rarely present at anything other than very very minuscule skull levels now when you start to look at pattern what do you see if we plot thc versus cbd this is what we see so every dot here is an individual cannabis flower product that you might find on a shelf somewhere and we're showing you the relationship between thc and cbd levels that you'll find in that plant and so notice that you kind of get three buckets that just naturally come out of this data and this is actually also what we expect we see this in other papers you see this in the wild although the the total abundance is not as high you get a lot of plants that are producing mostly thc with low cbd that's going to be 96 or more of the legal cannabis flower market thc dominant very low levels of cbd but you also get these things that we call balanced strains they've got a one to one or two to one or three to one ratio of thc and cbd and then of course you've got some products which we'll call cbd dominant high cbd levels and low thc levels.
But again 96% or more of everything is going to be thc we're going to focus on that group because that's what dominates the market and that's where those claims really come from around indica sativa and hybrid and the different strain names the the supposition people are making when you walk into that dispensary is that there are different types of thc dominant cannabis that have reliably different effects so let's look at how terpenes are spread out across those different strains we already know and i can tell you although i haven't shown you that the cannabinoids do not differ between indica, sativa and hybrid it's not like one has more thc and one has more cbd that's not the case but let's look at the terpenes there's a lot of diversity to look at there across the board let's just look at the most abundant terpenes
So again terpenes are these volatile compounds responsible for the aroma of the plant they also have very interesting pharmacological properties in many cases they are likely although this is still being studied having some modulatory impact on the actual psychoactive and or medicinal effects the plant has and there are dozens and dozens and dozens of terpenes these are the ones that are the most abundant notice that the basic pattern here is there's more or less a handful maybe a half dozen or so terpenes that you ever see at um reasonably significant levels so it's going to be very uncommon to find an individual plant with one percent of its weight or higher for an individual terpene that would be a very high terpene level notice that most of these are sitting below one percent myrcene is the most abundant caryophyllene limonene humulene and so on and so forth a lot of them are very found at very low levels in the plant these are the ones that are most abundant but just like the cannabinoids we need to look at the patterns the combination of these things that are found in individual plants because as i'll show you the terpenes can't be thought of one at a time necessarily because they're also correlated with each other or many of them are at least so what i'm showing you here is the correlation and the abundance of two terpenes so no matter what lab you look at in what state no matter what plant you look at what you tend to find is that when you've got high levels of this terpene called beta caryophyllene which is interesting because it has anti-inflammatory effects you also tend to have high levels of this terpene called humulene.
So why am i showing you this why is this important well we actually know the biological basis for this relationship these are two terpene molecules co-produced by the same enzyme at a particular ratio and that's why you see this nice tight linear correlation between these two compounds and of course because that's coming from the biology we expect to see that at each and every lab that we look at if they're measuring things properly and in fact that's what we see so we know that there can be these correlations between individual pairs of terpenes but there are many many terpenes found in each individual plant what we want to do is understand the sort of patterns across all of those terpenes that we see and to do that we use a variety of analytical techniques i am not going to go into too much detail here what you're seeing are graphs that show you different patterns of terpenes that tend to manifest in commercial cannabis flower
So what you're looking at on the left is a correlation matrix so just like i showed you the graph that showed you the correlation between beta carotene and humulene we can look at the correlation between every pair of terpenes that's found in these plants and we see interesting patterns emerge from that if we take a graph like that and we turn it into this network diagram on the right you can kind of see even if you don't know what this graph is you see how there's like different constellations that come out of there there's groups of terpenes that tend to be closely associated with one another.
If this was a graph of your facebook data for example we could take all the data from our facebook profiles and we could make a graph like this based on who we interact with and who we engage with the dots that are close together connected by thick lines that would be like someone that you communicate with a lot on facebook if there's a thinner line and the dots are further away that's someone that you don't interact with so much on facebook that would be what the social network graph looked like of your behavior this is almost like the social network graph of the different cannabis terpenes so the bottom line is when you look at the data and you analyze it certain terpenes tend to travel with each other in these groups some people call them entourages because there's this idea that in cannabis you get something called an entourage effect that the particular combination of compounds in any one plant is going to be is going to be impactful insofar as that particular set of compounds that entourage causes a reliably different set of effects compared to some other entourage and some other plant what we're showing you here is there are in fact different entourages that you can identify in these plants so to look at patterns in a analytical way to understand how this data based purely on chemistry based purely on on the analysis of the chemistry relates to those classification systems we talked about the beginning the traditional knowledge in cannabis culture
We have to do something called dimensionality reduction and again i'm not going to go into a detail about the math here but to give you an intuition for how this kind of analysis works you can just think about looking at an object and then shining a light on it to cast a shadow and so here what i'm showing you on this slide is just some complicated looking 3d sculpture right it's spherical it's got this very interesting pattern to it and when you shine a light on something like this you of course get a shadow that you can project onto a flat two-dimensional surface and so what i want you to take away from this is we're taking a three-dimensional object it's got three spatial dimensions we're turning it into a two-dimensional object the shadow is a compressed 2d version of this 3d object but we can still look at that shadow and discern a lot about the object from which it comes right you can see a lot of pattern in that shadow and we take our three-dimensional problem we turn it into an easier to study two-dimensional problem and we see something like a shadow we're gonna do that but we're gonna do that for the chemistry and that's that's through something in this case called principal components analysis
So what you're seeing here every dot is a sample of cannabis flower and there are tens and tens and tens of thousands of samples here and we're kind of looking at the shadow of the data so for every dot it's a flower sample for which we've measured dozens of terpene molecules so we've got dozens of dimensions to this data set we've projected that data that has dozens of dimensions onto just two dimensions because that's a much easier problem that we can just look at with our own eyes and doing that allows us to actually explain most of the variance in that terpene profile data set with only two dimensions and then what we've done is we've gone ahead and we've given every dot one color for the most abundant terpene in that sample so all i want you to understand from this graph is you see a particular pattern here right there's dots down here there's kind of another cloud up here and the colors sort of clump together so all of these blue dots those are cannabis samples that have mercy and as the most abundant terpene and they're all kind of sitting in the same neighborhood the yellow dots stand for limonene the fuchsia dots stand for caryophyllene.
So all of those samples are high in those terpenes so notice that those two colors are sitting right on top of each other and that's because those two terpenes travel together we've got this other neighborhood up here this other cluster of orange that's a different kind of cannabis plant producing a different kind of chirping profile and so what we can do is we can actually take a data set like this and we can run it through clustering algorithms the types of algorithms that are commonly used every day in many of the websites and the apps that you're familiar with a social media website will use these to group people based on their behavior netflix will use these to group movies based on the types of people that watch them and so on and so forth and what we're going to do is we're going to actually categorize cannabis in a completely objective kind of scientific way we're going to say we're going to feed in the chemistry data and we're going to segment all of the flower products into different groups we're going to partition them into different groups using a clustering algorithm the algorithm doesn't care how much how much you paid for it the algorithm doesn't care how good people say the cannabis is it doesn't care about the strain name or the indica sativa thing it's just going to be based on the chemistry and when we do that one good way to partition the data turns out to be to partition it into thirds so one of these algorithms tells us that a good way to explain the chemistry here is to cut the data into three different clusters that's super interesting right because that sort of story that's told throughout the industry is based on this tripartite categorization system with indica hybrid and sativa so now what we want to do is we want to ask how does that system that the industry is using maps onto this algorithmically defined set of clusters that's based entirely on the chemistry right how does how much does the science and the chemistry side of it actually match up to the industry side of it right and so to do that i'm just going to show you the same exact graph i'm going to change the graph on the left but i'm going to add purple for indica green for hybrid and red for sativa if that system is good at explaining the diversity of chemistry that we actually see in the data then we should see red dots in one part of the graph mostly by each other we should see purple dots in another part of the graph mostly by each other and then we should see the green dots somewhere in the middle for the hybrids that's not what we see here right by and large when you look at this it just looks like a mess right there's kind of dots all over the place and the colors do not cleanly segregate into different corners of the graph for the most part and what that means is that overall the indica/sativa/hybrid designation that is commonly used to categorize flower in the commercial space does not map onto the chemistry the chemical diversity that we see very well at all okay so indicative of hybrid not very good at explaining chemistry.
But what are these groups then that we found we did identify these three different chemically distinct types of cannabis that's the graph on the right what are these groups if we take the average of all the dots in each group what do we see well we'll just say this is group one group one is high thc cannabis that is characterized by having particularly high levels of the terpenes beta-caryophyllene and limonene this is the profile if you look at it as a radar chart like this that you'll tend to see in cannabis flower products that belong to this group the ones that have this type of chemical profile.
The types of strains as they're called in the industry that tend to show up in this group are things with a glue or cake or sweet sweet sounding cake stuff in the name so gorilla glue 4 a very popular cannabis train wedding cake that was leafly strain of the year a couple years ago gelato a lot of the strains have those kinds of names with glues or cakes or ice cream you know ice cream cake is another one those strains often show up in this group high beta-caryophyllene high-limonene they have that particular entourage of terpene profiles.
Group two has a different ratio of all these things the same compounds are present but at different ratios group two is characterized by very high levels of something called myrcene as well as relatively high levels of something called pinene right so it's a different cocktail of drugs essentially inside of the essential oil of these strains here is where you tend to see strain names like blue dream or granddaddy purple or nine pound hammer right so different kinds of strain names tend to exhibit these different kinds of entourages these different types of chemical signatures.
Group three the final one is characterized predominantly by high levels of this compound called terpinolene and here you see strain names like jack herer or super lemon haze or ghost strain haze these are all types of strains that tend to have this type of profile now what i can tell you which is um an area of controversy in the industry is uh there's a big open question around how reliable these strained names are because unlike other industries there's really no rules or regulations around these names i can grow a cannabis plant and if i'm licensed and i can sell it legally there's no restrictions on what i call it if someone's growing jack herer and putting a jack herer sticker on the bottle there's they don't have to prove that they genetically have something called the jack rare plant they don't have to prove that they have a particular chemical profile they can pretty much just pick whatever name they want and so you kind of get two the two extreme schools of thought here with lots of people in the middle would be the following um someone like a bud tender will often tell you that the strain names are actually quite reliable that they are reliable indicators of the way that a product is going to make you feel jack herer feels one way it's said to be energizing granddaddy purple feels another way it's said to be sedating if that's true what that must mean is that the labels the strain name labels that we see on the products should match up in a statistically meaningful way to the underlying chemistry that we see now the opposite school of thought and you'll tend to see this with people that have more of my background right if you go talk to a physician or a scientist they'll basically hear these strain names and and oftentimes start laughing like that you know people are just picking random names they're acute sounding names these are marketing tools this is completely and utterly meaningless and therefore if that's the case when we go look at the chemistry data we should see no no good correlation between the strain names that we see on the labels and the actual chemistry inside these products and so with the chemistry in hand we thought let's just look and see what's true so the way that we're going to do that so the way that
I'm going to display this to you and this is all in the paper if anyone really wants to geek out in the details is for every strain that we have a lot of data for and so these would be the strains that you see here are the ones we have the most data for dozens or many hundreds of samples in each case for each straining name here what we're going to do is we're going to compare all of the samples that share the same strain name so if you're looking at two blue dream samples for example maybe one comes from seattle maybe one comes from southern Washington we want to compare those two and say how similar are they chemically speaking and then we're gonna and then we're gonna compare each of those to a sample from California and a sample from florida and we're going to do that for all of the samples that we have now if the strain names are meaningful then when we pick two different blue dream products made by two different people they should tend to be reasonably similar in chemistry and when that's true we're gonna see bubbles show up near the top of this graph near the very top that means that things are relatively consistent and relatively reliable between products that share the same strain name across producers and across regions now if these strain names are totally bogus and people are just making stuff up and there's no relationship to the chemistry the bubble should mostly fall right around this dotted line and what that means if something's close to the dotted line it means if you pick two products that share the strain same strain name they are no more similar to each other on average than if you just close your eyes and pick two random products so that's what we want to ask are the strain names doing better than random or not and here's what we see and so the answer to the question how reliable are strain names for commercial cannabis is it depends on the strain name notice that we see a nice smooth sort of spectrum going from left to right here there's a bunch of strains that are relatively above that dotted line something like white tahoe cookies or purple punch is relatively consistent if you go find purple punch at one place and you'll find it in another place odds are good that'll be the same basic thing right that would be akin to uh me going to get a glass of merlot down the street here in seattle i could fly to miami tomorrow and i could order a glass from below it's not going to be identical but it's not going to taste like chardonnay right so that would be like a strain that's closer to the top of the graph here now what are these strains over here these ones that don't have stars are not statistically different from the dotted line and that means that for something like pineapple express or tangy if you pick two tangie products off the shelf they're gonna be no more likely to be similar to each other than if you just pick two random products completely randomly so some strain names are relatively consistent to a pretty good degree some strain names are relatively consistent but not quite as consistent and of course some are no better than random chance so let's look at that in another view.
This is another dimensionality reduction graph this is another machine learning visualization technique that you can use to look at high dimensional data so this is our chemical map of cannabis flower in the united states it's very similar to what i showed you before the colors here you've got three colors the orange the blue and the fuchsia those are our three clusters right those are what our algorithm defined as the basic chemical phenotypes or chemical types of cannabis and what we've done here is we've shown you those neighborhoods and we've superimposed purple punch and tangy samples onto this map so what we're showing you are all of the purple punch samples in black diamonds that's a relatively consistent strain name and notice what that means here in this context most of the black diamonds are over here in this neighborhood right they're all in cluster a the same basic type of cannabis they're not all in identical locations there's some variation but you can clearly see that they're mostly there and there's just a couple of outliers that you spot in these other neighborhoods tangie this other strain name very different story those diamonds there indicating tangie are all over the place you find them here you find them there you find them there they're just kind of all over that's what we mean by a relatively reliable or chemically consistent strain name versus one that's unreliable the reliability of even strain name depends very heavily on the particular name that you're talking about so this would be again to use my wine analogy you know purple punch would be something relatively consistent if i go try it in two different locations odds are pretty good it's going to be the same basic chemical profile not true for tng and that's what you see when you look at the individual profiles right so here on the top i'm just showing you the individual terpene profiles for purple punch and on the bottom tangy tangie is all over the place purple punch much more consistent not perfect by any means but more consistent.
What happens if we superimpose on our map here the indica city of a hybrid structure so i told you overall it's not really good at explaining the chemistry but something interesting happens here at first blush you see that it basically just looks like chaos the red the green and the blue do not separate very well but look at this cluster up here this is cluster number three right number three out of the three that we've defined using uh algorithms and data notice that there's not that much purple up there and it kind of looks like there's more red as well when we actually quantify the breakdown of indica hybrid and sativa for each of these three clusters what we see is that for cluster number three and only cluster number three something interesting comes out there's an over-representation of what you would call sativa or sativa dominant strain names in that cluster and an under-representation of those that you would classify as indica through the commercial system and what we find when we sort of look at that more closely is that a certain subset of things that are commonly classified as sativas belonging to certain lineages of cannabis strains are over represented in that group it's things like jack herer and the jack strains blackjack j1 xj13 it's things like the lemon haze strains lemon haze super lemon haze and so on and so you know if i if i take the liberty of speculating for a moment what i think we could be looking at with a data set like this is perhaps what we're seeing are the remnants of something that used to be much clearer perhaps one day decades and decades ago there actually were two or three clearly distinguishable types of cannabis maybe one was what people called indica one was what people called sativa maybe this cluster number three up here because we see an over-representation of certain sativas maybe those lineages were the original sativas maybe this high terpinolene profile is what originally was uh discerned to be this more energizing group we don't know the answer to that but it could be true um maybe things that people were classifying as indicas originally belong to one of these two other neighborhoods right and those are the more sedating ones we don't know the answer to those questions but what's wonderful about the present moment is they're actually testable so in principle you could get the genetics of these plants you could test the chemistry and if you were diligent and you did the proper properly controlled human studies you could actually discern whether or not certain types of chemical phenotypes different chemical phenotypes of cannabis cause different effects and they're scientists working on those kinds of problems today what we've provided here is sort of a map a chemical map of all of the common chemical profiles seen throughout commercial cannabis today and now i think the name of the game is really to go in and to determine what are the differences in the subjective effects that you might tend to get from strains and products that belong to these different chemically defined families and that's a really interesting topic
We in this paper right the whole point of this paper if you go and read it is to come up with a way to think about and to classify cannabis based on its actual chemistry not based on its morphology or what people say about it but based on its actual chemistry and that's the model that we come up with i'm not going to dwell on it but what does this actually mean when we start to think about the consumer side of this and we start to think about the industry side of this well we came to two conclusions from that work one the indica sativa hybrid classification system for the most part does not do a very good job at explaining the chemistry of these products right so that means if someone goes in and tells you with confidence indica will make you sleepy there's probably not a lot to that that's probably mostly just um in their head when someone says well that you should have a particular strain name for uh some effect that you might want you know what we discovered with this um was that that really depends on the strain name some strains are relatively consistent but none of them are perfectly consistent and some of them are not consistent at all if you walk into two different dispensaries expecting to get the same effect from two different tangie products you might be out of luck so if this is the structure of the industry today and this is the the design implementation for the original instantiation of leafly what we had was this indicative hybrid system red is sativa they make energized purple is indica they make you sleepy and there's a lot of green hybrids in the middle
If we actually change those tiles to a chemistry first viewpoint and we give you colors based on the chemistry we start to see right that there's not a good relationship between that system and the actual chemistry this just looks like a mess so if you want to make sense of this and use the chemistry first perspective you've got to use that data to reorganize everything.
When you do that what you see what i was showing you in that data before is what you're seeing in this cartoon version here that there actually is a kind of logic here there is kind of a periodic table of strains if you will that one could make based on the cannabinoid and the terpene content of these strains assuming that they're relatively consistent right so we've got a number of problems in the industry how do we get things to become more consistent how do we get things to be regulated in the appropriate way and how do we understand how the chemical logic the very rich very interesting chemical logic of this landscape actually influences things like the subjective effects people are after from both a recreational and a medicinal perspective.
So there's a lot more i could say on that this is the second last slide so what i'll just mention is you know at leafly you know we're this sort of three-sided marketplace in cannabis information resource we've got all of this data that's very interesting and what i'll call the leafly data ecosystem is a unique set of data that we get from different sources we get a lot of what i'll call demand side data from consumers what are they looking at what are they interested in what are they ordering online through leafly what are their purchase patterns that they're displaying behaviorally we have a lot of supply side data what is the market actually composed of we work with thousands and thousands of dispensaries and brands and we can see what's actually available on their menus on leafly so we know what products are out there what strain names are associated with them et cetera et cetera we know the demand side how are people behaving how are they actually buying these products and we actually know a lot about the effects in the cannabis chemistry we get people telling us what the subjective effects are by writing crowdsourced reviews which we collect on leafly just like you see reviews on yelp or amazon people can come in and review strains and products and leafly and all of that too can get rooted in the chemistry and we can understand the relationship between all of these things and use that both to power the functional side of leafly by putting a lot of this data into things like recommendation algorithms which can be very unique we can do that by offering data insights to clients so retailers or brands that want to sort of understand what the market looks like and what all this behavior looks like and you can even imagine some of the other applications that are out there that come for being able to correlate things like the chemistry with the subjective side of this and that might be of interest to people like biopharmaceutical companies who are interested in actually understanding the chemistry at a very very um detailed level and using that for real drug development so um with that i will say thank you again my name is nick jikomes i'm the director of science and innovation at leafly i also run a long form science podcast that comes out every week that's called minded matter i often talk to people in the cannabis space i often talk to people in the psychedelic science pace i talk to people about all sorts of things to do with drugs biotechnology and related areas of science you can find me at my website which is listed there and i also have a sub stack i do a free weekly newsletter which you can subscribe to and it will keep you updated on the podcast about the research world in so far as i'm plugged into it so a lot of the stuff that i just talked to you about as well anything that anything that might be of interest that's in that that general realm so um i've got i've got a fair amount of time left for q a so so we can just jump right in.
To hear my answers to audience Q&A, watch the video version: