Patrick Short 00:02 Hi everyone, welcome to the genetics Podcast. I'm excited to be here today with Dr. Veera Rajagopal who is known on Twitter for his prolific tweet threads. He is at Dr. Vera spell out dr. D O C T O R. And he's a scientist at Regeneron focus on drug discovery and neuroscience and psychiatry really focused on the on the genetic side of precision medicine, drug discovery. So first of all, Veera thanks so much, we're gonna have a special episode, wrapping up 2022. And looking on to 2023. So thank you for taking the time today. Dr Veera Rajagopal 00:35 Thank you very much Patrick for having on the podcast, I'm excited for today's conversation. Patrick Short 00:40 Great. So we are going to just give a quick intro to Veera and a little bit about his work, then we're going to spend some time recapping 2022. Again, if you don't follow him on Twitter, and his substack, which we'll come back to a little while later on, which is an email newsletter platform, he does amazing tweet breakdowns of some of the latest science and genetics. And he's put together three of the areas where there have been some major interesting piece of science and themes emerging in 2022. So we'll go through that first. And then we've both put together to come maybe call them predictions or areas to focus on for 2023 that will cover off in the end of the episode. So to start off, I'd actually just love to hear your uh, how did you get into the amazing tweet threads that you do summarising some of these incredibly complex papers that take normal people six hours to 20 hours to read and digest and you break them down into these beautiful 10 minute threads? How did that come about? Where did it start? Unknown Speaker 01:37 So I've been, I guess I've been like on Twitter since 2015. But I was like not using it that much. I think around 2019. That's when I actually finished my PhD I started spending quite a lot of time on Twitter. And, you know, it's just it just happened. Just like that, that I I just noticed that, you know, like things sometimes that they say, or people are resonating in people and from their feedback. So I think this is naturally reinforced my, you know, like Twitter activity, and I started doing it regularly on a regular basis. Yeah, so that's how it started. And then I kind of completely shifted to writing only about mostly about the papers that I need and and under summary. So one thing that I often noticed in Twitter is that people tweet about all the papers, they just retweeted or you know, just put the title, but you know, like, when you actually take some time to read it, read it and or give them like two points or something. And it kind of people like it more, because people also find it more useful, and particularly students. So yeah, so that I really enjoyed reading about this, it's mostly an activity for my own self learning. So when you start, you know, like, when you look at an article or read an abstract, you know, like, might look very simple, it's clear, it's understand, I can understand it. But when you try to summarise it or you know, just one sentence or two sentence on your own, you're right about it, just you kind of you start to think about loads of questions that you didn't think when he was waiting. So I really found it, like very useful to understand. And naturally, it became more useful for the for the followers for the readers as well. Trying to do as much as possible like before, but be not doing that meant that much. And that was an academy, but trying to keep keep up so hopefully be able to do it in the future as well. Yeah, definitely. Patrick Short 03:41 And you've launched a sub stack as well, which is, as I mentioned before, an email newsletter, maybe a little bit longer form, how did you decide to do that? And and is it going to be the same things you're posting on Twitter, but with a little bit more colour and less structure that the tweet threads and forced you to have certain characters or what's your, what's your plan for that? Unknown Speaker 04:02 I mean, I always was interested in writing blogs and everything. But you know, I have this typical trait of novelty seeking, and you know, not that much of delayed gratification, so, and they lacked discipline. So the reason why I was like, very good at being like successful in the Twitter is that it takes only a very short amount of time most often, you know, and I can able to keep my focus and everything and get it out. Right. But whenever I want to write long posts, I have to like really follow a discipline, practice and take more time. And I tried so many attempts in the past to start blog post I think I even lost count of how many blog websites that started but I kept failing at it, you know, at some point when I easily lose the interest. But recently a lot of people kept saying a lot of my Friends kept saying telling me that you should start a substack. You know, like, definitely people will love it. And you should, you should trust Updike. And the reason changes in Twitter Paco word and everything people started migrating, I was like, bit worried, oh, you know, like all these years of effort building up this network I might lose. So I thought it's high time that I started. And so far I'm enjoying it and it's kind of helping me to follow the practice of self discipline. So because it takes more time and people are more forgiving in Twitter, if you do typos and no medical mistakes were not so in a blog post, so that you need, you need to be more, you know, working on it, I really enjoy it. But in terms of what I have what I'm planning to post there, I'm still like, not very sure about it. But you know, like, I think that it's the thing that I feel about this attention, you know, like full focus only for short time. And so it's the same thing for others as well. So people can, the messages will reach out to people if you say one or two things at a time, or just in a few in a thread. But if you want to ask the people to go to a blog post and read something for like that can take 15 to 20 minutes, not everyone can do that. So I'm just trying to plan in a way that I continue to do what I do in the Twitter, and that's the most successful way of communicating science. But I also I will also spend time to do more structured posts that I put multiple papers that fall under a theme and you know, tell in a way in a typical storytelling way, just trying to find out what how things work out. So so so for three posts I've done so based on like a list of papers, one for the Thanksgiving and and also I did one, based on the Chad CPT. I just gave an IQ test and summarised it so people liked it. So yeah, so for the next post is like I'm going to write a big summary of 2022 highlights. Hopefully we'll get it out and before the New Year, very close to a new year. Patrick Short 07:06 Yeah, excellent. I think you have at least six major areas. We're only going to cover three of those. So if people want to get the other three, they can go over to your substack and subscribe. It's called CI was stories, right? TWA Yeah, yeah. So Unknown Speaker 07:19 yeah. So G was it just came from my, you know, like the name GWAS storyteller. I think I, you know, Eric, for man right from Pfizer. So he, he describes himself as GWAS whisperer. So as for fun, i Someday i think i just updated my profile saying GWAS storyteller, and I didn't really plan to keep it but it people liked it. And you know, like, I thought, then, okay, let's keep it like that. And then I gave the same similar name to a similar team for the substack. Patrick Short 07:56 We're not going to cover chat GPT it wasn't on either of our 2023 genetics lists, but I can't help but ask because I'm fascinated by what what do you think about it? You've used it and written about it? What's your sense of it? Unknown Speaker 08:09 I mean, like, I'm not very machine learning person. So I have a lot of friends. So they're so so far, the response is that they are not like as much, you know, impressed by like, how I'm impressed. But I think that it's really amazing that an algorithm can converse with you like in like a human and you are able to get so much information out of it. And you know, people already think there's like hundreds of different ways you can use it. I think it's really going to be very useful for in academic setting to write you know, like help with writing grant application, even papers and summarising. I was even worried that you know, like, Chatzi PT will take over my job in the middle of summer. I think. Patrick Short 08:58 Now, you've given somebody that idea, I don't think it could quite. I think it's amazing. The and it's only going to get better. I do think it's mainly good at synthesising. I haven't seen I mean, besides some like, funny examples, like you asked it to write a poem in the style of Jar Jar Binks or something like that. It is always going to struggle with invention of the new frontier. Right? So I think there's, but I think it's amazing. I'm with you. I got a card carrying machine learning person, but I'm pretty impressed with what I could do. Unknown Speaker 09:28 I just even ordered a book, you know, there was this one of the adorable uses to write children's stories. And like, you can also use Chad GPT to create the appropriate prompt to create the illustration and Dottie right so there was this guy who used Chad's EPD to tell a small children's story and then use us to charge up to kill the prom for the dolly and he published a book in Amazon. I think it's been successful. Probably is making a lot of money out of it. I mean by The moment I saw that tweet, I ordered it because it kind of marks this occasion of this, how this chat GPT you know, this release of this chat up there how the world has been using it. So I have this book. So probably like two years down from now, when I look at it book, I can remember this this time. So yeah, it's really great. I mean, from a lay person's perspective, I say it's like mind blowing what it can do? Patrick Short 10:25 I am, we'll see if how frequently my wife listens to these episodes. We have a six month old daughter and I am going to make her a Christmas story book for the holidays. I'm going to do that tomorrow. And we'll see if she listens to this before. Before I get Unknown Speaker 10:41 Chad CPT and yeah, that's what I'm Patrick Short 10:43 gonna do. Well, I what I think I'll do is I'll write the story, and then I'll feed the prompts to dolly and see what happens. Just one more question about you before we dive into the 2020 2023. In your in your day job when you're not telling us stories, you're drug discovery scientists at Regeneron. What what do you focus on for those who aren't as embedded in the relationship between genetics and drug discovery? What is what is that all about? And why is genetic? So important? And why are you know why? Why are you so interested in focused on this? Unknown Speaker 11:14 Yeah, sure. So I joined Regeneron genetic centre, like in May 2021. So before that, I was in Denmark, I did a PhD in psychiatric genetics, also postdoctoral fellowship. So I studied the genetics of psychiatric disorders. But as I started to read widely, so I have a medical background, so I was trained as a physician. Yeah. And so I had a very hard broad interest in the genetics of you know, all the diseases and human diseases and started reading more about it, then I really, you know, started getting interested more in how genetics is being used in the drug discovery research. So one of my thing, you know, like, when I, when I moved my research career from being a clinician to a full time scientist, it's always that, you know, you have this field that you are maybe not, you know, you have spent so much time like 10, more than 10 years and into clinical practice, and then now you're not using that knowledge. So then at that, so I found this field of drug discovery research is an excellent opportunity where I can combine my practice my expertise in genetics, but also my expertise in medicine to make some, you know, like discoveries are made to do research, they just have more translational potential. So luckily, and the time when I realised it is actually the high time when the whole field shot started shifting towards this area, and all the lot of former companies started investing in, in the human genetics. So that's the background. So I and I got this opportunity at Regeneron. I moved there. So what we are doing here, you know, Regeneron, are you so I'm part of this translation genetics group, which focuses on neurological diseases and also psychiatric and ophthalmological diseases. So what we basically do is do we large scale genetic association studies we look at, we focus a lot more on the rare variants based on exome sequencing data. And one of the major goals for us is to identify naturally occurring mutations that kind of protects the disease, and to understand what is the underlying mechanisms that leads to this protective effect. And we can try to mimic that using track design. And a very good example is CPCs. Canine. And we have so many other examples now. And so this is the this is the most important goal. But also we do not just focus on that. But also other findings also risk increasing findings as well, because sometimes it can just open up the pile of biological understanding of the diseases. And yeah, so that is my day to day job to run large scale genetic studies, to understand diseases and look for opportunities to help with the drug discovery in the company. Patrick Short 14:11 And one of the themes we're going to touch on later is the sheer growth and volume of genetic studies, but also now the transition from exomes to genomes and what that opens up and for those who aren't aware Regeneron genetic Centre has been one of the probably the number one company that has taken up large scale sequencing and embedded into their drug discovery process. AstraZeneca GSK, a number of others have made very significant investments in genetics, but when whenever you go to American side of human genetics, the major conference every year, it's like 10% of all of the talks are somebody from Regeneron talking about the amazing genetics they're doing. Unknown Speaker 14:46 It's only recently right so I just hear from one of my friends that this year he is he is a senior scientist from former side so he he was like impressed by the amount of representation of industry scientists at this series, he said just a few years back, perhaps 10, or even less, the people from industry, you know, presenting any genetics work people, industry scientists are, like treated like an outcast. But the tables have turned. So such a big transformation. Patrick Short 15:20 Yeah, it's, it's, it's a big shift, maybe one will, we'll come back to later. Okay, so, diving first into 2022, we talked about a couple of major milestones. So we're gonna dive into three different areas, I'm gonna let you introduce the first one, which you've titled as some mind blowing, genetics. And we'll talk through a couple examples within this theme. So I'll turn it over to you to explain what you mean by this and talk through a couple of the examples. Unknown Speaker 15:45 So yeah, so So was like, past few days, I was just, you know, looking at all my old Twitter threads. And I do this, like, I've been doing this for like, two years now that at the end of the year, I just look at all the Twitter threads and try to see what are the biggest stories and you know, like, put them under some themes. Last year, it was I posted like few threads, it was, it was very successful, actually. So when you asked about this podcast thing, so then I started looking into it, it's also very difficult to choose as there's so many findings. So it's just probably from my personal perspective, these are the major ones, but I'm sure there are so many other things that I'm probably not aware of, I haven't read, or I missed it, I chose like, three major themes, like three studies, perhaps in each of the themes to highlight, the first three themes are ones like called Mind Blowing genetics, findings that are really like you feel like, you know, jaw dropping findings, and then milestone achievements, and, you know, kind of marks the stage where we have reached something in order to achieve some success in the in the long progress of the human genetics research. And then third theme is a step in the right direction, focusing on you know, what all the research that you know, should be happening in the in the way that we think is like in the right direction by focusing on non European ancestry based research and everything. So the first one is mind blowing genetics. So I think like, I chose three, there are so many. And the first one is I think it's clear, anyone will agree with this, that's the biggest story of the serious, this paper that came out a few weeks ago in nature that showed how black that, you know, the plague has caused natural selection at a speed never seen before. So, so natural selection is like the survival of the fittest of the genes, that it's only the genes that, you know, give some kind of survival benefits passed on to the generation after generation, the weigh ins, right. So that's the underlying theme of natural selection, we learned about this in lots of context, like protection against, you know, like, in malaria, or sickle cell, you know, variant and also a lot of different natural selection examples, we know. But those are all, you know, process that's been happening for so many, you know, hundreds of 1000s of years. But we never came across the example where natural selection happens so drastically. So here, they were able to do it, they is based on ancient DNA research. So the identified ancient DNA samples, spanning three periods, one is before the Black Death. So the black, the bubonic plague epidemic, particularly in the US hero, that is in the middle of the 13th century. So that's where that's the timeline. So they have collected samples that span before that, and also after, and during that, from two places, one in London, also in Denmark, and they were able to extract the DNA and identify genetic variations and look at how the genetic variations the allele frequency have shifted. So they have like 200 samples, I think, and not so much, but sufficient to, you know, do some focus on a specific set of genes, they looked obviously looked at immune related genes, and they identified like, few low side that had like, drastically changed in allele frequency, and the top one is in chromosome five, where, you know, there's like two genes era one and two that is involved in the antigen presentation process, you know, so, like, the allele on the Iraq two, it's kind of it's splicing allele and did 30 percentage frequency has shifted just after the plague to 70 percentage. So it's like, you know, the Black Death eradicated 50 percentage of the European population, right? So such a drastic change in allele frequency. It's, it's jaw dropping, I mean, never seen before. I think that's the biggest story and the other thing is, like always this selection come with the trade off. So the protection that the same allele that protected the individuals from the black that is also putting those individuals at increased risk for autoimmune diseases that we know from today's jiwa studies and things like rheumatoid arthritis or you know, different kinds of ulcerative colitis, a lot of autoimmune diseases. So, yeah, I think it's one of the biggest stories of people really went crazy. Patrick Short 20:24 Like Peter was crazy when it blew my mind reading it, it really did. Like you said it was essentially carriers of the gene were significantly more likely to survive the Black Plague. Right. And and like you said, that has knock on effects hundreds of years later in some of the diseases like Crohn's, ulcerative colitis that are at much yeah, that one, I think it's a great choice that one blew my mind reading it that the technology has gotten so good as well, that we can do ancient DNA samples like that at a reasonable cost from 2000 years Unknown Speaker 20:55 ago. It also fit with the Nobel Prize theme, right? So just found the ancient DNA. Patrick Short 21:03 And now the second one on your list, I'm glad you chose it. I know, I know, a couple of the authors on this and I, I'm obviously biassed, but I think I think it's really great. Unknown Speaker 21:12 So yeah, so that's like two fascinating papers came from matters group, your past supervisor, right? So your PhD supervisor, so So I was like, bit in the time to choose which one but I think this kind of this the main more, I find it like more interesting, particularly for going into molecular mechanics. And here, the authors actually looked into the DNA of de novo mutations in the, you know, the Deci, for cohort and also in the genomics England cohort. And they kind of looked into this, like, what are the factors that affect the germline mutation rate, so the number of de novo mutations in an individual, so this comes from the mutation in the germline in the sperm and the own from the parents, on average, you know, we know like 70 to 90 mutations, new mutations arise every generation. So they looked at the distribution. And you know, there are some few outliers. Like I think they target 20,000 trials or 12 individuals, they have the hidden like more than expected, you know, mutations, hyper mutated genome, some are like really crazy, like 400, I think the highest number they observed is like more than 400 mutations. And we know that the most important factor that affects this dino mutation rate is parents age, and particularly for this age. And this explains, you know, about 90 percentage of the variation, and but it it seems that it probably is a little lesser than that based on this study. But here they are looking at other causes of this variation, particularly on the outliers. And when you look at this outliers, and just study why they have so many, you know, mutations in the genome, they found that, at least, you know, for a couple of these individuals are born to parents who had a genetic lab, you know, mutation in DNA repair gene, you know, the very the gene that, that govern that very process to prevent, you know, this kind of mutations to passing on to the next generation, right. So a lot of these new mutations arise because of replication error. And this is one of the individual father has zeroed in on pigmentosa, which is a very well known autosomal recessive condition because of this DNA repair gene. And they also found that some of the individuals parents had chemotherapy just before the conception. And so these are like outlier, the rat causes of this de novo mutations. And it's, it's really, if you think about it, it's really amazing. And it's such a small sample, they were able to precisely identify the cause of the such increased de novo mutations in that and paint a picture of the causes of this de novo mutation. And I think like this, this, this line of research is going to pick up there's like one preprint, that came up like two few days ago, they looked into the polygenic contribution of this dino mutation rate and try to look into the biology by using this way, research. They're looking into the biology of the dino mutation rate, I think like this is like this sample size will increase given in a lot of bio banks and sequencing efforts are happening. And I'm at least particularly excited about this line of research how this is going to progress in the upcoming years. Patrick Short 24:26 Yeah, I think it's also a great exemplar of how useful links genetic and clinical data is, because a lot of this would have been really difficult to tease out without knowing the parents, one of the parents had chemotherapy or the clinical xeroderma pigmentosum that you mentioned. I think it's also an interesting, just fundamental view of biology that if the typical genome has only you know, 50 or 75 mutations, and then you have someone who has 400 Why, why is that and what that has huge impact. axon disease risk our children and so on. Unknown Speaker 25:03 Yeah. So why I was like even, you know, like it attracted my interest is this kind of same thing that we do day to day, we are trying to identify large effect, rare variants affecting trade, basically, we are looking at the outlier individuals and what is pushing them to the, you know, like to the end of the frequency spectrum distribution, right. And so it's very, it's very simple, same as here, but here, the sample size is too low, if you're actually doing an X was or anything, you will not be able to find it. But just zooming into it and looking at an individual. So I said different way, you know, like, this is very important because I work in a psychiatric field neurodevelopmental fail, often we don't have enough power to identify, you know, like statistical power to identify association. So you need to have more, more than, you know, like, looking at the P values are anything you need more intuition, understanding, to identify such outliers and, you know, guess the causes and things like that. So, in a lot of ways, it's kind of reflected on what I did, and just, you know, mutation considering it as, as a phenotype itself was like, I find it very interesting. So Patrick Short 26:13 completely agree. The third one on your list is very hot off the presses only a couple days old. Right. Tell tell us about? Unknown Speaker 26:22 The thread is running, getting tweeted retweeted, so I just posted a thread like yesterday night. So I was actually trying to prepare for this podcast. Sometimes. Findings like this are so so interesting that you couldn't resist just ignoring everything. It's also like my trait, I guess. I just put everything aside and sat for an hour and this during the Twitter yonder had so many other things to do, I think it's definitely fantastic study. So So what's happening in this in this study, they have identified the first human case of monogenic form of obesity that is caused due to a mutation in a gene called ASAP. So it's about a signalling protein. And it is involved in the smell of cotton in our signalling pathway, that is the nature one response, you know, genes in that pathways are mainly causing this monogenic forms of obesity. So the most fascinating thing about this study is like the very first knowledge of this, you know, protein comes like 30 years ago, based on a lab animal model, it's called a Gauti. Mice, it's, you know, it's mainly this gene is involved in determines the colouring of the skin and the firm. And so this mice had this mutation that kind of made this gene to be expressed throughout the body. So it's supposed to be expressed only in some specific tissues, mainly in the skin, where its role. And this mutation causes this gene to express throughout the body and also in the hypothalamus. And it starts to, you know, disturb the appetite centre and the mice gets fat hyperphagia gets hyperphagia and become fat. And so it's I guess, it's a very popular mice, it's been in the used for a lot of trade research so far, but people have been like searching for any kind of human evidence, genetic evidence for this gene for probably 30 years when nothing has been found. And this is the first time they are encountering the human case. And like, the features are almost a similar, you know, near fino copy of the mice. They this was a child girl child who have severe obesity and red hair and you know, all the other complications of this obesity, diabetes and co2, hepatitis, everything and so they this is part of the routine clinical work, they tried to look into the genes that are known to cause monogenic forms of density, but they did not find a mutation. And this girl underwent a gastric gastrectomy surgery for the, you know, extreme obesity. And so they collected adipose tissue sample, and then they looked into the gene expression transcriptomic profile, comparing with the control tissue, there it is. So you don't you know, you don't even need any kind of statistics to look at it as one gene that's extremely highly expressed, like several 100 polls, and what is that gene does a very gene that we know, you know, causes obesity in this adult in mice. So they went back to the whole genome sequencing data and looked at the reads and they can see there is a tandem duplication that switch the promoter of this ASAP gene with the promoter of a nearby gene. And that gene is an ubiquitous gene it is expressed throughout the body. Now this this gene ASAP is under the promoter of this gene. So now it is getting transcribed in every cell of the girl's body right throughout the body and including the hypothalamic neurons. And so this ASAP is a notch in Antigonus for tourists, melanocortin receptors MC one RT and MC four are in the skin. It binds to them so you're not To create the secret to melanin to cause this red skin and also red hair, and also in the brain it it antagonises The MC Farrar and it results in very increased appetite. So the patient over eats and then they become obese. And so it's amazing they look, they went back to childhood obesity cohort and try to see if there are any other patients with the same mutation. And they found for patients with the mutation, the odds of looking at the same mutation and for patients and three girls, one boy, the three girls had the very similar phenotype, red hair hyperphagia, extreme obesity, it's actually it's mind blowing, right? So this these are the findings I really like in the human genetics and there is like a happy ending to it. So there is a known FDA FDA approved drug MC for agonist that can use it to and you know, neutralise this antagonistic effect of this overexpress gene. And so it's a great paper, and I really loved it. So I guess people also loving it. So far, from what I've seen, Patrick Short 31:00 seeing, it'll get triple digit retweets, for sure. And just for everybody listening out, we'll add the links to all these papers into the notes in the show. So if you want to go and check them out, and we'll link to Dr. Barrows, Twitter, so you can read the threads as well, your second major theme, milestone achievements, maybe you could talk through the first Yeah, enormous G was study. This is Unknown Speaker 31:21 the most important the milestone achievements, I choose three thing, and the first one is, it's one of the biggest finding, when you narrow down the field to GWAS, you know, if you ask me, What is the biggest finding in the GWAS field, then I will definitely choose this. So there's this was G walls of height in a sample size of 5.4 million. And what is special about this GWAS, it's a saturated GWAS, meaning Assad this 5.4 sample, we have found everything that you can find all the genes and the variants that you can find, you know, for that associated with height. So it's, we have reached a saturation. So in the past, we always believe that, you know, like, it is impossible to reach the state, because it's going to be like infinite, the sample size required to completely, you know, get all the genome wide significant variants, the, you know, the significant variants to explain the full height variation, right, so that you that is explained by the whole genome. So, that's a remarkable moment, right. So high disappered type variants. So, so the major finding here is at the sample size, they have this, like the SNP heritability with a irritability is the variation of the phenotype explained by the SNPs, that the common variants, and it is around 50 percentage. So you can estimate it, even if you don't have any significant, you know, genetic variants using the statistical method. And the twin heritability of height is 70 to 80 percentage, and if this 50 percentage, SNP heritability, but the so far, the genome wide significant variance explained in like only a small portion of this full SNP heritability, and this is one of the gaps that we are trying to fill in as we increase the sample size. And so in this GWAS, that gap is filled. So now we are reached completely to that. And so the other insights from this, you know, studies, there are so, so many important insights, like, when you have such a large sample size, you can go back and see, what are the sample sizes, you need require to, you know, to get different kinds of insights from the data, for example, if I want to identify genes or pathways, right, do I need 5.4 million samples? No, you don't need so, so the given estimate of different sample sizes at which you reach saturation for different kinds of results. So for example, you reach saturation for Pathways, like all the height related pathways, or identify pathways in 250,000 samples, so you're not going to find any new pathways beyond that sample size. If you are interested in identifying all the height related genes, particularly that cause, you know, many alien forms or forms of height problems, then, you know, a to 1.2 million size 2 million you reach the saturation beyond that you're not going to find and if it's the same, but when it comes to the variants, individual variants, you reach a 5.4 million samples. And another important thing is this variants like the identity for 14,000 variants in total, they are not distributed throughout the genome, as not of people expected of, you know, under this omni channel hypothesis, but they're their distributor, one lean 21 percentage of genome. So when we talk about polygenic trade, we think that genes throughout the genome everything at some point in the future, you will be like finding average interest every variant in the genome to show a significant P value. And that is one of the common criticism. You know, and this is going to at some point, you will end up finding all the gene but that's not the case. So here they clearly show that It's only a part of the genome. And you know, like, you start to find more and more variants only in the gene that is like more related to the height. So they have this kind of cool plot where they show different colour variants associated with height at the same locus. And there is one specific locus called near a gene a can, which is one of the most important locus where 25 independent local law sites identified, I think this is one of this is the moment like, people were trying to, you know, accomplish since, like past probably 1015 years, and it's a huge success and motivation for the field. And so yeah, definitely, it's one of the important achievement milestone in 2022. Patrick Short 35:42 I feel like, I need to go back and reread it. Because to be honest, when I first saw this, I thought, Oh, it's another very large hype. He was. Unknown Speaker 35:52 Exactly, yeah. So a lot of people thought that it's not like there are so many million samples, you want this come this year, but nothing like this. This is very important. This is this marks a milestone in GWAS journey. Patrick Short 36:05 So I guess it the this is almost the end to a quest in some ways, because one of the big criticisms of genome wide association studies in general is like, where does it end? We're just gonna sequence more and more. Yeah, yeah, this Yeah, this is sort of showing at a at a certain point, you found almost everything that has to be found, and you can still start to dive into other biological questions. And so I thought Unknown Speaker 36:27 to mention one unimportant thing, I think it's very important to mention that so here still, there is one more thing to be you know, accomplish this for, like, even though you have companies the complete saturation still the it is not, it is only in the European sample, right. So it just doesn't apply to the non european sample, particularly for you know, the using the variants to predict the trait, the polygenic score. So polygenic score explains one leading or a part of this variance in the non european sample. And this is completely This is because you don't have that many training samples from the other population. So definitely, we need to so this kind of more emphasises the importance of studying on European population. That is one of the things so when I say we have found everything we have found, so it's one apply to European sample, I have want to emphasise that here. So Patrick Short 37:18 yeah, I think it's very important emphasise. And also, I think the point you made stands, which is that this is about showing a general framework for knowing when you've reached saturation in any population or disease, right. And then you can apply that thinking or approach exactly anywhere. Yeah. So net next on your list is about the UK Biobank, which Yeah, so evergreen source of, of new and exciting threads for you. Unknown Speaker 37:44 The other biggest story is the whole genome sequencing. So it's kind of overlaps with last year, and this year, I think, the preprint was like last year, and the paper published this year, I guess so. So here, you know, so the biggest story of last year was the exome sequencing. So when paper published from Regeneron, also from other companies like AstraZeneca, that first time we were able to look at the rabbit in association at scale, like 450,000 individuals. And so this year, we are looking, we are going beyond the exomes. And we are looking at the full genomes. And there was always this question of, you know, or people always focus on exomes. And that's the major focus for all the companies. So what is the benefit of looking beyond the exome. So what what we are going to learn new that we haven't learned through exams, I think there was this paper from the decode flagship paper, the main paper, like reporting about this data, thing, it's a great paper, they show a lot of interesting findings, that probably you will be able to find only using whole genomes. So so the very first thing is, you know, when we say exomes, it seems it's not truly like when we do exome sequencing, we are not really capturing all the you know, region of the coding region of the genome, there are like a lot of things that we are missing one of the things like the regions of the coding regions that are transcribed, but not translated, the untranslated regions, three prime, they have very important post transcriptional regulatory roles. And so they identify a lot of associations in that. So then other thing is you can get in, you can find structure here, so, it's very difficult to identify them using exome sequencing. So that is one of the excitation like for the structure variants. I think they have like maybe a lot of beautiful examples for that. And so many other insights and also imputation panels, so you have a very high resolution of the variance and so they have built imputation using which you can accurately impute variants, common variants from the genotyping array data. And here particularly, they are the decode scientists are really the, you know, the experts in this area. So they are the one who first actually successfully employed this imputation approach. So to Identify variants all the way to down to very rare allele frequency in Icelandic population, because they have this unique genetic structure. So they were able to do that. So they have only whole genome sequence, part of the population use that they kind of apply to that. So they are an expert in this. And they have really showed off their skills in the UK Biobank. And yeah, so many other, you know, great findings. But the real question is, we will come back to this when we talk about what we could expect in the 2023. Like, how this data is going to be used? And what are the challenges? We'll come back to it later? Yeah, and Patrick Short 40:36 I think, I mean, I think it's probably fair to say most of the biobanks Well, I don't know, I'll ask this to you. Because I think people have different opinions on this. I hear many people say, actually, I'm gonna keep doing exomes for a very long time, because they cost Yeah, you know, they don't cost very much at scale. Still, compared to genomes, but you don't know what you don't know. Do you have an opinion on whether everyone's gonna go the direction of UK Biobank? And it's gonna be genomes all the way or? Unknown Speaker 41:04 Yeah, I think that's still the jury's out. So I remember the first time the discussion happened in 2000. Like, last previous year, as a CI, they were first time announcing about the whole genomes. And you know, there was this one presentation from from a scientist I forgot isn't from industry scientists who gave an initial, you know, look into the what to expect. And they did a genetic associations with all the trades possible in the UK Biobank, and they found so many associations. Then after that, then they tried to look into what are the associations that are independent of you know, known GWAS loci, they did this conditional analysis, to remove all the variants that we already know, through the G was almost everything is gone. So there's very few is left. So that's one of the thing right, so we have been doing GWAS for a long time, and what is whole genome going to bring beyond what we have learned using genotyping array and imputation? I think it's still not clear. I think one of the challenges there would be to what are the other regulatory annotations that we'll have to interpret them? So yeah, discuss a little more on this when we look at the 2023. Patrick Short 42:12 Yeah, definitely. The next one on your list is another UK Biobank, which I am definitely with you on this proteomics at scale and olink. In particular, maybe you could talk a little bit more about that. Unknown Speaker 42:22 Yeah, so I think the other because, you know, it's always I think most of the biggest stories always tell UK biobank that they are the pioneers and all of this, though, they just been a huge success with in UK Biobank Industry Collaboration in terms of whole exome sequencing, and also whole genome sequencing. Right. So and the next biggest data at least, you know, from the genomic side, I think it's this proteomics, that is based on a collaboration between UK Biobank and so many former partners, and they have generated the data of proteomics protein expression in the blood 4500 proteins and the first phase 40,000 participants. So it is like a very big data set. So we usually, when we interpret, you know, genome wide association loci, one of the very common thing we do is go back to and look at whether if this variant affect the gene expression, you know, allele we look at the GTEx database or the database to see the, you know, EQ tears. But some often people say, it's more important to also look into the protein, because prompts are probably things that you look at the M happening at mRNA, but probably will not be reflected protein. But we never had a large database and people started building it, the largest one was from decode, I think 35,000 That was for like published a year ago. It's a fantastic paper, and now we are having like 50,000 from the UK Biobank, if you combine the code and UK Biobank is like 80,000. That's, you know, extremely big data set and so many possibilities and research can stem out of this data. So we have this preprint problem we published in the next year. So they have did some preliminary analysis to see what are the things you can do with this kind of data set. So the very first thing is, you know, so protein, the presence of molecule a trait that is very close to the DNA. So now you really have like, extremely many false more power to identify this genetic associations. And so this can help in refining your Mendelian randomization analysis to even for drug targets or epidemiological questions. And you can use this variant associations with the protein to identify the pathogenicity of the variants. So, you might be familiar with this massively parallel reporter assay right. So that they use CRISPR to edit every possible basis in gene and make all this possible variants and look at the some, you know, gene expression or the cell culture multiplication, you know, like survival assay, etc. So, this is, in a way, kind of an MPRA. Except that, you know, we have already the Felisa separator right. So list of all the lot of aliens. And here the readouts is the protein. So it's very similar to MPRA. But it's good for like we are having 1000. Right? So this is going to really improve fine all our GWAS findings. And particularly, I think decode has like a very special attachment to PQ, TLS, and proteomics. So, if you look at Terry Stop, I've been listening for like three or four, four or five carries dog and every time he goes somewhere somehow goes into this proteomics data, multi omics data. And recently also car gave a talk fantastic talk and the ACG, the drift symposium that region don't organise. And he talked about one of the special use of this proteomics is to risk prediction, proteomics risk score. And so he said this kind of captures a different dimension than what genetic risk score. So we know polygenic risk score to identify disease risk or disease progression and proteomics score kind of is not same as genetic to score, it captures a different dimension, and probably these two will complement Well, I think a lot of people are working on it. So there is like endless possibilities. And we will be like seeing so many papers in next year using this data at least first from the industry scientist, I Patrick Short 46:17 think, yeah, it's I think, to me, what's interesting about it, as well as how just the sheer complexity of the number of time points and disease states and things like that you could test genetics is special in that you only need to do at once. But proteomics is special in that it tells you what's happening at a moment in time in a particular cell. And so there are two very different ends of the spectrum, you only need to genome sequence people once but you could do proteomics every second of the day, if you wanted to. And it made it probably wouldn't tell you anything at that resolution. But you could do it every day and in every different tissue. And you'd probably learn a lot that way. So how we're going to wrestle with this question of prioritising how we spend our scarce resources to build these kinds of resources is interesting one, Unknown Speaker 47:00 one downside is that, you know, like, we are looking at the blood proteins. And so, you know, like, it's useful for a lot of secreted proteins and you know, like diseases relate metabolic diseases. But when it comes to, you know, disease domains that I'm interested in, like neurological diseases, psychiatric diseases, sister, you know, this, we have looked into that, you know, it's not like clear how much this is going to be useful for that. But yeah, I mean, apart from that, but I think it has so many potential for like, other disease domains. Patrick Short 47:31 Yeah, definitely. I mean, this is maybe a good segue into the third bucket, because a lot of what we've talked to is centred around, primarily European ancestry populations. So maybe you could talk Yes, advances here. Unknown Speaker 47:45 Yeah, I think this is the event, if it is the last time it is the most important thing, I think we should always look back into this specific thing. So I chose three points to highlight here. So the first one is the dataset, the we now we have the first large scale, rare variant exome data set for a non European population, Latin American population. And this is an effort from scientist in Mexico, and also scientist in Oxford, and also Regeneron, a collaboration with Regeneron. So the sequencing happened in Regeneron. But you know, so this is like, 150,000 individuals from Mexico. So I think this is one of the important milestone in human genetics, where finally we are, you know, we are having such large datasets for non European population as well. And the preprint describing, you know, the initial analysis has been posted this year, and probably it will be published sometime next year. And so this will be like, people probably use this in scientists, Mexico, and as well as other researchers will be like collaborating in Oxford and probably use it for long and we'll be learning more about the genetics of Latin American population. And also this will reduce this, you know, like underrepresentation of Latin American population in the human genetic studies. So some of the major findings here is one one important thing that was like, really amazing, is that how they actually recruited the samples. So this samples, like, has been removed 20 years, since I think, like 2000 around and the scientists, they have been like visiting house to house and collecting the samples. So can you imagine they just went 200 100,000 houses to collect this 150 1000 sample? That is, like, such a huge effort, and we should, you know, appreciate it? I think so. And it's a big effort to invest in this right. So. So one of the major thing about this is, this is a population of admixed population, that is people have like different proportions of ancestry because of this admixture events between Europeans and Africans and Native Americans. And so this is a very special part population and people. So now we are starting to realise about the more use case of this population. So there are so many ways you can look, you know, so they can improve the statistical power for lots of variants. For example, if that is fixed in, you know, like the homoeologous original population, for example, if there is a variant that is present in everyone in Africa, but it's no one in European, right, so, you will not have power either in African population or European population to do study. But if you look into admixed American population, there will be like intermediate frequency. So this gives them more power. But it also like when you have the same individual harbouring different proportions of ancestry, you can actually remove the environmental effects and look at genetic effects. So it's always very challenging to compare genetic effects between the population, right, so if I have a genetic variant, or set of variants affecting a trait, and I cannot simply compare and European population and just compare the genetic effects, because there are so many other factors we have to look into. But if this both ancestry come from the same individual process, then basically you are like, it's like a within family kind of design. Right? So you remove all the founders. So that is one of the exciting progress in statistics to statistical methods to I think, there is a paper from both Dan's lab from UCLA who actually looked into so one of the problem with the PRS is that the PRS effect sizes are different across ancestry. So what are the reasons for this different effect sizes, and you cannot tell that by just comparing individuals from different ancestry, but you can actually, you know, look at the admixed individuals, and calculate PRs only for trans African ancestry proportion of the genome and portion of the genome and European portion of the genome and compare the effect sizes, you can actually capture the what is the reason, so there are different ways you can use it. So I think this dataset will be like a lot of great discoveries and findings will be like coming out and following as from this dataset. That's tremendous. So that's the first one. Okay, so we have like, I have two more one. So polygenic risk score. So this has been getting more and more, you know, attention, and people are starting to use it, and I think already started entering clinic. But still we haven't solved the problem of the poor portability of polygenic score across different answers to while that is being the case, then I think like there are studies that coming up to show to bring out other challenges that we should be aware of when we actually implement the polygenic score prediction in clinical practice. So to have this, I think it's like I found it very important, like one is that there is the study that was published in medicine, I think early this year from the based on the Ugandans genomic resource from the African scientists. So they show that if you construct a polygenic risk score based on European population, and then predict the trait in the in the African population, so it's going to be poor that we know that right. So you can improve that by including African American samples in your training sample. So this will improve the power. But we always put the whole African ancestry in a single bucket, right? So here, they show that if you construct a polygenic risk score, even using an African American training sample, the prediction can differ substantially between different African subcontinent histories. So they looked into the two set of individuals, one from the account and region and other forms of Zulu region. And they use the polygenic risk score for I think, LDL cholesterol. And so the polygenic risk score trained based on African American Training Sample explained like a percentage of variation in the Zulu population, but 0.0 point 02 percentage in Kenyan population, such a big difference, imagine so we, I mean, it's great that you're able to show it and you know, highlight, when we talk about portability, we also have to go and look into the fine scale ancestry of Britain, you know, like, we cannot just use African population as a single bucket and you know, try to use use that as a category to study the portability issues in polygenic risk are in very much in line with that. I think there is an another paper that came out that showed there is also the same issue with admixed ancestry. So coming, you know, coming back to your previous point when we talk about MCPR. So when we talk about portability issues, we always talk about this continental ancestry, European population, African populations are patient population, there is this population of admixture in ancestry, right. So so when you look into the polygenic prediction in the admixture, which was even among them, if you separate them based on the proportion of their ancestry, if you have a polygenic score trained based on a European sample, it performs well wandering individuals who have relatively more European ancestry in their genome than individuals who are I'd like to leave more African ancestry. And this is also an important issue we need to address. How are we going to make sure that the polygenic score will perform well in admixture? Right. So we always focus on other this, like, homogeneous ancestry. And I think that's important. So as much as you know, we have studies that shows the stories, I think it's also important that the studies that actually bring out these challenges for other scientists so that they are aware of it, so they can try to work out solutions for it. And so that's the second one. And the third one, I think it's probably a bit biassed, because I'm from India, and I'm always like, one of my all time concern is that the poor presentation of the Indian population, also South Asian population studies. So any studies that come out of India, it's like, bit more excites me more. And so there was one preprint, that came out that showed the genetics of type two diabetes in the Indian population in comparison to European population. So my dad have type two diabetes my mom have, so my dad is diabetic quality years. And, you know, my mom was nice with the quality and yes, I know, I'm going to be a diabetic very soon. So because it's more on our it's a genetic risk. And so being spent a clinical practice, you know, as a clinician in India, I know the the high prevalence of type two diabetes and has spent so many, so much time during when I was working as duty doctor, treating patients with diabetic foot ulcers and everything. So it has, it's so common, and, you know, like, we have some studies, but, you know, people haven't looked into this differences in the back, you know, like the genetics between the in population and the population, I think this is the first time they're looking at it. So one of the main know, like, in finding when it comes to diabetes in India is that it is has a very early age of onset. So people get adapted starting to get diabetes more more encouraged, like less than 40 years. So you know, and this is not the case in the European population. And what is the underlying index, they show that the heritability of the age of onset of type two diabetes is more in Indian population than in the European population. And so and one other interesting thing is here derived a polygenic score using Indian population or patient population as the training sample, and they show it predicts very poorly in the European population, but it predicts very well in Indian population, you know, we always hear about how European population based polygenic scores performing poorly in other population. So it was it's refreshing to hear it for watch part one reverse. reverse is also true. It's very important. Patrick Short 57:45 Why do you think that the author's get into why? Why the heritability is greater? I don't know as much about the genetic ancestry of India as I probably should, but I would imagine it's also highly admixed population, as well. And you're not talking about one population. Right? Yeah, there's, that's Unknown Speaker 58:05 right. Yeah, we have not Indian population and South Indian population. So you can categorise them more, as you know, go more deep into this ancestry structure. I think like dietary factors, one of the important and there is there's definitely evolutionary aspects to it. And so, and one other problem here is that, you know, they looked into the sample, and you know, even fit, and typically they had a very big change, average age of onset in the sample they studied from India is 40 years, but he was 60 years in the European sample. So, just slightly due to probably, you know, selection ascertainment bias, but probably it is also the real difference, if you randomly sample individuals from South India, from India and randomly sample from Euro people with type two diabetes, and look at the mean difference in the age of onset probably, it will be more, you know, higher in India, because we know this from epidemiological studies, and probably this is because people are having variants that causing, you know, more stronger effects on the, you know, genetic has stronger genetic risk on for type two diabetes, and we are just starting to, you know, look, see, what is that I mean, they, in the study, they identified the same loss, I don't know, if you're aware of TCF seven or choose one of the strongest type two diabetes, this locus identified very early in the GWAS timeline, and is one of the most replicated finding and they identified the same thing. So this, but they also see that there is like variants in the same locus, but seen one Indian population, but not in the European population. So right. So, even with the same gene that has the similar biology in Europe as well as in India, you have this there are variants that probably you will see one in Indian population, that is like having more contributing more to the variation that discussed in India. And this is one of the problem with the polygenic risk score prediction, right. So our knowledge of the risk weighting answers are important to predict a trade come from the samples that we did GWAS, it is in the euro. So we naturally miss some of the variants that are more important in the non European context. And we don't give enough weight to the millions, right? So that does the, again, hitting the same problem here. So that's why it's important that we have to go and identify, look at the genetic architecture in other population as well. So yes, Patrick Short 1:00:25 well, thank you, this will wrap up our 2020. To recap, we've gone through three themes, nine papers, I'm going to talk for a minute and give you a break, because I've had you talking the whole time and let you just take a quick breather before we head into 2023. And I think we'll revisit a couple of these themes. But we both sat down and each highlighted two areas where we think it's worth watching paying attention, there's something interesting going on. So I'm gonna go first with two of mine. And we're going to flip the script so via you can ask me any questions that you want. And if I don't know, I'll say so. And we can leave the audience to follow up with somebody who really does know, my my first one that I'm really excited about is the rise of whole genome sequencing and newborn screening. This, as many people may know, there is a form of universal genetic testing in many countries using a heel prick test that tests for a very small number of genes that have very clear clinical benefit that if you pick it up early, you can do something about it. What we're talking about here is a step beyond that, which is to do whole genome sequencing on newborns. And with most of the programmes, the focus is still on clinically actionable genes so that if you if you find something, you can do something really important for the newborn, put them on a preventive therapy for a rare disease or or, you know, or something like that. Some programmes I think, are thinking a little bit longer term, like what might you do with this in a predictive and preventive health care context. But why I think it's so exciting and important is there's been a couple programmes that have gotten really significant funding now to do to do this at a very large scale. So in the UK, for example, there's been about 200 million pounds, which is about $250 million for whole genome sequencing and newborns. And why I think this is so important is that we touched on this earlier, but DNA sequencing is a very special form of the medical test where you only have to do it once, because your DNA doesn't change if you're if we're talking about the germline. And so there's definitely a path towards the future where everyone is sequenced as a newborn. And that basically folds into our healthcare record and decision making system in some way so that the data can be used not just for early detection of very severe and rare diseases, but also towards some of the things we've just been discussing, like polygenic, risk scores, and prediction and prevention of disease. So I and I think these kinds of programmes also going to derive the costs further and further down. And in a way, it's going to kind of make genetics disappear into the background a little bit where eventually everybody will be sequenced, and maybe won't be so special anymore, but I think it's going to push us towards this personalised medicine or precision medicine call it whichever you'd like future where by default, everybody has the option to have their health care be genetically guided, which I think is really exciting. Versus today, where it's, as we've talked about, pretty spread out whether you get genetic testing or not, is really highly dependent on geography and disease state and so on. So what do you think about that? And what questions you have on that? Unknown Speaker 1:03:27 So I'm not very familiar with this in this initiative. But I was just wondering if this is like a test that's been given to everyone who is born or is just in the hospital setting? What is the case? Actually? Patrick Short 1:03:41 Yeah, no, it's a really good question. So this is a pilot that has been it's being run by genomics England and the NHS, and they've been working on it for a really long time. Actually, I we interviewed to chair genomics England on one of the previous podcasts episodes, and she gave a really clear overview of what's in scope and what's out of scope. And they've been doing a lot of work around making sure the questions and concerns of parents and and other stakeholders are heard. what's really different about this is I believe, they do intend to sequence all comers. So it will not be just critically ill infants, for example, which is a number of programmes that do rapid whole genomes, which of course, totally makes sense. But I think the problem that this is solving is those cases where if you'd known what the genetics was before the child becomes critically ill then it saves everybody a huge amount of heartache and concern. So flipside of that is you're going to test a lot of people who the result comes back negative in the sense that it's a very positive experience where you've been told you we've tested you and haven't found anything but a small fraction of babies or parents are going to get results that say, No, hopefully we found something that if we hadn't found it would become a disaster in a couple of weeks or months or years. Yeah. So I mean, I think it's definitely it. being shown that whole genome sequencing has a great value in a hospital setting, particularly, you know, to identify diagnosis or to rule out diagnosis. That's also an important case. And I think even tweeted about a clinical trial experimented there in a paediatric ICU unit. Radek, clearly found great value of saving cost and you know, and leading to a timely treatment intervention, but when it when you come to a population level, right, so what we do with this inborn errors of metabolism screening using yield records, and this being done, you know, for every child born like in Scandinavian countries, and also other countries, like we have this big biobank using whole genome sequencing in that setting. And I think it's the probably the, you know, I'm Unknown Speaker 1:05:49 not really sure, like, how many of the parents want to learn about the genetic risk of a perfectly healthy child? I mean, so let me ask you, if the hospital offered a whole genome sequencing for your kid, you would have accepted that you will want to do it or you know, that there is no no cancer, no issue with the health of your kids. Patrick Short 1:06:11 I personally would bet I'm 100%. with you that I actually think one of the interesting thing that's going to be learned through this programme is who who does opt in? Who does opt out? And why? Yeah, because there's plenty of good reasons not to as well, I personally would, but I think there's a lot of there's a lot of arguments not to Unknown Speaker 1:06:29 Yeah, I think the other problem, I don't know, maybe the one of the common things people bring up when you talk about this, you know, offering whole genome sequencing as like a general health checkup, you know, like doing it, if there's information, oh, this is like in problematic in terms of insurance, right? I mean, if this information is if the insurance company have information on this genetic risk, and things like that, so then that's gonna change the scenario, right? If they know beforehand, what is the issues that we assume that the child will have in the future? And things like that? So yeah, and how are they gonna solve about it? So Patrick Short 1:07:03 this is I'll use this as one way to segue into my second point is because I think it's really relevant here. And in the UK, the protections are a lot stronger than in the US. They're not perfect, though. So for example, my, I'm not an expert on this, but my understanding is that you still need to disclose information about Huntington's disease risk, for example, to ensure so it's not that in the UK, insurance can't do anything. And in the US, you know, in the US, as many people know, life insurance is able to ask about genetic risk, but health insurance, and my understanding is typically not but the, you know, there's no free ticket, but the UK protections are a little bit better. I don't know what the particular programme how they're planning to handle it. But I think this is going to be a really key aspect of the upfront counselling and discussion with parents of you know, what are the risks of taking part one of those is finding out something that is going to massively impact it. And I'll give you a chance to respond to this. But the next topic that I'm going to talk about is pre symptomatic treatment for very significant progressive genetic diseases like ALS, or Alzheimer's. And I think this is equally important there where if you have a family member who has, if your family member has ALS, typically rapidly progressive neurodegenerative disease and you and they get genetically tested and they have a genetic form. And then you're deciding whether you want to get genetically tested to see if you carry it, you have the same major question to consider of a Do you want to know and under what circumstances and also in the US in particular, what are the implications of getting a test that says you have a very high likelihood of getting a very significant disease, but then you have another chance to take out life insurance or Unknown Speaker 1:08:45 other? Yeah, so it's very exciting initiative. So it's just my, you know, like when we evaluate the value of whole genome sequencing in a clinical setting, so the trade the study that I tweeted at, you know, like couple of days ago, they look into how much money you save, how much of the disease burden you already use, you know, so it's like you have this kind of metrics to evaluate the outcome. Right. So in terms of economy, and also in terms of disease burden and everything. I am not sure how I mean, I would really love to know, what will be the impact of this? You know, I think we can only know if we implemented so. So definitely, that's one of the thing to see and how it's like impacting the country's economy and things in a large scale. So, Patrick Short 1:09:31 yeah, and you're so right about the health economics. I think, broadly, my prediction of it would be, you know, 95 plus percent of parents won't have anything. I mean, I'll even say probably 99% won't have anything actionable. And their costs will be a couple 1000 pounds to the healthcare system all in for sequencing, counselling, processing, and then there's going to be 1% Where the testing is hopefully going to help avoid something that catastrophic and when I don't know is how much that how much does that 1% cost? Somebody will be working hard to run these numbers, I think to figure out, do we want to do this at scale? You know, and the one of the, I guess, pet peeves that I have with these systems in general is that the health economic assessments often don't encompass some of the non economic impacts, like what does saving one and 100 families from having one and 500 Wherever it ends up being from having a traumatic first six months of life, but that's also hard to put a number on, but we have to because they have the health system has to wait up against something like genome sequencing and 50 year olds to do cardiac PRs or something like that and compare the cost benefit? Unknown Speaker 1:10:42 Yeah, so I think it's great. So it's really interesting to see how this pans out in the future. Patrick Short 1:10:46 So yeah, and my second one, which actually actually related to that last point is, we've seen a couple drug approvals this year, in ALS and Alzheimer's, they don't appear to be silver bullets, and there's a lot of discussion around how well they work and side effects and so on. But one of the things that has seemed to emerge from both the trial and pre clinical data is that early treatment, in neurodegenerative disease with some of these recently approved compounds may be more effective than treating people who are already very advanced with disease. And why I think this is potentially really interesting is for kind of obvious reasons, if we can intervene before we get into severe neurodegeneration, then that's a great thing. But it also opens some really challenging questions about, you know, like the one we just covered if you have a family member who has Alzheimer's, and it's ApoE e4, or a family member who has ALS, and it's sad one or one of the other genetic variants. And you're going through the discussion of a whether you get tested, but it'd be if you are tested and you're 25 years old, carrying that how do we design trials that test, you know, that test this hypothesis that administering a drug to an otherwise healthy, but pre symptomatic population might work? And I am excited for that? Because if it works, it'll be amazing. But it's also a it's a big challenge to think Unknown Speaker 1:12:07 through. Yeah, I think definitely. So there's a lot of hype about the reason for the as a must. Right, so I just remember, I think, one of the nature news and reusable Azeema thing. So one of the challenge, I guess, like people feel that the reason why all this drugs targeting the amyloid beta pathway is failing is that we are three, doing the trial to the very late stage. Right. So we need to start early. We're also like, define early, like, very difficult, right? So we don't have proper biomarkers are that kind of, you know, accurately tell you okay, this is the critical point, you have to make some intervention before this. We don't know that. And to evaluate the improvements, we don't have any good way to evaluate how is the drug is working apart from you know, measuring the betta might not play eggs. But we'd know that doesn't correlate well with, you know, like disease outcomes. And yeah, so, so many challenges, I think, like one of the solution they offer is like what you said, so we need to try the trials with early onset cases. So either you can do it in two ways. One is just focus on monogenic conditions are caused by non mutation like AP mutation or an identify carriers of this mutation and follow up them from the very early stage, you know, early age, and enrol them in the trials. And again, it's there is a problem with putting half of them in the plastic or drink, right? So it's not fair. Right? So they have this kind of design where we switch over the groups at some point. And then, yeah, so I think that trying out a lot of different ways. And so one is monogenic condition. The other one is like, clinically identifying early onset cases, like even before cognitive impairment using PET scan to identify the amyloid plaques, and you know, your earliest science and then enrolling them. So, yeah, definitely, I think that's the next thing to do. And yeah, it's really, if it works, then you know that really great, but there's just a lot of risk involved in that. And you have to think about the patients and yes, Patrick Short 1:14:19 yeah, and maybe this is a good segue to one of yours, which is the coming back to the proteomics because it's something you just said, I think, you know, genetics is an important very imperfect predictor of something like Alzheimer's disease risk, but it may be that getting rich proteomic data could help to close the loop to find a protein based risk score, for example, that is a lot more predictive of onset than that genetics. ApoE e4 might tell you, you're going to, you're more likely to get it eventually, but something that's a nearer to a real time biomarker, I think, is absolutely me. So maybe you can talk about that and what you're excited about for 2023 Unknown Speaker 1:14:58 Yeah, I think you'd It reminded me of one of the most, you know, like interesting use case for this. I think the compelling finding comes from decode paper who they published on 35,000, based PQ to paper. So, one of the use, one of the ways you can use it to use the proteomics data is to identify biomarkers that can inform about, you know, disease progression, particularly, when you are evaluating a drug, whether, you know, it can tell you the drug worked or not. So, they have, you know, they use this Mendelian randomization to identify so that, when you do association between the protein levels and disease, both, you know, you find two types of associations, right one where the disease is likely leading to the change in the protein level, and the other one is the protein, it's being causal to the disease. So, we always focus on, you know, get fixated on an old looking at one D, the causal associations and ignore the other group of associations, but actually, that the group associations is two might be also equally important, because they might tell you, you know, when you're trying to evaluate some drugs, that they can give you an unbiased, you know, marker or a tool to identify. So they, I think they have an example for psoriasis, virus, a biomarker DFB II, one some protein in the skin, you know, so it's clearly a marker for psoriasis disease. And it's not it doesn't have any causal Association, it's very clear, very beautiful example, for this mandate and randomization. And you can use that to, you know, when you're treating, trying out some drugs that are looking at the outcome, then you can use that as a marker, probably. And I think they also have some other kind of example for osteo arthritis. So there are like, that's one of the exciting use case, we will be looking seeing a lot of these examples in the upcoming years were different for different diseases where this, you know, can be useful, that is like individual proteins, but you can also use like a polygenic risk score, like approach, combining proteomics, all the protein associations, and, you know, training mission using machine learning models to train scores, and use that to monitor the progression that is another area where probably useful probably might be useful in this context, like the clinical trials context. Patrick Short 1:17:15 Yeah, completely agree. wrap us up here. And I think this will be probably one of our longest episodes. Yeah. Which is good, because we're taking a little break over the holiday period. So people can chop this up into two or three and listen to it as they're working off their holiday meals. Yeah, so we just touched a little bit on proteomics, maybe we can start with that one and Oh, link in particular. Unknown Speaker 1:17:35 So like I mentioned to you before, so when I when we think about what to look forward in the future, so I kind of category I can divide into two one is that the 2023, what I would expect to come out in 2023, what I would look forward, and then what I would be like looking forward in a long, little bit longer scale, like probably five years, 10 years. So because one year is nothing so most of the studies that I would expect to come in 2020 Already in the preprint in the Bio Archive this year. So so if we focus on 2023 I think two things that I'm more excited, one is in what are the ways people are going to use the whole genome sequencing data from the UK Biobank and the other one is in how people are going to use the proteomics data. So, the I think the whole genome sequence data has already been made available for everyone, all the researchers. So, there are so many use cases like you know, people are going to use it to interpret their GWAS associations, they might be able to have better find resolution at some of the jiwa Cloakers. So, one of the important problem, your challenge in interpreting jiwa studies to identify the causal genes of the G was locus. So, the whole genome sequencing resolution might be able to help with that. And then to identify, I think, most excite people are more excited about using volume sequencing is to finally go and look into the non coding variant associations, right. So, we mainly focus on the coding variant associations. And so we know rare variant have very large effect sizes, there is also non coding variants, which can have very large effect sizes, particularly noncoding variants in the regulatory regions, I think, the decode paper beautifully show it right. So we always know the exons are highly, you know, like under constrained right? So they, you know, they, you see less number of mutations than what you would expect. And but this is also true for some of the non coding regions and this level of constraints are sometimes even higher than what you see for exons. So that means, what happens when you have a mutation in this region where you don't see any mutation at all right. So, probably it will have a very big effect and we are actually we are just scratching the surface of such associations. And that is one of the most exciting you know, discoveries we will see. But one of the challenges is that you know, we will be like more restricted to tradition. Should annotations like promoter enhancer, you know, around the gene region. But we also need more fine annotations like tissue specific annotation, developmental stage specific annotation, to know that to look at the whole genome sequence, and that is one of the challenge. And so I guess like, people probably will be probably already working on it, because it's what I always say just one of the big idea for if you're trying to start a company or anything, if you're if you want to make use of all the UK Biobank data, but also you have something in house that can complement it. One exciting area is to build resources for this kind of like regulatory annotations and things like that, it will be like, extremely useful to capture a lot of low hanging fruits using this like, find 1000 People whole genome sequences coming up in this years. So that is one other area, and polygenic risk score, and then laws of evolutionary based analysis, and the court has created a beautiful invitation panel. And this is going to be very useful for a lot of non Indian non or non European population. So they have built a poor imputation panel for South Asian population, I think this is one of the first large scale imputation parent that has been even been made. So jiwa studies from all these areas probably will make use of this, you know, UK Biobank whole genome sequencing mutation panel, just yesterday, I was tweeting about paper that showed how to use low coverage, whole genome sequencing data, combine this with UK Biobank imputation Reference panel to get the best imputation performance. So that is one other use case. So there is like so many things, we will see how people, you know, as well, as I'm very excited about this one this area for the next year or so like, next few years. Patrick Short 1:21:46 We have we have UK Biobank and all of the industry funders in the sequencing to thank for this for this great, yeah, Unknown Speaker 1:21:52 I think it's successful, successful formula, right? I mean, there's always this gap between academic research and industry research, you know, it is always this successful collaboration is kind of show that it's possible to have this collaboration with still industry making their, you know, profits getting out of the investment. But at the same time, it's been incredibly useful for the Community Academy research community and the society. So I think it's a great thing, great examples, all these collaborations, big data's, and Patrick Short 1:22:25 maybe you can talk a little bit about oh, link and proteomics, I only really heard of a link for the first time maybe a year or two ago. But it seems like a really interesting approach. And maybe you could explain how you know, which I suspect you do, because you know a lot about a lot of things, how that method for proteomics works compared to things like mass spec, and others. And then what, what's different about that, compared to some of the other tools that we've had available in the past, yeah, so Unknown Speaker 1:22:49 you mean the way they measure the protein, I think this whole thing is based on body based capture technology. So there's who they know, like the popular ways of measuring the preteens one is optimal space, where you know, they have this short chain of all this new oligonucleotides, they identify this oligonucleotides based on a huge massive library, where they identify the origin of nucleotides with specificity towards proteins, then you use this to capture the protein in the blood. And then you measure this using the regular DNA technology. I think the decode use the abdomen base to measure but in the UK Biobank olink is based on antibody based capture, but when you call this say, I think Proxima ligation as the extension of say, so they have this short nucleotide chain attached to the antibody. And I think they have two antibodies that are targeting a single protein. So one of the problem with using antibodies to measure protein is cross reactivity, right? So it's very good to have this antibody with very specific, it might only do this one protein. So by using two antibodies, you effectively increasing the specificity. So they have this oligonucleotides that when it comes to in proximity, then they bind and you know, like it extends like a PCR assay. And then you know, you measure that based on that you met using this DNA array, you can measure it. So the main the beauty of this list, you can do it on scale, like you can have DNA barcodes and you can really scale up. That's one of the problems with proteomics, I think, like two challenges to scale. Right. So I like measure 1000s and 1000s of proteins in industry scale. So that's the technology and still we are having still people are evaluating how accurate is this? What are the false positive findings, but so far, it's been great and it's a lot of findings replicate across cohorts across technology. That's what the published papers have shown. So this data set is going to be you know, exciting and I think at the ACG this year already. There was like the presentation of this data set in at least 50 In 220 abstracts in posters and talks. There's also a talk from Chris Whalen, I think he is the main lead of this analytical workgroup or the proteomics in the UK Biobank conference and an interesting portion that happened a few weeks ago. And so there's like a lot of exciting use cases. So one, mainly in the draft target discovery, this is going to enhance our resolution of genetic associations, because every time when we find an association between a common variant, the trait or rare variant of the trade, the natural question, is that how this variant is affecting the protein? Right? So is it increasing the protein or decreasing the protein, because that is the fundamental information that we need to defy to propose a hypothesis that this is how you're going to treat the disease, right? So and it's going to be very helpful for that. And so looking at the association of the proteins, and so there are a lot of times, you know, there's two kinds of genetics reverse genetics for with genetics, the classical genetics that we do is that we start with the phenotype, do a genetic association, we identify an association, then go back to the genome and the locus and identify the gene. And the reverse genetics can be like, you start with the gene, you know, this gene is important is has a specific role in the disease, and what is the consequence of mutations in this gene in the population. So using this proteomics, you can actually combine this to, for example, use first discover the gene, then go back to the gene and look at associations at a finer level with more power associations with the actual proteins, then you can identify important variants, what are the missense variants that are more important in this protein or pathogenic or the loss of function variants? Then you go back to the population datasets, and look at these associations and increase the power or right, because now you're essentially reducing the number of variants to truly affecting variants. Right. So that's one exciting area. And predictions, we already talked about it. And we know like, we have a lot of ways to predict the pathogenicity, generosity of the variants, computational, right, we have different ways to say for missense variants, but to genetic. And so I think like, this has been excellent data set to create an another, you know, like such a prediction score, by combining with other scores, you can even improve the performance of this prediction. So you can, you know, say if it is affecting the protein and in what ways it is affecting protein. So that will be like very useful in clinical genetics, as well as in the population genetics when people are walking, you know, imagine going back to a database like Nomad and looking at the genetic variant, and then also you're getting the information. What is the effect of this variant on the protein level? Is it like increasing or decreasing, or what is the scale of this association is going to be amazing if you have this information in the future? And I think like a lot of other ways, Mendelian randomization, both for drug discovery as well as in the epidemiological studies. So a lot of questions, interesting questions you can answer because now you have, you can have more stronger genetic instruments, because like, protein is very close to the DNA compared to a phenotype, you know, that's like a very distal consequence. So essentially, by using this protein, you can have like more stronger instruments for doing this causal Association. Patrick Short 1:28:21 I couldn't agree more, I think gets you k by omega is the gift that keeps on giving. I just want to say thank you. This has been an amazing conversation. If I can quickly summarise, we talked about tweet threads, chat GPT, we talked about natural selection of through the Black Plague and its effects on modern day number of milestone achievements, the largest hype GWAS ever, which I learned is quite a definitive moment in the field. We talked a lot about exomes genomes, proteomics, a step in the right direction, as you put it in some large scale, more representative populations, including Mexico, India, and Africa. And then I think we just covered what's in store for 2023. But I'd like to say thank you for taking the time out of your holiday. I'm Unknown Speaker 1:29:06 really sorry for extending a long I think probably one of the longest Patrick Short 1:29:09 No, this is like the Joe Rogan length we've gone for we haven't hit the three hour mark. I think we're well below two hours, which is great. I like you love to talk about this stuff. So it could be here. I can be here all day, all night. So thank you, Happy Holidays. And thank Unknown Speaker 1:29:23 you very much. Thank you very much. Good to see you. Bye. Transcribed by https://otter.ai