The Genetics Podcast episode 39: How hundreds of scientists from 50 nations are collaborating on Slack to study genetics and Covid-19 with Dr Andrea Ganna ------------------------------------------------------------------------------------------------------------------------- --------- Patrick Short: Welcome to the genetics podcast. I'm really excited to be here today with Dr. Andrea Ganna. Andrea is a scientist and group leader at the Finnish Institute for Molecular Medicine, and he's also a researcher affiliated with the Broad Institute of Harvard Medical School and Massachusetts General Hospital. So Andrea's work generally speaking focuses on large scale genomic data analysis in a number of different common and complex diseases. But most recently he's been leading a worldwide effort to understand the role of genetics in COVID-19. The project's called the COVID-19 Host Genetics Initiative, or COVID-19 HG. And it's really a first of its kind large scale collaboration between hundreds of researchers around the world that are pooling together data and summary statistics from their research studies to try to get a better understanding of the genetics of this novel disease. So Andrea, welcome to the podcast. And I wonder if you could just maybe start by taking us back to the moment that you and your colleagues decided to start this initiative and why you decided to put it together. --------- Dr Andrea Ganna: Hey, yes. Thank you, Patrick, for inviting me to this podcast. So, so briefly I think the initiative started back in March. We were - the pandemic was coming from China and was traveling through Europe - didn't hit United States yet. And it clearly immediately became clear that there was a variation in the severity of COVID-19 in the - in patients. And so, you know, genetics, it's important in many diseases and we thought that probably had a role, we didn't know how important was the role, but we knew there was some kind of role. And so we wanted to start to explore that aspect and we put out a - very simply - a website and the tweet that was the, you know, the effort we put in this initiative in the beginning. There were collaboration at that time, only from Italy - one group from it and one group from US and one group from Finland. And so we didn't know if that's going to be a success, or that's not going to work out. But, but it turns out to - that was well received by the community. And that's kind of also the power of social media I would say, and we probably started this initiative at the right time when, when the pandemic was growing, but it was not immediately obvious the role of genetics. And so right now we are around 100 groups and more than thousand researcher in this initiative. So, been a very interesting journey. --------- Patrick Short: What did those groups and researchers look like? Do you have any kind of high-level statistics of whether they're academic research groups, people from pharmaceutical companies, industry biobanks, what, what kind of groups has the initiative attracted? --------- Dr Andrea Ganna: Yeah, so it's mostly academic groups. And we have - we cover more than 50 nations, so it's a very diverse set of, of researcher. We have also some pharma collaborators and some, and some other industry collaborators. But I think one of the most interesting things for me is that we're not only collecting research that are already part of the human genetic network, where we know each other quite well, but we are picking up a lot of clinician researcher, and other researcher not-necessarily interested in host genetics, but they turn out to have samples and want to contribute in this initiative. And by doing that, we give them the opportunity also to participate more in this, you mentioned it - the discovery effort that otherwise we will be cut out by normal channels. So, so that's, that's one of the advantage of having such a diverse pool of researcher. --------- Patrick Short: Yeah. So how does it actually work for the researchers who are participating? If you could explain to someone who's never been a part of one of these big consortium genetics projects - so everybody's collecting data more or less independently, but then pooling the data or at least parts of the data to make them larger, and in some cases, much larger sample size than anyone would have individually. Could you maybe just walk through how that, how that works? --------- Dr Andrea Ganna: Yeah, sure. So the researcher, the first things you can do is to go on the website and register. And when you register, you can describe your study, describe which assays you've been doing on your study, describe what's your research question. This is kind of new compared to other consortia. In this way, we not only, you know, put out the name of the people that are part of the consortium, but we also provide a database of studies that are out there and we actually provide the opportunity of the different study to contact each other - we have a contact button on the website. So if two study find that they have similar patient population and similar research question they can actually connect to each other. So that's, that's a cool feature. And then, then after you register you basically have two ways. One is that you share individual level data, and that's probably the - it's mostly done by smaller groups, which do not have computational capabilities in house, or groups that want their sample to be genotyped. And so we provide the genotype service but also the [inaudible] service. And the second way is that you compute in-house analysis according to a certain analysis plan, and then you share the summary statistics and then we put them together. The second type of analysis is the one chosen by large bio bank that can not share individual level data. Currently we have mostly study doing the second approach. So sharing summary statistics, but once the researcher use this approach, both number one or two, what we do is that we centralize the analysis. We put together the data, we do some quick quality control, and then we put out the result immediately to the community. --------- Patrick Short: So how many groups have submitted some kind of reasonable number - or maybe just at least one - sample so far? Has it been a small number of groups that are doing the vast majority of the samples, or what does the distribution look like in terms of the people who are contributing? --------- Dr Andrea Ganna: Yeah, so we have more than 20 studies that have contributed out of 200 study that have registered. So it's around 10% right now. It's, it's actually mostly from Europe and from - we have also study from Brazil, Qatar, and Korea. And US has been surprisingly not very well represented, at least in this, in the beginning. We have some from, from Partners Biobank in Boston and BioMe New York. So that's, that's the distribution more or less of the studies. --------- Patrick Short: Does the geographical distribution make the, make the analysis hard? Obviously you want to have a representative sample, but that also sometimes makes the analysis challenging. I'm not at all an expert in this, but I've seen people talking about this association with blood type and whether blood type does affect or doesn't affect severity. And there's a lot of discussion around whether blood type is correlated with ancestry, well - it is and how that affects the analysis. Have, do you know if any one in your consortium has been able to kind of get to the bottom of this or at least get an understanding of, you know, whether that association is, is likely to be real or whether it's due to different sized groups from different parts of the world kind of being put into the same analysis? --------- Dr Andrea Ganna: Yeah. So that's a good question. And so the EPO association, it's, it's interesting because of this concept of population stratification, and ancestry diversity between cases and control. And I think, you know in the article that was published on New England Journal of Medicine a few weeks ago, they point to that hit, and 23andme also seem to support, that while in our [inaudible] we don't find evidence, it's, it's difficult to say if maybe it has to do with the severity of the disease, maybe has to do with population stratification. We are now actually trying to understand how the results look like without, because in our study, there is also this New England Journal of Medicine summary statistics. So we're trying to see how the result looks like removing that one, and if there is variation across the studies. In UK biobank, the signal doesn't seem to be strong, and we know the UK biobank is generally well controlled, in terms of population stratification, but it's also more a mild COVID-19 population compared to their study. So one challenge that we are facing is, is that the analysis that seems to be more powerful is when we compare hospitalized COVID-19 cases with population controls, but the population controls is normally something that you want already to get from existing genotype cohorts, rather than, you know, genotype new population controls, which no one's collecting in this pandemic crisis. Right? So, so when you, when you take these population controls and you compare to your cases, there you have challenging in matching, you know, ancestry and so on. So that's, that's maybe what can introduce these artefacts. So that's the status. I don't think we arrive, arrive to the bottom of the problem, but that's [inaudible] explanation. --------- Patrick Short: Yeah, that's interesting, and it probably will take some time to tease it out. So, for people who aren't as familiar with the project, I know you're looking at very severe hospitalized cases compared to healthy controls, but also probably compared to milder cases. I think one thing that you all have done is, is really carefully looked at all the different potential phenotypes or characteristics of the, of the disease that you could study. I wonder if you could kind of go through a high level - what are the different tests that you're doing or planning to do, whether it's severe, mild, asymptomatic. I know you're also kind of using some of the existing machine learning models, like the spectrum model, potentially to predict cases where there may not have been a positive test, but you could effectively kind of infer that there was a positive test. Is that right? --------- Dr Andrea Ganna: Yeah. Yeah. So, so I think the, you know, we have three degree of severity. We look at basically individuals with COVID-19 that have assisted ventilation, then we look more in general for COVID-19 cases that are hospitalized, and then we look at COVID-19 cases, no matter what, just reported generally. And then we compare these three group with two main group. One is population controls, so everyone else, and the other one is COVID-19 positive, but not hospitalised for the first two analyses. And we have different combination, and we have also one, maybe I can speak later about that, about the predicted COVID-19 from symptoms. Until now the one that seems to give us the strongest signal is when we compare COVID-19 hospitalized versus population controls, there, we have a very clear peak on chromosome three. When we actually compare any COVID-19 versus population control that peak on chromosome three goes down, despite the sample size almost duplicate. It's, it's a balance, I guess, and we still need - we need to, you know, decide something or type a priori to allow people to do the analysis, but then we can use this phenotype to refine - are we want to proceed? For example, everyone will bet that, you know, when we compare COVID-19 severe with assisted respiratory support versus COVID-19 mild or moderate or not severe, we see a strong signal there and you know, it, it might be, but, and we have done that analysis but the sample size there is not great. And we don't see any amazing signal. So it's a fine balance between sample size and how accurate you want your phenotype to be, and we're still learning, basically. --------- Patrick Short: How many people in the severe group have you included, approximately, in the latest analysis and, and I'm interested in how that number compares to the number worldwide? Where could you go in terms of increasing the size of the sample, just by getting more research groups around the world (who are presumably already collecting a lot of this data) involved? Do you have any idea on the rough numbers? --------- Dr Andrea Ganna: So for the COVID-19 hospitalized, we have 3,200 cases; for the COVID-19 -all- we have almost 7,000 cases, and controls doesn't really matter, we have like a million or something like that. --------- Patrick Short: Yes. --------- Dr Andrea Ganna: It's easy to get, to get population controls. Where are we going? I think, you know, I think probably it's easier in the next round to reach the 10,000 cases in a month or so - a lot will depend on investment in, in, in genotype, by granting agency, especially in US which [inaudible] by the pandemic. I think in Europe there, you know, some large project you're probably aware on Genomic England to do a genome sequencing on 10,000. So probably with Europe, we cannot, you know, order 10, 20,000, but, but, but the big number, if they need to come, they will need to come from US. And there it depends a lot on what's the willingness of the granting agency to, to support that. And until now there've been, you know, the NIH has been very supportive or actually in their notice of interest they have highlighted that study that some grant on COVID-19 host genetics, they need to deposit the data in these initiatives. So we have a potential to grow both from the genome type side and now we're trying to gear up also from the sequencing side and see what, where are we going? But, but it's tricky I mean more we grow, more other challenges come into play, --------- Patrick Short: Like, like what, from an analysis perspective or just simply coordinating so many groups and sample? --------- Dr Andrea Ganna: I think from the analysis perspective you know, like this chromosome three signal is very clear and clean, you know, then they will more you - larger sample size you will not notice that people that are hospitalized for COVID-19 are not the general population. They have some characteristics that are such a demographic that are different and that's not necessarily driven by disease maybe by comorbidity with COVID-19. And so what you pick up is really related to COVID-19 or related to other co-morbidities or, or social demographic information. So I think we see it in UK biobank with some of these GWS's on behavioral traits. And so it's... yeah, but, but I mean, I think there are other values clearly to, to create predictors, for example, using [inaudible] score that would just benefit from sample size. --------- Patrick Short: Are you expecting to find - are you expecting this shift from genotyping to next generation sequencing? So Genomics England you mentioned earlier who, who we had on the podcast a couple of weeks ago is planning to do whole genomes. There are a number of people in the consortium, I think that are doing genomes or exomes, are you expecting there to be a huge difference stepping from genotypes to next-generation sequencing or, or not necessarily? Do you have any kind of predictions or thoughts based on the data you've seen so far? --------- Dr Andrea Ganna: Well, I mean, the, the idea is that exome and genome sequencing is valuable when we look at more rare form of the disease. So if you take young individuals without morbidity that might have a value, if you take the general population maybe, but probably on bigger sample size than genotyping, which would be hard to reach, so - difficult to say, but there is value and anyway it's an interesting exercise to be able to bring together all these genetic data across different platform and bring them together. --------- Patrick Short: Absolutely. I wonder if we could just talk for a minute about this association on chromosome three. I know it's kind of hot off the presses. It's only been really validated over the last week or two, but I wonder if you could just discuss what we know today about it and any potential association with the disease and then where we have to go next in terms of fine mapping and, and understanding what's, what's actually going on there. I think for people who aren't as familiar with genome-wide association studies, as you are, they probably don't realize all the work that has to happen after you find that initial peak, and that there's not always a single gene under there that you can tell exactly what's going on and and kind of understand the whole picture. --------- Dr Andrea Ganna: Yeah, so we have this chromosome, this peak on chromosome three, and I think it's right now quite strong. We see particularly strong when we look at, when we look at severe hospitalized cases versus control. And when we actually look - and the effect size, it's not too far away from the one reported on the New England Journal paper. So I think it's an odds ratio 1.5 to 1.7. So it's not bad. It's comparable to other, you know, top signal we got from, from complex diseases, but it's in the range of, you know, pleasant signal I would say - it's not this odds ratio 1.01 or something like that. So there's definitely something there. Now if all in a region of [inaudible] and in that region, there are several genes that might seems interesting. I think in terms of distance, the closest is these [inaudible] that TFL-1 and the other one is CXCR-6. And then there is a couple of CCR-3, CCR-1, CCR-9. So all these genes, you know, you might know the CCR-5 in HIV, they have to do something with infection. That's a little bit beyond me exactly what's the function. On the paper people seem to speculate also on these SLC-6-8-2-0, which interacts somehow with phase two, which is clearly involved in - --------- Patrick Short: A receptor, yeah. --------- Dr Andrea Ganna: Yeah. So, so in terms of EQTL signals, nothing is strong supporter of any of these gene, I would say. In terms of association of these variant, we thought of traits as being associated with monocytes and macrophages inflammatory protein. So it seems to fit probably the inflammation [inaudible] or something like that. But yeah, that's all, I think people are still trying to figure that out. --------- Patrick Short: Yeah. So if I could maybe recap, make sure I get it right. The part you mentioned about the odds ratio is, it's basically saying that people who carry one particular genetic variant at that location are 50 to 70% more likely to present with some kind of severe respiratory failure. So we have this link between some gene in that area and an increased likelihood of respiratory failure, but the challenge right now is there's a number of different genes in that general vicinity, and there's a lot of work to determine which one is actually kind of the, the cause rather than just kind of along for the ride from a genetic sort of statistical sense, is that right? --------- Dr Andrea Ganna: That's correct. Yes. --------- Patrick Short: So in terms of how this data and these findings can be useful going forward, what are you most excited about? So there, there may be additional hits as well besides this one, are you most excited about how this data's going to be used to help understand the fundamental biology of the disease, and therefore help to develop better treatments? Vaccines? Are you most excited about trying to do better personalized risk stratification, identifying people who are, you know, maybe fall in this high risk group from a genetic perspective, but wouldn't seem high risk otherwise, basically other things, you know, which, which of the above do you think are going to be the biggest impact or the most exciting parts of this? --------- Dr Andrea Ganna: I think if there is any impact, it's probably going to be on a more on the biology side. I don't think that the prediction side is super relevant because we're not really a disease that needs you know, long-term primary prevention or anything like that. And probably if there is a vaccination there will be other demographic that are more valuable than genetic to decide who to vaccinate if, if we will not vaccinate everyone. And once you are admitted in the hospital, I can imagine there are other biomarkers that are probably more valuable to predict if you will end up in an ICU. So I don't see that to be particularly valuable. In terms of biology might be valuable, but I'm not super expert into, into that. But in general, I think genetic is - it's a tool that it's part of the toolbox. And what I like of genetic is that almost every study has genetic. So if you want to understand if, you know, these, these COVID-19 is causal - or those biomarkers are related - causally related with COVID-19 or if two biomarker are somehow related to each other in COVID-19 space you don't need to have to have this measurable from the same study. You don't have to have measure of COVID-19 and the biomarker, you can use the genetic to link different - all mixed across different studies and maybe so to draw some causal inference out of that. So I think that, you know, what we are doing is more a service - providing, you know, a service that you know can be used to answer multiple research question. I don't think there is one primary one. --------- Patrick Short: Can you give an example of that linking either in this context or others where genetics can kind of be a bridge between maybe two, two different studies or two different types of measurement? --------- Dr Andrea Ganna: One example in the, in the causal inference space I think I saw a recent publication where if you do mundane randomization with the summary statistics on smoking, it looks that smoking is causally related with worst COVID-19 outcomes. While the epidemiological association actually said the opposite of smoking is protective. So that's a nice example of how you can use genetics to draw a causal inference. --------- Patrick Short: So what does the genetics reveal in that case to - can you look for the link between genetics and... cause we know that genetics influences whether somebody is more or less likely to smoke or at least become addicted because of the differences [inaudible] --------- Dr Andrea Ganna: Yeah the GWAS of smoking, the GWAS of COVID-19 doesn't need to be measured on the same study and that you can use genetics to basically link the two. And you can do that with other biomarker as well. Let's say, I, I don't know to which extent you can push that, but you know, you say you have a cohort where you have genetically characterize a certain protein or a certain biomarker, independently if people have developed COVID-19 or not. And then you have your GWAS result for COVID-19. Then you can use mundane randomization approaches to link the two without having them measured on the same studies, just an example, but you can do genetic correlation, which is basically something we have seen until now in the epidemiological space, right? Most of the epidemiology came out is while people with COVID-19 have more neurological disorder people with COVID-19 have more of these and more of that, you can do that from, you know, if you have a strong enough signal in the, into genetic of COVID-19, you can have genetic correlation, and 99% of the time will give you exactly the same results that doing epidemiological correlation. --------- Patrick Short: Right. Interesting. Have you, have you had any surprises from the perspective of organizing this collaboration when you started off? I think you mentioned it was a website and a tweet and it kind of unfolded to where it is today with more than a thousand researchers in a slack group. Have there been any either positive or negative surprises in terms of organizing and building consensus around how you do this kind of analysis with a group of that size has - it seems like at least from the outside, it's been a very successful kind of sociological experiment. And as far as large scale science goes, have you felt that way as well? --------- Dr Andrea Ganna: Yeah, I think we didn't have any major problem. I think people sometimes find the idea that we are not pursuing a publication directly a little bit unsettling. We know that in science, there needs to be some clear rules. Like, you know, I'm doing this because I get my coat or sheet on a paper. And so, you know, when you say well you should contribute the data, because that's good, and then you get acknowledged on the website, but you know, we're not bringing paper right now, so it's not going to be your name on any paper. You know, some people have to cope with that, but I, and you know, the other challenge is what if I use your summary statistics, add, you know, 2000 sample and then publish a paper - you know, what, what does it mean? Can I do that? Theoretically you can do - why not? But where does the rules apply when you have to collaborate and you can pursue your own? So there are some challenges with this new model, I think where we make, and now we are doing in-silico analysis where we basically say run genetic correlation, run find mapping, and make - put them on the website immediately available. So how long we can push this, you know open collaboration where everything is shared on the website, and how much this clash against traditional publishing strategy - that's the fine balance. --------- Patrick Short: How did you decide to go about it this way? Was it given the, you know, given the enormity of the situation and the essential nature of having data out, as soon as possible, you felt like there was no other way to do it or, or have, you know, I think that you're probably a pretty big proponent of open science and getting the information out there as soon as possible, no matter what it is you're working on. So is this a, you know, an opportunity to try something radically different that also kind of was demanded by the times, because it is something that's worldwide and fast paced, and we don't have the time to wait for, you know, fights over authorship and those sorts of things, or how did it come about? --------- Dr Andrea Ganna: Yeah, I think there are two reasons. One is that we want people to share their data. And if you start to say, you know, we are gonna do, you know, this paper and this paper, then more people maybe might be pursued to then do their own work before they share. So that was a reason but it didn't seem really to affect that decision too much. But the second one is that the situation is moving. I don't know if there's really a good point - it might be an a year or so where you say, well, we are there and now it's time to close it, but it's constantly evolving, and we need more sample - it's clear. I mean, like, you know, I think the last estimate of heritability I've seen is not significant. So we are not there yet I think, to do any, any conclusion or something that deserves to be put out there. I mean, some people will do it with a lot of, you know, by archive or even a lot of publication all along the way, but we decide that I don't think it's a wise investment. That's nothing different than putting it on the website really. --------- Patrick Short: Do you have a - personally speaking - a kind of long-term view on what this is likely to evolve into? My, I can kind of give my personal view that I think it's, it's really, I, my understanding is it's really challenging to develop a vaccine on the kind of timescales that we're talking about. So it's likely to be 12 to 18 months before a vaccine is, is available. And I think the other layer to this is that there's not to date, been a successful coronavirus vaccine. So there is a question of, you know, whether it can be done or not. I think, I believe in human ingenuity, especially given the number of smart people that are working on this, that it will get solved, but I'm interested in - from your perspective, whether, especially living in a place where from an epidemiological perspective, from a public health perspective in Finland, you've gotten it under control relatively reasonably, but there are other parts of the world where it looks like it may just be that the virus is running its course for the foreseeable future until a vaccine is developed. Do you kind of think about how your work fits into the longer term likely scenarios, or are you just kind of trying to do the best science that you can now? And, and not worry too much about what's going to happen in two years, three years, et cetera. --------- Dr Andrea Ganna: Yeah, I wish I could say that was the first case, but I'm, I'm probably more likely the second option. I think, it's hard to imagine how these results can play in the global effort to find a vaccine, or how they can have that type of impact. I think, you know, I imagine this one as providing a solid service to the scientific community and, and trying to do that as most efficiently and you know, and openly as possible. And then it's, it's about others, I think, to, to use these results in a, in a good way. I don't think I can foresee an immediate impact on the global scale. --------- Patrick Short: Makes sense. How, how different have the responses of different consortium members been in your experience from... - because it seems like different - if we take biobanks, for example, different biobanks have very different setups in terms of how easy it is to re-contact participants in the biobank to collect longitudinal data. This UK biobank has started to run a very large scale study looking at whether the half a million people in the biobank have had a COVID-19 test, but not, not all biobanks are set up in the same way. Have you, have you seen any kind of models that have been particularly successful or other models that have been a little more slow moving, in responding to something new like this? --------- Dr Andrea Ganna: Yeah, I think the Dutch biobanks that do- or Dutch study have been doing an amazing job. I think Lifelines was already sending out [inaudible]. I mean, we got, they were one of the first to contribute and they - out of self reported COVID-19 cases - and they could prepare a questionnaire and send it out in a matter of weeks. So in the end that's happened also with other Dutch cohorts. So that was a nice example in how things can be, but I think they have a very good relationship with their participants as well. The US biobanks, they have been - I don't think they are set up for that. Maybe only [inaudible] to recontact participants and, and things like that., and in general, they have been very slow in obtaining permission and there is a - [inaudible] here link with electronic health records, my understanding is that there's been a lot of chart - manual chart review - to actually extract that. In Finland here we have you know, there was also a very fast [inaudible] registry and with, with FinnGen it was very fast and now there are some study that they are recontacting participants. --------- Patrick Short: Right, yeah. It's interesting how the - because there are a lot of different ways you can do it, right? You can ask people directly, but it may not be as reliable. You can link to healthcare records, but if they're, if they're not well organized or if they're not in electronic readable format, then that also doesn't work. So I suspect, I don't know how much of this will stick and how much of it will, how much of it will kind of go back to old habits. But I have, I guess I'm optimistic that we might be able to develop slightly more responsive models of doing research, like Lifelines that you described seems like a perfect example, that if you can recontact participants and spin up a new study, you know, it shouldn't have to be a global pandemic to kind of respond in an agile format to answer new questions, or try to collect a new form of data that might help answer a question. Is that - I feel like you're probably thinking the same way at FIMM as well? --------- Dr Andrea Ganna: Yeah. So this is what we are working in [inaudible] very hard to create a recontacting platform - that's clearly the future. And, you know, the future will, will also be country-specific - you know, there is a lot of distrust in some countries towards the government and towards some of these research studies. So we hope in Finland where this is relatively low - there's a low distrust, and a high trust - that the response is going to be good, but you know, probably the most vulnerable population are the less likely ones, which is gonna further bias. --------- Patrick Short: I actually wanted to ask about that. I saw in some of the tweets that went out the other day, that you have quite a wide international representation in COVID-19 HG. So I think some, some of the Qatar genome project is submitting samples, you've got people all across Europe. Has it been part of the strategy deliberately to make sure that you have representation from across the globe? And, and I mean, like you mentioned, I think it's, if it's not managed actively, then it's really difficult to be representative. Is that part of one of the programs of work that you all have going is to kind of reach out to places where, you know, aren't, they're not fully represented in the, in the study at the moment and make sure that you have good coverage? --------- Dr Andrea Ganna: Yeah. More than reach out we kind of made very clear on every call and as much as possible that we pay for genotyping, if you need it. And so if you are in Luxembourg and you contact me because you need to genotype and I'm like, mm, maybe you can afford it. But you know, if you are a researcher from Egypt, Kenya and you reach out and you know that that's the case - I think almost every researcher that I, that, you know, really need to genotype got support from us or from Erasmus, or from Lumina or - so, so this free genotyping and free support also for DNA extraction, I think make a great difference in those countries. --------- Patrick Short: Yeah, that's great. And I think it, it just makes the results much more applicable, right? If you can, if you can be sure that it's not something that's isolated to one specific area of the world, but it's something that can be generalized, then it'll ultimately strengthen the science. But before we close out here, I was wondering if you're managing to find time for any of your other work. Are you focused a hundred percent right now on COVID-19 HG and related work? Or are you also continuing some of your other common complex disease and trait work in the odd hours? --------- Dr Andrea Ganna: No I'm not, I'm not a hundred percent. I mean, probably like 20% on COVID-19 HG - I it's it's surprisingly light. So I always set it up was relatively light organization. You know, mostly it's, I mean, there's now some more, more work for the [inaudible] and everything, but it's kind of, you know, automatic in a way - people submit summary statistics, fill an online form, everything gets automatically reported on the website. So it's pretty light organization. Yeah. I mean, I'm continuing my work. Is there anything particularly that you're interested in? --------- Patrick Short: Nothing in particular, I'd be interested to hear, what else is it that you're working on? If - don't have to give anything secret away if you have something that's unpublished or exciting, but I'm, I'm curious to hear. --------- Dr Andrea Ganna: Well I mean, I really like to put data together and kind of collectionist of data. So I, I'm very excited of this Finn registry project we've been pushing for a couple of years now, where we are putting together the, all the registries or at least all the largest nationwide registry in Finland into one place. And then we have similar project in Sweden as well. So I think it's, you know, the number of questions you can ask her is really unlimited. One example is [inaudible], she's working on ongoing selection. So lifetime reproductive success - how many children you have and so on only from an epidemiological perspective and, you know, the studies done until now maybe are like, you know, on a couple of diseases - we can do it on 2000 diseases, across 7 million people from Sweden and Finland, which have been followed for their entire reproductive life, because they were born between 56 and 77, and we can see how, you know, having any disease impact the number of children - to the childless - we see if your brother and sister has more children or not. So we can look at balancing selection, sex, differential selection, by just using family design. And then, you know, when we will have more exome sequencing data, we can also then look at constraint gene and selection [inaudible] and see if the immunological observation fit the genetic observation. But there's so many things. I mean, I'm coming from a lot of background in genetics, but there are so many things you can do just from, you know, registry data and epidemiology which I find very exciting right now which I'm trying to focus on. --------- Patrick Short: Yeah. So, so it sounds like you're, you may be shifting ever so slightly into wider and wider variety of data that you can analyze, right? Because one of the, one of the exciting parts, but also challenges of registry data is, is you then kind of end up with sometimes a data overload, right? Where you've got all of a sudden records from, I don't know how many millions of people, but going hundreds of years. And some of it's structured, some of it's unstructured and, you know, that's not even kind of taking into account genetic data you might generate. So you're, are you excited by kind of analyzing these multi-variate types of data and making sense of it on a kind of population scale? Is that is that an accurate summary? --------- Dr Andrea Ganna: Yeah, I think so. And, and combining genetics, so not just treating genetic alone or like, you know, there's a lot of people that like look at genetics and genetic explain [inaudible], but genetics is just one other piece of the puzzle, I would say. And, you know, and it's nice to look at that in the context of all these epidemiological data that we have. [inaudible.] --------- Patrick Short: Great. Well, maybe just to close off here, I'm, I'm conscious of time. I was wondering if you could just take a second to think about what you think is the most exciting unsolved problem in, in genetics or medicine right now, whether it's something you're working on or others are working on, what are you most, what are you most excited about or you think will have the biggest impact? It can be specific, or it can be kind of more general if you think, but I'm interested to hear what, what you're most excited about. --------- Dr Andrea Ganna: Yeah, that's a, that's a good question. I think personally, I think that we have been unbelievably successful in, in genetic of complex traits, especially of behavioral and cognitive traits. We don't know very little about the biology of cognition. I mean, we know how [inaudible] works and we know how, you know, FMRI scan, they showed some area of the brain lighting up. But with genetics, we can really now take tales of cognitive polygenic score or, or risk-taking polygenic score or other, you know behavioral traits. And there is a huge variation in the population, and look at those and try to understand the biology - maybe doing IPS work or, or functional follow-up of that. So I think bridging that, that behavioral genetic results with more biologists is I think something that interests me [inaudible] but yeah, that, there is a lot of other things in terms of diseases. But I think this is where you know, there's more fascinating things to be found. --------- Patrick Short: Yeah. Well, I think that the example you gave, I mean, it's such an incredibly complex area, right? How something as fundamental as genetics plays a role in behaviors, as complex as you know, risk-taking, or, you know, personality traits, those kinds of things. It's, there's an amazing and complex story to unravel between those. Well great, thanks. I really appreciate it. I think if people want to keep track of you, they can follow you. I know on Twitter, you're @andganna, A N D G A N N A, the COVID-19 Host Genetics Initiative is covid19hg.org. Is there anything else that you'd, you'd like to add? Are you hiring, are you looking for collaborators? Anything you'd like to shout out? --------- Dr Andrea Ganna: Yeah - we are hiring. Direct message me if you're interested in any of the things I spoke about and for the COVID-19 Austria initiative, we're certainly looking for more sample and more people getting involved so please reach out, and yeah - thank you, Patrick, for inviting me. --------- Patrick Short: Yeah, wonderful, thanks for being a part of it.