Patrick Short 0:03 Hi, everyone, w elcome to the genetics Podcast. I'm really excited to be here today with Dr. Jessica Kissinger, who is distinguished research professor at the Department of Genetics at the University of Georgia. We are going to talk about humans today but through the lens of parasites, Dr. Kissinger has worked her really the majority of her career across parasite genomics and evolutionary biology, and has built and led some of the leading databases in the space that helped to u nderstand parasite biology. So Dr. Kissinger, thanks so much, first of all, for taking the time and welcome to the podcast. Dr. Jessica Kissinger 0:34 Thanks, Patrick. I'm glad to be here. Patrick Short 0:36 I'd love to just start at the start what drew you to evolutionary biology and got you into this field in the first place? Dr. Jessica Kissinger 0:42 Oh, well, for evolutionary biology, I guess it would have to be the likies and anthropology sort of growing up and all the discoveries of hominid fossils and the Olduvai Gorge and thinking that all of that was Major, cool. And then I guess it's just sort of being in the right place at the right time, I happen to be working in doctors offices, doing insurance, billing, and I used to get the CDC Morbidity and Mortality Weekly Report. And I found that to be fascinating, really, oh, who was dying of lung were just an N at a time dating myself here. But at a time when there was one of the first major Ebola outbreaks that you know, was very widely teleseismic, the legionnaires was discovered, the AIDS crisis was speaking. And, you know, this is a time when we're coming off of second and third generation antibiotics and the thought that infectious diseases, we're just going to be wiped out a lot of focus on cancer instead. And so this really rattled, you know, the whole kind of medical world about like, the things could change. I mean, you know, here, we're coming off the COVID pandemic, and we're taking it for granted, and we watch how flu and the COVID virus are changing. But, you know, to think about infectious diseases is evolving back in the 80s, was not a thing. And I don't know, something just clicked in my head, and I'm just like, oh, this is neat, you know, there's gonna be this continual sort of, for lack of a better word, arms race, and, and I wanted to, I wanted to see it, and more importantly, I wanted to see it from the pathogen side, hadn't picked parasites at that time. It's just pathogens writ large. Patrick Short 2:25 Sothat's great. I think that's a really great framing of this, I'd love to hear maybe through the lens of a couple of examples for people who aren't as familiar with parasite evolution, and how different that is from human evolution, what what is the problem that we're facing? And what are the tools also that you and others in the in the field have built over the last couple of decades to go from this understanding that parasites evolve and do so rapidly in this arms race to to where we are today? Dr. Jessica Kissinger 2:51 Yeah, so in parasites, and I'll just add a little biological clarification here. When I say parasites, I'm talking about eukaryotic pathogens. So for the listeners, you might want to think about pathogens like plasmodium that causes malaria, or toxoplasmosis. If you've ever been in better pregnant women and don't change the cat litter. That would be a eukaryotic pathogen, Cryptosporidium causes a lot of waterborne outbreaks. So there eukaryotes like us. And while we're very used to hearing about resistance to drugs and bacteria, while resistance to drugs and pathogens occurs, as well, and the need to have drugs to cure these pathogens is also highly important. But they're a little bit more challenging at times to develop drugs for because they are eukaryotes, in their biology, like us, they're not bacterial or viral. And so you you have to worry a lot about off target effects actually, affecting us the host, as well. So, in in the look at, at parasite problems and in particular resistance. The first drug that probably most people have heard of that was used widely to treat malaria was Chloroquine highly effective, very cheap, used broadly. And resistance developed, likely more than once and quickly spread around the globe, you know, creating crises of how you can treat this very important disease that infects so many. And so actually, a lot of the initial studies that were done first was an incredibly important genetic cross, a historic cross made at the National Institutes of Health or Tom Williams group. Being able to actually carry out a lifecycle involves a mosquito bite of non human primates to get all the way through the lifecycle to do a cross to get some parasite progeny which were only in the dozens to try and map the location of the resistance gene. This was you know, I don't have the exact dates but like a 20 year undertaking, you know, that then actually required, the production was history marched on, I've having a genome sequence to actually be able to hone in on on the locus, and the area of the genome to be able to try and identify the gene that's responsible. Now, thinking about the evolution question that you the perspective on this is that the importance of understanding populations and diversity and populations has become huge. And so it's become very important to go out and sample and survey populations of the parasites. And I'll give the analogy into COVID. Right, we're looking for variants and genetic changes that are popping up globally. And so again, if there are areas in the world where you start seeing resistance to some of the later drugs that were developed, you know, or what is it? What are the polymorphisms associated with this? And so then you start looking for polymorphisms in these parasite genomes to begin to understand the origins or possible variants that can then be taken into the lab and studied further to see if they're associated with this resistance phenomenon. Patrick Short 5:59 And how advanced is this surveillance technology for lack of a better word for for the parasite community compared to what we're used to with COVID? Now and I'd also be interested in comparing that to pre COVID, because obviously, there's not quite a fair comparison, we've had a global pandemic, but what does the viral surveillance toolset look like compared to the parasite one and in your world? Dr. Jessica Kissinger 6:22 Well, there's definitely far greater resources and effort going into the the COVID surveillance that said, you know, very talented researchers in the community have been able to develop markers that allow sort of a barcoding. Think of it as a multilocus typing strategies for a number of the species that infect humans. Although, you know, even for some of the species that infect humans, we didn't have genome sequences until a few years ago. So very far behind the genome much larger than the viral genome or a bacterial genome still not huge, so about 25 million base pair genome in the case of plasmodium 60 million base pair in the case of toxoplasma. Because of the importance of malaria, there is a larger genetic toolset available there, both for population studies, gene expression studies, genetic systems, and some really new cool research that can actually let you do crosses now in Hue, have mice where they have humanised livers. And so that part's really cool. Other pathogens, the tool set smaller. So for toxoplasma is actually arguably one of the most successful pathogen parasites on the planet. So depending on the country, you live in 30 to 70% of the population is infected with it. And if you're immune competent, or you're not pregnant, for the most part, it goes to your brain makes a little system there it sits, and you know, you never know about it for the rest of your life. But the genetic testing, we still are missing global diversity. There are a few labs working on it in terms of having a really nice typing system, many people are working on it. But it's not something that's routinely done. In terms of outbreaks. It's so ubiquitous, that that doesn't happen. I'm deeply involved in a project right now to develop a set of multilocus typing markers for Cryptosporidium, and this is a waterborne pathogen. And the pathogen is so hard to isolate or separate from its host, we're using a technique called hybrid capture, where you can take total physical DNA that includes everything you ate your cells and the pathogen. And you can hybridise and pull the genome sequences out and enrich them and then sequence. So we're doing a global whole genome survey of this currently to try and find the most variable sites from as many samples as we can get. And then from that, we will develop a smaller set of multilocus typing markers that can then be used more broadly. But yeah, the parasites are lagging behind. Despite the fact I'll just add a factoid because maybe many people have not heard of Cryptosporidium. But the Bill and Melinda Gates Foundation did an amazing study. Now it's close to 15 years ago, looking at the major causes of diarrhoea, globally, especially in infants and young children. Cryptosporidium came in number two right behind rotavirus To everyone's surprise, is much more prominent and significant health pathogen than anyone knew. And so we're playing catch up. We still can't culture the pathogen. It's propagated in immunosuppressed mice or cows. And so it's challenging. Patrick Short 9:30 Maybe you could also just give us a general view of the number and variety of different pathogens that you do study because you have a, I think, a unique position of running some of the largest collaborative databases in the world that look at genome sequences from a wide variety of different parts of the of the tree of life. So maybe you could just give us an overview of what are we working with are the different parts of the world where certain pathogens are bigger problems. And then maybe we can dive into one or two of the examples. Dr. Jessica Kissinger 10:00 Sure, well, first I want to back up. So I think when we think about diversity, I think we're very a pistol biassed. So we'd look at a human and a fungus, or a frog in a chicken and efficient like, Oh, that's really different. But they're, you know, even down to the MeV. You know, we're on like one branch of the eukaryotic tree. And then plants, trees, you know, they're on another major branch. But there are depending on how you slice and dice it, at least a dozen dozen other major lineages of eukaryotes, the majority of which are protests, and many of which are free living. Find them in the oceans are out there swimming around algae be a typical example. A paramecium is ciliates. But you know, a lot of them are pathogens, and you find pathogens in trichomonas vaginalis, giardia and intestinal parasite, that trypanosomes they're in a kinetoplastid lineage related to you Glenn IDs, cause African sleeping sickness, Chagas disease, South America, those are vector transmitted, you can have schistosomiasis. In this case, as such, it's actually not a protest. It's it's a worm, but it enters into these resources we'll discuss in a minute. And then you can have there's a whole group, an entire phylum actually called AP complexa. And all of the protest in this phyla, including Cryptosporidium toxoplasma. plasmodium, as we've discussed so far are pathogens of humans and animals. I know I'm definitely leaving the mic, the microsporidia. They thought they were produce, it turns out now with sequencing their fungi. So all over the tree of life, they're parasitism, and the ability to cause harm has arisen, you know, the ability to take up a lifestyle living in or on someone else. It's not unique to produce, but there's a lot of them. So moving to the second part of your question, I've been very fortunate to be involved in an undertaking that is nothing that I ever said I would grow up and do. And that is being heavily involved in databases for pathogens. And they have an interesting story. But I think the story is important because it represents the philosophy of these resources as scientists. And to make a long story short, going back to the discovery of this blasted, it be complex and organisms that should not have been there, it appears to have been, we now know, secondary endosymbiosis of a red alga. And it many genes that were present, in that red alga, like happens in plants and any other endosymbiotic events of mitochondria are plastics were transferred to the nuclear genome. And so you want to go searching in the nuclear genome to see what genes are there and targeted back to the organelle. And I say that because one of the first projects that I worked on, in the laboratory of David Roos, at University of Pennsylvania, was to clone and sequence this AP class genome, annotate it, and I discovered it Are you finished your first genome and you know, this is like 90 in the 90s. And it was kind of done manually. You're real proud. And it was the most boring genome ever. And not as a geneticist, it's hard to say the genome is boring, but it was really boring, just had T RNAs, ribosomal genes, right? RNA polymerase, a clip protease and two unknown ORFs. That was it, which meant everything interesting had to have been transferred to the nuclear genome, well, then you can think quite logically of strategies. And already independently, two labs had found, you know, at least one or two candidate genes that would have an extension. So it's kind of like a ticketing system, you have to have a signal peptide, and then a transit peptide that gets you the protein targeted to the right organelle within the cell, where it can carry out its function. And then the clip protease that was present in here would cut that off, and you'd have a mature protein that can do its job. So we wanted to go looking for those. But these are very early genome days, right? So right human genome just came out, what was it? Was it 2099? Somewhere somewhere in there, right. So the malaria genome was a very important genome, I think it costs $35 million. In total, many years, it was split up the old fashioned way by chromosome at different sequencing centres around the world, much as the human genome project originally started. And the data were sitting on FTP sites scattered around the world. So here's when you reach a point in your scientific career when you need to learn to do something that you don't know how to do and I had to learn to programme but to Cold Spring harbour took an amazing class with Lincoln Stein, kind of four by five biologists for biologists How to programme in Perl now everything's in Python, or Java, Patrick Short 14:44 but in any case, that's what I learned in actually Perl. Dr. Jessica Kissinger 14:48 And so, you know, it's powerful, you can make some websites you can make some scripts and I and another postdoc in the lab who had better programming skills than I we did a number of very simple stick things we just, we blasted all of these sequences, you know, like blast X search, it made an index of all the hits that we found. And their start sites, we were looking for proteins that had an internal extension, and we were looking things that had homology to plants. And so if the best blast it was to a plant, alright, and this is a sequence similarity search, for those who aren't familiar with blast, and so we just made this cute little website where you could go in and do keyword searches and look for N terminal extensions. And, and this was building on prior work that had done a lot of thing on these ancient things called ESGs expressed sequence tags now, people will think of it as RNA seek. So there was a, you know, data available, there were ESDs, and talkshoe. But there were genome for plasmodium, still more than 100 million years separated, but the closest organism we had, so we started mining it and it was really powerful, we were able to discover a lot of putative targets, I was able to work with a computer science student and faculty member at Penn to develop a neural net, that could identify these things and tell you which proteins were likely targeted back to the organelle. So was a very exciting time considering it was like the early 2000s At this point, 1999. And long story short, you know, present our findings at a meeting, all happy about the evolutionary story and trying to be able to figure out the metabolism that was being carried out in this plastid. And then malaria researchers are like, wait, wait, wait, wait, wait, you can mine the plasmodium genome, you know, like you there's tools to do this. And, and hence, the database that is now known as plasma DB was born. And I don't want to go through like sort of all the history of it, but I raise it, as I started in my introduction as a scientist, because it was a tool developed to meet a scientific need. And the tools that were developed were designed with answering a question in mind, and they were designed with mining data, not downloading it, not browsing it but mining it. And this was a philosophy. So as the database has continued to grow. When I started my independent career here at UGA, I built databases for toxoplasma. And I started out for talks about cryptosporidium and for Japan, Osama Cruzi. And then Wellcome Trust NIH, a lot of funders got involved, and many communities wanted and right around what was in the early 2000s, the anthrax scare happened. And NIH realised that the need to have centralised information on pathogens was really important. And they created a programme called the bioinformatics resource centres to handle bacterial viral and eukaryotic pathogens to collect all of this data. And so we put in a proposal and we won. We the collective, it was a very large group that still believed institution was Pennsylvania, I was at Georgia, and it's grown to many other institutions since. And we now have 700, and some genome sequences for our nearly 200 Different organisms. But to say genome sequences is an understatement. We have an additional 2500 ish data sets, RNA seek proteomics, metabolomics, even some human immune response data with the idea that again, that the data should be mine. So they're integrated, truly integrated. And they're processed through the same pipelines such that they're harmonised so that you can now go in researchers from any of the fields where the data are available. And actually ask questions. I think we've all had those conversations at meetings of, oh, I just wish I could find all of the genes or the proteins that like we're on at this time, or what changed in response to that drug, or, you know, if we could stop development here, or if we could keep it from being infectious there, you know. And so if you start tinkering, just as a biologist like, well, what characteristics would they have? Right? Would they need to be on this time? Well, then you'd need to look at RNA seek data, if you want to know that the protein was made. If it's a protein coding gene, well, then you'd want to look at proteomics to confirm that it was made. If you want to know if that genes under selection, you'd want to look at population data nonsynonymous to synonymous ratios. And so you know, or you want to know, you know, if you know this drug is altering this pathway, you want to look at the metabolites, right and see what's going on. Imagine, right in an environment where you can do that with the pointing and clicking with the same kind of ease that you might go to a travel website and say, I need to fly from here to here. And I want to change at this hub, and I don't want to pay more than this one. And so when the data are integrated, you can do that. Right. And so it's been a true pleasure to be involved with these reasons. Horses and watch them grow and watch them facilitate the community carry out their research. Patrick Short 20:06 There's been a similar parallel track in human genetics. And we had Daniel McArthur on a previous episode, who led one of the major human exome aggregation consortium and similar type of approach one resource where anybody can go and ask a question. And you I was thinking that as you were talking through this, and I was interested to understand, from your perspective, if you if you were to be able to grow the resource on only one axis, whether it's a greater diversity of samples, or a greater number, or a greater type of omics, whether it's RNA proteomics, what's the most useful vector that you'd think about growing that resource? Is it scale? Is it variety? Is it type of different datasets that are linked together? where's the where's the bottleneck right now? Dr. Jessica Kissinger 20:52 Oh, that's tough, because it kind of depends on the pathogen. Some definitely needs scale. Others need new hard to get datasets like you realise, if you just had this data set, you could make a link from that data set to that dataset and really have some insight overall, where would I grow? Um, well, this is gonna sound selfish, but I think just the resources, I think that the skill and the talent required to harmonise all of these data to clean them to make them trustworthy for for use, right for folks to be able to plan their experiments on them. It's a huge undertaking. I mean, I just bought a spokesman for a team of nearly 60 people behind this resource. And there's always more data, more data, more data, more data, and some things you can definitely do better and more smartly. And with new technologies, you can get more bang for your buck. But the cleaning aspect, the aspect of ontologically, marking them up so that despite whatever organism you're looking at, you're comparing apples to apples and oranges to oranges, regardless of what that community calls that gene. Because we're really big proponents of leveraging Orthology because so many organisms are missing data types. If you can fill it in via knowledge of an orthologous gene and a related organism, you want to do that, that that's a bottleneck right now, that's a huge bottleneck for us. People haven't solved how to, in an automated way, truly identify an individual datum, and clearly say what it is, I mean, even across the world, the definition of what constitutes a fever, clinically, is different. Right. So like, you know, just imagine in trying to agree on the vocabularies and the terms and the semantics of the data that we want to integrate, because, sorry, you got me on a roll now. But if, if you could do this, right, think of the possibilities. Think of the ecosystem that could be created if pathogen datasets could talk with human datasets, and even better, and this is the axis that is growing, is looking at the host pathogen interactions simultaneously, a lot of in vitro, some in vivo. So you can actually begin to understand that dynamic and the interplay that's going on during the course of an infection. What is that kind of internal arms race? What does the pathogen do first? How does the host respond? You know, what is it going on during the course of something, you know, that that's amazing, and But historically, right, the communities are very divided, the human people and the clinicians do this. And you know, the, the basic researchers working on parasite to do that, and to think about the power of bringing the data together, not necessarily in one resource, because that's not possible. There's too much of it, but having the tools that let you go and ask those queries dynamically, because datasets are self describing, because the datasets can announce what they are and how you access them, and what vocabulary or ontology they're using. I imagined that world I mean, like, wow, Patrick Short 24:08 you know, I couldn't agree more. And how does data sharing work in a community? Is it? Is it a collaborative community? Is there a tension? I know in human genetics, there's always a tension between collaboration, but also there's a degree of competition as well, where people don't always readily share because it's, you know, it's is a competitive aspect to publishing and commercialization, how does it work in your community? And what are the what are the barriers? How does it where does it work? Well, where do you see opportunities for improvement? Dr. Jessica Kissinger 24:41 I'll have to say communities plural. And they we literally kind of added communities one at a time, a very first community was toxoplasma plasmodium, and it just kind of kept moving on. I think in the beginning. Yeah, there definitely was hesitancy or you may remember in the early As of even the human genome, there were a lot of discussions and a lot of records about who owned what and when it had to be released and who had the right to do what with it before a paper was published or not. I'm not going to rehash all of that history. But needless to say, other communities had the same concerns and went through that. I think, what we have learned what I've witnessed time and time, again, with more than a dozen communities now is, at first, there's a little hesitancy, there's like, you want to do what with our data, you know, like, can we be trusted. And, and I'm really proud of that our team is really proud of that. But we're honest brokers were viewed as honest brokers, nothing goes live without the person giving their sign off on the dataset. And that is represented correctly within the resource. But yes, the hesitancy at first, but then once a dataset is shared, and it's just kind of fun, I shouldn't say fun. But it is fun. It's wonderful to watch the first data sets shared, the community is so thankful the gratitude that the researcher received from sharing something often pre publication, more than made up for it. And we saw then other members of the community going wait, wait, I have data to, and you go from asking to data to having so much data that you're running months behind and getting it loaded. And I've watched that cycle repeat itself again, and again. And again. And I think that we're kind of in a day and age, and if if COVID didn't prove it to us, I don't know what will when you share the data, and you can have it in in real time, and you can actually be able to try and make some decisions, you know, that don't take five years, you know, to be able to see, it's powerful, it's really powerful. And so I'm really hopeful that the tides are turning. But that said, you know, the hurdles to sharing data are real, right? And the funders in the end, and the publishers, they're all rightly so urging everyone to share data and make them available. But we haven't really found that right mechanism of standards and platforms and ways. Yet, it's still pretty hard to share data in a way that it can be used. Anybody can dump data on a server somewhere. And knowing that you have a sequence similar to that sequence is great. But if you don't know that that sequence came from this patient or this pathogen, and this year in that country, it's useless. And so the metadata about the data, right or key, right, and that takes time, right. And I think that's undervalued currently, the amount of time and effort it takes to collect and share that data. Patrick Short 27:36 Yes, and it makes so much sense to do it the way you're doing it. Because if one group or a couple of groups can do that for the community, then you save that efficiency of everybody completely reinventing the wheel and spending those untold hours, cleaning the data describing it. And if you can just get everybody to cooperate on some of those most basic data hygiene aspects, then it's, it saves everybody so much time and makes everything so much faster. Dr. Jessica Kissinger 28:05 It does. And I'm glad I mean, even education, I mean, I work at a university. I mean, you know, at least in graduate school, now, I know the students really have to take a data science and data literacy class. I mean, I think it's just essential, you know, as well as learning some programming or at a minimum are in some scripting, I think we all have to be as comfortable in front of the computer as the bench now. And that's not to say everybody needs to make software or learn to programme, but we are all going to be dealing with data and getting more savvy with that, you know, it takes time. Patrick Short 28:38 I want to as we close out here just take a slightly different direction. You mentioned prior to the call to me that you had recently received some personal genetic news that I imagine is impacted the way you think about um, obviously your you know, your personal life, but also professional life seeing it through a different light, it'd be great if you could share a little bit more about that diagnosis, what it's meant for you and how it's changed the way you maybe see the last last decades of your life. Dr. Jessica Kissinger 29:04 Yeah. So long story short, I had to get a hip replacement. And I guess I'm considered a little bit young to need a hip replacement. And I've been having hip troubles for a long time and I went to see a specialist and specialist said, hey, when we were leaving, he wrote down this word name of a syndrome, Ehlers Danlos on a piece of paper, and she says, I want you to look this up. So I go home and I get a Google or whatever, and I look I looked things up and, and I'm like, Wow, that sounds like me. And for those who haven't heard of this syndrome, it's it's one it's there's three categories of this syndrome from very severe to what's called a hypermobility type. There are a series of it's a syndrome because it can be mutations in like at least a dozen or more genes, when they're still trying to find more of them but often related to collagen. So your ligaments and people are very flexible, those double jointed people. I'm one of them. I'm so in any case, I'm like, Oh my God, that sounds like me. So I then had the adventure of going to a hospital to the paediatrics unit. This is all just two years ago to be tested for genetic testing because they normally even test infants and and, you know, you go through all these phenotype measurements was measured in very weird ways, ratios of bones and this and that to all things in and they were like, yes, you you know phenotypically have Ehlers Danlos. That means high hypermobility in the category three, but you have symptoms, also of type two, which has to do with the scarring and your skin and things of this nature. And so they said, We wanted genetic testing. And so I'm like, Oh, that's cool. I'm a geneticist. I like this. So they did the testing. And it came back that I had a variant of unknown significance. I never knew that word before Wii U. S. I'm not a human geneticists, right. So it came back. It's very in an abundance significance at first, but they were very excited about it, because they said, we have another patient that has the same variant of unknown significance that also has the disease. So they were like, can we test your parents, we want to test your parents. And so they tested my parents, my parents agreed this was all quite the family genetics exercise, and they didn't have it, I'm a spontaneous mutation, and have the phenotype. So that changed it from a variants of unknown significance to one of significance. And it's, it's a splicing, it will affect splicing. So it's not in the exome. It's six nucleotides out. But you know, this collagen jeans and my collagen five, a one gene, I guess, but yeah, so I'm probably Haplo insufficient, you know, haven't done the RNA studies to prove it. But in any case, I have for those who know me, I am always in a cast in a brace I have I've broken both ankles. I've now had my hip replacement I on crutches, wrist braces, you name it, I have been a klutz. My father used to talk about keeping me in a little padded cotton box, I hurt myself so much. And now I know why. There is a reason I literally can't hold my bones together. It was relieving, you know, I can't do anything about it. But I'm glad that I could contribute the second variant of unknown significance. And again, is the power of sharing data. Right? Patrick Short 32:18 Yeah. So I was gonna say what a great example of the power of data sharing you've you've literally made, it helped to make a profound discovery you for yourself, and that other person and others, others later down the line that may have the same result. Well, thank you. I think that's a really great note to end on. And it does tie the data sharing and, and data mining story in really nicely. I first of all, just want to say thank you for taking the time to be part of the podcast, I learned a lot as you also notice, we don't have very many non human genetics. So it's great for me to learn a little bit more about something I know very little about. And yes, thanks again. And if you're if people want to reach you, they can visit your website, type in Dr. Jessica Kissinger into Google and it'll come up. Is there anything that you'd like to plug in? You're hiring smart data scientists and scientists to work in your team? I imagine always to help solve some of these problems. Always. Always. Always y'all are. Great. Well, thank you and And thanks, everyone, for listening. We appreciate you taking the time to tune in every two weeks. If you have any feedback, then don't hesitate to let us know and if you liked the episode, don't be afraid to share on social media or share directly with a friend. Thanks again and we'll see you next time. Transcribed by https://otter.ai