Champions
of Data + AI

データドリブンな革新を推進するリーダー

EPISODE 14

Representation Matters in Data + AI

Organizations worldwide are focusing on greater diversity, equality and inclusion (DEI) in the workplace. But what about ensuring DEI within data? In this episode, Jeffrey Reid from Regeneron joins us to explore the struggles facing the genetics and healthcare industry when it comes to representation in data and the adverse impact it can have on AI algorithms if left unaddressed. Jeff also dives into his experience as a member of the LGBTQ+ community and how it influences his approach to being more inclusive with data in genomics.

headshot
Jeffrey Reid
VP, Chief Data Officer, Regeneron Genetics Center
Dr. Reid is the Chief Data Officer at the Regeneron Genetics Center where he leads a team developing and applying novel large-scale computational analysis tools, systems and methods to produce and analyze large genomic data sets with the goal of making precision medicine a reality. His primary focus is on maximizing the impact that the integration of EHR-derived phenotypes and genomic data can have in providing biological insights to drive drug discovery and improve patient outcomes. Dr. Reid has worked in all aspects of large-scale genomic sequence data production and analysis, and using his background in computational physics, has been an evangelist for cloud computing in genomics and the thoughtful application of data science techniques to next-generation sequencing problems. Dr. Reid received his Ph.D. in physics from The University of Washington and his bachelor’s degree from Harvey Mudd College. He lives in Stamford, Connecticut, with his husband Jim and three cats Sabrina, Lyndon and Emile.

Read Interview

Alex Mysak:
Welcome to episode 14 in the Champions of Data and AI series. I’m your host, Alex Mysak. And I’m joined today by our guest Jeffrey Reid, VP and Chief Data Officer at Regeneron Genetics Center. Organizations worldwide are increasing the focus on ensuring diversity, equality, and inclusion in the workplace. But what about ensuring DE&I within data? In this episode, we explore the struggles facing the genetics and healthcare industry when it comes to representation in data and the adverse impact it can have in AI algorithms when this is not addressed. Jeff and I will also dive into his experience as a member of the LGBTQ community and how it’s influencing his approach on being more inclusive with data in genomics. Let’s get started.

Alex Mysak:
Jeff, so in prepping for this session with you, I think we found that we shared a couple of mutual passions. One of those is diversity and inclusion, which we’ll get to later in the session, but the other, perhaps even more enjoyably was Dr. Who.

Jeffery Reid:
Absolutely.

Alex Mysak:
So as a Brit, in order to warm our audience here up a little bit, I thought we’d start with a Dr. Who pop trivia quiz, and maybe some of your preferences around the series.

Jeffery Reid:
I love it. Let’s do it.

Alex Mysak:
All right. So that quite easy questions, but we’ll see how we do. And how many doctors have there been and which would be your favorite?

Jeffery Reid:
Well, the how many doctors have there been conversation, I think is a little controversial canonically I think there’s 13, but I don’t count the weird movie that was made between 89 and 2005. You know, so personally I think that’s Peter McGahn, but I wouldn’t go there. That said my favorite doctor is Tom Baker. He’s the absolute best doctor ever. There’s no question.

Alex Mysak:
Yeah. Well, I was brought up on days Sebesta McCoy era, so he might be my favorite. And so one thing I learned in doing this trivia quiz, for you is that Tardis is in fact an acronym. Do you know what Tardis stands for?

Jeffery Reid:
Time and relative dimension in space.

Alex Mysak:
Awesome I knew I had a real fan on my hands.

Jeffery Reid:
This is something that really appealed to me when I was young. Right. I was watching this in high school and junior high. And just this idea that something’s bigger on the inside than the outside. That’s the first, I mean it’s not, I wouldn’t say it’s not used frequently in Sci-Fi, but certainly, just as an idea, that was where I first encountered it. And I always found that sort of concept fascinating of like a pocket dimension or something, and then that great episode where they, you find out that the engine of the Tardis is essentially a collapsing sun. It’s super cool.

Alex Mysak:
So the other thing I like about you is that you’re really into dentification. And so I noticed, in addition to your PhD that you’ve learned in physics, you’ve also according to Google scholars, your work’s been cited almost 98,000 times. So how does that make you feel?

Jeffery Reid:
I mean, it’s a little surreal because it’s kind of hard to wrap your head around that number, but, certainly if you had told a very young me that as a scientist, I would have contributed to this work that had been cited, almost a hundred thousand times I would have been really excited because this is really, science is something I’ve always been passionate about. And so, yeah, I mean, it feels really good. It feels like I’m doing what I should be doing, which I think, which is a nice feeling. And certainly look forward to, the work in the future. That’s going to continue to have impact with people.

Alex Mysak:
So Jeff, you’ve been openly gay in the workplace since the mid 1990s. And I know when we were speaking, you shared with me that coming out to the workplace did post some challenges for you professionally triggering that move from academia into pharmaceuticals. Could you share with our viewers a little bit about that journey and, your professional journey as an openly gay man?

Jeffery Reid:
Yeah, absolutely. You know, I came out when I was in graduate school and it was very difficult for me just because I was raised in a relatively conservative Mormon family. But once I sort of got over the kind of coming out to my family piece, I really was quite at peace with it. It was very, very out as a graduate student and didn’t, my thesis advisor, he was relatively tough person to work with, but he wasn’t like a problem in terms of my being gay. But then when I, when I went and did a postdoc in Houston, it was a great place to be, to do the genetics that I do now. And I learned an enormous amount. And I met my husband there, which was not something I was really expecting, but yeah, working in a pretty conservative environment in a pretty conservative state, I’d found that there were some people on the faculty who, they would say things to colleagues who were friends of my like, oh, you know, Jeff’s a little too gay, which, it’s like, it’s not like I hid this.

Jeffery Reid:
It Isn’t like I cannot do this. You know, that’s the other thing I sort of am myself in spite of myself, even if I tried to not be myself, I wouldn’t be, so it shouldn’t come as a surprise to them, but then at one point there was this one anecdote that I’ve told before where, I was in a meeting with a possible funder for a startup company that was spinning out of one of the labs at the institution I was at. And we’re just having this casual conversation about our spouses. And one of the guys at the table had just got married. And so this investor, prospective investor turned to me and said, well, you know, are you married? And you know, I’m me and I wanted to just respond to that question authentically. So I said, well, you know, my husband and I call him my husband.

Jeffery Reid:
We bought a house together. We formed a variety of legal relationships that tried to mimic what one could get with a marriage, which was not available at the time in Texas. And I heard about it later from my boss that it’s like, that’s, that’s not appropriate. You shouldn’t bring this sort of political discussion. And so it was just like that was one of the things where I was like, this is not an environment where I can really fully feel comfortable being myself. And there were some things about funding and academia that I found frustrating on top of that. And so a move seemed really, really good.

Jeffery Reid:
And this opportunity at Regeneron came up. And when I went to them, I think the HR team just about fainted, because I said, I need to meet an out gay person who works here, who can like vouch for this place as an institution. And it took them a few minutes to sort of figure out how to do that, but they responded to it really, really well. And out of that experience, we started an employee interest group for LGBT employees at Regeneron. And since then, Regeneron has grown in our DEI practice a lot. And we now have a chief diversity officer and are just doing a lot in that space, both internally in the workplace, as well as with our science and broader society.

Alex Mysak:
Yeah. It’s actually been a big part of my journey, not as somebody gay, but as an ally because I’ve worked in financial services for the first 20 years of my career. And so it’s something that became increasingly acceptable. And I have to congratulate, I worked for Goldman Sachs at the time who always had a very forward progressive policy, I would say at least in the last 10 to 15 years on this topic. So I’m somewhat sorry that you had that experience, but obviously it led to a path in life that you would never have otherwise had. Thank you for sharing that with us.

Jeffery Reid:
Did you ever experience gender discrimination? I mean, it can be tough to be a woman in finance, I’m sure. Right.
Alex Misaka:
People ask that a lot. And I actually never felt that I did at Goldman Sachs and in finance in general, I had the fortune to work for two companies Goldman and then Standard Chartered Bank who, where DNI is, but part of the fabric of the company. So not really, but what I would say is in the example that you described as being too gay, a woman that looks younger than maybe his status has earned her means that you maybe have to work harder for period of your career. So it’s that might just be life, but that would be the only thing that I think has made it a little bit tougher, but obviously, hopefully everyone, my Stripe C6. Suggest the work that you and Regeneron are helping the world with as we navigate this health crisis globally has been one area that we’ve spent a lot of time speaking to you about in the past, but there is another element of work that I know that you’re incredibly emphatic about. And so could you share a little bit more about the role that you’re playing in DNI related to genomic analysis and its importance?

Jeffery Reid:
Yeah so one of the things that I find really fascinating about genetics is it does really relate to this concept of diversity. Fundamentally the way you can quantify the diversity of a population, one of the ways, is to look at the genetic diversity of that population and for better or for worse we’ve associated certain genetic variants or certain patterns of genetic variance with certain ancestries relating to certain places. So we say there’s European ancestry, individuals, that’s me there’s African ancestry individuals. And what we’ve seen is scientifically we know that there is variation that is enriched in some of these populations and that is not present as frequently in other populations. So if we want to maximize the scientific opportunity, particularly around making discoveries, as they relate to rare variants, which is one of the things that’s extremely important to us, because we’re looking for these functional consequential variants that can give us an insight in what happens when a gene function is broken or when a gene function is kind of over amplified, because we would like to try to mimic that with an antibody that can be an effective therapeutic.

Jeffery Reid:
And so, we came pretty early at the Regeneron Genomic Center to a strategy where we wanted to have as diverse a sequenced population in our database as possible, and have really, really worked from the very beginning to try to make that happen, which is difficult because unfortunately there just aren’t as many cohorts of non-European and ancestry individuals that had been collected and created and followed by investigators all over the world. But we’ve partnered with researchers in Taiwan with researchers in Mexico, really researchers all over the world that have built one of the more ancestrally, diverse populations of genetically sequenced individuals in the world and are using of that to try to drive the discoveries that we’re making. So we’re continuing in that vein, but it really is from a pure scientific genetics perspective. It’s the right thing to do, to try to maximize your ability to make these discoveries by looking essentially at the whole monopoly of human variation,

Alex Mysak:
That’s truly inspiring, I guess, what all of the viewers here, would love to hear what are some of the significant biases that you’re looking to try and remove from the data as a result of this word?

Jeffery Reid:
Yeah one of the things that I find a little frustrating is people will look at genetic associations and they will presume that a genetic association result must relate to biology. So a perfect example is, Africans ancestry individuals in US culture have been essentially discriminated against since the beginning of the founding of our country. And because of that, there are socioeconomic factors that correlate with the genetics of African ancestry in the US. And it has nothing to do with biology. It has everything to do with the fact that over the history of our country, if your skin is a certain color, you were treated a certain way. You were given less access to resources. You wouldn’t be given a loan, all of these things that then when you start looking just in a completely naive way at genetic associations, particularly with socioeconomic factors, you have this huge confounder of societal impact.

Jeffery Reid:
And so to me, that’s one really important thing to have front and center in mind, if you are finding associations that appear to be correlating with socioeconomic status, then you need to take that into consideration in your biological interpretation. You know, a great example of this is I saw a study a while back that suggested that the rates of cancer were higher in individuals who worked off shift work, basically people who worked overnight, but does that mean that working at night gives you cancer? No, actually what that means is people who are less socioeconomically advantage tend to be doing that work because that may be the only job that they can get and having less money, having less access to care, having less ability to access good food and really the sort of virtues of working a day shift is more likely what’s causing that than some biological factors.

Jeffery Reid:
And so trying to figure out how to understand these nuanced cultural factors that present themselves through the lens of genetics, when you look at genetics is something I think we have to be very careful with. And it concerns me a bit in the context of AI and ML, where you can take all of this aggregative information and it can be very difficult to trace back to sort of where a particular insight is coming from. And we know that there’s this kind of bias in bias out feature. If you have a bias training set, the model will learn that bias. And if you’re not aware of it, you may be reaching the wrong conclusions from the data that you have. And so, trying to figure out how to take these nuances of the data and capture them in the models, or at least have an awareness of how the models may be biased.

Jeffery Reid:
So, you can capture them in terms of your interpretation of the outputs of the models. I think that it’s a very difficult problem. And it’s one that frankly, I think we’re dealing with right now today, if you look at polygenic risk scores that are composites of risk for a disease, over the whole of your genetics, there’s some evidence that if you train a polygenic risk score on European ancestry, individuals, it’s not as applicable to people of African ancestry which shouldn’t be that surprising because if you don’t use that data to train the model, how is it going to learn? So, yeah, I think this is a big challenge and not just in genetics, in healthcare in general.

Alex Mysak:
So Jeff, in addition to a bias, I would imagine the other major challenge that you deal with is scale. I was, once lucky enough to meet one of the ministers in India that was in charge of their biometrics project for a billion Indians. And it was just mind blowing the kind of scale, which is the exact application that you’re talking about here or similar. Can you talk to us about an example of how this scale is helping you on projects or in Regeneron in general at solve problem?

Jeffery Reid:
Yeah. So what one of the key successes or one of the key drivers of the success of the Regeneron Genetics Center, I would say is Regeneron is a longstanding commitment to automation. So, Regeneron as a company probably makes, well, maybe not the most in the world, but makes more transgenic mice than just about anyone else.

Alex Mysak:
Transgenic Mice?

Jeffery Reid:
Yes. So what we do is we take the mouse and we cut out certain parts of its genome. And we put in some other parts and this lets us model human diseases in mice, which is really important because if you have a genetic insight in a human, you’d like to see that you can do experiments on a mouse that can validate that insight. As well as very interestingly, we actually fully humanized the immune system of the mouse. So when we have an antibody that we are working on creating to make a therapeutic, once we’re done with the mouse modeling, we actually have fully human monoclonal antibodies. So we don’t have to do this additional step of humanizing, a mouse antibody after we’ve sort of finished with that model at work. So that’s been very important to the success of Regeneron. And it’s really useful for the Regeneron Genomic Center as a genetic discovery engine to have access to the validation platforms of these transgenic mice.

Jeffery Reid:
But just the idea that automating things at scale is something that Regeneron knows how to think about and knows how to do. That’s what really led us to, to audaciously think that we could create this very large-scale sequencing effort where, we’re sequencing at a pace that a half a million samples a year or something like that. And that’s incredibly important for, previously we talked a little bit about that idea of using rare variation as a mechanism for discovering more about human biology. Well, rare variation is rare. So if you want to see a lot of rare variation, you have to sequence a lot of people and then you have to analyze all that, right? So if you have, half a million people and they each have millions and millions of genetic variance, now all of a sudden you’re into very, very large data sets.

Jeffery Reid:
And then we bring in phenotype information. Information about people’s biology and their disease state. And so, one of the things that one of the ways we tackle that scale problem is we try to precompute things. So instead of trying to think of a problem and or a question, and just ask that one question, what we do is we create these high throughput, scalable systems to sort of calculate every possible association between every genetic variant and every possible phenotype or disease state. And then it’s just a matter of querying that very large data set, but that has, hundreds of billions to trillions of individual cells of information. And that’s difficult to attack without some serious attention to the scalability of the technology.

Alex Mysak:
Indeed, and that’s one of the use cases that we love to see the pharmaceutical industry using us for. It really sounds like you’ve got everything covered and that you’ve thought of everything, but I’m sure that you have some strong views on what you think the gaps are, where you think that whether it’s your organization, your peers, the academic community, what are those gaps where you’d like to see focus related to DNI and genetics, biology, and medicine?

Jeffery Reid:
I mean I would say generally one of the things that I worry a lot about particularly coming out of academia is I think our training programs tend to reinforce the cultural biases that are out there. So they’re growing up as a young gay closeted kid, it’s not like there were all these gay scientists out there who I could just look at it and go, that’s the kind of thing that I want to be. I mean, I did have some awareness that, there was some sort of queerness in Alan Turing’s background and he’s sort of one of my personal scientific heroes, but so just making it clear to people that they can do this, people like them do this, whether they’re a poor kid or whether they’re in rural Kentucky or in Africa, or a young woman who’s growing up really interested in science, maybe in the fifties or the sixties would not have gotten the message, that it would be a great idea for her to become a scientist.

Jeffery Reid:
And I’ve heard some just really horrifying stories from some of my colleagues, particularly when I was back at Baylor who had been in science for a really long time about a woman who is one of the technical leads, walks in the conference room. And they asked her to get coffee in the seventies because they just don’t even think that she’s a woman. Of course, she’s here to get coffee. She’s not. So I think that representation within the training programs really, really matters. It’s, it’s sort of the pipeline problem as people talk about in DEI, we’re not going to have real equity and parody in terms of the opportunity for diverse scientists to make discoveries. If we don’t have training programs that bring diverse scientists forward. And I think again, in genetics, it’s so important scientifically to be able to utilize these ancestrally diverse datasets and talking to those communities, getting the information from those diverse communities.

Jeffery Reid:
That’s something that I think would really be best done with scientists who are from those communities. And so I think an existential threat to healthcare equity because so much data is pouring in so much, frankly, genetics is going to be included in the future of healthcare and the interpretation of the genome and risk and disease outcome. We really have to work on this lack of diversity within the sciences, particularly in healthcare, because if we don’t, we’re not going to be able to gather the data, talk to the people, get the information together that we need to like fix that in equity. And we’re on the cusp of this transition to a much more personalized version of healthcare. And it seems somewhat cruel that that personalized version of healthcare, frankly, may not be as useful or accessible to people who are kind of outside of the sort of European ancestry kind of Western wealthy first world countries.

Alex Mysak:
So Jeff thank you for that answer. When you’re talking about scale of data, it makes me very mindful as to a couple of peers of yours that we had on prior sessions, Dan Jeavons, who’s the VP of computational science search Shell said that in terms of data more is not always more. And then in one of the last sessions we had Warren Breakstone, who is chief product officer of S and P market intelligence, data management solutions business. And he was talking to us about vigorous data quality and sourcing practices that S and P go through to be a responsible data provider globally. It definitely jived with a comment that you made to me, which is that, although AI is great, the data is critical and the organizations need to focus on being there for more data centric than AI centric. Could you please explain what you mean by that view and why that distinction is important for organizations to internalize?

Jeffery Reid:
Absolutely. This is something that I occasionally might get a little worked up about that. I think that just to start, I think the easiest way to describe it is if we expect computers to learn from data, just like we expect people to learn from data, we should expect that that data should be right. It should be correct. It should be understandable. The cleaner, the more sensible, the more understandable it is, the easier it is for a computer to learn from it, the easier it is for a person to learn from. So I don’t really see this as any different than people learning from data. It’s just computers do it a little bit differently and maybe a little bit faster in some ways and slower than others. And so, if you have a big pile of dirty, messy data and you just sort of throw it in and you hope that you’re going to get some really great insights out, it’s just, it’s not sensible, right?

Jeffery Reid:
Like, yes, there are algorithms that can extract insights from unclean or dirty data sources. But particularly in healthcare, if you look at electronic medical records, for example, one healthcare system may use the codes that they have for encoding what patient diseases are, what patient outcomes are. They may use those codes differently than another organization. One doctor, one of my favorite anecdotes is I have this weird in remission now auto-immune disease and anchor related vasculitis. And so one of the things that that puts me at risk for is problems in the eye. So is that my optometrist and he was coding incorrectly this, autoimmune disease I had because he was coding it as, at the time it was called, it’s got a different name now, but at the time it was called Wegener’s granulomatosis is Wegener’s granulomatosis with renal involvement, but I never really had renal involvement.

Jeffery Reid:
So I leaned over and I said, no, that’s wrong. You should code it this way. And if you want to understand the barriers to fixing healthcare data, go ahead and correct your doctor on how he’s coding you because he got a little thoroughly and kind of told me it was none of my business, how he decided to code my data, which just blew my mind, right? Because the way that we capture and utilize healthcare data, it is not data that is captured and utilized primarily for the intent for research and care. It is unfortunately in the US anyway, primarily captured in utilized for billing. And so by just taking all that data and sort of jamming it together and assuming that the AI algorithm can sort it out we are giving the algorithms a misunderstanding of what, the way the world works is because the data isn’t clean, it’s not harmonious, and it doesn’t actually capture all the things that we want to capture it.

Jeffery Reid:
And some of it’s incorrect. So we spend, we have an entire clinical informatics team. Michael Cantor is leads that for us. And he’s really just one of the world’s experts in this trying to look across a whole bunch of different healthcare systems, trying to clean and harmonize that data. So you can get it ready for either computers or people to learn from it. But somehow the idea that you have a cool new whizzbang AI algorithm, and it doesn’t matter what the data is. It’s going to like get all the insights out of the data. That sounds an awful lot, like a pitch from a Silicon valley startup that would really, really like somebody to give them money so they can generate some data. And I sometimes a bit cheekily point out that the people who focus on the algorithms tend to be the people with the smallest data sets and the people who focus on the data tend to be the people who’ve done the work to put together the data sets.

Jeffery Reid:
And frankly, if you have the data, you know the challenge, if you don’t have the data, maybe you don’t know it yet, but if you get lucky enough to get the data set, you want to get, you will discover that pulling the data together, making it clean and harmonious and interoperable, the fair principles find-able assessable, interoperable, and reusable. It’s very difficult. It’s a lot of work. It’s something that we’re working on every day, but it’s a little bit, it’s not like a problem that you fix. It’s more like brushing your teeth. It’s just something you kind of have to do at the beginning of the day and the end of the day, every day. But if you want to learn from the data, it’s absolutely the best way to make the data valuable is to spend a lot of time effort paying attention to how clean it is and how, how appropriate it is to the questions you’re asking.

Alex Mysak:
Well, you did read my mind because I was going to ask you about the best practices. And you did already, admit that you’re very passionate and empathetic on the topic. So you answered that question of mine. I would love to bring a little bit more of as a final question as we wrap, but to bring a little bit of your personality back to our audience, if I can please, Jeff. So what career advice would you give to someone starting out on their journey in data and AI today, and then as a separate topic, as you reflect on your journey as an openly member of the LGBTQ community, is there any advice that you would give to young gay people as they contemplate being out in the workplace?

Jeffery Reid:
Yeah, no. I mean two great questions. I think first of all, to the sort of general, what advice to give to someone entering in the field of kind of data and AI, it’s really the same advice I give to anybody who is working in any field and that sort of follow your passion. If it’s something that is really exciting to you, if you find yourself, thinking about it in the shower, in the morning, on your way to work, although none of us go to work anymore. But if you used to think about it in the shower, in the morning on the way to work, or when you’re driving the car home at night, pulling over to like scribble some notes down, cause you have some great idea. If you’re connecting with the work that you’re doing in that way, then it cannot help but be a positive experience for you in the world.

Jeffery Reid:
So I think if somebody is really struggling to care about what they’re working on, they probably should find something else to work on. But if you are so excited to figure out what the next iteration of the training is going to show you, or if you’re super psyched about some new idea about bringing in another data set, that’s maybe going to, can finally connect all the pieces and give you the insights you’re looking for. Then you know, you’re doing the right thing for you. And if you’re doing the right thing for you, it really cannot help but have an impact. And I think the other question is a bit of a corollary to that, like this question of, well, what do you do as an LGBTQ person? Like, how do you decide how transparent to be?

Jeffery Reid:
None of the things that people don’t always realize is like coming out, isn’t like a one-time thing, right? Every time you meet somebody, you sort of have to come out to them again and again and again. And so it can be a little bit uncomfortable, but the advice I go back to is Harvey Milk, who was this politician in San Francisco, one of the very first, if not the first elected out gay official in the US and he got hate mail and it was really, really, it was a bit rough for him. And so at one point about a year before he was unfortunately assassinated, he thought it was so likely that he would be killed for being an out gay politician that he recorded this message and you can look it up on online.

Jeffery Reid:
But the thing that stuck with me about that was he basically said, I want everybody to come out. So if Harvey Milk’s fought as he’s recording a message to be played posthumously after he’s been murdered, because he thinks that this is very likely, and then that happens, if Harvey Milk is saying, coming out is the thing that is the positive way to respond to the challenges you have. I think you have to take that very seriously. And while I know you have to be comfortable enough with coming out that you’re able to do it again and again and again, but the thing I would tell young people is if you can’t do it for yourself, which you should do it for yourself, at least do it for Harvey, do it for the community, recognize that there is value in the struggle that you are going through.

Jeffery Reid:
And maybe you will end up working in a job where they’re not as supportive as you would like them to be as I have been in the past. And that’s okay, because then you can move on to something better, but yeah come out, come out wherever you are. And it may be uncomfortable, but it really is a much better way to live than trying to hide yourself every day with the people who are really the people who you should be connecting with in a meaningful way.