How did you equip yourself in machine learning, a field that can seem pretty foreign and daunting to biologists?
If you had told college-aged Anne, “22 years from now, you’re going to be leading a research group focused on AI,” I would have said you’re insane. It would not have been possible to make this shift into machine learning without having made friends with machine learning experts — particularly Jones.
After he and I finished our training at MIT, we started a lab together at the Broad Institute in 2007, and we brainstormed a lot about how machine learning could help biologists. What allowed these ideas to percolate and develop was both of us hopping over the fence and getting familiar with the terminology and power of both sides, biology and computer science. It’s really a productive partnership.
And it’s not just Jones anymore. My group is about 50-50 in terms of people coming from the biology side versus the computational side.
You’ve had a lot of success in promoting interdisciplinary work.
I like bringing people together. My lab welcomes people who are curious and have different ideas — kind of the opposite of the toxic tech bro culture where it’s “we’re important, we do our thing, and don’t ask a question unless you want to get mocked.” When I realized it’s hard to be a woman in computer science, I realized immediately that it’s much harder to be in a racial minority in science in general.
We focus on whether the person has skills and interests that complement the group, whether they are curious about areas outside their domain, and whether they can communicate well to people without the same training. And without explicitly trying, my lab has been much more diverse than average for a computational lab at a top-tier institution. And the majority of the independent labs launched from among my alumni are led by women or people from minoritized groups.
I wonder how many people don’t think they’re racist or sexist, but when hiring they are, like, “This guy talks like me, he understands our language and jargon, he understands our domain,” not to mention “he’s the kind of person I’d like to have a beer with.” You can see how that would end up with a group that is homogeneous in demographics but also in domain expertise and experience.
These days, your group focuses on developing image-based profiling tools to accelerate drug discovery. Why did you choose that?
Several lines of evidence helped solidify that mission. One came from head-to-head experiments in 2014 that showed image-based profiles could be just as powerful as transcriptional profiles.
Another was described in our 2017 eLife paper, where we overexpressed a couple hundred genes in cells and found that half of them had an impact on cell morphology. By grouping the genes based on the imaging data, you can see in one beautiful cluster analysis what has taken biologists decades to piece together about various signaling pathways: over here, all the genes related to the RAS pathway involved in cancer; over there, the genes in the Hippo pathway that regulates tissue growth, and so on.
Looking at that visualization and realizing we had reconstituted a lot of biological knowledge for this set of genes in a single experiment — maybe a couple of weeks’ work — was really remarkable to me. It made us decide to invest more time and energy into developing this research trajectory.
In a 2018 Cell Chemical Biology paper, Janssen Pharmaceutica researchers dug up images sitting around from old experiments — where they had measured only the one thing they had cared about — and found that there was often enough information in those images to predict results from other assays the company conducted. About 37% of assay results could be predicted by machine learning using images they had lying around. This really got the attention of big pharma! Replacing a large-scale drug assay with a computational query saves millions of dollars each time.
In a consortium I helped launch in 2019, a dozen companies and nonprofit partners are working to create a massive Cell Painting data set of cells treated with more than 120,000 compounds and subjected to 20,000 genetic perturbations. The goal is to speed drug discovery by determining the mechanism of action of potential drugs before they go into clinical trials.
What are some examples of how image-based profiling can help find new drugs?
Recursion Pharmaceuticals is the company farthest along in using image-based profiling, with four drug compounds going into clinical trials. I serve on their scientific advisory board. Their basic approach is to say, let’s perturb a gene known to cause a human disease and see what happens to cells as a result. And if the cells change in any measurable way, can we find a drug that causes the unhealthy-looking cells to go back to looking healthy?
They’ve taken it a step further. Without even testing the drugs on the cells, they can computationally predict which disease phenotypes might be mitigated by which compounds, based on previous tests showing a compound’s impact on cells. I know this strategy works, because my lab has been working on the same thing in a project we just preprinted, though using relatively primitive computational techniques.
I’ve been collaborating with Paul Blainey at MIT and J.T. Neal at the Broad Institute on this genetic bar-coding technique that would let us mix a bunch of genetic perturbations in cells and then use bar-coding to figure out which cell got which genetic reagent. That allows us to mix together 200 normal and 200 mutated human proteins in a single well that we can treat with a drug. For each well, we’re testing whether this drug is useful for any of these 200 diseases. So it’s 200 times cheaper than doing 200 individual drug screens.
We got internal funding to do a pilot with 80 drugs and are seeking funding to test about 6,800 drugs. If we do this well, it may be that about a year from now, the outcome of this experiment suggests actual drugs for these disorders that doctors could prescribe after reading our paper.
What excites you about the future of image-based profiling in biomedical research — and perhaps more broadly, about the future of AI in this realm?
We’re already at the point where implementing existing machine learning methods improves the drug discovery process. But I can see a future, beyond the current capabilities of image-based profiling, where you start gaining exponentially, in leaps and bounds.
All the machine learning algorithms we’re using were developed for social media to identify faces and for financial institutions to identify unusual transactions — that sort of thing. I think putting some more attention toward biological domains and cellular images specifically could really move things forward faster.