Phyllis Zhang (Chemistry & Physics and Computer Science ’23) and Ziyuan Zhao (Chemistry & Physics and Computer Science ‘23) from the Hekstra Lab have been awarded the prestigious Hoopes Prize for undergraduate thesis research. The university-wide award celebrates undergraduates from all concentrations who produce outstanding work.
The Hekstra Lab investigates the internal motions of proteins as they carry out their functions. In recent years, machine learning has generated non-moving structures for many proteins, but these structures aren’t the whole story.
“Protein molecules consist of thousands of atoms and could, in principle, move in many different ways between different structural variants, or conformations,” MCB faculty Doeke Hekstra explains. “In practice, the motions of protein are usually small because their amino acid sequences have been ‘tamed’ by evolution—many weak interactions keep the protein structure mostly intact. On the one hand, this makes it possible to predict the structures of proteins from their sequences. On the other hand, these interactions are so subtle that it is hard to predict which motions are possible, even when these motions are important. This also makes it hard to design drug molecules or artificial enzymes that will work as intended.”
For her thesis research, Zhang built off of previous crystallographic data. “During the pandemic, we learned about new crystallographic experiments in which thousands of crystals of a protein of interest are each soaked with a different ‘drug fragment,’” says Hekstra. “These fragments are small molecules that serve as a potential scaffold for drug design. The key to analyzing these experiments is to find which drug fragments would bind a protein, where they do so, and how binding changes the state of the protein.”
However, it can be difficult to compare crystallography data collected from different crystals. Factors such as temperature and sample handling create variation between crystals. Accounting for this variation is necessary to find the drug fragments binding to the drug target of interest. Zhang used her machine learning skills to accurately correct for these artifacts, producing an algorithm named VALDO that does exactly that—finding the hidden drug fragments. “Coming up with the idea was the easy part,” Hekstra says. “Phyllis came in with a strong background in machine learning and efficiently and skillfully learned a lot of crystallographic theory, including about esoteric things like space groups, Fourier transforms, and structure refinement. Phyllis picked all of this up quickly.”
“Phyllis’s thesis work stood out due to its comprehensive approach in tackling the challenges of crystallographic drug fragment screens,” says applied physics (SEAS) graduate student Minhuan Li, who works in the Hekstra Lab and who mentored Zhang. “Her ability to integrate crystallographic theory and machine learning techniques in the development of the VALDO pipeline showcased a strong grasp of both domains.”
“The key findings of Phyllis’s work were remarkable,” Li continues. “Starting from a simple idea, the VALDO pipeline effectively accounted for crystal-to-crystal variation, improving the accuracy of detecting bound small molecules across thousands of datasets. Moreover, it addressed the limitations of existing software by utilizing reciprocal-space methods and considering the structural changes induced by drug fragments on the protein itself. As a result, VALDO outperformed the existing tool in many ways, particularly in cases involving structural alterations. These findings highlight the potential of VALDO to significantly enhance discovery in drug development.”
Li adds that working with Zhang was a pleasure. “Throughout our journey together, her unwavering dedication and insatiable intellectual curiosity have been evident,” Li says. “Her collaborative nature and vibrant personality have made her a valuable team member as well as a cherished friend.”
After graduation, Zhang is moving to New York to work full-time for Jane Street Capital’s quantitative research team. She adds that one of her favorite aspects of her Harvard experience was her time as a teaching fellow and that she hopes to eventually return to teaching. “While many here were incredibly supportive and encouraging, at times my ability to balance my teaching with coursework was met with skepticism and seeming contempt,” Zhang says. “But I believe that if you really love what you do, you can do it–at the very least, it’s worth fighting for!…Try to find wonderful, kind, and inspiring mentors and once you do, stick with them!”
Zhang also expressed gratitude toward her mentors, including Li and Hekstra. “Minhuan was the best mentor ever!” she says. “Not only was he a literal genius, he was also so incredibly sweet, creating a warm and supportive environment for me to learn, grow, and have fun! He broadened my academic knowledge – answering my many questions at any time of day over Slack and raving about new AI advances with me. Most of all, I was also so lucky to become friends with him.”
She adds, “Finally, thank you to my wonderful friend, Luca, who despite not knowing what the word “binding” meant until 1 week before the thesis was due, was so excited for and so supportive of my research and I.”
In his thesis research, Zhao tackled questions about how protein conformations change. “Attempts to predict possible conformations that enable proteins to actually work face two major obstacles: the ability to predict relevant conformations, and providing explicit physical understanding of the relations between these conformations,” he “To address these obstacles, I describe a framework that combines ideas from coarse-graining and probabilistic modeling to generate conformations in an efficient and physically interpretable manner. I show that this approach works using a small model system, the alanine dipeptide. In particular, the coarse-grained representation of the peptide preserves its basic mechanical properties, and allows for identification of transition paths between conformations.”
Hekstra adds, “I discussed this problem extensively with Ziyuan. He spent a lot of time thinking independently about modeling of protein dynamics. I was getting nervous about how long it took him to decide on a senior thesis project! But, he came up with a clever idea. Usually, machine learning models simplify things by reducing the number of dimensions needed to describe their input data but don’t give us much understanding. On the other hand, we have simple ball-and-spring models that are easy to understand but often wrong. Ziyuan found a promising way to combine both.”
Zhao will be staying on at Harvard as a graduate student in the Systems, Synthetic, and Quantitative Biology (SSQB) program. “Reflecting on my time at Harvard, I’m glad that I experienced very different styles of research by working in various labs: Gunawardena lab, Zitnik lab, and then Hekstra lab,” he says. “There’s some tradeoff here – even though I don’t get to work deeply in one specific area, I have a better understanding of the broader fields and can gauge my interest in various possible research directions before going to graduate school.”
Hekstra adds, “I’m excited that he will remain at Harvard. Our conversations are always stimulating, and I hope to have many more. He’s hardly ever satisfied, but I hope that with graduation, he will find a few moments to enjoy what he has accomplished so far!”
Congratulations to Phyllis Zhang and Ziyuan Zhao!