David R. Liu
Department of Chemistry and Chemical Biology, Harvard University
Howard Hughes Medical Institute
Programming Human Biology Using Next-Generation Macromolecular and Small-Molecule Tools for the Understanding and Treatment of Disease
The past century of life sciences research has resulted in an emerging understanding of the ways in which DNA, RNA, proteins, and small molecules regulate information flow in living systems. This understanding has made increasingly realistic the possibility of not just reading but precisely manipulating biological information in humans by changing the structure of our genomes, altering gene expression patterns, engineering new conditions under which genes are expressed, or creating new circuits that interface exogenous synthetic molecules with gene expression programs. The purposeful manipulation of information flow in human cells and patients has the potential to: (i) reveal and validate causal relationships between genes, gene products, and human disease; and (ii) lead to breakthrough small-molecule or macromolecular therapeutics that address disease at the most fundamental level of our software, the human genome, as well as at the level of gene products.
While the vision of manipulating gene sequences, gene regulation, and gene products with molecular precision in mammalian cells and, eventually, in humans, has enormous potential to benefit society, a number of daunting challenges must be overcome before this vision can be fully realized. Perhaps the most significant of these challenges is how to create with a practical efficiency and success rate the many protein or nucleic acid machines that are needed to alter genomes, transcriptomes, or proteomes with a sufficiently high degree of selectivity and potency. To realize a vision in which arbitrary genes, transcripts, or proteins can be manipulated in mammalian cells requires fundamentally new approaches to generating, at an unprecedented scale, protein or nucleic acid machines with precision, tailor-made properties.
These approaches will likely exploit recently discovered natural proteins, coupled with state-of-the-art technologies such as phage-assisted continuous evolution (PACE) for rapidly evolving and engineering these proteins towards uses that advance the science of therapeutics. For example, the creation of robust platforms of programmable CRISPR (Cas9)-based or TALE-based genome editing and transcriptional regulation tools that are capable of turning "on", turning "off", or altering the nucleotide sequence of any combination of genes or regulatory sequences in the human genome represents an ambitious but well-defined goal that would have an major impact on illuminating disease biology and potentially treating genetic diseases.
In addition to evolving and engineering macromolecules, discovering and developing small molecules that can modulate the biological activities of targets validated using programmable genome engineering proteins is an essential activity to connect new biological insights to leads for therapeutic development. Some targets may only be addressable using macromolecular therapeutics by virtue of their binding energies and ability to catalyze transformations such as manipulating the covalent structure of genes and proteins. For other targets, however, small molecules will likely remain the most promising class of agents to modulate activities in therapeutically relevant contexts. Therefore, the development and application of new, highly efficient small-molecule discovery technologies such as the selection of DNA-encoded small-molecule libraries against many biological targets of interest in a single experiment will play crucial roles.
The activities needed to realize this vision can be classified into three stages:
Phase 1: Develop the tools. New methodologies and technologies to characterize, engineer, and evolve proteins will be developed and applied to transform natural components such as Cas9 or TALE domains into variants with the specificity, context independence, activity level, stability, cellular compatibility, and effector functions necessary to illuminate or address human disease. These effector functions will likely include DNA cleavage, transcriptional activation, transcription repression, epigenetic modification, and recombination to insert, delete, or replace alleles. While these activities have become a focus of several laboratories include our own, many of the key developments and insights have either not yet been reported, or have only very recently been described. Importantly, TALE- and CRISPR-based systems are programmable using a simple code that relates target DNA sequences with TALE or CRISPR protein or RNA sequences. Because this programmability alone—while crucial—is insufficient to ensure that these tools will be robust or accessible enough to support Phase 2 and Phase 3 activities, methods to rapidly characterize and improve these tools must also be developed.
In addition to programmable DNA-binding proteins and protein-RNA complexes, other macromolecules capable of manipulating biological information flow in human cells including antibodies, proteases, sortases, recombinases, polymerases, and nucleases are also poised to play key roles in the understanding and treatment of disease. As is the case with TALE and CRISPR systems, a primary determinant of the likely impact of these proteins is our ability to engineer or evolve therapeutically relevant levels of activity, specificity, stability, and/or cell-state dependence. Therefore, general methods such as high-throughput specificity profiling or PACE that can efficiently characterize and improve diverse classes of proteins may prove especially valuable to Phase 1 efforts.
Finally, the development of rich collections of small molecules together with methods for their rapid screening or selection (in the case of DNA-encoded libraries) will power small-molecule discovery efforts that will yield new modulators of biological targets that play potential roles in human disease.
Phase 2: Discover the programs. Sets of evolved or engineered macromolecules or small molecules generated in Phase 1 will be used to discover and test causal relationships between genes and disease-associated pathways in mammalian cells. As Phase 1 methods become increasingly effective, and larger and larger sets of these tools become accessible, Phase 2 activities will transition from a hypothesis-testing mode (does gene A when upregulated and protein B when inhibited induce disease if gene C is mutated?) into a hypothesis-generating (“forward genetics”) mode with the goal of discovering sets of genes or proteins that when activated, repressed, or modified by macromolecules or small molecules alter the propensity of human cells to enter a diseased state.
Phase 3: Enable therapeutics. The knowledge from Phase 1 and Phase 2 will trigger new drug discovery efforts through the identification of new targets for small-molecule screening and development. In addition, the programming tools themselves, if sufficiently specific and active, have potential as macromolecular therapeutics. Phase 3 efforts therefore will aim to develop both small-molecule and macromolecular therapeutics that program human cells in the ways discovered in Phase 2. The macromolecular side of this phase will require characterizing and improving delivery (using new delivery technologies such as supercharged proteins), biodistribution, immunogenicity, and efficacy studies in cell culture and animal models of human disease.
Implementing this ambitious vision in a way that impacts society outside of the laboratory will foster—and require—a multidisciplinary, highly collaborative culture that seamlessly integrates chemists, synthetic biologists, macromolecule engineering and evolution experts, cell biologists, clinicians, bioinformaticists, industry experts, and entrepreneurs.
Specific examples of transformative applications include:
Š Revealing the genetic dependencies of oncogenesis, infectious disease progression, and metabolic di sorders in therapeutically relevant settings
Š Programming the expression of sets of transcription factors that induce the differentiation or transdifferentiation of therapeutic cells (for example, pancreatic exocrine cells into beta cells in diabetics, white adipose tissue into brown fat in patients with metabolic disorders, or serotonergic neurons into dopaminergic neurons in Parkinson’s patients)
Š Altering the structure of the genes in infected individuals to disrupt the life cycle of infectious disease agents (as a validated example, editing CCR5 in HIV patients)
Š Programming cells containing cancer- or infectious disease-associated genetic changes to undergo apoptosis
Š Implicating genes and gene combinations that grant resistance or sensitivity to known bioactive molecules for which there is no target known