The Big Picture: from genotype to phenotype

How does the information contained within the metagenome of an organism interacting with its environment specify behaviour and characteristics?
or
How does life happen, and can we simulate it?

Summary

A fundamental biological challenge is to understand how the information encoded by the metagenome of an organism is processed to produce the resulting behaviors and phenotypes. Simply: genes, made up of DNA, are transcribed into RNA, and translated into proteins which together form the vast majority of functional elements in an organism. Evolutionary processes ensure that these functional elements interact with their environment in a manner that is beneficial to the organism, using a variety of organic (mostly) and inorganic molecules to catalyse reactions, recognise cellular signals, build cellular structures and pathways, and to perform a host of other diverse biological functions.

Our research elucidates these processes by developing computational algorithms to model, annotate, and understand the relationships among and between the sequences, structures, functions, and interactions of proteins, DNA, RNA, and metabolites, from atoms and molecules to systems and proteomes, in the context of their environments. The goal is to develop a coherent picture of the mechanistic basis (interconnectedness) of molecular and organismal structure, function, networks, and evolution within a fundamental scientific framework.

The big picture summed up in one sentence for us model interactomics at an atomic level by algorithmic approaches as well as by inferring homology (as we have done for the rice proteome and are actively doing for the proteomes from 1000 plants), a subset of which is the CANDO compound-proteome 3D interaction matrix, two applications of which are drug discovery and nanotechnology.

Our goals are to develop novel methods and apply them in the following areas:

Structure: Predict atomic level three dimensional structures of biologically important molecules (such as proteins, DNA, RNA, and small molecules with emphasis on proteins) given their linear chemical structure (sequence).
Function: Predict function using the resulting models with the aid of existing experimental information.
Interaction: Predict interactions between and among these molecules, including biological substrates and inhibitors.
Systems: Predict the behaviour of pathways, systems, and whole cells by integrating the structure, function, and interaction information with the expression (copy number) of these molecules.
Evolution: Study the evolution of these biological molecules to ask and answer questions on the origin of, and the conditions necessary to nurture, life.
Design: Design novel biological entities not observed in nature using our predictive methodologies and verify the designed constructs at the bench.
Application: Apply the methodologies developed to study specific biological problems of interest in the areas of medicine and nanotechnology.
Infrastructure: Develop an infrastructure to publish the integrated information so that it is useful for biologists to pose and answer precise scientific questions about molecular, systems and organismal biology.

More detailed information on these methods are available as part of our ongoing research and also our list of publications.

Implications

We expect that research into these different bioinformatics disciplines will enable to provide a comprehensive mapping and understanding how organismal genomes specify phenotype in particular environments. This information will enable us to probe that organism's molecular, cellular, and physiological pathways with an exquisite degree of sensitivity and also help us understand and treat infectious, neoplastic, and inherited disease in an increasingly efficient and rational manner. The development of algorithms and tools to understand organismal genomes and proteomes will have practical utility for pharmacogenomics and genetic engineering, and will be of use to the general research community to pose and answer ever more precise biological questions. Specifically, one of the major applications of our research is precision shotgun drug discovery and repurposing using virtual cell multilscale models for disease diagnosis and intervention.

Understanding organismal biology from a genomic and proteomic perspective requires expertise in several scientific disciplines, including computing science, mathematics, physics, chemistry, and biology. The problems that need to be solved generally involve exploration of large search spaces and finding objects of interest within those spaces, as well as managing the large amount of data produced and making predictions from analysis of the data. Thus our research has significance in not only answering biological questions, but is also relevant for solving problems of a similar nature in other scientific disciplines.

Long term goals

Our research involves integrating knowledge from the fields of computing science, mathematics, biology, physics, and chemistry to:

Achieve better understanding of protein structure, protein function, and molecular evolution.
Analyse genomes and study interactions of individual genes and their corresponding proteins to understand and model their roles in infectious and inherited disease.
Model complete cellular pathways and systems within an organism of interest using knowledge about the structure of proteins, protein expression, protein-protein and protein-substrate interactions.
Develop therapeutics and molecular machines to improve human health and quality of life.
Devise simulations of all forms of life we can observe and others we can imagine.

Samudrala Computational Biology Research Group (CompBio) || Ram Samudrala || me@ram.org