The Big Picture: from genotype to phenotype

How does the information contained within the metagenome of an organism interacting with its environment specify behaviour and characteristics?
How does life happen, and can we simulate it?


A fundamental biological challenge is to understand how the information encoded by the metagenome of an organism is processed to produce the resulting behaviors and phenotypes. Simply: genes, made up of DNA, are transcribed into RNA, and translated into proteins which together form the vast majority of functional elements in an organism. Evolutionary processes ensure that these functional elements interact with their environment in a manner that is beneficial to the organism, using a variety of organic (mostly) and inorganic molecules to catalyse reactions, recognise cellular signals, build cellular structures and pathways, and to perform a host of other diverse biological functions.

Our research elucidates these processes by developing computational algorithms to model, annotate, and understand the relationships among and between the sequences, structures, functions, and interactions of proteins, DNA, RNA, and metabolites, from atoms and molecules to systems and proteomes, in the context of their environments. The goal is to develop a coherent picture of the mechanistic basis (interconnectedness) of molecular and organismal structure, function, networks, and evolution within a fundamental scientific framework.

The big picture summed up in one sentence for us model interactomics at an atomic level by algorithmic approaches as well as by inferring homology (as we have done for the rice proteome and are actively doing for the proteomes from 1000 plants), a subset of which is the CANDO compound-proteome 3D interaction matrix, two applications of which are drug discovery and nanotechnology.

Our goals are to develop novel methods and apply them in the following areas:

More detailed information on these methods are available as part of our ongoing research and also our list of publications.


We expect that research into these different bioinformatics disciplines will enable to provide a comprehensive mapping and understanding how organismal genomes specify phenotype in particular environments. This information will enable us to probe that organism's molecular, cellular, and physiological pathways with an exquisite degree of sensitivity and also help us understand and treat infectious, neoplastic, and inherited disease in an increasingly efficient and rational manner. The development of algorithms and tools to understand organismal genomes and proteomes will have practical utility for pharmacogenomics and genetic engineering, and will be of use to the general research community to pose and answer ever more precise biological questions. Specifically, one of the major applications of our research is precision shotgun drug discovery and repurposing using virtual cell multilscale models for disease diagnosis and intervention.

Understanding organismal biology from a genomic and proteomic perspective requires expertise in several scientific disciplines, including computing science, mathematics, physics, chemistry, and biology. The problems that need to be solved generally involve exploration of large search spaces and finding objects of interest within those spaces, as well as managing the large amount of data produced and making predictions from analysis of the data. Thus our research has significance in not only answering biological questions, but is also relevant for solving problems of a similar nature in other scientific disciplines.

Long term goals

Our research involves integrating knowledge from the fields of computing science, mathematics, biology, physics, and chemistry to:

Samudrala Computational Biology Research Group (CompBio) ||