The goal of the bioinformatics program in the Center for Genes, Environment and Health (CGEH) is to provide support and advance the science of bioinformatics and computational analyses of complex genomic data. Easy access to state of the art experimental technologies at CGEH provides exciting new opportunities for individual researchers to ask increasingly complicated biological questions from their data. The bioinformatics challenge is to establish strategies to manage and interpret the large volumes of data produced by these advanced high-throughput instruments. The scope of the computational needs span from basic raw data analysis, to statistical testing and association, to annotating results using existing knowledge-bases, to functional characterization of candidate genes, to useful visualization of the data at all scales.
Some examples of the computational analyses stemming from use of next-generation technologies include:
Genome-Wide Association Studies (GWAS)
Search among a large cohort of individuals for common genetic variants, such as single nucleotide polymorphisms (SNPs), which associate with a trait, such as disease status. The computation involved is the reliable identification of genetic variation, tests for association with the trait, and characterization of the functional effect of the associated traits.
Uses the fact that offspring inherit contiguous segments of a chromosome from their parents. Thus if specific marker genes are found to inherited along with a disease trait, then there is a possibility that gene(s) responsible for the disease are located near (i.e. linked to) the inherited marker gene on the same chromosome. The computation involved is determining the genotypes of individuals, computing the probability of linkage of markers and genes given a certain model of inheritance (recessive, dominant, etc) and functional characterization of candidate disease genes.
Whole-Genome Sequencing (WGS)
Allows us to identify genetic variants (SNPs, insertions, deletions, copy number variants, structural variants) in individual samples, which is important for studying the genetic causes of disease. WGS can also determine the genome for newly-sequenced organism, which has important implications for study of rapidly evolving pathogenic organisms. The basic computational task is mapping the enormous volume of sequence information to known genomes or de novo assembly of sequence reads for unknown genomes. More advanced analyses are used to detect and annotate genomic variation, and then compute their contribution and association to the process of interest.
Investigates gene expression across samples. Using next-generation technologies now offers that ability to study genes at the level of individual isoform expression. The computational challenge is to divide the expression level of a gene measured over all isoforms into the constituent expression levels for each isoform, compare that to known models of gene structure, possibly identify novel isoforms, and finally, to characterize the consequences of isoform-specific expression.
Interrogate small molecule control of gene expression by binding to the ends of gene transcripts to inhibit expression of those genes. The short length of microRNAs (~22 DNA bases) and relative low abundance of expression makes them difficult to interrogate and their specificity of control is not well validated. The computational task is to quantitate expression, and associate microRNA with their target genes. Often these studies involve integration with gene expression data.
Investigates the presence of a methyl group (one Carbon and three Hydrogen molecules) attached to specific DNA bases in the genome. These chemical changes to the DNA have consequences for how expression of nearby genes is regulated. Determining differential patterns of methylation in phenotypic groups provides compelling evidence for genes potentially causing the differing phenotypes. The computational challenges here involve detecting methylation patterns, associating those with genes, characterizing the genes, and often integrating the data with gene or microRNA expression data.
Studies examine chemical modifications to a component of the chromosome around which DNA is wound. Changes in histone modification patterns affect how accessible DNA is to cellular machinery, which then controls gene expression. Combinations of multiple, distinct histone modifications define a code which the cell deciphers to determine gene expression. The basic computational need is identification of areas of the genome affected by a particular modification, followed by a more complex integration of multiple modification profiles to determine the code at any particular genomic locus. The remaining challenge is to search for functional consequences by examining differences in, for example, expression of nearby genes.
The advance of both instrument technology and computational tools presents an unprecedented ability to investigate increasingly complex hypotheses at an incredible level of detail. The ability to integrate analyses across multiple experiments aimed at distinct mechanisms of controls offers the potential to uncover a comprehensive understanding of the biological process of interest.
Learn more about the exciting research undertaken in the Center for Genes, Environment and Health.