Large-scale biology projects such as the sequencing of the human genome and gene expression surveys using RNA-seq, microarrays and other technologies have created a wealth of data for biologists. However, the challenge facing scientists is analyzing and even accessing these data to extract useful information pertaining to the system being studied. This course focuses on employing existing bioinformatic resources – mainly web-based programs and databases – to access the wealth of data to answer questions relevant to the average biologist, and is highly hands-on.
Topics covered include multiple sequence alignments, phylogenetics, gene expression data analysis, and protein interaction networks, in two separate parts.
The first part, Bioinformatic Methods I (this one), deals with databases, Blast, multiple sequence alignments, phylogenetics, selection analysis and metagenomics.
The second part, Bioinformatic Methods II, covers motif searching, protein-protein interactions, structural bioinformatics, gene expression data analysis, and cis-element predictions.
This pair of courses is useful to any student considering graduate school in the biological sciences, as well as students considering molecular medicine. Both provide an overview of the many different bioinformatic tools that are out there.
These courses are based on one taught at the University of Toronto to upper-level undergraduates who have some understanding of basic molecular biology. If you're not familiar with this, something like https://learn.saylor.org/course/bio101 might be helpful. No programming is required for this course.
Bioinformatic Methods I is regularly updated, and was completely updated for January 2020.
-In this module we'll be exploring the amazing resources available at NCBI, the National Centre for Biotechnology Information, run by the National Library of Medicine in the USA. We'll also be doing a Blast search to find similar sequences in the enormous NR sequence database. We can use similar sequences to infer homology, which is the primary predictor of gene or protein function.
Blast II/Comparative Genomics
-In this module we'll continue exploring the incredible resources available at NCBI, the National Centre for Biotechnology Information. We will be performing several different kinds of Blast searches: BlastP, PSI-Blast, and Translated Blast. We can use similar sequences identified by such methods to infer homology, which is the primary predictor of gene or protein function. We'll also be comparing parts of the genomes of a couple of different species, to see how similar they are.
Multiple Sequence Alignments
-In this module we'll be doing multiple sequence alignments with Clustal (as implemented in MEGA), DiAlign, and MAFFT. Multiple sequences alignments can tell you where in a sequence the conserved and variable regions are, which is important for understanding the biology of the sequences under investigation. It also has practical applications, such as being able to design PCR primers that will amplify sequences from a number of different species, for example.
Review: NCBI/Blast I, Blast II/Comparative Genetics, and Multiple Sequence Alignments
-In this module we'll be using the multiple sequence alignments we generated last lab to do some phylogenetic analyses with both neighbour-joining and maximum likelihood methods. The tree-like structure generated by such analyses tells us how closely sequences are related one to another, and suggests when in evolutionary time a speciation or gene duplication event occurred.
-In this module we'll take a set of orthologous sequences from bacteria and use DataMonkey to analyze them for the presence of certain sites under positive, negative or neutral selection. Such an analysis can help understand the biology of a set of protein coding sequences by identifying residues that might be important for biological function (those residues under negative selection) or those that might be involved in response to external influences, such as drugs, pathogens or other factors (residues under positive selection).
'Next Gen' Sequence Analysis (RNA-Seq) / Metagenomics
-In this module we'll explore some of the data that have been generated as a result of the rapid decrease in the cost of sequencing DNA. We'll be exploring a couple of RNA-Seq data sets that can tell us where any given gene is expressed, and also how that gene might be alternatively spliced. We'll also be looking at a couple of metagenome data sets that can tell us about the kinds of species (especially microbial species that might otherwise be hard to culture) that are in a given environmental niche.
Review: Phylogenetics, Selection Analysis, and 'Next Gen' Sequence Analysis (RNA-seq)/Metagenomics + Final Assignment