CS 495: Bioinformatics

Jones & Pevzner cover

Syllabus

You can download the syllabus here. The syllabus includes information on grading, course goals, and policies I will use in teaching this course.

Text

We will be using An Introduction to Bioinformatics Algorithms, by Jones and Pevzner.

FAQ

This is a FAQ I wrote about this class, and the required background to take it.

Class Schedule

This is a tentative schedule for the class. I may change it depending on how the class progresses.

WeekDatesSubjectNotes
1Jan. 18 - 22Introduction
2Jan. 25 - 29Sequence Alignment
3Feb. 1 - 5Sequence Alignment
4Feb. 8 - 12Sequence Alignment
5Feb. 15 - 19Phylogeny
6Feb. 22 - 26PhylogenyMidterm 1
7Mar. 1 - 5Phylogeny
8Mar. 8 - 12Gene Expression
9Mar. 15 - 19Gene Expression
Spring Break!
10Mar. 29 - Apr. 2Gene ExpressionMidterm 2
11Apr. 5 - 9Markov Models
12Apr. 12 - 16Markov Models
13Apr. 19 - 23Markov Models
14Apr. 26 - 30Further Topics & Wrap-Up

Units

These are the topics we will cover in class.

Introduction

Text Reading:Ch. 1-3
Topics covered:
  • Introduction to molecular biology
  • Introduction to computer science

Sequence Alignment

Paper:Salyers et al. (2004)
Text Reading:6.1 - 6.10
Assignment:Align sequence from several human gut bacteria, looking for antibiotic-resistance genes. (data)
Topics Covered:
  • Dynamic programming methods for global & local alignments
  • Pairwise & multiple sequence alignment
  • Gaps, insertions/deletions
  • Linear & affine gap penalty functions
  • BLAST algorithm
  • Heuristics for multiple sequence alignment
  • Alignment statistics & substitution matrices (BLOSUM)

Phylogeny

Paper:Lanciotti et al. (1999)
Text Reading:10.5 - 10.11
Assignment:Place several related West Nile Virus sequences in a phylogenetic tree. (data) (WN genomes)
Topics covered:
  • Rooted vs. unrooted trees
  • Distance vs. parsimony vs. maximum likelihood
  • UPGMA
  • Neighbor Joining
  • Fitch's Algorithm
  • Searching for improved trees

Gene Expression

Papers:Schena et al. (1995), Golub et al. (1999)
Text Reading:10.1 - 10.4
Assignment:Separate leukemia patients into subgroups, using gene expression data. (training data) (test data)
Topics covered:
  • Microarrays
  • Distance metrics for GED
  • Hierarchical clustering
  • k-means
  • Expectation maximization

Markov Models

Find the CpG islands within a stretch of DNA.
Paper:Bird (1987)
Text Reading:Chapter 11
Assignment:Find the CpG islands within a stretch of DNA. (HMM data) (sequence data)
Topics covered:
  • Markov assumption
  • Different orders of models
  • Basic Markov chains
  • Estimating model parameters
  • Hidden Markov Models
  • Forward, Viterbi, and Forward-Backward (Baum-Welch) Algorithms
  • Profile HMMs

Further Topics & Wrap-Up

Assignment:Independent project, if possible using the student's own data.
Topics covered
(if time):
  • Fragment assembly
  • Protein structure
  • Time series analysis

Sample Code

These files illustrate some basic principles in Java.

These files put everything together, to illustrate the Central Dogma of Molecular Biology.

Lectures

Past lecture notes and recordings are here. (Password required.)

Class-Only Information

Click here if you're in the class, for more useful information.

Other Resources

Any other links that might be useful to the class will be here.