Stage 1. Sequence Acquisition
- For phylogeny, DNA can be more informative
- The protein-coding portion of DNA has synonymous and nonsynonymous substitutions
- Thus, some DNA changes do not have corresponding protein changes
Stage 2. Multiple Sequence Alignment (MSA)
- The fundamental basis of a phylogenetic tree
- Include misalignment, nonhomologous sequence → It will still be possible to generate a tree
- Confirm that all sequences are homologous
- Adjust gap creation and extension penalties as needed to optimize the alignment
- Restrict phylogenetic analysis to regions of the MSA for which data are available for all taxa
Stage 3. Models of DNA and Amino Acid Substitution
DNA models
- Jukes and Cantor model
- Kimura model
- Tamura and Nei model
Protein models
개념 정리
- Substitution models
- DNA substitution mutations
- Transition : Interchanges of two-ring purines (A, G) or of one-ring pyrimidines (C, T) / Involve bases of smilar shape
- Transconversion : Interchanges of purine for pyrimidine bases / Involve exchange of one-ring & two-ring structures
Stage 4. Tree-Building Methods
- Tree
- Phylogenetic tree construction
- Input: a set of n species, a method for computing a score for a labeled tree
- Output: labeled tree with the optimal score
- Distance-based methods
- UPGMA (Unweighted Pair Group Method with Arithmetic mean)
- Input: Distance matrix showing distances between sequences
- Idea: Combine the two closest sequences, then iterate until reach a single cluster
- Distance $dij$ between two clusters Ci and Cj is defined as the average distance between pairs of sequences from each other
- |Ci|, |Cj| : the number of sequences
- Neighbor-joining
- UPGMA (Unweighted Pair Group Method with Arithmetic mean)
- : involve a distance metric (e.g. the number of amio acid changes between the sequences, distance score)
- Character-based methods
- Maximum parsimony
- Maximum likelihood
- Bayesian
'Biology' 카테고리의 다른 글
dn/ds ratio (0) | 2022.12.10 |
---|