This web page was produced as an assignment for Genetics 564, an undergraduate course at UW-Madison.
What is GenE Phylogeny?
Gene phylogeny can be used to determine the relatedness of different species, based on their gene sequences, over time. A gene phylogenetic tree shows the evolutionary relationships of different species based on the similarity or differences in the gene sequences. Many methods can be used to determine the similarity between sequences. Only a few methods will be described here.
Generating a phylogenetic tree
Percent Identity
The 'percent identity' method looks at the percentage that two sequences are identical to one another. After two sequences have been aligned, the percentage that the same basepairs are found in the same positions in the two sequences, is calculated. Once the similarity scores are calculated, a phylogenetic tree can be generated using several different methods. 'Neighbor Joining' and 'Average Distance' are the methods discussed here [1] [2].
BLOSUM MAtrix
The 'BLOSUM matrix' method is used to calculate the similarity between gene sequences. After two sequences have been aligned, the BLOSUM matrix is used to assign a score to each pair of aligned basepairs, based on the likelihood that these two basepairs would match by random chance. The BLOSUM62 matrix used to assign scores can be found here. The scores at all the sites are summed to give a total score that defines the relatedness of the two sequences. A higher score indicates sequences that are more closely related. Once the similarity scores are calculated, a phylogenetic tree can be generated using several different methods. 'Neighbor Joining' and 'Average Distance' are the methods discussed here [2] [3].
Neighbor Joining
The 'neighbor joining' method uses the similarity scores calculated by 'BLOSUM' or 'percent identity' to determine the relatedness between species. The amount of change that follows the divergence of the two species is calculated to determine the branch lengths for the phylogenetic tree. This creates a tree with varying branch lengths [2] [4].
Average distance
The 'average distance' method uses the similarity scores calculated by 'BLOSUM' or 'percent identity' to determine the most closely related species. The most closely related species are joined with equal branch lengths to create a node. The 'average distance' method makes the assumption that both species equally diverged from their common ancestor [4].
Discussion
Due to the length of the SEMA5A gene, the gene sequences for this project were unable to be aligned. For the phylogenetic tree results for the SEMA5A protein, see the the Protein Phylogeny page.
References
[1] Fassler, J. (2011, July 14). BLAST Help. Retrieved March 7, 2015, from http://www.ncbi.nlm.nih.gov/books/NBK62051/
[2] Professor Ahna Skop's Website: http://genetics564.weebly.com/homology--phylogeny.html
[3] Eddy, S. (n.d.). Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology, 22, 1035-1036.
[4] Barton, N. (2007, January 1). Phylogenetic Reconstruction. Retrieved March 7, 2015, from http://evolution-textbook.org/content/free/contents/ch27.html#ch27-4-2
[2] Professor Ahna Skop's Website: http://genetics564.weebly.com/homology--phylogeny.html
[3] Eddy, S. (n.d.). Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology, 22, 1035-1036.
[4] Barton, N. (2007, January 1). Phylogenetic Reconstruction. Retrieved March 7, 2015, from http://evolution-textbook.org/content/free/contents/ch27.html#ch27-4-2