Phyloinformatics Summer of Code 2013/Summaries
For the seventh summer in a row, NESCent offered a number of internships aimed at introducing students to open-source software development. This summer, six interns from the 2013 Google Summer of Code™ program, and one from the GNOME Outreach Program for Women worked remotely on an evoinformatics project of their own choosing, each under the guidance of an experienced mentor. NESCent’s 2013 Summer of Code students included Chanda Phelan, Paul Frandsen, Monica-Andreea Dragan, Imran Fanaswala, Joshua Lynch, Zheng Ruan and Yanbo Ye. Their projects ranged from machine learning for ecological genomics to phylogenetics in BioPython to visualization of evolutionary trees.
Students: when you edit this page, there is a template that you can copy and paste for your summary. Take a look at last year's page for examples.
Student: Imran Aslam
Mentor(s): Andrew Rambaut, Marc Suchard, Stephane Guindon
Project: Extend PhyML to use the BEAGLE library
PhyML is a phylogenetic software for inference of evolutionary relationships through genome sequences. These relationships are represented as trees; PhyML creates trees and eventually presents the "best" tree to the user. Of course, to determine the "best" tree PhyML needs to evaluate thousands of tree configurations which is computationally demanding. This project's goal was to use the BEAGLE library to do the heavy-lifting of the computations in PhyML.
Technically speaking, PhyML explores the tree space in terms of branch lengths, topology, substitution rate parameters, etc. This exploration induces maximum likelihood calculations at each step. Under the Markovian assumption of independent events and time-reversibility, one can paralleize (1) partial likelihoods computations and (2) P-matrix updating (i.e. matrix multiplication) thus are suitable for computation across cores. This project implemented the aforementioned optimizations.
Student: Monica Dragan
Mentors: Anurag Priyam and Yannick Wurm
Project: Identifying problems with gene predictions
Genome sequencing is now possible at almost no cost. However, obtaining accurate gene predictions remains a target hard to achieve with the existing technology. GeneValidator is a tool that identifies problems with gene predictions, based on similarities with data from public databases. We apply a set of validation tests that provide useful information about the problems that appear in the predictions, in order to make evidence about how the gene curation can be made or whether a certain predicted gene may not be considered in other analysis.
Student: Zheng Ruan
Mentors: Eric Talevich and Peter Cock
Project: Codon Alignment and Analysis in Biopython
A codon alignment is an alignment of nucleotide sequences in which the trinucleotides correspond directly to amino acids in the translated protein product. Codon Alignment is widely used in calculating evolutionary parameters and a variety of neutrality test. During this project, I implemented a new module -- CodonAlign in Biopython to construct codon alignment and apply many applications of it. The implementation is capable of dealing with mismatch and frameshift events. Codon Alignment can also be written in a variety MSA formats for further analysis. The code is available at CodonAlign.
Student: Yanbo Ye
Mentor(s): Mark Holder, Jeet Sukumaran, Eric Talevich
Project: Phylogenetics in Biopython: Filling in the gaps
Bio.Phylo is a phylogenetics package in the open source bioinformatics library -- Biopython. This project is to implement several tree construction and consensus tree algorithms that are not available in previous versions, including UPGMA, NJ, MP, strict consensus, majority rule, adam consensus and some branch support, bootstrap algorithms. Now these algorithms are completed in two modules, TreeConstruction and Consensus, in Bio.Phylo.
Student: Paul Frandsen
Mentor(s): Rob Lanfear, Brett Calcott
Project: Extend PartitionFinder to automatically partition DNA and protein alignments
PartitionFinder uses heuristic algorithms to combine similar blocks of data, which are assigned a priori by the user, into a model partitioning scheme. This project implemented a new algorithm that uses site rates or site likelihoods to automatically partition sites in a DNA and protein alignments into a best fit partitioning scheme.
Student: Joshua Lynch
Mentors: Kathryn Iverson, Abu Zaher Md. Faridee
Project: Implementing Machine Learning Algorithms for Classification and Feature Selection in mothur
mothur is a bioinformatics tool developed for and by the microbial ecology community. This project implemented a new classification and feature selection command for mothur using the support vector machine algorithm.
Student: Chanda Phelan
Mentors: Gabriel Harp, Stephen A. Smith
Project: Phylet: Open Tree of Life Graph Visualization and Navigation