Phyloinformatics Summer of Code 2012/Summaries
Student software developers showcase their work
For the sixth summer in a row, NESCent offered a number of internships aimed at introducing students to open-source software development. This summer, seven interns from the 2012 Google Summer of Code™ program, and one independent intern, worked remotely on an evoinformatics project of their own choosing, each under the guidance of an experienced mentor. NESCent’s 2012 Summer of Code students included Abu Zaher Md. Faridee, Daniel Gates, Pulkit Goyal, Elliot Hauser, Islam Ismailov, Michael Landis, Anne Ménard and Justs Zarins. Their projects ranged from machine learning for ecological genomics to optimizing R code for Approximate Bayesian Computing to tools for metadata extraction for phylogenetics.
Student: Justs Zarins
Mentor(s): Andrew Rambaut, Karen Cranston, Benjamin Redelings
Project: MASTodon: Summary and visualization of phylogenetic tree sets
MASTodon is a Java application that looks for common subtrees in large sets of phylogenetic trees. It provides a user-friendly graphical interface, automatic pruning algorithms (with the possibility to implement more) as well as powerful manual pruning options.
Student: Pulkit Goyal
Mentor(s): Matt Yoder, Christopher Baron
Project: jMatrixBrowse: GMaps-like Matrix Browsing
jMatrixBrowse is a jQuery plugin for browsing large matrices by dragging them arround(like Google Maps). It renders a browsable canvas that can be panned and zoomed easily. The data is retrieved from an API that gives the information about the matrix to be rendered and the values in the cells. An online demo is available here.
Student: Michael Landis
Mentor(s): Trevor Bedford, Andrew Rambaut
Project: Phylowood.js: Browser-based Interactive Animations of Ancestral Dispersal and Diversity Patterns
Student: Elliott Hauser
Mentor(s): Hilmar Lapp
Project: NeXML to MIAPA Mapping & ISAtab Transformation
Miapa-etl is a command line tool for converting TreeBASE NeXML files into ISAtab format compatible with the open source ISAtools data curation suite. This project was a first step towards interoperability between the phylogenetics community and the larger life sciences community using ISAtool to store, curate, and distribute scientific data and metadata. It was also one of the first attempts at mapping computational 'in silico' experiments such as NeXML onto a wet lab 'in vivo' data model like the one in ISAtab. Full documentation can be found in the project Google folder.
Student: Anne Ménard
Mentor(s): Yann Ponty and Jim Procter
Project: Towards a fully RNA-aware alignment editor
JalView is a multiple sequence alignment software, initially focused on amino-acids sequences. The project consisted in extending of JalView’s support for RNA structural features, with an special emphasis on tertiary interactions, which can now be loaded from specific XML files (RNAML format), directly annotated from 3D experimental models (PDB files), and displayed graphically using an integrated version of the Varna software. This joint vision of the sequence evolution, 3D architecture and tertiary module should allow for integrated studies of molecular function.
Student: Islam Ismailov
Mentor(s): James Degnan, Tanja Stadler
Project: Ranked gene tree topologies probability computation
Ranked Gene Trees is a polynomial-time algorithm implementation for computing probabilities of ranked gene tree topologies given species trees. Ranked gene tree probabilities could be used to infer species trees, although inferring species trees is beyond the scope of the project. The idea is to consider ranked gene tree topologies, where we distinguish the relative order of times of nodes on gene trees, but not the real-valued branch lengths.
Student: Abu Zaher Md. Faridee
Mentor(s): Kathryn Iverson, Sarah Westcott
Project: Apply Machine Learning Algorithm(s) to Ecology Data
Random Forest for Mothur is our first attempt to analyse microbial ecology data with machine learning algorithms so that microbial ecologists can identify bacterial populations that are associated with differences between health and disease. So far we've implemented random forest algorithm, we are working on performance improvement issues.
Student: Daniel Gates
Mentor(s): Derrick Zwickl, Barb Banbury, Hilmarr Lapp, Brian O'meara
Project: Optimizing R code for ABC
TreEvo is an R based program which uses approximate Bayesian computing to make inferences about continuous character trait evolution. Bayesian simulations in R can be very computationally intensive, so our project took the original TreEvo code and found ineffeciencies within the code and fixed them while adding parallelization and checkpointing. All of these improvements should make the program more computationally friendly and will require that the user have to make less of a sacrifice of power to complete simulations in a reasonable amount of time.