R Hackathon 1/Proposal
NESCent is sponsoring a hackathon focused on integration of comparative methods within the R statistical package in order to promote interoperability, the support of data exchange standards, and greater usability of tools and methods in evolutionary bioinformatics.
Comparative phylogenetic methods provide a rich an powerful way to understand the evolution of organismal traits. A wide variety of statistical methods and tools have been developed to rigorously test hypotheses about rates and modes of trait evolution, trait covariation, correlation of traits with ecological and environmental factors, host/parasite co-evolution, etc. From a scientist's perspective, analyzing such data is often inherently statistical in nature, and therefore the R statistical analysis package has emerged as a popular platform for implementation of these methods. However, many of the comparative methods are not easily embedded in a powerful statistical analysis platform such as R, and phylogenetic inference tools, such as PAUP*, MrBayes, etc, are not easily integrated into analysis scripts written in R.
As an example for the potential of overcoming these problems, the Bioconductor software transformed the field of microarray analysis by putting rigorous data transformations, analysis methods, and statistical tests in the hands of bench researchers, and by enabling the rapid validation and dissemination of new methods. Bioconductor is a collection of freely available open source packages for R that interoperate on the basis of a common data model. The widespread use of Bioconductor fostered a culture in which newly developed methods in microarray analysis are meanwhile typically published as Bioconductor packages, allowing them to be rapidly adopted into existing R-based analysis pipelines.
The many individual comparative analysis software development efforts in R and the growing number of users present an opportunity to address the common challenges of data exchange, interoperability, and usability. NESCent seeks to take advantage of this opportunity by sponsoring a hackathon, or codefest, an event at which programmers who otherwise do not have the opportunity to interact on a routine basis meet to collaboratively develop working code that furthers the goals of the larger open development community to which they belong. The hackathon will bring together different people and groups who have started to develop comparative phylogenetic methods in the R platform, or who would like to integrate their methods into, or interface a tool with the R platform.
The following broad objectives have been defined. Participants will help to identify specific objectives and coding targets prior to and during the event.
- Ensuring compatibility and data flow between R packages, for example by agreeing on and implementing one or more common data models.
- Improving support for the input and output of data exchange standards and formats.
- Development of approaches to enable code re-usability and extensibility.
- Identifying new functionality that is well-suited to native integration into the R platform, and prototyping the integration of select targets.
- Identifying external software well-suited to interface with the R platform, and prototyping the interface for select targets. For instance, providing support for analysis of large data sets through an interface to existing C/C++ programs, or to broader capabilities through an interface to general-purpose phylogenetic analysis packages such as Mesquite.
- Development of end-user documentation.
- Identification of future research areas and the initiative of collaborations between different groups.
- Training of future developers and broadening the diversity of the software development community.
The hackathon will concentrate on writing code. All code and documentation will be made available immediately and freely to the community under an open-source (OSI-approved) license.
The hackathon will be organized around subgroups of 2-5 participants each. Each subgroup will be tasked with attacking a set of problems towards one or more of the objectives of the hackathon.
Subgroups will emerge during the planning phase. At present we envision subgroups broadly along the following areas:
- Standard and Interoperable data structures ("Data Structures")
- Data exchange formats and standards ("IO")
- Interfacing with external programs ("Interfacing")
- Visualization of data structures and analysis results ("Visualization")