R Hackathon 1
NESCent sponsored a hackathon (Dec. 10-14, 2007) focused on integration of comparative methods within the R statistical package to promote interoperability, the support of data exchange standards, and greater usability of tools and methods in evolutionary bioinformatics.
A citable poster describing the hackathon and its outcomes is available at Nature Precedings. Also, the history of the hackathon and the various results that arose from it over time have recently been written up in a preprint submitted to a conference: Cranston, Karen, Todd Vision, Brian O’Meara, and Hilmar Lapp. 2013. “A Grassroots Approach to Software Sustainability.” doi:10.6084/m9.figshare.790739.
Comparative phylogenetic methods provide a rich and powerful way to understand the evolution of organismal traits. A wide variety of statistical methods and tools have been developed to rigorously test hypotheses about rates and modes of trait evolution, trait covariation, correlation of traits with ecological and environmental factors, host/parasite co-evolution, etc. The R statistical analysis package has emerged as a popular platform for implementation of these methods.
The many individual software development efforts in R and the growing number of users presented an opportunity to address the common challenges of data exchange, interoperability, and usability. NESCent took advantage of this opportunity by sponsoring a hackathon, or codefest, an event at which programmers who otherwise do not have the opportunity to interact on a routine basis meet to collaboratively develop working code that furthers the goals of the larger open development community to which they belong. The hackathon brought together different people and groups who had started to develop comparative phylogenetic methods in the R platform, or who would had wanted to integrate their methods into, or interface a tool with the R platform.
The following broad objectives were defined. Participants then helped to identify specific objectives and coding targets prior to and during the event.
- Ensuring compatibility and data flow between R packages, for example by agreeing on and implementing one or more common data models.
- Improving support for the input and output of data exchange standards and formats.
- Developing approaches to enable code re-usability and extensibility.
- Identifying new functionality that is well-suited to native integration into the R platform, and prototyping the integration of select targets.
- Identifying external software well-suited to interface with the R platform, and prototyping the interface for select targets. For instance, providing support for analysis of large data sets through an interface to existing C/C++ programs, or to broader capabilities through an interface to general-purpose phylogenetic analysis packages such as Mesquite.
- Developing end-user documentation.
- Identifying future research areas and the initiative of collaborations between different groups. See overview of packages, including future development.
- Training of future developers and broadening the diversity of the software development community.
The hackathon concentrated on writing code. All code and documentation is made available immediately and freely to the community under an open-source (OSI-approved) license.
From the objectives list the following 7 subgroups were formed.
- Diversification SG
- Divergence Time Estimation SG
- Documentation SG
- Trait Evolution SG
- Class Design SG
- Mesquite-R communication SG
- Input-Output SG
Participation was arranged by invitation and by self-nomination followed by review.
- The list of participants is on-line.
- We would also like to thank the following people who contributed in the planning stages of the hackathon.
- Steve Kembel, Hilmar Lapp, Brian O'Meara, Samantha Price, Todd Vision, Amy Zanne
- Wolfgang Huber, a core developer of Bioconductor, has agreed to serve in an advisory role during the planning phase of the event.
- The hackathon took place from Dec 10-14, 2007 at NESCent.
Detailed planning steps are outlined and documented separately. In particular, the following activities took place:
- writing the proposal.
- communicate through teleconferences
- discuss possible sub-groups
- discuss the priorities of end-user and programmer goals
- discuss requirements and candidates for data standards
- collect information for an overview of existing R packages
- collect information on current data representations of those packages
- collect information on how to write R-vignettes
- collect information on S4 and other documenation