PhyloSoC:BioPerl integration of the NeXML exchange standard and Bio::Phylo toolkit/Benchmark Results

From NESCent Informatics Wiki
Jump to: navigation, search

Benchmarking

The two main methods of each module were benchmarked using the Benchmark.pm module included in Perl. Several data files were used and the results were normalized against the most time consuming method (i.e. Bio::Nexml::write has a %100 mark and the rest are compared to it).

Representative results

Below are the results based on two nexml documents taken from the nexml.org website. Characters.xml contains two relevant matrices (matrices that are of the type DNA, RNA, or Protein) and was used for the Bio::Nexml, Bio::AlignIO, and Bio::SeqIO tests. Trees.xml contains three trees and was used for the Bio::TreeIO and Bio::Nexml tests. The amount of data is roughly the same in both files.

Bio::TreeIO::nexml

_parse()

5 wallclock secs ( 4.67 usr + 0.04 sys = 4.71 CPU) @ 10.62/s (n=50) Normalized : 22%

write()

2 wallclock secs ( 2.10 usr + 0.01 sys = 2.11 CPU) @ 23.70/s (n=50) Normalized : 8%

Bio::AlignIO::nexml

_parse()

14 wallclock secs (13.28 usr + 0.07 sys = 13.35 CPU) @ 3.75/s (n=50) Normalized : 60%

write()

14 wallclock secs ( 9.25 usr + 0.08 sys = 9.33 CPU) @ 5.36/s (n=50) Normalized: 60%

Bio::SeqIO::nexml

_parse()

13 wallclock secs (13.13 usr + 0.08 sys = 13.21 CPU) @ 3.79/s (n=50) Normalized : 56%

write()

4 wallclock secs ( 3.90 usr + 0.03 sys = 3.93 CPU) @ 12.72/s (n=50) Normalized : 17%

Bio::Nexml

_parse()

11 wallclock secs (10.63 usr + 0.08 sys = 10.71 CPU) @ 4.67/s (n=50) Normalized : 47%

write()

23 wallclock secs (22.76 usr + 0.22 sys = 22.98 CPU) @ 2.18/s (n=50) Normalized : 100%

Analysis

Overall, the methods were reasonably efficient but a bit on the slow side. By comparing the benchmarks, it is obvious that both Bio::AlignIO methods, Bio::Nexml::write, and Bio::SeqIO::_parse should all be able to be optimized for increased efficiency.

  • Nice work, Chase --maj 12:22, 20 July 2009 (EDT)