21 5 / 2012

Going by the adage of don’t over-optimize — now is not the time to obsess over performance, so I’ll file this away for later.

One usage question I have is whether it is more desirable to have an object that neatly organizes all of the information about a single site, or an iterable that contains limited information about many sites. I imagine in the long run it would be great to be able to output a arbitrary combination of, for lack of better terms, depth and breadth. Given limited RAM and processing speed, smaller depth (i.e. less information about each site) would allow greater breadth (i.e. more sites). Would a reasonable short-term goal be to implement the highest depth/lowest breadth case as well as the lowest depth/highest breadth case? What’s the minimum depth? Just the ref/alt sequence?

Another consideration for limiting RAM usage are any pre-computed values (if I’m not mistaken, allele frequency can be calculated given number of samples with data and total number of samples).