Manually setting project for virtualenv
Using virtualenvburrito/virtualenvwrapper, the project associated with a virtualenv is specified in: ~/.virtualenvs/VENVNAME/.project
Programmer, meet glob.
Copied from a Python library’s documentation (details changed to protect the innocent): “As an example of how useful Python libraries and syntax techniques can be, I will show how to create a list of all ‘ext’ files in the current directory using the os library: import os the_files = [file for file in os.listdir(os.getcwd()) if file[-4:]=='.ext'] “
And the summer ends
The coordinate mapper, with updated documentation, is now located on this branch, awaiting the merging of Peter’s f_loc4 branch. I’ve written an entry on coordinate mapping for the Cookbook. Additionally, at Peter’s suggestion, I’ve written a clarification of strand as it relates to transcription and translation. It’s available here. It’s been a great...
The summer is winding to a close. I’ve spent this week busy with orientation events and meetings for my upcoming PhD program. I hope to have time to continue to contribute to Biopython in my spare time, and ideally I would like to use and expand Biopython as a portion of my research. I have been considering how to handle gene strandedness. As long as I’m correctly interpreting the...
Coordinate Mapping update
Following extensive discussion on the dev list of the pros and cons of configuration classes/modules, I have refactored my coordinate mapper to keep configuration as isolated as possible. All mapping functions use base 0 internally. Transformation to and from 1-based coords is allowed by custom MapPosition objects. (they are currently separate from the Seq* positions but could probably subclass...
I have been expanding the coordinate mapper Reece posted to the dev list a couple of years ago. It’s currently living as a gist, although it has grown rather precipitously (over 300 lines each of code and testing). I may have gotten slightly carried away with the concept of “test-driven development,” but in this case, extensive testing is extremely critical. Note that as of...
I previously proposed the implementation of a method for PyVCF that would quickly scan the entire file and provide useful summary statistics. The idea is shamelessly copied from Brad’s GFF parser; for GFF, this method is helpful because the annotations on a sequence can vary widely. However, I no longer think this would be useful for VCF: Most importantly, the VCF headers generally...
Week 7: Variations
The highlight of this week was getting strep throat. I also gave myself a quick lesson in setuptools (working on an unrelated project), which is helpful for understanding how various projects do their building and testing. The fragmentation of standards is certainly not one of the things I love about Python (e.g. getopt/optparse/argparse; distutils/setuptools/distribute/distutils2). Reece...
Week 6: vertical slicing
This week, I wrote a script for PyVCF that can filter a file by sample as it’s being parsed. It’s currently named vcf_sample_filter.py. It’s designed to be functional from the command line, the Python interpreter, or as a module. I’ve written basic unit tests for the command line form. To minimize impact on the existing parser, I’m adding the necessary methods to...
Weeks 4 and 5
Given the choice of attending an aerospace conference or spending three days writing Python in the lobby of a hotel, I chose the latter, which turned out to be rather productive. I finished a prototype writer that reverses the VCF to SQL trip, discovering more of the peculiarities of the meta-format along the way. However, it seems that my SQL project may have been relegated to being a...
Weeks 2 and 3: More SQL
James raised some concerns about the difficulty of representing the VCF “metaformat” in SQL. I’ve taken these into consideration and am forging ahead. So far, some of the types of data fit more neatly into SQL than into a VCF row. I have redesigned my SQL schema with a two-pronged approach to tackle the flexibility of VCF: For the site, alt, and genotype tables, there are...
Weeks 0 and 1: SQL
I started implementing storage of VCF data in SeqRecord and SeqFeature. I digressed, spending a few days experimenting with overloading __getattr__() in lieu of manually writing properties. Then it occurred to me that if, as Reece pointed out, a variant doesn’t contain the actual sequence but a reference to the sequence, the advantages to using SeqRecord are minimal or possibly negative. ...
I’m going to use Github issues/milestones instead of trying to use some sort of calendar-based timeline. I have a very limited ability to predict how long things will take, so a date-linked timeline seemed like it would end up being “Lenna shuffles due dates around for half an hour every week.” Plus, the auto-closing and referencing of issues is really nice. Links: Variant...
A couple quick things: Met a grad student today who has written his own Python PDB parser - he only found out about Biopython two months ago. He has a collection of variations in alphabets used by proprietary software and I am trying to convince him to contribute it to Biopython. His lab does primarily NMR and molecular dynamics. Started a github branch for the variant project. I did subclass...
New Vim setup
Vim on Mac has some very odd quirks. I’ve seen a version refuse to understand backspace, so that the only methods of deleting text are using x and d commands. Mouse support is limited at best, and explore windows aren’t resizable. I tried MacVim but got some odd errors (“error detected while processing function”). This post mentions the same error. I used the...
My most exciting news this week is that I’ve understood and immediately come to love the @property decorator. Forcing users to use an accessor without even knowing it? Genius! I’ve also been reading everything I can find on the internet about whether traditional OOP has a place within Python (namely interfaces and inheritance). Overall, I’m more confused than before I started,...
I’ve been reading a lot of blog posts and StackExchange discussions about interfaces, inheritance, and their place in Python, and how Python features such as duck typing impact those as well as polymorphism. I’m coming from an admittedly hodgepodge background where I’m used to C++ design for large projects and Python for small. So it’s taking me some time to evaluate the...
Going by the adage of don’t over-optimize — now is not the time to obsess over performance, so I’ll file this away for later. One usage question I have is whether it is more desirable to have an object that neatly organizes all of the information about a single site, or an iterable that contains limited information about many sites. I imagine in the long run it would be great...
isinstance() considered harmful: http://www.canonical.org/~kragen/isinstance/ implied interface: http://www.shindich.com/sources/patterns/implied.html @property: http://docs.python.org/library/functions.html#property decorator definition: http://docs.python.org/glossary.html#term-decorator The property decorator is a way to force people to use an accessor without knowing about it. Amazing! It...
In the past week, I’ve continued to plan the overall structure of the Biopython Variant module (name pending, naturally). See my skeleton code and discussion on the mailing list. Brief summary of this post: I don’t think SeqFeature or an extension thereof would be appropriate for storing Variant data; therefore, I intend to make a new structure based on _Record and _Call in PyVCF....
Weeks -4 and -3: Spot IsA Dog; HasA Tail
“If I had an hour to solve a problem I’d spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.” Albert Einstein The second week of the GSoC community bonding phase overlapped with my finals week, so my work has been primarily limited to the cognitive stage. I’m glad for the enforced planning time before coding; I’ve made some important...
I’m pleased to announce that I will be spending the summer working on Biopython, funded by Google Summer of Code 2012! OBF announcement: http://news.open-bio.org/news/2012/04/students-selected-for-gsoc/ Project abstract: http://www.google-melange.com/gsoc/project/google/gsoc2012/lenna/23001
Fun with git
I currently have three (!) open github pull requests for biopython, and I also pointed out an open issue that had been fixed. Busy busy! One update was to allow automatic construction of a dict based on a remote file. Someone had already written the conversion script; I just worked on downloading the file. Struggled a bit with urllib etc. and couldn’t figure out how to get a file...
I’ve been reading The Pragmatic Programmer (http://pragprog.com/the-pragmatic-programmer) and one thing they really emphasize (other than Don’t Repeat Yourself) is trying to make sure the computer doesn’t get in your way while programming. I’ve made a few tweaks that should make my programming life more streamlined. In my ~/.inputrc, I added set editing-mode vi which...
Decided to test PyPy because the PLY lexer isn’t much use on Python implementations that can use the C module. Initial tests of the PLY CIF lexer with PyPy show that it’s marginally slower than with CPython. PyPy can’t import modules from within PDB due to NumPy dependency; however, most modules in PDB are compatible with NumPyPy. Determine least-ugly way to implement this. ...
Background Biopython has had a C CIF* parser for quite some time (2002 I believe) but it has been commented out of setup.py for a few years because the compiler required a link library and I don’t think Python has a great way to check for C libraries. The parser is written using flex,* which takes a fairly simple input file and makes generated C (default name lex.yy.c). This file is...
Virtualization pt 1: Debian
I finally gave in and made a Debian virtual machine (using VMWare Fusion). Yes, I know, I should use VirtualBox, but I’ve never had good experiences with free software interaction with the OS X windowing system. Qt does an acceptable Cocoa interface, but Tcl/Tk and X window software are both unbearably slow. So when MacUpdate’s latest package included VMWare, I decided to spring for...
Finding a libre 2D molecule editor: success!
After much blood, sweat, and tears, I’ve finally gotten gnome-chemistry-utils running on Mac OS X. Yes, really. Previous tribulations here. I decided to not make a CMake project because I figured there wouldn’t be Find scripts for most of the dependencies, and I eventually realized that everything I needed had a .pc somewhere. Major steps to getting it to find the dependencies: ...
Finding a libre 2D molecule editor: part 2
Right now, I’m trying to get gnome-chemistry-utils to install. Of course there’s no macport, so I’m currently in the ./configure -> install dependency via macports endless cycle. At least it’s got screenshots that show the simple line drawings I want, and it was updated in 2012. I’m keeping my fingers crossed. Fun things so far: requires goffice 0.9.0, macport...
Finding a libre 2D molecule editor
So I’ve been having fun trying to find a usable libre 2D molecule editor. I’m just trying to make diagrams of some simple pyridine-based molecules, nothing fancy. Avogadro seems a bit buggy. Odd graphics glitches (atom labels showing up as white squares, ball rendering very wrong), poor interface (no menus?). It also seems overly 3D oriented for my needs. I couldn’t find a...