About my project
I will be spending the summer of 2012 working on writing a genomic variant interface for Biopython, funded by Google Summer of Code.
All known living organisms, from bacteria to plants to humans, use DNA with the letters ATGC. A genome is the total sequence of all of an organism’s DNA; it could be described as its source code. Recent advances in technology have made it easier to obtain full genomes. However, more than 99% of the genome is the same between humans. The small differences are very important — in some cases, a single changed letter can cause a serious disease.
A set of one organism’s genomic differences from the expected (reference) genome is called a genomic variant. Genomic variants can be stored in a variety of file formats, and there are existing Python parsers for several genomic variant file formats. The overall goal of my project is to create a unified Python interface for genomic variant files. This should allow researchers more convenient and consistent access to this important type of data.
More information about genomes and genomic variation may be found at NCBI.
About Google Summer of Code
Google Summer of Code (GSoC) is a program in which students apply for funding from Google to work on an open source software project. The stipend is intended to replace a full time job or internship. More information can be found on the GSoC website.
For 2012, the 8th year of Google Summer of Code, 1212 students were accepted out of 4258 applicants. 8.3% of the accepted students are women. source
About open source software
In software, source code essentially means the instructions that make the software do what it does. Most software is proprietary and closed source, i.e. users can neither see nor modify the source code. Open source software means that the source code is freely available and modifiable. Open source software is very important for progress in research, because instead of having to reinvent the wheel, developers can focus on writing new and exciting things.
The majority of open source software is free software in the financial sense, but some is also intended to be free in the same sense as free speech. More information can be found at the Open Source Initiative and the Free Software Foundation.
About Biopython
Python is a programming language written in the 1990s and released as free software. It is popular because it is flexible and easy to write but still powerful. Biopython is a collection of Python software useful for biologists. It is part of the Open Bioinformatics Foundation.