ELIZABETH HEPPENHEIMER, Ph.D.
  • Popular Science Blog
  • Scientific Publications
  • Writing Portfolio
  • Media
  • Contact & CV
  • Bioinformatics
Bioinformatics
 Code posted here may not be the most elegant solution, but it typically gets the job done. That said, please check to ensure that any code taken from this website actually accomplishes the desired tasks. 
Bugs/Errors can be reported to elizabeth.heppenheimer@gmail.com

​Linux/Unix Commands, Bash Scripting, AWK, etc.  


Bash for loops 
​How to repeat the same set of commands on multiple files without repeating yourself. Download my conceptual overview here. 
Symbolic Links (aka soft links)
Avoid the temptation to make multiple copies of the same file! This is a bad idea for at least three reasons: 1. It's confusing, 2. Memory is a limited resource, 3. It's definitely not necessary once you master soft links!
​Overview & Examples 
Very Basic AWK 
​Anyone who knows me knows that I'm nearly unhealthily obsessed with AWK, though many have tried to convince me that python or perl would be more worth my time. While that may be true, AWK is still an incredibly powerful text manipulation tool. I breakdown a few of the most basic AWK commands here. 
Very Basic AWK: Part II 
Slightly more advanced AWK. I cover parsing files based on the value of a column.  Read about it here.
Sort Files & Identify Unique Values 
This comes up a lot for me and is decently self explanatory. Bash script assuming a single column file available here, annotations within should be enough to adapt this to a multi column file.
Remove a File Header 
Does anyone else feel like they have to do this nearly every day? It's only one line, but I have it formatted as a ready to run bash script here. 
Vlookup with AWK
Familiar with the vlookup function in excel? This AWK one-liner bash script will accomplish something similar if excel is not an option.  
Basic Grep
Search (a possibly very large) file for a specific string (regular expression) and print the entire line that contains the expression you searched for. Useful for parsing log files, among other things. Short overview here. 
Note: Though it is not technically impossible to grep a number, but you'd be better off using AWK for that. 
Pipes
Let's be honest: I write far too many intermediate files when I do any kind of computational. Pipes let you use the output of one command as input for the next command and they really aren't difficult to use. See a few quick examples and notes: here.

Existing Programs, File Format Conversion, Specific Plotting Functions, etc.  


Calculate Number of SNPs per Chromosome 
Uses a plink .map file as input to calculate the number of SNPs per Chromosome. 
Written for the dog genome (38 chr + sex chr) but could be easily adapted for other genomes 
 Download here
Paired-End Read Mapping in Stampy 
Possibly my magnum opus when it comes to bash for loops, these commands will properly match read pairs (assuming there are many different read pairs in the same directory) for mapping. Use of this loop may require some editing depending on the file naming conventions, but it's a great starting point. This could also easily be adapted for other mapping programs or any context where separate files need to be recognized as pairs. Download here. 
 I also have a for loop for single-end read mapping in Stampy, but it's much less impressive.
Chromosome Paint Plots in R 
 Want to plot a chromosome (or whole genome) and color in certain segments according to ancestry, methylation frequency, etc? This approach takes two input files, one that is the base genome/chromosome and the second specifies which coordinates to color in. Download the Example R Code, Chromosome File, and the Coordinates File​.
Powered by Create your own unique website with customizable templates.
  • Popular Science Blog
  • Scientific Publications
  • Writing Portfolio
  • Media
  • Contact & CV
  • Bioinformatics