Code posted here may not be the most elegant solution, but it typically gets the job done. That said,please check to ensure that any code taken from this website actually accomplishes the desired tasks. Bugs/Errors can be reported to [email protected]
Linux/Unix Commands, Bash Scripting, AWK, etc.
Bash for loops How to repeat the same set of commands on multiple files without repeating yourself. Download my conceptual overview here.
Symbolic Links (aka soft links) Avoid the temptation to make multiple copies of the same file! This is a bad idea for at least three reasons: 1. It's confusing, 2. Memory is a limited resource, 3. It's definitely not necessary once you master soft links! Overview & Examples
Very Basic AWK Anyone who knows me knows that I'm nearly unhealthily obsessed with AWK, though many have tried to convince me that python or perl would be more worth my time. While that may be true, AWK is still an incredibly powerful text manipulation tool. I breakdown a few of the most basic AWK commands here.
Very Basic AWK: Part II Slightly more advanced AWK. I cover parsing files based on the value of a column. Read about it here.
Sort Files & Identify Unique Values This comes up a lot for me and is decently self explanatory. Bash script assuming a single column file available here, annotations within should be enough to adapt this to a multi column file.
Remove a File Header Does anyone else feel like they have to do this nearly every day? It's only one line, but I have it formatted as a ready to run bash script here.
Vlookup with AWK Familiar with the vlookup function in excel? This AWK one-liner bash script will accomplish something similar if excel is not an option.
Basic Grep Search (a possibly very large) file for a specific string (regular expression) and print the entire line that contains the expression you searched for. Useful for parsing log files, among other things. Short overview here. Note: Though it is not technically impossible to grep a number, but you'd be better off using AWK for that.
Pipes Let's be honest: I write far too many intermediate files when I do any kind of computational. Pipes let you use the output of one command as input for the next command and they really aren't difficult to use. See a few quick examples and notes: here.
Existing Programs, File Format Conversion, Specific Plotting Functions, etc.
Calculate Number of SNPs per Chromosome Uses a plink .map file as input to calculate the number of SNPs per Chromosome. Written for the dog genome (38 chr + sex chr) but could be easily adapted for other genomes Download here
Paired-End Read Mapping in Stampy Possibly my magnum opus when it comes to bash for loops, these commands will properly match read pairs (assuming there are many different read pairs in the same directory) for mapping. Use of this loop may require some editing depending on the file naming conventions, but it's a great starting point. This could also easily be adapted for other mapping programs or any context where separate files need to be recognized as pairs. Download here. I also have a for loop for single-end read mapping in Stampy, but it's much less impressive.
Chromosome Paint Plots in R Want to plot a chromosome (or whole genome) and color in certain segments according to ancestry, methylation frequency, etc? This approach takes two input files, one that is the base genome/chromosome and the second specifies which coordinates to color in. Download the Example R Code, Chromosome File, and the Coordinates File.