#Basic AWK Overview #Written by E Heppenheimer, Spring 2018 #awk is technically a programming language and is very useful for the manipulation of text files #Here, I breakdown the very basics of awk commands #AWK interprets text in terms of "fields" #"fields" are nearly synonymous with columns, though there is possibly a subtle distinction in the computer science world #fields are specified by use of the dollar sign ($) followed by the field number #that is, "$1" typically calls the first column of a file, "$2" calls the second column, etc. #The default field separator, that is, what separates column 1 from column 2 within awk, is white space #White space characters are most commonly spaces or tabs #However, you can override this default and make the field separator essentially anything you want #Common examples of custom field separators (i.e. non white space)include underscores (_) and dash marks (-) #But you could make it something really obscure, exclamation point (!), percent sign (%), etc. #I'm not sure why you would want something obscure, but it is technically possible #While fields in awk are usually specified by the number, it is possible to specify fields in other ways #For example, "$NF" specifies the last field of the file, note this may be different for different rows! #Second example, "$0" species all fields of the file! #AWK Example: #Extracting a subset of fields (that is, columns) from a file #Assuming a 6 column tab separated file, and you would like to make a new file that includes only the first two columns #use $ awk '{print $1 "\t" $2}' inputfile > outputfile #this command essentially tells awk to read the input file, print the first column ($1),then print a tab ("\t"), then print the second column ($2) #if the original input file looked like this: 1 2 3 4 5 6 #the output file would read: 1 2 #In addition to extracting columns, this basic syntax can be used to: #reorder columns e.g. awk '{print $2 "\t" $1}' inputfile > outputfile #add different separators between columns e.g. awk '{print $1 "_" $2}' inputfile > outputfile #and many other similar tasks... #Note: As the field separator in the orignal file was white space (a tab), we did not need to specify that in the awk command #If the field separator is anything other than white space, it can be specified with the -F flag #For example, if the field separator in the input file is a period (.) $ awk -F "." '{print $1 "/t" $2 "/t" $3}' infile > outfile #one thing to note is that awk (for reasons beyond the scope of this document) can't overwrite files #that is, you cannot name your output file the same as your input file and the following command DOES NOT WORK: $ awk '{print $1 "\t" $2 "\t" 3}' example.txt > example.txt #repeat: this command does NOT work and example.txt will be empty and you've just deleted your data #if you would like to overwrite the original file, use this instead: $ awk '{print $1 "\t" $2 "\t" 3}' example.txt > temp.txt && mv temp.txt example.txt