awk reminder

Français   |   Source

To quickly and easily analyse files or outputs in bash, one can use the powerful awk.

Mean and median calculation

It is easy to compute a mean (for example on the last column of a file):

cat file | awk '{sum += $NF} END {printf "Average: %.3f\n", sum/NR}'

It can be improved to compute standard deviation:

cat file | awk '{sum += $NF; sq += $NF^2} END {printf "Average: %.3f\nStddev:  %.3f", sum/NR, sqrt(sq/NR - (sum/NR)^2}'

It is also possible to compute a median; the trick is to first sort the column you intend to analyse:

cat file | awk '{print $NF}' | sort | awk '{a[i++] = $NF} END {printf "Median: %.3f\n", a[int(NR/2)]}'

Find mininum and maximum

The main trick is not to forget to initialise min and max variables:

cat file | awk 'NR == 1 {min = $NF; max = $NF} NR > 1 {min=min<$NF?min:$NF; max=max>$NF?max:$NF+0} END {printf "Min: %.3f\nMax: %.3f\n", min, max}'

Convert time units

It is pretty easy to do floating point computation with awk and possibilities are almost endless. A concrete example is to convert milliseconds in minutes (with seconds):

awk '{printf "%d, %d min %.3f s\n", $2, $(NF-1)/1000/60,($(NF-1)/1000-(int($(NF-1)/1000/60)*60))}'

Truncate and round

To truncate, two solutions:

awk '{printf "%d\n", $NF}'

or

awk '{printf "%d\n", int($NF)}'

And to round:

awk '{printf "%.0f\n", $NF}'