Article Image
Article Image
read

Inspired by the first ever Destroy All Software screencast (and shamelessly copying its title) I’ve also decided to use the shell and the git command line tools to iterate over revisions, computing a statistic for each revision. In particular, the task I have in mind is to rank commits based on their message length to lines changed ratio. This will prove to be an interesting ranking, showing on the one spectrum small changes that have needed large explanations and on the other short messages that explain massive edits.

First we’ll start by using git rev-list in a bash for-loop to list all the commits that are reachable by following a given commit or commit range (assuming we’re in a git project directory).

for commit in $(git rev-list $1); do
	#use $commit here
done

Next we need to find the commit message length and number of altered lines for each commit. The former can be done using git log in combination with the --format=%B option which specifies what we want only the whole commit message. The latter can be achieved using the same command, but with an empty format argument --format= to skip everything from the output and the --shortstat flag to display statistics about changed lines. The wc and awk commands are used in addition to the git-ones to get the commit message length and sum the added/deleted lines respectively.

git log -n 1 --format=%B $commit | wc -m
git log -n 1 --format= --shortstat $commit | awk '{print $4 + $6}'

What remains is to only divide these two metrics for each commit and sort by the obtained ratio. Here’s how the whole bash script looks:

#!/bin/bash

set -e

for commit in $(git rev-list --no-merges $1); do
	length=$(git log -n 1 --format=%B $commit | wc -m)
	changes=$(git log -n 1 --format= --shortstat $commit |\
		awk '{print $4 + $6}')

	ratio=$(echo "$length/$changes" | bc -l)
	echo "$ratio $commit"
done | sort -n

Note that names are shortened and a line is split so that the whole codeblock could fit better on the page. The only unexplained commands and options are the --no-merges flag for git rev-list, which is quite self-explainatory, and the ratio calcucation using bc -l instead of a regular division. This is done because bash does not support floating point numbers and the easiest way in my opinion is to use bc with the -l --mathlib use the predefined math routines option.

To put this script in use, I ran it on one of my university projects, namely a Language Workbench Challenge, available on Github. The project is relatively small, but was done in a team and has a decent number of commits. The commit that comes on top of the chart is this one, in which I’ve changed a total of 6 lines (one of which is an addition of an empty line) and have explained it in a total of 251 characters. On the other end we have a commit with 514 additions and 509 deletions clarified in 49 characters.

Running it on a more real-world project like the express framework leads to this commit (albeit producing a few Runtime error (func=(main), adr=6): Divide by zero exceptions in the meantime). In it the author basically changes a single line while explaining it in 437 characters.

While the results of the script may not be exactly useful, it is still a good and fun exercise in bash scripting and git command line tooling.

Blog Logo

Milan Milanov


Published