Automating Hg Work flow

2018-05-10

Introduction

Mercurial(Hg) has very powerful command-line interface. This command line interface can be used to automate your development work flow that involves interacting with the Hg repository. In this post, I will introduce you to this interface and demonstrate certain uses of this interface by showing how I automate one of my own interactions with Hg. Our objectives for this automation is to DRY as much as possible, and

  • Not repeat mechanical stuff that could easily be automated
  • Reduce errors involved in visually looking up changeset information and typing out changset ids, while merging to/from branches(usually default)
  • Systematically generate consistent messages, that can be prefixed/suffixed to our commit messages. The instances where we want messages wrapped in a template would include:
    • Committing to a particular branch with some meta information about the feature, ticket id, and other tags that can be picked up by other team collaboration applications like code review tools etc.
    • While merging to default, carry over the branch names, ticket id and any meta information available from the branch automatically
    • While merging from default (As in once in a while you want to catch up with default.)

Mercurial Commands

Revsets

Almost all Hg command-line interfaces require one or set of changsets provided as an argument. When revset argument is not provided, implicit default changesets are used. This changeset arguments are provided in the form which Hg calls a revset.

Here are some examples of using revsets with hg log command:

  1. Using the default revset option, that displays the last commit on the local repo across any user.

    $ hg log --limit 1

    changeset:   29227:448ace9d726b
    tag:         tip
    user:        Guru Devanla <g@mail.com>
    date:        Sun May 13 10:13:33 2018 -0700
    files:       core-lib/cloop.py
    description:
    experimental ast implementation"
  1. To view that last 2 changesets that were committed by user devanla

    $ hg log -r 'last(user(devanla), 2)'
    changeset:   29227:448ace9d726b
    tag:         tip
    user:        Guru Devanla <g@mail.com>
    date:        Sun May 13 10:13:33 2018 -0700
    files:       external/cloop.py
    description:
    experimental cloop in python with some help from ipython


    changeset:   29223:3cc421ca1c23
    user:        Guru Devanla <g@mail.com>
    date:        Fri May 11 12:39:12 2018 -0700
    files:       package/public/table_compositor/README.rst
    description:
    update table compositor README
  1. To view the latest changsets that were committed by user ‘devanla’ but only on any non-default branch. Notice that the changes are from 2 different branches and not from default

    $ hg log -r 'last(user(devanla) and not(branch(default)) and heads(all()) and not closed(), 2)'
    changeset:   27763:7c0af63767a4
    branch:      devanla/table-compositor/1010/improve-documentation
    user:        Guru Devanla <g@mail.com>
    date:        Thu Mar 29 11:35:38 2018 -0700
    files:       package/public/table_compositor/README.rst
    description: improve documentation of table-compositor



    changeset:   24297:db5591f91f97
    branch:      devanla/external/1911/update-commit-hooks
    user:        Guru Devanla <g@mail.com>
    date:        Fri Jan 05 15:43:19 2018 -0800
    files:       external/commit-hook.py
    description: update commit hooks to handle mis-named branches

You see that the -r argument to the command used above can get very complex. But, it is easy to understand what is going on here. The -r argument is what we call the revset. Reading the documentation for revsets will be helpful. While reading the documentation, keeping the following concepts in mind, would help understand the documentation better.

  1. Each of the functions used in the argument to -r returns a set of changesets. eg. heads() returns all heads, last() by default returns the set with one changeset which is the last committed changeset. These functions are also referred to as Predicates.
  2. These functions can be composed with other functions to filter/narrow down the changsets. eg. last(heads(), 2) returns set of last 2 changeset ids across all heads.
  3. You can combined the result from each of this composed functions using set operators, since each of the composed functions return a set of changsets. eg. last(heads(), 2) and not branch(default) would filter out head on default. These operators are referred to as Patterns in the documentation.

With this breakup and the mercurial documentation you should be able to understand how to use revsets. Take a few minutes and read through the documentation now.

Templates

One other concept you need to learn before we talk about automation is how we can customize the printing of log messages. Note that Mercurial documentation recommends always using the hg commands to interact with the local and remote repositories and not to rely on the internal Python level API that can be used. In addition, to automate our interactions we usually need access to one particular changset id. To achieve this we will have to use the --template argument (or -T).

  1. For example, to view the a list of changsets as one liners, with short form of node-ids
    $ hg log -r "last(user(devanla), 2)" --limit 2 -T "{rev}: {node|short}: {desc}\n"

    29227: 448ace9d726b: experimental cloop
    29226: 1bdfad2d0c86: function refactor

Again, a lot more information about templates can be got using hg help templates or here.

Here is the outline you will need to quickly understand how to use the templating system:

  1. Every changset in mercurial has a set of attributes, like changeset id, branch, user, phase etc. Each of these attributes can be accessed and printed by a set of pre-defined keywords. eg: node for changeset ids, user for user, phase for phases etc.
  2. The value provided by these keywords are strings and they can be transformed by applying filters on them. I would call them transformations as the term filter is a bit misleading.
  3. Filters may not be sufficient in all scenarios, For example, dealing with list of values (files updated in a changeset) where the names have to be a concatenated string, applying conditional filtering/formatting based on the values, apply transformations to list of values. In such scenarios we have ‘functions’ like if, files etc.
  4. A list operator % is also available, to process return values of expressions that return lists.

Automating Our Tasks: Example

Powered with our knowledge of using revsets and templates we are able to either customize our output to view logs or just let the template return the changset id which in turn get be piped to other functions in our automation scripts.

Putting together, some of the things we learned about revsets and template, we can write powerful queries to interact with the Hg repository.

  1. To get the last update of a user (refer to example above)

  2. To get the last update of a user across different branches (refer to example above)

  3. Get the last update of user with a keyword in commit message or file name. (refer to keyword in template documentation for more info)

    hg log -r "last(user(devanla) and keyword('norm'))"
  1. Combine the former queries with template to just return node id, using -T {node}
  2. Use the node_id to perform some operations like hg up to last commit on a non-default node that is not merged yet
    hg log -r 'last(user(devanla) and heads(all()) and not parents(merge()) and not closed() and not branch(default))'  -T "{node|short}" | xargs hg up

Automatically wrapping commit messages in a template to achieve consistency

Now, lets make use of this knowledge to automate our work flow. This automation will especially help consistently apply some message templates to our commit messages.

This will be our objective:

Objective #1

I usually work on a feature branch. For consistency, I always name a branch with a particular format, say, {user}/{package}/{ticket-id}/{feature-desc-in-short}. This format forms a branch name. For example, a name of the branch in this format would be devanla/core-lib/19191/refactor-normalization-in-phase-2 Each time we commit a changeset to this branch, we want our commit message to include information about the ticket id, which will look this way core-lib: update norm function to take an optimization param, #19191. Note, core-lib and 19191 are string that were extracted from the branch name and used to ‘decorate’ the original commit message.

This method of referencing ticket ids in commit messages, usually tends to help other team collaboration tools to link changsets to code review requests. Note, that this template is just an example. The point is we want to be able to tag each commit with some information that is available in the branch. This provides information at the commit level and could also provide the needed meta-information for team collaboration tools.

Objective #2:

Once in a while we want to catch up with default. The typical steps would be


    hg up branch-you-are-working-on
    hg pull
    hg merge default
    hg commit -m"catching-up-with-default"

We want to automate this step.

Objective #3:

The inverse of step 2, is that once we are done working on changes on a feature branch, we want to merge those changes to default. While merging those changes, we want to commit messages of the merge to follow a template. We want this template to capture the meta-information that is available in the branch name. For example, if we want to merge branch: devanla/core-lib/19191/refactor-normalization-in-phase-2. Then, we want the commit message of the merge to be `core-lib: merged, refactor-normalization-in-phase-2, refs #19191’.

Note, that we built this commit message entirely from the branch name. We will automate this step as well.

Let’s Automate

Automate objective #1

Automating objective #1 is a little tricky. We will have to perform a number of steps that we have not discussed about so far. I picked up this method from this post on SO.


	import re
	import curses

    # Example: devanla/core-lib/919191/refactor-norm-function
    pat = re.compile(r'[a-z]*/([a-zA-Z]*)/([0-9]*)/*')


	def precommit_hook(repo, **kwargs):
	    # keep a copy of repo.commitctx
	    commitctx = repo.commitctx
	    branch = repo[None].branch()
	    ui = kwargs['ui']
	    curses.setupterm()
	    suffix_message = ''
	    prefix_message = ''
	    if branch != 'default':
		m = re.match(pat, branch)
		if m:
		    if len(m.groups()) == 2:
		        tlt = m.groups()[0]
		        redmine_ticket = m.groups()[1]
		    else:
		        tlt = ''
		        redmine_ticket = m.groups()[0]
		    response = ui.prompt('\nUpdate message with \"%s: [YOUR MESSAGE], refs #%s\" to message? (y/N)?' %
		            (tlt, redmine_ticket))
		    if response != 'N':
		        suffix_message = ', refs #{}'.format(redmine_ticket)
		        prefix_message = tlt + ':'
	    def updatectx(ctx, error):
		ctx._text = '{}{}{}'.format(
		        prefix_message,
		        ctx._text,
		        suffix_message)
		#ctx._text += extra_message
		return commitctx(ctx, error)
	    # monkeypatch the commit method
	    repo.commitctx = updatectx

Here is what the script does. We start with a branch-name that looks as follows: devanla/core-lib/191921/feature-to-improve-normalization..

Now say, we make some changes and commit the changeset, with the following message: hg commit -m"refactoring core norm functions args.

At that point this hook gets called, and transforms the message with information available in the branch name. The script shown above also asks for confirmation before transforming the message. After the hook is applied the final message will look like core-lib: refactoring core norm functions args, refs #191921.

Note that we attached the package-name and ticket-id to the commit message. This will have to change depending on each person’s work flow.

Automating objective #2

To achieve objective #2 we use shell scripting. Since, I use zsh, I have outlined the sample script using the zsh syntax. To automate this objective we add an alias function to our zsh that does the following

  1. Get the id of the current branch
  2. Makes sure we are not on default already. (since we want to merge changes from default)
  3. Ask for confirmation
  4. Merge changes from default with a default commit message. Note, that since we have our pre-commit hook enabled, the final message we get would look as follows corelib: catch-up-with-default, reds 191921. Is is nice that our pre-commit hook is still effective for this script as well.

# alias added to .zshrc or equivalent file that will bring this alias to zsh scope

cuwd () {
    local current_branch=`hg identify -b`
    echo $current_branch
    if [[ $current_branch = "default" ]]; then
        echo "Yikes! Sorry. \n You should NOT be on the default to perform this operation"
        return
    fi

    echo "Merging default to current branch"
    local temp='Press any key to Continue, Ctrl-C to quit'
    vared temp
    hg merge default
    hg commit -m"catch-up-with-default"
 }

Using this kind of consistent message template also automate future reporting, filtering changesets that are routine merges.

Automating objective #3

Now, once you are done working on a feature and you are ready to merge to default. Each time we merge to default, we want to provide information on what this merge is about. Again, since we have all this information in the branch name, we can have a script that will help us build a message and commit the merges to default.

For example, if we have a branch named devanla/core-lib/10101/feature-2001, then while we merge to default we can generate a message that looks like this: core-lib: merged, feature-to-improve-normalization, refs #10101.. Note that we nicely have some desc, the project-name and the ticket id in the message of the merge commit.

This again can be achieved with a shell script, powered by our knowledge of revsets and templating.


merge_latest_branch_to_default () {
      # this script only allows merging to default
      # check and quit if you are not on default
      local current_branch=`hg identify -b`
      echo $current_branch
      if [[ $current_branch != "default" ]]; then
         echo "Yikes! Sorry. \n You should be on the current branch to perform this operation"
         return
      fi
      hg lc_last_update
      local changeset_display="`hg lc_last_update --template "{rev}: {node|short}: {branch}"`"
      local changeset="`hg lc_all --limit 1 --template "{node|short}"`"
      local branchname=`hg lc_all --limit 1 --template "{branch}"`
      local message="`echo $branchname | sed -e 's/\(devanla\)\/\(.*\)-\([0-9]*\)-\(.*\)/\2:merged, \4, refs #\3/g'`"
      echo "Pushing changeset = $changeset_display"
      local temp='Press any key to Continue, Ctrl-C to quit'
      vared temp
      hg merge -r $changeset
      echo "\n\nMerge completed, do you want me to commit, using the following message\n"
      echo "\n\n------------------------------------------------"
      echo $message
      echo "\n---------------------------------------------------"
      local temp='Press any key to Continue, Ctrl-C to quit'
      vared temp
      hg commit -m"$message"
      echo "Complete"
}

Again, this script is just an example of how a automatic merge can be crafted. You will have to add you own little script to accommodate your particular work flow.

Conclusion

We developers are lazy (and proudly so) and we seek to automate many of the tasks we perform. Automation also comes with the benefit of reducing errors and provides a consistent result. In this post we saw how we can leverage some of the powerful command-line feature of Hg to automate our interactions with Hg as we add more features and changsets to our code base.