Python bytes: vcs

Showing posts with label vcs. Show all posts

Friday, March 26, 2010

On commit messages

I would like to address the issue of commit messages. Good commit messages can make finding bugs and understanding the timeline of a project easy, and bad ones can result in an infuriating waste of time reading diffs and trying to locate information.

First of all, all commits should be atomic, that is they shouldn't include unrelated changes. Fixing a typo or spacing while fixing bug in related code is acceptable, but fixing 6 bugs and adding 2 features in the same commit makes it hard for people to parse out what change was for in the future. A good rule of thumb is that if a summary of your changes can't fit in one line, it's probably too big.

The first line of the commit message is most important part. This is especially true today, where many DVCSes only show the first line of the commit by default in their log command. The summary line should succinctly summarize what your change is and what it accomplishes. It need not be a full sentence, but just a bug number or general statement ("fix this") is not appropriate. The best summary lines quickly inform any log browser of the purpose and changes in the commit. Summary lines should also never be wrapped. Nothing is more annoying than reading a summary line which is cut off in the middle by a line break. Simple typo fixes do not require complicated messages. Good examples:

fix #2345 by preventing add() from accepting strings

fix a segfault in foo_my_bars() #4563

fix spelling

add a Python interface to the tokenizer #3222

and bad ones:

test and a fix

ugg

bah

a huge change to Foo class

why does this not work?

bug #4543

After the summary line can optionally come a body. A blank line should always separate the commit message from the body and different sections of the body from another. Bodies should also always be line wrapped. The body can include any of the following:

Bullet points describing various aspect of the change in more detail.

A paragraph description explaining why how something was implemented or why it's written a certain way.

A reference to mailing list discussions or decisions that lead to the commit.

Authors and attributions.

Any other significant information about the commit. For example, explain how it affects external components or might result in unexpected behavior.

Some projects follow the convention of listing affected files in bullet points and describing the individual changes to each. I personally find a prose summary of the changes in the body along with a diff or the verbose version of the log which shows changed files more helpful than this technique.
Good examples of complete commit messages:


"""
normalize encoding before opening file #3242

This change requires that tokenizer.c be linked with the Unicode
library.
"""

"""
silence foo warnings by default

Approved by BDFL in
http://mail.python.org/pipermail/mailinglist/bladh.html
"""

"""
support unicode in shlex module #4523

This is implemented by providing a separate class for Unicode and
requiring a locale to be set before parsing commences.

Patch by J. Hacker and J. Programmer
"""

"""
boost the speed of keyword argument comparisons

This improves some function calls by over 30% by comparing for
identity before falling back to the regular comparison. stringobject.c
was modified to provide faster access to a string's value.
"""

Monday, October 13, 2008

First impressions of darcs

This week, I've been playing around with the relatively little known distributed version control system, darcs. (That stands for David's Advanced Revision Control System.)

Darcs is based on David Roundy's, its creator, theory of patches. Simply put, darcs' fundamental type is a difference between two trees, a patch.

Creating a simple repo was quick and painless with "darcs initialize". I recorded a few patches easily, and was feeling quite happy about the fast pace with which darcs went about its business. Then, I decided to review my work. Apparently, darcs has no concept of a revision number; every "commit" is just a patch. This makes selecting patches to review rather difficult since everything is relative to the current state of the repo. Perhaps this isn't a problem in practice, though, because advanced patch matching (with regular expressions) is provided. Another thing I disliked was the lack of history in merging between repos. Although it is simple to do, no evidence besides the author's name in the log indicates that the patch was pulled.

Obviously, this is just a first step into the exciting darcs world; I'll continue to use it for some of my projects, and report back later.

Friday, July 25, 2008

bzr vs. hg -- a different perspective

I'd say one of my favorite parts of F/OSS is the educational value I can get from it. There's much more practical programming knowledge in Python/import.c than in your average CS textbook.

My code curiosity recently turned to my favorite (D)VCSes, Mercurial, and Bazaar. (I've tried reading Subversion, but the C is just too much.) Much of the world (including me) likes to battle over their merits, but I'd like to talk about what I saw in the source. (Disclaimer: I do not claim to have an good knowledge of either project's design philosophy or why things were done the way they were.)

On the superficial side, Bazaar has a lot more code than Mercurial does: About 1 MB with 2.5 MB of tests for Mercurial and 6MB with 6 MB of tests for Bazaar. I'm not going to make anything of it, though, for fear of condemnation. Mercurial's non-capitalized class names also drive me a bit crazy...

Overall, Mercurial appears to be a much simpler system, true to the wisdom of "do one thing well". The one repository format is beautifully simple. Bazaar, on the other hand, has to have layers of abstraction in order to make access over many protocols and different repository, branch, and working tree formats possible. For this complexity, it gains much flexibility allowing it to be a hybrid distributed and central VCS. Bazaar's source code also has many more comments and docstrings. Mercurial's looks rather bare in comparison.

Bazaar and Mercurial both seem to implement dirstate the same way. They also both have C extensions targeted to speed themselves up.

In the end, the both work well and are a pleasure to use, so I'm not going to complain.