Author Archive for Rock

Tags – Subversion Conversion to Mercurial, Part 3

Well, if converting the trunk was a necessary appetizer, and converting branches was the meat and potatoes of Subversion to Mercurial conversion, then surely converting the tags is the dessert. We now understand all of the basics of converting the trunk and branches of a Subversion repository to Mercurial. All we need to do is pull over the tags that label specific revisions. Before we dive into dessert, let’s admire it a little by examining how tags are stored in both Subversion and Mercurial.

What Are Tags?

First, in Subversion, because revisions are stored as a set of pointers to the files that make up a revision, then a tag is just a set of pointers to the files at specific states in their history. This is done be making a copy of the file pointers we want in the tag at another location in the repository directory structure. O course, because it uses this generic way of creating tags, then tags might not only be tags. They could be branches, if further changes were made to the copied files. Additionally, a tag need not include all of the files in the trunk or in a branch.

Mercurial, on the other hand, records a tag as just a name associated with a revision id. That revision id specifies the entire state of the repository at that time, so by definition a tag includes all files in a repository. Additionally, tags in Mercurial are versioned, as any other file is. This means that your repository won’t know about tags created in another repository unless you’ve pulled the changes that create those tags, even though you may have the revisions specified by the tags.

Generically

As you can imagine, these fundamental differences make converting tags a very ambiguous process. Not any more ambiguous than converting branches, of course, but not any less either. The generic part of the algorithm that hg convert uses for converting tags, or the work done no matter what the type of the source destination, is fairly straightforward. The converter asks the source converter object for the tags, as a simple dictionary from the tag name to the revision id. The source revision ids are then mapped to destination revision ids, as long as that revision wasn’t skipped in the conversion process, e.g. by a file mapping that excluded it. The destination converter object then records the tags in the destination repository. The converter then updates the revision map, so that future conversions parent new children to the revision that creates the tags, rather than its parent. This avoids the problem of branching to create the tags. It also means that all tags are created after all other revisions have been converted into the new repository, which may seem odd. Unfortunately, it’s the least ambiguous way to convert the tags.

Finding the tags

Of course, this high-level description abstracts away the real work – getting the tags from the Subversion repository and putting them into the Mercurial repository. So let’s dive into the Subversion side first. Because of the way tags are recorded in Subversion repositories, as well as all of the non-tag things you can do with and to the files there, this is a rather tricky process, that can easily miss tags if the Subversion repository is in any way unconventional. Additionally, the algorithm may change if someone comes up with better heuristics for determining the tags in a Subversion repository. This comment before the algorithm for getting tags sums things up well:

# svn tags are just a convention, project branches left in a
# 'tags' directory. There is no other relationship than
# ancestry, which is expensive to discover and makes them hard
# to update incrementally.  Worse, past revisions may be
# referenced by tags far away in the future, requiring a deep
# history traversal on every calculation.  Current code
# performs a single backward traversal, tracking moves within
# the tags directory (tag renaming) and recording a new tag
# everytime a project is copied from outside the tags
# directory. It also lists deleted tags, this behaviour may
# change in the future.

The Subversion converter object goes through the svn log from the latest revision to the revision specified as the start revision in the converter config section (defaults to 0). From each log entry, it finds changes that are copies from one location to another, and sorts those copies from more specific to more general. Then it looks to see if the most generic copy is actually copying the tags directory itself. If so, we make sure that as we continue backward through the log we start looking in the old tags directory, rather than where it was moved to. Then, for each copy we check to see if the file(s) copied are copied into the tags directory.

After that check, we then look through all the copies seen so far, to see if the current copy is just a rename of an existing tag. If so, we update our record of the tag. The subversion converter object then checks each of the files added in the revision against the pending tags. If any file added creates a tag that tags files from different branches of the repository—i.e. files from the trunk as well as files from a branch—then it is discarded, since it cannot be represented in Mercurial.

Finally the source converter object goes through each pending tag, and determines the name. If the tag is a rename of another tag, it leaves it in the pending list, and continues to the next tag. Next it gets the revision id for the tag, by looking at the source revision from which the tag was created. Finally, if it hasn’t yet been added to the official tags list, it is.

Tagging the new repository

The work to put these tags into the new repository is much simpler. But first, one limitation of the tags conversion code is that all converted tags must be placed in a single cloned branch. This limitation exists because tags are not converted in line with other revisions, so the challenge of ensuring that changes to the .hgtags file are in all the right cloned branch repos is a tricky one. Each cloned branch could determine which tags apply to it, but then each cloned branch would have a different set of changes to the .hgtags file at its tip after the conversion, all of which would have to be merged when doing merges between these cloned branches.

So first, the destination converter object determines in which repository to place the updated tags. This is necessary if the –clonebranches option was specified on the command line, otherwise there is only one repository to put them in. It then loads up the old tags from the .hgtags file, and creates the full list of entries to be placed in that file from the tags dictionary retrieved from Subversion. If no new tags have been created, then it returns. If some have, then it saves the full list of entries to the file and commits the changes to the repository.

Summing Up

Now we’ve gone over all the basics of converting a repository from Subversion to Mercurial: the trunk, the branches, and the tags. But we’ve just barely touched on the many different tricks of converting repositories, and cleaning them up after the fact. Tools like svnadmin dump with dumpfilters, the mq extension to Mercurial, hg histedit, the hgsubversion extension, not to mention the possibilities with going through another VCS in the process, such as Git, all offer possibilities worth exploring when you run into issues with conversion. Though I don’t have concrete plans for writing about each of these, I will occasionally share tips and tricks as I learn about them. In the meantime, happy coding!

Branches – Subversion Conversion to Mercurial, Part 2

After reading about how the hg convert extension can convert the trunk of a Subversion repository to Mercurial, you’re probably thinking: “But we have more than just a single line of development! We branch our code! We merge it! We tie it into knots! It’s like a great monster!” Of course it is. You wouldn’t be good developers if it weren’t. If a simple trunk is like a single snake, unbroken from head to tail, then any actively developed repository is more like the Hydra, with more heads then you can count, and poisonous breath besides. Even worse, the Hydra is immortal, and so cannot be killed – sounds like a legacy codebase to me! Heracles couldn’t kill the Hydra, but he could bring it under submission, and put it to his own uses.

So lets look at how hg convert can take the branches in a Subversion repository and bring them into a Mercurial repository, thus taming the beast.

Finding the Hydra’s Heads

Getting the head revision when we were just dealing with the trunk was easy – just find the latest revision under the trunk’s path. But now we’re dealing with the Hydra – we need to keep track of many heads, one for each branch. The source converter object (remember that one? It’s Subversion specific) does this work. After getting the head of the trunk, it then lists the contents of the branches directory. This directory is either detected as a child of the source url passed into the hg convert command, or it is specified in the convert configuration section. Typically that’s done on the command line using the –config parameter. So the source converter object lists the children of the branches directory. For each child, it checks if it’s a directory, and finds the latest revision in that directory. As long as the latest revision wasn’t the one that created it (i.e. it’s a branch with no changes in it), then it adds that latest revision to the set of heads.

As with the trunk, we then need to follow the parents of each head back to track down all of the revisions that need to be included in the convert. And as with the trunk, if hg convert has been run before against the same source, then we only track back till we find changes that have already been converted into the destination repository.

Sorting Things Out

The biggest challenge in fighting the Hydra isn’t the poisonous breath – Heracles overcame that with a simple cloth over his mouth and nose. No, it’s that when you cut off one of it’s heads, it grows back two more. So one of the first things you need to know is whether it’s heads will grow back in parallel or one after the other. When converting Subversion repositories to Mercurial, this concept corresponds to the sort order.

Now that we’re going to be converting multiple branches, the order in which we sort the changes to be imported is important. The hg convert extension offers three types of sorts: branchsort, datesort, and sourcesort. We can ignore sourcesort, because that only applies when importing from a Mercurial repository. The default for Subversion repositories is branchsort. This means that when importing from Subversion, the algorithm essentially sorts them as a depth first search. It imports one branch all the way to its head revision, then goes back and imports the next branch. In other words, the Hydra grows back one head, then grows a second. This is in contrast to datesort, which imports each revision in date order, or rather, the hydra’s heads grow in parallel. Datesort is more intuitive, and leads to repositories that are organized the way we expect them to be, with development going on in the trunk and in branches at the same time. Branchsort, however, actually creates smaller Mercurial repositories, because the diffs to files tend to be much smaller when they’re organized by branch, rather than intermingled.

Given that disk space is cheap, you will be happiest if you specify datesort, and only rerun hg convert with branchsort if you experience any problems with the resulting repository size.

Hacking at the Hydra

Heracles defeated the Hydra by cutting off it’s heads, then having his nephew cauterize the wounds so that new heads would not grow back. Finally he placed it’s one remaining immortal head under a heavy rock, trapping it.

The trick to defeating the Hydra is to limit the number of heads you’re dealing with. Unfortunately, normal conversions from Subversion to Mercurial will often leave more heads than expected. Remember, this is the Hydra – we should expect more heads than expected. In this case, it is because the convert extension cannot recognize Subversion merges . Doing so is a tricky problem because Subversion merges are so flexible. So after the conversion, merges don’t produce a nice revision with two parents. Rather, it looks like any other revision with one parent, while the other parent is left dangling. Because Subversion branches are often closed when they are merged back into trunk development, this means that we’re leaving an extra head in the converted repository that need not be there. Even if the Subversion branch was used further after the merge, we’ve still lost an important piece of history by not recording the revision as a merge in the new repository. Fortunately, the convert extension does provide a manual workaround to this limitation: the splicemap. Like Heracles firebrand wielding nephew, the splicemap can safely eliminate the Hydra’s heads. The splicemap does this by allowing you to specify the parents of any given revision.

As you might imagine, you can easily shoot yourself in the foot with this ability. You could rearrange revisions in any number of ways, creating an odd tree, or switching revisions around in ways that significantly increase the size of the converted repository. But rather than hacking up and grafting together a totally new Hydra, let’s just use it to get rid of a few of the hydra’s heads. It’s most obvious benefit is to specify the two parents of any merge operation, in most cases eliminating one head from the converted repository. It can also be used to bring together two disparate lines of development, which may occasionally be useful, e.g. when you realize that two separate repositories should really be combined into one.

The hg convert extensions implements the splicemap using a simple lookup of revision based on the ids specified. Then it replaces the parents on a commit that has been retrieved from the source converter object, before having the destination converter object put the commit into the destination repository.

One trick to using the splicemap is understanding the revision format used in the splicemap file. For subversion repositories, it is important to get this right, or it will be as if the splicemap hadn’t even been specified. A subversion repository has its revision in the splicemap formatted like so:

svn:<uuid>/path/to/module@<revnum

So for example, revision 931750 in the trunk of the official subversion repository would be specified like this:

svn:13f79535-47bb-0310-9956-ffa450edef68/subversion/trunk@931750

How Many Hydras?

In taming the Hydra of a subversion repository, we have an option that Heracles did not have: rather than having to eliminate Hydra heads by chopping them off and cauterizing the wounds, it’s as if Heracles could chop the Hydra up into n Hydras, each with one head, so it’s really more like a snake. Instead of a many headed Hydra, we end up with a bunch of one headed snakes. Indiana Jones might not like that solution, but it’s much easier for a hero like Heracles to tackle, because he can just deal with one at a time.

Likewise, instead of removing heads using the splicemap, we can split the repository into multiple repositories, each with one head. To understand the differences between these having a bunch of named branches in one repository, versus having a separate repository for each branch, it is helpful to read Steven Losh’s excellent branching in Mercurial guide.

If you decide to just create named branches in the destination repository, the source converter object records the branch that a given revision is on, and the destination converter object creates a commit with that named branch. Nothing too spectacular here.

Creating cloned branches, where there is a separate Mercurial repository for each Subversion branch, takes a little more work. For you, the little more work is just to specify –clonebranches on the command line. For the converter, it needs to make sure that each revision goes into the right repository. First, when copying each revision from the source to the destination, the converter object finds the branches that a revisions parents are on. It then tells the destination converter object which branch the child revision is on, as well as the branches that the parent revisions are on. The destination convert object first sets the correct repository to commit the revision to. If it doesn’t exist yet (i.e. the revision is the first in this branch), then the destination repository is created. Then it needs to make sure that the destination repository has all of the revisions leading up to the one being copied. So it pulls all of the appropriate revisions in from each of the parents branches. Finally it is ready to commit the child revision to the appropriate branch repository.

Cleaning up the mess

Fighting the Hydra can be messy. Here are some things we can do to clean up once we’re done. One thing to note when doing Subversion to Mercurial conversions is that you’ll want to eliminate empty revisions. Typically, these occur because the subversion revision either only changes subversion properties (and not files), or because it only creates the directory at the root of the trunk or one of the branches. They can also occur if a filemap is specified. But if it isn’t, then the hg convert extension doesn’t try to eliminate empty revisions. So the rule when doing conversions should be to always specify  a filemap file, even if you just leave it empty. This will make sure that the hg convert extension still tries to eliminate empty revisions.

You may also want to eliminate branches from the history, preserving only the merge commit. The easiest way to do this is to not use the splicemap to merge the branch into development, and then strip that branch from the mercurial repository after the conversion is done. The trunk will still have the proper changes from the revision that performed the merge, but all of the history of how that revision was created in the branch will be gone.

Victorious

Hopefully, that’s enough information to both understand how conversion of branches works, as well as to successfully convert the unique repositories you’ve got on hand. Once you’re done you’ll still have to deal with the immortal Hydra, i.e. all your existing code. But hopefully you’ve made things more manageable along the way, making it easier to leverage all the goodness in that code. Like Heracles, it should now be possible to go forth and conquer other monsters using the Hydra’s venomous poison.

The Trunk – Subversion Conversion to Mercurial, Part 1

So you’ve got a bunch of code, the key to your companies future, and like a good developer you’re keeping it under source control using Subversion. But you’ve heard about these new distributed version control systems, like Mercurial, and after doing some research you’ve decided to take the plunge. But now you face a challenge: how to get all that juicy code into a Mercurial repository?

Never fear! The wonderful Mercurial developers have created an the excellent extension that can convert Subversion repositories to Mercurial. Because I work on the Kiln tool to import Mercurial repositories from existing code in a different source control system, I’ve been working to understand more about how the whole conversion process works. To get a good basis of understanding, let’s first look at how the extension will import a single line of development – the trunk. This is reasonable in cases where there are no branches or where all branches have already been merged into trunk, and you don’t care about which changes were made on which branches. I’ll explore the process of converting tags and branches in future posts.

A Generic Algorithm

self.ui.status(_("scanning source...\n"))
heads = self.source.getheads()
parents = self.walktree(heads)

The convert extension is designed to be a generic converter from many different repository types. The overall convert algorithm is handled at this generic level, while the details of retrieving specific revisions, files, and tags are handled by converter source objects that are specific to the source repository type. A destination converter object (in this case Mercurial-specific), does the work of writing the revisions, files, and tags to the new Mercurial repository.

So you fire up your trusty shell, and kick off:

hg convert svn://path/to/your/svn/repository  --datesort

After parsing the command line, the converter creates the source and destination objects. We didn’t specify a filemap, but if a filemap had been specified, the converter would then wrap the source object, which is the subversion converter object, with a filemap source object. This filemap object uses the filemap file to adjust file paths before they are passed to or returned from the source repository object.

When you execute a typical hg convert command, the first output line you’ll see  is this:

scanning source...

When this appears, the converter begins by asking the source object to get the heads, or the latest revisions, of that repository. Because we’re ignoring branches for now, the subversion converter object will just get the latest revision under the trunk. After getting this revision, it walks backwards through the revisions until it reaches the beginning. As the converter retrieves each revision from the source converter object, it caches it and creates a map from the revision to a list of it’s parents. In the case of a single Subversion trunk, each revision will only have one parent.

Sorting the revisions

self.ui.status(_("sorting...\n"))
t = self.toposort(parents, sortmode)
num = len(t)

The converter needs to process the revisions in the right order. The hg convert command gives three sorting options: datesort, branchsort, sourcesort. The sourcesort option is not available when converting from Subversion. To perform either of the other sorts, the converter first creates a children map from the parents map, as well as a list of the roots, or revisions without parents. For our trunk-only conversion there will only be one root revision. Starting at this root revision, the converter chooses the next revision based on the ordering type. Then it adds any revisions whose parents are all in the ordering to the list of possibly next revisions, from which the next revision is chosen. For a subversion trunk-only conversion, there will only ever be one revision to choose from, regardless of the sort order. Therefore, I’ll discuss the differences between datesort and branchsort in part 2, on converting branches.

Importing changes

def copy(self, rev):
	commit = self.commitcache[rev]
	files, copies = self.source.getchanges(rev)
	parents = [self.map[p] for p in commit.parents]
	newnode = self.dest.putcommit(files, copies, parents, commit,
	self.source, self.map)
	self.source.converted(rev, newnode)
	self.map[rev] = newnode

Now that we’ve got this sorted list of revisions, the converter can start the process of converting each one individually. It does this by retrieving the appropriate changes from subversion and copying them to Mercurial.

When it initially walked the tree of changes, the subversion converter object stored the paths of files in each revision as well as the parent revisions. Because we’re looking at a trunk-only conversion, each revision will only ever have 1 parent. As the conversion proceeds, each of these revisions has the paths expanded. Each path is checked to see if it is a file, a directory, or a deleted item. File paths are recoded appropriately. Paths representing directories are expanded to include all files in the directory at that revision, and records of copied files and directories are also stored.

The Mercurial converter object then goes through the files and copies and retrieves the contents of each file from the subversion converter object. It uses the file contents to create the revision to be committed to the destination repository. That’s it. Your conversion is all done! Or is it? What if someone makes more changes to the subversion repository after you already performed the conversion?

Multiple hg convert runs

# Record converted revisions persistently: maps source revision
# ID to target revision ID (both strings).  (This is how
# incremental conversions work.)
self.map = mapfile(ui, revmapfile)

The hg convert extension supports multiple executions against the same source and destination repositories. This can be useful if you did one run of hg convert, and then later wanted to pull in further development from your subversion repository. This feature is primarily made possible by the revmap, a file that hg convert saves in the destination’s .hg directory. The revmap is just a simple map from revision ids in the source repository to revision ids in the destination repository. The hg convert extension reads this revmap in (if it exists) before beginning conversion. It uses the revmap to determine which revisions have already been converted, and accordingly begins with revisions that come after those already converted. One option, when running hg convert, is to specify where the revmap is – or where to save it if this is the first run against a given repository.

Another trick to consecutive hg convert runs is the authormap. The authormap is a file that allows you to change author names when converting from Subversion to Mercurial, which can be quite useful if you want to add additional information to Mercurial users, such as email addresses. The authormap, like the revmap, is stored in the destination .hg directory. On subsequent hg convert runs, this file is read in and used if no authormap is specified. If there is both an authormap specified on the command line and one in the destination .hg directory, the two are merged, with the one on the command line winning whenever there is a discrepancy.

How the filemap works

fmap = opts.get('filemap')
if fmap:
	srcc = filemap.filemap_source(ui, srcc, fmap)
	destc.setfilemapmode(True)

One last aspect of conversion deserves consideration – the filemap. Implementation of the filemap uses an interesting design. The code for handling the filemap is in a filemap converter object, much like the subversion converter object. This filemap converter wraps the subversion converter and does the mapping in a way that both the subversion converter and the hg converter can be oblivious to its presence.

The filemap converter object handles two major pieces of functionality. First, it takes care of renaming files. The renaming of files is done by a filemapper, which keeps a map of from and to filenames. Whenever filenames are passed to or from the converter object, it does the mapping necessary.

The more interesting challenge is determining which files and revisions should actually be included in the conversion. First, the filemap converter checks to see if a given revision includes any files that are included in the filemap. If so, then the revision needs to be converted. But the revision also needs to have it’s parent updated to the correct revision. In subversion, the parent of a given revision is simply the previous revision. Of course, that revision might not include any files in the filemap, and so be discarded during conversion. So the filemap converter needs to reparent the new revision to the last included revision also.

Coming Soon …

This algorithm at its root is quite simple. But understanding what is going on in the simple case is essential to understanding what is happening when we make it more complicated with branches, tags, and the options associated with them.  Part two will be a detailed look at how branches are converted from Subversion to Mercurial.

Commitments

In my post on how to care, I committed to start blogging more regularly, rather than sporadically. I began last week by spending an hour on my commute each day doing writing and research for the post I published Saturday. On Sunday I took a break from the entire internet. I plan to continue with that schedule. This coming week I’ll be eliminating some time-wasting websites from my life and replacing them with reviewing and acting on my next actions lists, including one for improving my blog. However, because I don’t want my blog to be primarily about blogging, or about my own personal growth, I’m not going to be publicly committing and reporting on my commitments here in the future. For those who care, they can see my current efforts on my public Daytum page.

Mercurial will make you a better developer

Since starting at Fog Creek, I’ve been learning about Mercurial from day one, since I’m working on Kiln. It was a big change from my work at Microsoft, where we used a VCS that was much closer to the Subversion model than the Mercurial model. One of my areas of focus in Kiln has been the import tool for teams migrating from Subversion. As I’ve tried to wrap my head around Subversion, Mercurial, and converting between the two, I’ve started to realize that many of the cultural differences between the two communities stem from basic technical strengths and weaknesses between the two products. Feel free to substitute Git for Mercurial, if that’s your cup of DVCS tea.

You could argue that the cultural differences led to the technical differences between the two camps. I suppose that’s probably true for the earliest contributors to the products, but it’s more likely that the technical strengths and weaknesses of each product appealed to those who naturally thought in certain ways, thus leading to the natural congregation of people with similar outlooks on creating software.

But enough of that. On to the differences.

Single project repositories vs. multiple project repositories

One change you’ll run up against, which was initially quite disconcerting for me, is that each project in Mercurial was contained in it’s own repository. I was used to one huge repository with different subdirectories for different projects. Consequently, because the code for Kiln is broken up into 5-10 separate repositories, I’ve spent the last few months asking others on my team if it wouldn’t be better to just combine some or all of our repositories. About once a month. I admit that I still think some combining would be good, but I’m beginning to understand more fully the mindset that leads to lots of small repositories.

This difference is one of the easiest to trace to basic differences in how the products work. Mercurial is much more narrowly focused, as a product, than Subversion is. Mercurial is all about tracking changes to a set of files. Subversion is all about tracking changes to each file separately. Mercurial tracks some repository wide information, such as branches, tags, and repository settings. Subversion allows you to branch, tag, and set properties on the whole repository or any subdirectory, or any random unrelated set of files, if you so desire.

Of the two, Subversion is far more general purpose in nature. It tracks changes to each file and directory separately, only keeping an overall revision number that tracks the chronological order of changes. Because of that, there are many features in Subversion that allow you to operate on a portion of the repository. You can check out a specific subdirectory, map files from another subdirectory, keep your working directory files at different revisions per file (called mixed revisions), and set properties on directories that apply to a directory and all of it’s children, such as which files to ignore when doing an svn status. Its even possible to set different permissions for different parts of the repository.

In contrast, Mercurial manages a single set of files in a repository. Directories are not first class objects in Mercurial, as they are in Subversion, they’re just artifacts of file names. Although internally, Mercurial tracks changes to each file separately there is no way to put the working directory into a mixed revisions state. The DAG cannot handle that type of freedom. Of course, the fact that Mercurial requires you to download the entire repository history to create your own working directory also puts downward pressure on the size of a repository. And because permissions are the same throughout a single repository, if different code needs different permissions, it also needs to be in a different Mercurial repository. The same is true for other settings, such as which files to ignore when doing hg status.

All of these differences naturally lead to smaller repositories that typically contain one project in Mercurial, and larger repositories that typically contain many, if not all projects, in Subversion. If you’re coming from Subversion, you’re going to want to get used to it. Fortunately it appeals to your innate desire to componentize — that is an innate desire, right? For me it is, and Mercurial makes it easier to do it at the project level. Of course, because Mercurial does less, it leaves to other systems the management of multiple repositories (see bitbucket and Kiln).

Sam Hart, when he decided to switch from Subversion to Mercurial, discussed this exact phenomenon:

“If you’re like me, when you originally set up SVN you did so in the laziest way possible.

“Setting up SVN repos is more work than it should be. It involves using commands that you normally never have to touch (svnadmin), setting up new entries for those repos in your http server’s configuration files (if you’re using Apache and WebDAV), and setting up user permissions to those repos. Thus, the lazy way to set them up is to make one central SVN repo under which you have multiple sub-repos. This has the advantage of making your repository very easy to maintain. However [it] has a big disadvantage in that a user with write access to any sub-repo will have write access to the entire repo.

“In Hg, on the other hand, setting up a new repository is much easier, and maintaining multiple repositories more manageable. So, if you’re like me, you may be tempted to remedy past sins by splitting your single gargantuan SVN repo into smaller Hg repos.”

Commit often vs commit when “ready”

Another change you’ll need to adjust to is to commit often. You’re probably used to making a bunch of related (or unrelated) changes, then doing some testing. You may build a version of your product and have others do testing. You’ll probably run automated tests, possibly multiple sets of automated tests. And finally, you’ll check in.

If you do this in a team using Mercurial they’ll wonder where you disappeared to while your code was being written, complain about how large the code reviews are, and be frustrated at how slowly you iterate on your code towards a good solution.

On my team at Microsoft, we had a concept of a shippable chunk of software. This helped guide the creation of branches in our centralized VCS. We could work in the branch, possibly with one or two other developers, until we had something we could reasonably ship, then merge the branch back into the main development repository. Depending on the rules for checking in to a VCS, whether centralized or distributed, software teams develop an understanding, either explicit or implicit of what a “committable chunk” is. What amount of code is worth committing, either for review or sharing with others.

The key change in mindset for me has been to make my own “committable chunks” much smaller than they used to be. No longer do I make hundreds of changes in tens of files, tying up another developer for hours in code reviews. It’s easy to make frequent commits locally, and push those to a personal branch on Kiln regularly for review.

But DVCS’s don’t just make it easy to have smaller committable chunks. They make it easier to manage committable chunks of all sizes. Because I work against a personal repository and merging is so easy, I can commit almost minute by minute to my personal repository, push multiple times a day to the feature branch I’m working on, push occasionally from the feature branch to the main development branch, and handle multi-feature pushes from development to a stable release branch. Obviously, those are all possible with a CVCS, but they always took so much time and effort to manage the branches, do the merge, and verify that nothing broke. In practice, that meant that steps were left out, and things slipped through the cracks.

Now, my changes to code are clearer, my original intentions more obvious, and I feel far better with my code in source control. I can look at changes at a small granular level, or I can look at the big merges.

Branch always vs. branch rarely

Closely related to a cultural norm of small, frequent commits is a norm of branching. Every clone of a Mercurial repository introduces a new branch once a change has been made. It’s also easy to branch many times within that clone. When I first made the switch, I didn’t really understand this. I knew I could work separately from other developers in my own repository, but I didn’t think of it conceptually as branching. It was more like I had my own sandbox, which I could then merge with the main repository when I checked in. And the idea of easily branching within my own repository still seems new to me.

But I’m learning to embrace the value in branching. As with frequent commits, it’s the ease of merging that makes the benefits of branching so readily available. And I’m beginning to value the power of having branches within my local repository. I can work on bug fixes separate from a major refactoring work, and easily (and quickly) switch between the two using a simple “hg up” command – even when I’m offline. That’s great for the times when I’m deep into feature code and a sudden urgent bug pops up that needs to be fixed and released immediately. I can also switch back and forth between work on two different features , which is great when I get stuck and just need a mental break from one of them. Also, it makes it super easy to prototype out new ideas without messing with my regular development.

One counterpoint to the ease of branching is that it may isolate developers. John DeRosa registers his concern about this:

“Additionally, I think distributed SCMs like Mercurial have a not-yet-fully-appreciated problem in making it too easy to not [ever] check code back into the main pool.  With a local repository, a developer can feel protected from accidents and continue working happily for quite a long time.  And then, say a year down the road, he/she does a massive check-in and discovers an integration problem.  Branches, or a local repository that is effectively a private branch,  should be easy to make — but not too easy.”

Let me explain why I don’t buy it. First, “a year down the road”?! Seriously? It says something that you have to imagine a scenario so horrible and unlikely in order to envision easy branching as a bad thing. I think that the author likely didn’t realize how easy merging usually is with a DVCS like Mercurial. And he must have totally forgotten that this lone maverick developer could have been merging the main development line into her own repository every day or week. The right solution to this imagined (and barely imaginable) scenario is not to eliminate easy branching, since without it the lone developer will do the same thing, but be much more likely to lose her work because it won’t be stored in a repository. The right solution is to fix a broken culture that enables someone to go a full year with no accountability for their work.

Source code files vs. all code files

Another important difference relates to what files you put in the repository. Because Mercurial and other DVCS’s don’t handle versioning of large files well, it is much more tempting to store them in a different way. This most obviously manifests itself in the storage of built binaries. If they are largish, and you want to keep lots of copies of them (nightly builds backed up for QA purposes, or even just weekly or monthly builds) then your repository becomes quite large and unwieldy very quickly. These types of files typically don’t diff well, making diffs between versions very large, and because the files are very large themselves, it means that downloading the repository takes much longer.

In this area, Subversion currently has a clear advantage. Only the files in the working directory are downloaded to client computers, so storing the history of large binary files only requires storage scaling on the server. Bandwidth is significantly reduced. Because it’s fairly simple, many Subversion installations have used it to track changes to built binaries and other very large files. The challenges to scale and management are limited to one machine, the server.

Naturally, users of Mercurial push handling of these large files to other systems. Their VCS is the location for their source files, typically a bunch of text files, which external tools then build into large binaries which are almost never stored in the same or another Mercurial repository. It is true that efforts are underway to alleviate this weakness in Mercurial, though I’m sure some don’t see it as a problem at all. The bfiles extension is an attempt to limit provide a more centralized model for certain large files. Of course, it has tradeoffs, but the fact that it’s being actively developed indicates that, at least for many, the tradeoffs are worth it.

For now, I’m happy that this aspect of Mercurial motivates me to automate more, to maintain more of the components of my products as code (in some form) that is compiled (using some method) to create these human-unreadable products.

Conclusion

There are obviously different ways to look at these cultural and philosophical differences between Subversion users and Mercurial users. One might look over the differences and conclude that Subversion seems much more flexible than Mercurial. Therefore, it must be better. Another might see how much better Mercurial handles basic source control features, such as branching, merging, and tags, and conclude that it is therefore a better product. It’s pretty obvious to me that these two views are quite related.

Michael Haggerty makes this point quite well in his post Git, Mercurial, and Bazaar—simplicity through inflexibility. The discussion is about the merging differences between Git and Subversion, but the principles apply to Mercurial as well. He argues that the very flexibility of Subversion is what makes merging more burdensome:

“Starting with release 1.5, Subversion, ironically, supports a much more flexible model of merging than the DAG-based DVCSs. Changes from any commit can be merged to any branch at the single-file level of granularity, enabling all of the operations listed above and some even weirder things (for example, a change that was originally applied to one file can be “merged” onto a completely different file). If your workflow demands this sort of thing, Subversion might hold significant advantages for you.

“But there are also many disadvantage to Subversion’s flexibility:

  • Subversion’s merging model is more complicated than that of DAG-based VCSs, and therefore more complicated to implement and less predictable.
  • It is much harder to visualize the history of a Subversion project (contrast that to DVCSs, whose history can be displayed as a single DAG).
  • Subversion merges are innately slow, because of the large quantities of metadata that have to be manipulated.
  • The bookkeeping of SVN merge info requires more user conscientiousness, and mistakes are not as easy to spot and fix.”

While he doesn’t take a stand on which is better, a CVCS like Subversion, or a DVCS like Mercurial or Git, I will. Mercurial  (and other DAG based DVCS’s) provides a level of intrinsic guidance to developers through the limitations it has. Like many other great products, it is defined in part by what is not included. One might easily say that it is defined in large part by that. Products like the original iPod and iPhone both have this same feature. By focusing on the most important features, and specifically limiting users choice in other areas (changing batteries, how to buy and download apps, etc.), Apple created products that are wildly successful. True, they may not be as flexible as an Android, Blackberry, or Windows Phone. But they got the right things right.

And I think Mercurial is a step in that direction. I don’t think it’s there yet, but I don’t think anything else is any closer. Some other DVCS’s (git, at least) are also heading in the right direction, though they may be coming from a different starting point. Mercurial creates a philosophical and cultural starting point because of the technical choices that define both its strengths as well as its weaknesses. That philosophical starting point is a fundamentally better starting point for software development. It leads to greater componentization, greater granularity of history, more productive use of development time, and more automation.

How To Care

Earlier this month, Merlin Mann wrote about one principle that is more important than we typically acknowledge: First, care. He begins by discussing the common challenge of staying focused on the important stuff. Staying focused is the easy thing, he reminds us: just do one thing at a time. Of course, he knows that an obviously tautological statement like that is not the solution, so he tells us what is: we must care more about the one thing we’re doing than anything else. If we don’t, then we’ll naturally flit from task to task without the sense of focus we desire. Eventually our lack of focus becomes the most important thing we focus on, but that just makes the problem worse. The only real solution is to care so much about something that the question changes from “How do I maintain focus?” to “How do I get rid of everything unrelated to the one important thing I’m working on?”

A False Start …

Fortunately, I read this article at just the right time – one month into developing my nightly post-mortem and planning habit. Because I actually did care about that habit, I recognized the truth in Merlin’s post. But I also recognized that the rest of my activities perfectly exemplify people focused on their lack of focus. And because I was less than a month away from starting a new habit, I knew what it should be: It was time to eliminate my focus issues.

Of course, I was thinking about it all wrong. And if you had actually read Merlin’s post, you’d know that. Go back, try again. The whole point of his post is that you cannot make eliminating distraction your focus. In reality, I needed to replace my bad habits with good ones. My bad habits consist of mindless internet surfing to a variety of sites. When I’m stuck on a problem at home or at work, I gravitate to a web browser and drown out my “stuckness” by reading all about politics, or watching movie trailers, or learning about some cool new technology, or following random news that isn’t really all that important. Oh, I usually get back to the problem eventually, and usually solve it … eventually. And then I go back to the mindless surfing.

Another False Start …

So I listed the triggers for mindless surfing, as well as the sites I would regularly visit. Once that list was in place I started thinking about what to replace it with. I came up with all kinds of ideas, ranging from practicing code katas, to working through my book reading list, to exercise, to doing a better job of reviewing and acting on my next actions lists, to cultivating my blog.

Over the last few weeks I tried to narrow this down. Okay … not really. What I really tried to do was come up with 8-9 small steps that replaced my lack of focus with 8-9 new habits. Focus Fail! Well, it wasn’t quite that bad, but I was trying to replace my lack of focus with a few different things, which would mean that I would still have a lack of focus. And of course, that made it hard to actually come up with the steps to take, and what I did come up with was unorganized and lacked, well, focus. And besides, it seemed about 10 times harder than my nightly post-mortem habit.

Lessons from False Starts …

However, in the process, I discovered some important benefits to Leo’s idea of anticipating the start of a new habit without actually starting it. First, you don’t jump into something prematurely. If I had tried to start with the ideas I had a week ago, or two weeks ago, I’m pretty sure I would have failed to keep up the changes. Oh I may have stuck with them for a few weeks, or even the whole two months, but the lack of natural cohesion would have eventually broken up the habit.

Second, it’s pretty easy to try out some of the ideas you have by doing “test runs”. Basically, I tried one of the small steps I would take for a day or two, just to see what it was like. Writing last week’s post on the habit creation process was one of these experiments. The key to this is to make sure your “test runs” are not a priority. My priority throughout was still the nightly post-mortem, but with my spare mental energy I also tried out some of the ideas I had, and some of them are even sticking. However, they are not my current focus, and if they fall by the wayside that’s ok. Other activities haven’t stuck at all, or were obviously not useful, so they won’t be part of my next habit.

What I re-realized just a few days ago is that I need one thing to focus on. I knew when I began this process what it would probably be: developing my blogging into a regular habit, rather than something I do once every few months. But at the time, I just didn’t care enough. I didn’t care enough to overcome my fear of failure. I didn’t care enough to push off the other interesting things I could be doing with this magical free time I’ll be creating for myself. Which wouldn’t be created, of course, if I maintained my current lack of focus. I didn’t care enough to commit myself and really do the hard work it will take to develop a blogging habit.

How to Care …

When I realized that, I discovered another important truth: Merlin’s instruction to “First, care”, can be spurred on by “Commit to something”. Caring is important. Caring about lots of different things isn’t going to solve your focus issues, however. But sometimes committing yourself wholeheartedly to something can increase how much you care. With the assumption that I’ll focus on this until I it’s a lifelong habit, and then go to work on the many other failings and weaknesses in my life, I can truly focus on it and nothing else. I can care about it more than anything else. I know I’ll get to the other stuff, eventually, which is no worse than before I made the commitment. And I’ll only get better at committing and following through with practice, thus increasing my chances of actually succeeding at all the other stuff.

So that’s my commitment – make blogging a regular habit. But what to blog about? I don’t know exactly. I’ve enjoyed writing this post and the last about habit development. I’m a developer at Fog Creek, and that means there are a bunch of technical topics that interest me also, and maybe Fog Creek needs another blogger. I could just as easily blog about religion or politics, both important subjects to me. I don’t have any active hobbies to write about as I’m not currently running,  and don’t have the money yet for flight lessons. Writing about one of those hobbies, or another one, could help me get motivated to do more related to that. Or I could blog about blogging (like I am right now!) and the things I’m doing to get better at it. Or not.

I guess it comes back to the motivations I have for blogging, beyond the fact that I just committed to doing it. I want to blog to become a better writer, but I can do that with any topic. I want to blog to create a public reputation within the software and business communities I’m a part of. I want to blog to explore topics that are interesting to me, because forcing myself to express and understand those ideas is an important way that I learn. I spent a year and a half teaching a Sunday School class and learned more about the scriptures because I had to express myself well in order to teach. I’d like to have a similar experience with other areas of knowledge. I want to become an active part of at least one online community.

It’s that last point that will probably make this habit a little bigger than just blogging. I don’t want to just talk into my blog, disconnected from the rest of the humanity – I want to be involved in conversations. And that’s what I’m committing to. I’m still working out the individual steps, and over the next couple months I’ll be committing to those steps each week here on my blog.

The 6 Changes Habit Creation Technique in Action

I first read the posts at 6Changes.com just before Christmas. At the time, I was preparing for the yearly planning that my wife and I do each January. I already wanted to make some changes in my own life, and 6changes.com was like a small revelation. It convinced me to tackle an important change, gave me a set of things to do, and happened to be at just the level of detail I needed. I’ve gone through enough attempts at self-improvement to know they don’t all stick. That naturally made me wary of some of the claims Leo makes, but it also helped me to recognize the truth of many of his ideas.

In summary, he advocates the following regimen for making a change in your life:

  1. Choose a habit to develop
  2. Only work on one at a time
  3. Build anticipation up to a starting date
  4. Commit publicly to the overall habit
  5. Break the habit into 8-9 small steps
  6. Choose habits and steps that can be done daily
  7. Add one of the small steps each week for two months
  8. Commit publicly to each step when you start it
  9. Report publicly on your progress

You can check out his site for more on the reasoning behind these steps.

I have spent the last 7 weeks following this plan step by step. I recount the experience here because a detailed blow-by-blow of one person’s attempt would have helped me when I started.

Preparation …

After reading through the ideas at 6changes.com, I wanted a habit that would help me continue developing new habits, as well as get me doing some of the simple things that I wanted to make sure got done each day. My nightly post-mortem and planning session was born. This includes a variety of small but important steps: writing in my journal, getting to inbox zero on my various inboxes, going through my tickler file, planning the next day, and nightly prayer.

I followed Leo’s advice and didn’t start right away. I thought through the small steps I wanted to take each week, and made plans to start the first full week of January, giving me a few days to recover from New Years and having family in town. I came up with 8 small changes to my nightly routine that, combined, would make a big difference. And because I was starting out so small, it was really quite easy.  Also, I told my wife about the new goal, and I started tracking it at Daytum.

I broke my habit into the following small steps:

  1. Organize desk
  2. Record date in journal, scan inboxes, and clear daily plan
  3. Write in journal
  4. Process inbox items
  5. Process ticklers for tomorrow
  6. List things I want to get done tomorrow
  7. Prepare to pray
  8. Nightly Prayer

Getting Started …

I already prayed each night before bed, so my first step was just to organize and clean up my desk, before my prayer. Organizing my work area took all of about 1 minute and typically just involved me plugging in my laptop, cell phone, and Zune. Sometimes there was more to put away, like when my son decided to do his homework on my desk and just leave it all there. But it never took more than a couple minutes to finish, so it was easy, even when I stayed up quite late.

The next week, I opened up a page in OneNote and recorded the date (Alt-Shift-D for you keyboard fanatics). Then I closed it. I scanned my inboxes, but did not process anything. And I looked at my calendar. Total added time: 30 seconds. That was the beginning of my journal writing habit. I made it to the end of the second week without missing a day.

The third week, I expanded my journal entry by actually writing a little about the day, sometimes spending a few minutes recounting something I’ve been thinking about or an interesting story. This added about 5 minutes to the process. My entries aren’t typically very long. Sometimes, I add some thoughts about what I hope to do the next day. So far, I haven’t missed a day writing a full journal entry since January 17th.

Building on the Habit …

Now to the most obviously beneficial change: processing my inboxes. This could have been daunting, initially, since I hadn’t been doing a good job of this. That is to say, all my inboxes were full of crusty old stuff that had been lying around for weeks or months. Even though I knew I would start working on this part of the resolution near the end of January, I didn’t try to get ahead of myself by changing my inbox processing earlier or doing a big purge the day before. I just eased into it. I figured if I went through one or two items each day from each inbox I’d be pretty close to inbox zero by the end of the week.

Of course, I went through a lot more than 1 or 2 items each day that first week. I had to, just to keep up. But it’s pretty easy to delete or archive all the random stuff that doesn’t require any action. And I made sure to finish each day with less in my inboxes than I had started with. Ever since then I’ve been at inbox zero in my personal email and physical inbox every single night. My work email is a slightly different beast that I tackle at work anyway. I do take care of the easy stuff at night though. And now that I’m current on all that it usually takes just a few minutes.

Next step: tickler file. Ever since first reading about the tickler file in Getting Things Done, I’ve thought it would be a great tool to use. Of course, using it absolutely requires some sort of daily habit, or it’s just another place to lose track of things that are important. Well, now I had a daily habit that I’d kept up for a month without fail, so I added going through my tickler file, which I keep in OneNote, each night. That was a huge change, in that I now had an easy way to remind myself of something at any point in the future.

The one modification I’ve made to the tickler described in GTD, is the addition of four “week” folders for the four weeks of the month, which I go through on Sunday. Then I only need seven “day” folders instead of 31. It’s easy to remember to go through the longer time periods when I should because I put reminders to do so in the shorter time periods. For example, my Sunday tickler file has a reminder to go through the weekly tickler file, and my “4th week” tickler file has a reminder to go through the monthly tickler file, etc.

Once I had the habit of looking back by writing in my journal, dealing with the stuff at hand by processing my inboxes and tickler file, it was time to look forward by planning the next day. For me, it’s a really simple process that just involves listing in order the things I expect to do the next day. I usually include my plans for the commute (1.5 hours one way on the train and subway), focus goals for work, and how to spend the evening with my family. I also add reminders to my calendar for things I cannot forget. This addition makes it easy for me to just get up in the morning and go. I don’t have to think as much about what I should be doing.

Going Forward …

All that brings us up to Valentine’s day. I’m spending this week and next improving my nightly prayer, which is still too perfunctory. And I’ve started working out a plan for the next habit to create. I have a lot more confidence going into the next one, because of the experiences I’ve had over the last 7 weeks.

Overall, it’s been a great experience, one that I hope to repeat in March and April.

Fog Creek Software, Inc.

Well, I’m now at Fog Creek. I love it here. The benefits they mention in their recruiting materials are all real and all great. I love the food, I love the snacks, I love the flexibility, I love the views of New York and the Hudson.
More important than that, I love the work.

Fog Creek

Microsoft

Over the last few years, working on Outlook at Microsoft, I became more and more aware of good engineering practices. At Microsoft, so much of product development is handled by others, that initially, it was easy to sequester myself in my office, write lots of code, then fix lots of bugs, and feel a certain sense of accomplishment. I had to learn what it meant to write code that could be localized, code that was considered to have quality, by the standards there. I also saw lots of code, much of it good and much of it bad. I became very familiar with old code and still managed to forget plenty of code that I wrote after only a few months.
But I also started learning about what good code could be. I started to develop a certain feel for good code, and also started to recognize the natural impediments to writing good code, some of which are inherent in shipping a product that people will buy. I learned about TDD and tried to practice it in my work. I shared the ideas and concepts with my team. I started to get impatient with pace of adoption at Microsoft, while at the same time recognizing some of the reasons it was slow as valid.
As I approached my 6 year mark at Microsoft, I really started feeling like it was time for a change, both professionally and personally. Our family was ready to live somewhere that didn’t involve 8-9 months of rain each year. So I started feeling around for jobs. The Fog Creek opening landed on my radar at just the right time and, long story short, I started on September 14th.
One part of that long-story-made-short bears mentioning. After making the decision to take the job at Fog Creek, I was once again impressed at what a great place Microsoft is. My coworkers and management were very supportive of both me and the move. I’ve reflected on my time there and feel very good about what I learned, what I accomplished, and the products and features I was able to work on. I would definitely recommend it as a great place to work.
I’ve had a few friends and acquaintances ask me how I went about making this decision, and I felt like it would be good to share some of the reasoning that went into it for others. I’m not going to write a “10 reasons to switch jobs now” list or a “7 reasons small companies are better than large ones” list or a “How to land a dream job in New York City” essay. Rather, I’d like to talk about how I see my career and hopefully pull out some universal principles for any career in software development.

Own your career

First of all, you need to own your career. I’m not going to go into a lot of detail on this one, because any good advice on careers in general, or software development careers in particular, will offer plenty of detail on this one. I just want to call it out as a necessary pre-requisite to all the other ideas.

Know what you want

As my wife will tell you, I don’t normally know what I want. I tend to be pretty ambivalent about a lot of things. About 6 months before this career change, however, I started taking about 30 minutes a day just to think about and write about my career. I went back over all the notes I’d ever written down, all the goals I’d ever considered, all the crazy ideas I’d ever had. I organized them all and tried to figure out what made me tick. This was a pretty simple exercise, easy enough to do before breakfast (what can I say, I’m a morning person). But it paid huge dividends almost immediately. I went into work more excited (still at Microsoft). More importantly, I started to really care about where I would end up. I don’t know if my 30 minutes a day strategy would work for everyone, but you really need to become passionate about something, or your career won’t go anywhere.
For me, that passion boiled down to becoming a software craftsman. Which leads to another important principle in creating your ideal career: strive to be your very best. I connected with the ideal of a being software craftsman, which is admittedly still being defined. Whether that includes TDD as pre-requisite, or whether that means trying to be a Duct Tape Programmer is secondary to the ideal of crafting the best code possible within a set of interesting, creativity-inspiring constraints.

Broad and deep

A few years ago I had some interesting conversations with a friend of mine about whether education should initially take a student deep or broad. I don’t know that we ever answered that question, but we both agreed that a good education is ultimately both. At Microsoft, as at most big companies, I got to go deep. I learned what it meant to work on a single code base, in a single language. During my 6 years there I worked on two or three major feature areas. So much of the product development and marketing was handled by others, and many of them I never met or knew. I was able to gain a lot of knowledge about a very limited set of technologies, product areas, and functional roles. A big part of my desire for a change, any change, was to do something new. I wanted to come to a small company so I could at least be closer to the sales, marketing, and customer support side of the business. I wanted to change my technical focus, and learn more about new technologies: web development, new languages, whatever. It was time for me to broaden my experiences.

Make incremental improvements

When I told my wife that I had submitted my resume to a company based in New York City, her first reaction was incredulity: “New York City! I never want to live there!”.  I assured her it was just for interview practice, because I felt the same way. I bring this up because sometimes our career goals will take us unexpected places. I personally would love it if Joel woke up one day, realized how much more profitable Fog Creek could be if it were based in Fort Collins, Colorado, and decided to move the whole company there. But I doubt that will happen. And that’s ok. I realized, and so did my wife, that we weren’t going to get the perfect situation at this point (especially in this economy). We knew even before the job search began that there would be some compromises. As it is, I’m extremely happy with the compromises we made, and we’ll probably be at Fog Creek for quite a while. I expected to give up more and not get as much. But no, we don’t live in New York City. They couldn’t handle our three boys, so we’re actually enjoying a great place in New Jersey.

Sometimes, be the worst

I know this advice has been around for a while in the body of software development career advice (links!), but that’s another reason I made this move, and it’s already started to bear fruit. I was by no means the best on my team at Microsoft. But I’d been there a while and I was too comfortable. I knew, coming to Fog Creek, that it would not be comfortable, that I would stretch myself technically, socially, and in other ways. But that wasn’t just something I had to live with. It was a necessary change for me, one that I looked forward to. It was time to grow, to learn, to stretch.
In saying this, I recognize that you probably shouldn’t go through your whole career this way. It would probably get pretty depressing if you always felt that you didn’t quite measure up. Plateaus aren’t bad things, they allow us to step back, to look over the things we’ve learn, to rest and recuperate in a way. And being the best is also a valuable position to be in, when you can mentor others and learn from their fresh new ideas.

And so…

At this point, I’m very happy with the move. I’m excited to be working on Fog Creek’s new product, Kiln. I’m just as excited for the new version of Outlook to ship, and am itching to get my hands on a copy of Windows 7.
Life is good.

Whimsical Walk

In my last post I mentioned that I was doing some practice with the binary search algorithm. I wanted to approach it with a slightly different mindset and see what kind of an algorithm that led to. So I decided to think of it as a walk. I would go “visit” locations in the array and see if I should turn right or left at each one. Doing this in C meant that I could do some crazy stuff with arrays – nothing I would do in shipping code, but fun to play with nevertheless. Here is what I came up with:

 1 char ChopWalk(int value, int *array, int size, int *poffset)
 2 {
 3     int walkto = size / 2;
 4     int direction = 0;
 5     char found = 0;
 6 
 7     if (walkto == size)
 8         return 0;
 9 
10     if (value == array[walkto])
11     {
12         found = 1;
13         direction = 0;
14     }
15     else
16     {
17         if (value > array[walkto])
18             direction = +1;
19         else if (value < array[walkto])
20             direction = -1;
21         found = ChopWalk(value, &array[walkto + direction], direction * (size / 2), poffset);
22     }
23 
24     *poffset += walkto + direction;
25 
26     return found;
27 }
28 
29 int Chop(int value, int *array, int size)
30 {
31     int result = 0;
32     if (ChopWalk(value, array, size, &result))
33         return result;
34     else
35         return -1;
36 }

Practicing Binary Search

So for the last couple of weeks I’ve been practicing the binary search algorithm, as laid out by Dave Thomas here. I’ve done both iterative and recursive variations in C++ and then in C. I love the value of katas for learning new languages and features. I know both C++ and C, but doing this simple problem in both programs allowed me to work on learning new areas in each. First, in C++ I took the time to use the standard template library (stl), because I’m not very familiar with it. I just used vectors in my tests, but it helped me to become familiar with the stl documentation and the concepts of iterators as used in the stl.

Next, I did the same variations in C. This was fun because C is so basic, and it’s been a long time since I coded in straight C. You don’t worry about concepts like objects or functional programming (though they may help you think about the problem). It gets you really close to the underlying physical model of computation. You’re forced to think more about how the memory is laid out, and how the algorithm takes advantage of that, whereas with C++ and the stl the whole problem and the solution are more abstract, just slightly more removed from hardware.

In addition to the languages themselves, it’s also a chance to practice my use of tools. I used the work to hone my skills in vim, to understand our build system at work more fully, and to consider testing frameworks.