<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Becoming a craftsman &#187; programming</title>
	<atom:link href="http://rock.hymasfamily.org/blog/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://rock.hymasfamily.org/blog</link>
	<description></description>
	<lastBuildDate>Sun, 18 Apr 2010 01:19:54 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Tags &#8211; Subversion Conversion to Mercurial, Part 3</title>
		<link>http://rock.hymasfamily.org/blog/2010/04/16/tags-subversion-conversion/</link>
		<comments>http://rock.hymasfamily.org/blog/2010/04/16/tags-subversion-conversion/#comments</comments>
		<pubDate>Sat, 17 Apr 2010 01:56:15 +0000</pubDate>
		<dc:creator>Rock</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://rock.hymasfamily.org/blog/?p=100</guid>
		<description><![CDATA[Well, if converting the trunk was a necessary appetizer, and converting branches was the meat and potatoes of Subversion to Mercurial conversion, then surely converting the tags is the dessert. We now understand all of the basics of converting the trunk and branches of a Subversion repository to Mercurial. All we need to do is [...]]]></description>
			<content:encoded><![CDATA[<p>Well, if converting the trunk was a necessary appetizer, and converting branches was the meat and potatoes of Subversion to Mercurial conversion, then surely converting the tags is the dessert. We now understand all of the basics of converting the trunk and branches of a Subversion repository to Mercurial. All we need to do is pull over the tags that label specific revisions. Before we dive into dessert, let&#8217;s admire it a little by examining how tags are stored in both Subversion and Mercurial.</p>
<h4>What Are Tags?</h4>
<p>First, in Subversion, because revisions are stored as a set of pointers to the files that make up a revision, then a tag is just a set of pointers to the files at specific states in their history. This is done be making a copy of the file pointers we want in the tag at another location in the repository directory structure. O course, because it uses this generic way of creating tags, then tags might not only be tags. They could be branches, if further changes were made to the copied files. Additionally, a tag need not include all of the files in the trunk or in a branch.</p>
<p>Mercurial, on the other hand, records a tag as just a name associated with a revision id. That revision id specifies the entire state of the repository at that time, so by definition a tag includes all files in a repository. Additionally, tags in Mercurial are versioned, as any other file is. This means that your repository won&#8217;t know about tags created in another repository unless you&#8217;ve pulled the changes that create those tags, even though you may have the revisions specified by the tags.</p>
<h4>Generically</h4>
<p>As you can imagine, these fundamental differences make converting tags a very ambiguous process. Not any more ambiguous than converting branches, of course, but not any less either. The generic part of the algorithm that hg convert uses for converting tags, or the work done no matter what the type of the source destination, is fairly straightforward. The converter asks the source converter object for the tags, as a simple dictionary from the tag name to the revision id. The source revision ids are then mapped to destination revision ids, as long as that revision wasn&#8217;t skipped in the conversion process, e.g. by a file mapping that excluded it. The destination converter object then records the tags in the destination repository. The converter then updates the revision map, so that future conversions parent new children to the revision that creates the tags, rather than its parent. This avoids the problem of branching to create the tags. It also means that all tags are created after all other revisions have been converted into the new repository, which may seem odd. Unfortunately, it&#8217;s the least ambiguous way to convert the tags.</p>
<h4>Finding the tags</h4>
<p>Of course, this high-level description abstracts away the real work &#8211; getting the tags from the Subversion repository and putting them into the Mercurial repository. So let&#8217;s dive into the Subversion side first. Because of the way tags are recorded in Subversion repositories, as well as all of the non-tag things you can do with and to the files there, this is a rather tricky process, that can easily miss tags if the Subversion repository is in any way unconventional. Additionally, the algorithm may change if someone comes up with better heuristics for determining the tags in a Subversion repository. This comment before the algorithm for getting tags sums things up well:</p>
<pre style="padding-left: 30px;" lang="python"># svn tags are just a convention, project branches left in a
# 'tags' directory. There is no other relationship than
# ancestry, which is expensive to discover and makes them hard
# to update incrementally.  Worse, past revisions may be
# referenced by tags far away in the future, requiring a deep
# history traversal on every calculation.  Current code
# performs a single backward traversal, tracking moves within
# the tags directory (tag renaming) and recording a new tag
# everytime a project is copied from outside the tags
# directory. It also lists deleted tags, this behaviour may
# change in the future.</pre>
<p>The Subversion converter object goes through the svn log from the latest revision to the revision specified as the start revision in the converter config section (defaults to 0). From each log entry, it finds changes that are copies from one location to another, and sorts those copies from more specific to more general. Then it looks to see if the most generic copy is actually copying the tags directory itself. If so, we make sure that as we continue backward through the log we start looking in the old tags directory, rather than where it was moved to. Then, for each copy we check to see if the file(s) copied are copied into the tags directory.</p>
<p>After that check, we then look through all the copies seen so far, to see if the current copy is just a rename of an existing tag. If so, we update our record of the tag. The subversion converter object then checks each of the files added in the revision against the pending tags. If any file added creates a tag that tags files from different branches of the repository—i.e. files from the trunk as well as files from a branch—then it is discarded, since it cannot be represented in Mercurial.</p>
<p>Finally the source converter object goes through each pending tag, and determines the name. If the tag is a rename of another tag, it leaves it in the pending list, and continues to the next tag. Next it gets the revision id for the tag, by looking at the source revision from which the tag was created. Finally, if it hasn&#8217;t yet been added to the official tags list, it is.</p>
<h4>Tagging the new repository</h4>
<p>The work to put these tags into the new repository is much simpler. But first, one limitation of the tags conversion code is that all converted tags must be placed in a single cloned branch. This limitation exists because tags are not converted in line with other revisions, so the challenge of ensuring that changes to the .hgtags file are in all the right cloned branch repos is a tricky one. Each cloned branch could determine which tags apply to it, but then each cloned branch would have a different set of changes to the .hgtags file at its tip after the conversion, all of which would have to be merged when doing merges between these cloned branches.</p>
<p>So first, the destination converter object determines in which repository to place the updated tags. This is necessary if the &#8211;clonebranches option was specified on the command line, otherwise there is only one repository to put them in. It then loads up the old tags from the .hgtags file, and creates the full list of entries to be placed in that file from the tags dictionary retrieved from Subversion. If no new tags have been created, then it returns. If some have, then it saves the full list of entries to the file and commits the changes to the repository.</p>
<h4>Summing Up</h4>
<p>Now we&#8217;ve gone over all the basics of converting a repository from Subversion to Mercurial: the trunk, the branches, and the tags. But we&#8217;ve just barely touched on the many different tricks of converting repositories, and cleaning them up after the fact. Tools like svnadmin dump with dumpfilters, the mq extension to Mercurial, hg histedit, the hgsubversion extension, not to mention the possibilities with going through another VCS in the process, such as Git, all offer possibilities worth exploring when you run into issues with conversion. Though I don&#8217;t have concrete plans for writing about each of these, I will occasionally share tips and tricks as I learn about them. In the meantime, happy coding!</p>
]]></content:encoded>
			<wfw:commentRss>http://rock.hymasfamily.org/blog/2010/04/16/tags-subversion-conversion/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Branches &#8211; Subversion Conversion to Mercurial, Part 2</title>
		<link>http://rock.hymasfamily.org/blog/2010/04/08/branches-subversion-conversion/</link>
		<comments>http://rock.hymasfamily.org/blog/2010/04/08/branches-subversion-conversion/#comments</comments>
		<pubDate>Thu, 08 Apr 2010 11:00:56 +0000</pubDate>
		<dc:creator>Rock</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://rock.hymasfamily.org/blog/?p=92</guid>
		<description><![CDATA[After reading about how the hg convert extension can convert the trunk of a Subversion repository to Mercurial, you&#8217;re probably thinking: &#8220;But we have more than just a single line of development! We branch our code! We merge it! We tie it into knots! It&#8217;s like a great monster!&#8221; Of course it is. You wouldn&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>After reading about how the hg convert extension can <a href="http://rock.hymasfamily.org/blog/2010/03/18/the-trunk-subversion-conversion/">convert the trunk</a> of a Subversion repository to Mercurial, you&#8217;re probably thinking: &#8220;But we have more than just a single line of development! We branch our code! We merge it! We tie it into knots! It&#8217;s like a great monster!&#8221; Of course it is. You wouldn&#8217;t be good developers if it weren&#8217;t. If a simple trunk is like a single snake, unbroken from head to tail, then any actively developed repository is more like the <a href="http://en.wikipedia.org/wiki/Lernaean_Hydra">Hydra</a>, with more heads then you can count, and poisonous breath besides. Even worse, the Hydra is immortal, and so cannot be killed &#8211; sounds like a legacy codebase to me! <a href="http://en.wikipedia.org/wiki/Heracles#Labours_of_Heracles">Heracles</a> couldn&#8217;t kill the Hydra, but he could bring it under submission, and put it to his own uses.</p>
<p>So lets look at how hg convert can take the branches in a Subversion repository and bring them into a Mercurial repository, thus taming the beast.</p>
<h4>Finding the Hydra&#8217;s Heads</h4>
<p>Getting the head revision when we were just dealing with the trunk was easy &#8211; just find the latest revision under the trunk&#8217;s path. But now we&#8217;re dealing with the Hydra &#8211; we need to keep track of many heads, one for each branch. The source converter object (remember that one? It&#8217;s Subversion specific) does this work. After getting the head of the trunk, it then lists the contents of the branches directory. This directory is either detected as a child of the source url passed into the hg convert command, or it is specified in the convert configuration section. Typically that&#8217;s done on the command line using the &#8211;config parameter. So the source converter object lists the children of the branches directory. For each child, it checks if it&#8217;s a directory, and finds the latest revision in that directory. As long as the latest revision wasn&#8217;t the one that created it (i.e. it&#8217;s a branch with no changes in it), then it adds that latest revision to the set of heads.</p>
<p>As with the trunk, we then need to follow the parents of each head back to track down all of the revisions that need to be included in the convert. And as with the trunk, if hg convert has been run before against the same source, then we only track back till we find changes that have already been converted into the destination repository.</p>
<h4>Sorting Things Out</h4>
<p>The biggest challenge in fighting the Hydra isn&#8217;t the poisonous breath &#8211; Heracles overcame that with a simple cloth over his mouth and nose. No, it&#8217;s that when you cut off one of it&#8217;s heads, it grows back two more. So one of the first things you need to know is whether it&#8217;s heads will grow back in parallel or one after the other. When converting Subversion repositories to Mercurial, this concept corresponds to the sort order.</p>
<p>Now that we&#8217;re going to be converting multiple branches, the order in which we sort the changes to be imported is important. The hg convert extension offers three types of sorts: branchsort, datesort, and sourcesort. We can ignore sourcesort, because that only applies when importing from a Mercurial repository. The default for Subversion repositories is branchsort. This means that when importing from Subversion, the algorithm essentially sorts them as a depth first search. It imports one branch all the way to its head revision, then goes back and imports the next branch. In other words, the Hydra grows back one head, then grows a second. This is in contrast to datesort, which imports each revision in date order, or rather, the hydra&#8217;s heads grow in parallel. Datesort is more intuitive, and leads to repositories that are organized the way we expect them to be, with development going on in the trunk and in branches at the same time. Branchsort, however, actually creates smaller Mercurial repositories, because the diffs to files tend to be much smaller when they&#8217;re organized by branch, rather than intermingled.</p>
<p>Given that disk space is cheap, you will be happiest if you specify datesort, and only rerun hg convert with branchsort if you experience any problems with the resulting repository size.</p>
<h4>Hacking at the Hydra</h4>
<p>Heracles defeated the Hydra by cutting off it&#8217;s heads, then having his nephew cauterize the wounds so that new heads would not grow back. Finally he placed it&#8217;s one remaining immortal head under a heavy rock, trapping it.</p>
<p>The trick to defeating the Hydra is to limit the number of heads you&#8217;re dealing with. Unfortunately, normal conversions from Subversion to Mercurial will often leave more heads than expected. Remember, this is the Hydra &#8211; we should expect more heads than expected. In this case, it is because the convert extension cannot recognize Subversion merges . Doing so is a tricky problem because Subversion merges are so flexible. So after the conversion, merges don&#8217;t produce a nice revision with two parents. Rather, it looks like any other revision with one parent, while the other parent is left dangling. Because Subversion branches are often closed when they are merged back into trunk development, this means that we&#8217;re leaving an extra head in the converted repository that need not be there. Even if the Subversion branch was used further after the merge, we&#8217;ve still lost an important piece of history by not recording the revision as a merge in the new repository. Fortunately, the convert extension does provide a manual workaround to this limitation: the splicemap. Like Heracles firebrand wielding nephew, the splicemap can safely eliminate the Hydra&#8217;s heads. The splicemap does this by allowing you to specify the parents of any given revision.</p>
<p>As you might imagine, you can easily shoot yourself in the foot with this ability. You could rearrange revisions in any number of ways, creating an odd tree, or switching revisions around in ways that significantly increase the size of the converted repository. But rather than hacking up and grafting together a totally new Hydra, let&#8217;s just use it to get rid of a few of the hydra&#8217;s heads. It&#8217;s most obvious benefit is to specify the two parents of any merge operation, in most cases eliminating one head from the converted repository. It can also be used to bring together two disparate lines of development, which may occasionally be useful, e.g. when you realize that two separate repositories should really be combined into one.</p>
<p>The hg convert extensions implements the splicemap using a simple lookup of revision based on the ids specified. Then it replaces the parents on a commit that has been retrieved from the source converter object, before having the destination converter object put the commit into the destination repository.</p>
<p>One trick to using the splicemap is understanding the revision format used in the splicemap file. For subversion repositories, it is important to get this right, or it will be as if the splicemap hadn&#8217;t even been specified. A subversion repository has its revision in the splicemap formatted like so:</p>
<p style="padding-left: 30px;">svn:&lt;uuid&gt;/path/to/module@&lt;revnum</p>
<p>So for example, revision 931750 in the trunk of the official subversion repository would be specified like this:</p>
<p style="padding-left: 30px;">svn:13f79535-47bb-0310-9956-ffa450edef68/subversion/trunk@931750</p>
<h4>How Many Hydras?</h4>
<p>In taming the Hydra of a subversion repository, we have an option that Heracles did not have: rather than having to eliminate Hydra heads by chopping them off and cauterizing the wounds, it&#8217;s as if Heracles could chop the Hydra up into n Hydras, each with one head, so it&#8217;s really more like a snake. Instead of a many headed Hydra, we end up with a bunch of one headed snakes. Indiana Jones might not like that solution, but it&#8217;s much easier for a hero like Heracles to tackle, because he can just deal with one at a time.</p>
<p>Likewise, instead of removing heads using the splicemap, we can split the repository into multiple repositories, each with one head. To understand the differences between these having a bunch of named branches in one repository, versus having a separate repository for each branch, it is helpful to read Steven Losh&#8217;s excellent <a href="http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial/">branching in Mercurial guide</a>.</p>
<p>If you decide to just create named branches in the destination repository, the source converter object records the branch that a given revision is on, and the destination converter object creates a commit with that named branch. Nothing too spectacular here.</p>
<p>Creating cloned branches, where there is a separate Mercurial repository for each Subversion branch, takes a little more work. For you, the little more work is just to specify &#8211;clonebranches on the command line. For the converter, it needs to make sure that each revision goes into the right repository. First, when copying each revision from the source to the destination, the converter object finds the branches that a revisions parents are on. It then tells the destination converter object which branch the child revision is on, as well as the branches that the parent revisions are on. The destination convert object first sets the correct repository to commit the revision to. If it doesn&#8217;t exist yet (i.e. the revision is the first in this branch), then the destination repository is created. Then it needs to make sure that the destination repository has all of the revisions leading up to the one being copied. So it pulls all of the appropriate revisions in from each of the parents branches. Finally it is ready to commit the child revision to the appropriate branch repository.</p>
<h4>Cleaning up the mess</h4>
<p>Fighting the Hydra can be messy. Here are some things we can do to clean up once we&#8217;re done. One thing to note when doing Subversion to Mercurial conversions is that you&#8217;ll want to eliminate empty revisions. Typically, these occur because the subversion revision either only changes subversion properties (and not files), or because it only creates the directory at the root of the trunk or one of the branches. They can also occur if a filemap is specified. But if it isn&#8217;t, then the hg convert extension doesn&#8217;t try to eliminate empty revisions. So the rule when doing conversions should be to always specify  a filemap file, even if you just leave it empty. This will make sure that the hg convert extension still tries to eliminate empty revisions.</p>
<p>You may also want to eliminate branches from the history, preserving only the merge commit. The easiest way to do this is to not use the splicemap to merge the branch into development, and then strip that branch from the mercurial repository after the conversion is done. The trunk will still have the proper changes from the revision that performed the merge, but all of the history of how that revision was created in the branch will be gone.</p>
<h4>Victorious</h4>
<p>Hopefully, that&#8217;s enough information to both understand how conversion of branches works, as well as to successfully convert the unique repositories you&#8217;ve got on hand. Once you&#8217;re done you&#8217;ll still have to deal with the immortal Hydra, i.e. all your existing code. But hopefully you&#8217;ve made things more manageable along the way, making it easier to leverage all the goodness in that code. Like Heracles, it should now be possible to go forth and conquer other monsters using the Hydra&#8217;s venomous poison.</p>
]]></content:encoded>
			<wfw:commentRss>http://rock.hymasfamily.org/blog/2010/04/08/branches-subversion-conversion/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The Trunk &#8211; Subversion Conversion to Mercurial, Part 1</title>
		<link>http://rock.hymasfamily.org/blog/2010/03/18/the-trunk-subversion-conversion/</link>
		<comments>http://rock.hymasfamily.org/blog/2010/03/18/the-trunk-subversion-conversion/#comments</comments>
		<pubDate>Fri, 19 Mar 2010 00:28:36 +0000</pubDate>
		<dc:creator>Rock</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://rock.hymasfamily.org/blog/?p=84</guid>
		<description><![CDATA[So you&#8217;ve got a bunch of code, the key to your companies future, and like a good developer you&#8217;re keeping it under source control using Subversion. But you&#8217;ve heard about these new distributed version control systems, like Mercurial, and after doing some research you&#8217;ve decided to take the plunge. But now you face a challenge: [...]]]></description>
			<content:encoded><![CDATA[<p>So you&#8217;ve got a bunch of code, the key to your companies future, and like a good developer you&#8217;re keeping it under source control using Subversion. But you&#8217;ve heard about these new distributed version control systems, like Mercurial, and after doing some research you&#8217;ve decided to take the plunge. But now you face a challenge: how to get all that juicy code into a Mercurial repository?</p>
<p>Never fear! The wonderful Mercurial developers have created an the excellent extension that can convert Subversion repositories to Mercurial. Because I work on the Kiln tool to import Mercurial repositories from existing code in a different source control system, I&#8217;ve been working to understand more about how the whole conversion process works. To get a good basis of understanding, let&#8217;s first look at how the extension will import a single line of development &#8211; the trunk. This is reasonable in cases where there are no branches or where all branches have already been merged into trunk, and you don&#8217;t care about which changes were made on which branches. I&#8217;ll explore the process of converting tags and branches in future posts.</p>
<h4>A Generic Algorithm</h4>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #008000;">self</span>.<span style="color: black;">ui</span>.<span style="color: black;">status</span><span style="color: black;">&#40;</span>_<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;scanning source...<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
heads = <span style="color: #008000;">self</span>.<span style="color: black;">source</span>.<span style="color: black;">getheads</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
parents = <span style="color: #008000;">self</span>.<span style="color: black;">walktree</span><span style="color: black;">&#40;</span>heads<span style="color: black;">&#41;</span></pre></div></div>

<p>The convert extension is designed to be a generic converter from many different repository types. The overall convert algorithm is handled at this generic level, while the details of retrieving specific revisions, files, and tags are handled by converter source objects that are specific to the source repository type. A destination converter object (in this case Mercurial-specific), does the work of writing the revisions, files, and tags to the new Mercurial repository.</p>
<p>So you fire up your trusty shell, and kick off:</p>

<div class="wp_syntax"><div class="code"><pre class="dos" style="font-family:monospace;">hg convert svn://path/to/your/svn/repository  --datesort</pre></div></div>

<p>After parsing the command line, the converter creates the source and destination objects. We didn&#8217;t specify a filemap, but if a filemap had been specified, the converter would then wrap the source object, which is the subversion converter object, with a filemap source object. This filemap object uses the filemap file to adjust file paths before they are passed to or returned from the source repository object.</p>
<p>When you execute a typical hg convert command, the first output line you&#8217;ll see  is this:</p>

<div class="wp_syntax"><div class="code"><pre class="dos" style="font-family:monospace;">scanning source...</pre></div></div>

<p>When this appears, the converter begins by asking the source object to get the heads, or the latest revisions, of that repository. Because we&#8217;re ignoring branches for now, the subversion converter object will just get the latest revision under the trunk. After getting this revision, it walks backwards through the revisions until it reaches the beginning. As the converter retrieves each revision from the source converter object, it caches it and creates a map from the revision to a list of it&#8217;s parents. In the case of a single Subversion trunk, each revision will only have one parent.</p>
<h4>Sorting the revisions</h4>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #008000;">self</span>.<span style="color: black;">ui</span>.<span style="color: black;">status</span><span style="color: black;">&#40;</span>_<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;sorting...<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
t = <span style="color: #008000;">self</span>.<span style="color: black;">toposort</span><span style="color: black;">&#40;</span>parents, sortmode<span style="color: black;">&#41;</span>
num = <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>t<span style="color: black;">&#41;</span></pre></div></div>

<p>The converter needs to process the revisions in the right order. The hg convert command gives three sorting options: datesort, branchsort, sourcesort. The sourcesort option is not available when converting from Subversion. To perform either of the other sorts, the converter first creates a children map from the parents map, as well as a list of the roots, or revisions without parents. For our trunk-only conversion there will only be one root revision. Starting at this root revision, the converter chooses the next revision based on the ordering type. Then it adds any revisions whose parents are all in the ordering to the list of possibly next revisions, from which the next revision is chosen. For a subversion trunk-only conversion, there will only ever be one revision to choose from, regardless of the sort order. Therefore, I&#8217;ll discuss the differences between datesort and branchsort in part 2, on converting branches.</p>
<h4>Importing changes</h4>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #dc143c;">copy</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, rev<span style="color: black;">&#41;</span>:
	commit = <span style="color: #008000;">self</span>.<span style="color: black;">commitcache</span><span style="color: black;">&#91;</span>rev<span style="color: black;">&#93;</span>
	files, copies = <span style="color: #008000;">self</span>.<span style="color: black;">source</span>.<span style="color: black;">getchanges</span><span style="color: black;">&#40;</span>rev<span style="color: black;">&#41;</span>
	parents = <span style="color: black;">&#91;</span><span style="color: #008000;">self</span>.<span style="color: #008000;">map</span><span style="color: black;">&#91;</span>p<span style="color: black;">&#93;</span> <span style="color: #ff7700;font-weight:bold;">for</span> p <span style="color: #ff7700;font-weight:bold;">in</span> commit.<span style="color: black;">parents</span><span style="color: black;">&#93;</span>
	newnode = <span style="color: #008000;">self</span>.<span style="color: black;">dest</span>.<span style="color: black;">putcommit</span><span style="color: black;">&#40;</span>files, copies, parents, commit,
	<span style="color: #008000;">self</span>.<span style="color: black;">source</span>, <span style="color: #008000;">self</span>.<span style="color: #008000;">map</span><span style="color: black;">&#41;</span>
	<span style="color: #008000;">self</span>.<span style="color: black;">source</span>.<span style="color: black;">converted</span><span style="color: black;">&#40;</span>rev, newnode<span style="color: black;">&#41;</span>
	<span style="color: #008000;">self</span>.<span style="color: #008000;">map</span><span style="color: black;">&#91;</span>rev<span style="color: black;">&#93;</span> = newnode</pre></div></div>

<p>Now that we&#8217;ve got this sorted list of revisions, the converter can start the process of converting each one individually. It does this by retrieving the appropriate changes from subversion and copying them to Mercurial.</p>
<p>When it initially walked the tree of changes, the subversion converter object stored the paths of files in each revision as well as the parent revisions. Because we&#8217;re looking at a trunk-only conversion, each revision will only ever have 1 parent. As the conversion proceeds, each of these revisions has the paths expanded. Each path is checked to see if it is a file, a directory, or a deleted item. File paths are recoded appropriately. Paths representing directories are expanded to include all files in the directory at that revision, and records of copied files and directories are also stored.</p>
<p>The Mercurial converter object then goes through the files and copies and retrieves the contents of each file from the subversion converter object. It uses the file contents to create the revision to be committed to the destination repository. That&#8217;s it. Your conversion is all done! Or is it? What if someone makes more changes to the subversion repository after you already performed the conversion?</p>
<h4>Multiple hg convert runs</h4>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;"># Record converted revisions persistently: maps source revision</span>
<span style="color: #808080; font-style: italic;"># ID to target revision ID (both strings).  (This is how</span>
<span style="color: #808080; font-style: italic;"># incremental conversions work.)</span>
<span style="color: #008000;">self</span>.<span style="color: #008000;">map</span> = mapfile<span style="color: black;">&#40;</span>ui, revmapfile<span style="color: black;">&#41;</span></pre></div></div>

<p>The hg convert extension supports multiple executions against the same source and destination repositories. This can be useful if you did one run of hg convert, and then later wanted to pull in further development from your subversion repository. This feature is primarily made possible by the revmap, a file that hg convert saves in the destination&#8217;s .hg directory. The revmap is just a simple map from revision ids in the source repository to revision ids in the destination repository. The hg convert extension reads this revmap in (if it exists) before beginning conversion. It uses the revmap to determine which revisions have already been converted, and accordingly begins with revisions that come after those already converted. One option, when running hg convert, is to specify where the revmap is &#8211; or where to save it if this is the first run against a given repository.</p>
<p>Another trick to consecutive hg convert runs is the authormap. The authormap is a file that allows you to change author names when converting from Subversion to Mercurial, which can be quite useful if you want to add additional information to Mercurial users, such as email addresses. The authormap, like the revmap, is stored in the destination .hg directory. On subsequent hg convert runs, this file is read in and used if no authormap is specified. If there is both an authormap specified on the command line and one in the destination .hg directory, the two are merged, with the one on the command line winning whenever there is a discrepancy.</p>
<h4>How the filemap works</h4>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">fmap = opts.<span style="color: black;">get</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'filemap'</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">if</span> fmap:
	srcc = filemap.<span style="color: black;">filemap_source</span><span style="color: black;">&#40;</span>ui, srcc, fmap<span style="color: black;">&#41;</span>
	destc.<span style="color: black;">setfilemapmode</span><span style="color: black;">&#40;</span><span style="color: #008000;">True</span><span style="color: black;">&#41;</span></pre></div></div>

<p>One last aspect of conversion deserves consideration &#8211; the filemap. Implementation of the filemap uses an interesting design. The code for handling the filemap is in a filemap converter object, much like the subversion converter object. This filemap converter wraps the subversion converter and does the mapping in a way that both the subversion converter and the hg converter can be oblivious to its presence.</p>
<p>The filemap converter object handles two major pieces of functionality. First, it takes care of renaming files. The renaming of files is done by a filemapper, which keeps a map of from and to filenames. Whenever filenames are passed to or from the converter object, it does the mapping necessary.</p>
<p>The more interesting challenge is determining which files and revisions should actually be included in the conversion. First, the filemap converter checks to see if a given revision includes any files that are included in the filemap. If so, then the revision needs to be converted. But the revision also needs to have it&#8217;s parent updated to the correct revision. In subversion, the parent of a given revision is simply the previous revision. Of course, that revision might not include any files in the filemap, and so be discarded during conversion. So the filemap converter needs to reparent the new revision to the last included revision also.</p>
<h4>Coming Soon …</h4>
<p>This algorithm at its root is quite simple. But understanding what is going on in the simple case is essential to understanding what is happening when we make it more complicated with branches, tags, and the options associated with them.  Part two will be a detailed look at how branches are converted from Subversion to Mercurial.</p>
]]></content:encoded>
			<wfw:commentRss>http://rock.hymasfamily.org/blog/2010/03/18/the-trunk-subversion-conversion/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Mercurial will make you a better developer</title>
		<link>http://rock.hymasfamily.org/blog/2010/03/06/mercurial-will-make-you-a-better-developer/</link>
		<comments>http://rock.hymasfamily.org/blog/2010/03/06/mercurial-will-make-you-a-better-developer/#comments</comments>
		<pubDate>Sat, 06 Mar 2010 12:09:32 +0000</pubDate>
		<dc:creator>Rock</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://rock.hymasfamily.org/blog/?p=72</guid>
		<description><![CDATA[Since starting at Fog Creek, I&#8217;ve been learning about Mercurial from day one, since I&#8217;m working on Kiln. It was a big change from my work at Microsoft, where we used a VCS that was much closer to the Subversion model than the Mercurial model. One of my areas of focus in Kiln has been [...]]]></description>
			<content:encoded><![CDATA[<p>Since starting at Fog Creek, I&#8217;ve been learning about Mercurial from day one, since I&#8217;m working on Kiln. It was a big change from my work at Microsoft, where we used a VCS that was much closer to the Subversion model than the Mercurial model. One of my areas of focus in Kiln has been the import tool for teams migrating from Subversion. As I&#8217;ve tried to wrap my head around Subversion, Mercurial, and converting between the two, I&#8217;ve started to realize that many of the cultural differences between the two communities stem from basic technical strengths and weaknesses between the two products. Feel free to substitute Git for Mercurial, if that&#8217;s your cup of DVCS tea.</p>
<p>You could argue that the cultural differences led to the technical differences between the two camps. I suppose that&#8217;s probably true for the earliest contributors to the products, but it&#8217;s more likely that the technical strengths and weaknesses of each product appealed to those who naturally thought in certain ways, thus leading to the natural congregation of people with similar outlooks on creating software.</p>
<p>But enough of that. On to the differences.</p>
<h4>Single project repositories vs. multiple project repositories</h4>
<p>One change you&#8217;ll run up against, which was initially quite disconcerting for me, is that each project in Mercurial was contained in it&#8217;s own repository. I was used to one huge repository with different subdirectories for different projects. Consequently, because the code for Kiln is broken up into 5-10 separate repositories, I&#8217;ve spent the last few months asking others on my team if it wouldn&#8217;t be better to just combine some or all of our repositories. About once a month. I admit that I still think some combining would be good, but I&#8217;m beginning to understand more fully the mindset that leads to lots of small repositories.</p>
<p>This difference is one of the easiest to trace to basic differences in how the products work. Mercurial is much more narrowly focused, as a product, than Subversion is. Mercurial is all about tracking changes to a set of files. Subversion is all about tracking changes to each file separately. Mercurial tracks some repository wide information, such as branches, tags, and repository settings. Subversion allows you to branch, tag, and set properties on the whole repository or any subdirectory, or any random unrelated set of files, if you so desire.</p>
<p>Of the two, Subversion is far more general purpose in nature. It tracks changes to each file and directory separately, only keeping an overall revision number that tracks the chronological order of changes. Because of that, there are many features in Subversion that allow you to operate on a portion of the repository. You can check out a specific subdirectory, map files from another subdirectory, keep your working directory files at different revisions per file (called mixed revisions), and set properties on directories that apply to a directory and all of it&#8217;s children, such as which files to ignore when doing an svn status. Its even possible to set different permissions for different parts of the repository.</p>
<p>In contrast, Mercurial manages a single set of files in a repository. Directories are not first class objects in Mercurial, as they are in Subversion, they&#8217;re just artifacts of file names. Although internally, Mercurial tracks changes to each file separately there is no way to put the working directory into a mixed revisions state. The DAG cannot handle that type of freedom. Of course, the fact that Mercurial requires you to download the entire repository history to create your own working directory also puts downward pressure on the size of a repository. And because permissions are the same throughout a single repository, if different code needs different permissions, it also needs to be in a different Mercurial repository. The same is true for other settings, such as which files to ignore when doing hg status.</p>
<p>All of these differences naturally lead to smaller repositories that typically contain one project in Mercurial, and larger repositories that typically contain many, if not all projects, in Subversion. If you&#8217;re coming from Subversion, you&#8217;re going to want to get used to it. Fortunately it appeals to your innate desire to componentize â€” that is an innate desire, right? For me it is, and Mercurial makes it easier to do it at the project level. Of course, because Mercurial does less, it leaves to other systems the management of multiple repositories (see <a href="http://bitbucket.org/">bitbucket</a> and <a href="http://fogcreek.com/kiln/">Kiln</a>).</p>
<p><a href="http://ww2.samhart.com/">Sam Hart</a>, when he decided to switch from Subversion to Mercurial, <a href="http://ww2.samhart.com/book/export/html/49">discussed this exact phenomenon</a>:</p>
<blockquote><p>&#8220;If you&#8217;re like me, when you originally set up SVN you did so in the laziest way possible.</p>
<p>&#8220;Setting up SVN repos is more work than it should be. It involves using commands that you normally never have to touch (svnadmin), setting up new entries for those repos in your http server&#8217;s configuration files (if you&#8217;re using Apache and WebDAV), and setting up user permissions to those repos. Thus, the lazy way to set them up is to make one central SVN repo under which you have multiple sub-repos. This has the advantage of making your repository very easy to maintain. However [it] has a big disadvantage in that a user with write access to any sub-repo will have write access to the entire repo.</p>
<p>&#8220;In Hg, on the other hand, setting up a new repository is much easier, and maintaining multiple repositories more manageable. So, if you&#8217;re like me, you may be tempted to remedy past sins by splitting your single gargantuan SVN repo into smaller Hg repos.&#8221;</p></blockquote>
<h4>Commit often vs commit when &#8220;ready&#8221;</h4>
<p>Another change you&#8217;ll need to adjust to is to commit often. You&#8217;re probably used to making a bunch of related (or unrelated) changes, then doing some testing. You may build a version of your product and have others do testing. You&#8217;ll probably run automated tests, possibly multiple sets of automated tests. And finally, you&#8217;ll check in.</p>
<p>If you do this in a team using Mercurial they&#8217;ll wonder where you disappeared to while your code was being written, complain about how large the code reviews are, and be frustrated at how slowly you iterate on your code towards a good solution.</p>
<p>On my team at Microsoft, we had a concept of a shippable chunk of software. This helped guide the creation of branches in our centralized VCS. We could work in the branch, possibly with one or two other developers, until we had something we could reasonably ship, then merge the branch back into the main development repository. Depending on the rules for checking in to a VCS, whether centralized or distributed, software teams develop an understanding, either explicit or implicit of what a &#8220;committable chunk&#8221; is. What amount of code is worth committing, either for review or sharing with others.</p>
<p>The key change in mindset for me has been to make my own &#8220;committable chunks&#8221; much smaller than they used to be. No longer do I make hundreds of changes in tens of files, tying up another developer for hours in code reviews. It&#8217;s easy to make frequent commits locally, and push those to a personal branch on Kiln regularly for review.</p>
<p>But DVCS&#8217;s don&#8217;t just make it easy to have smaller committable chunks. They make it easier to manage committable chunks of all sizes. Because I work against a personal repository and merging is so easy, I can commit almost minute by minute to my personal repository, push multiple times a day to the feature branch I&#8217;m working on, push occasionally from the feature branch to the main development branch, and handle multi-feature pushes from development to a stable release branch. Obviously, those are all possible with a CVCS, but they always took so much time and effort to manage the branches, do the merge, and verify that nothing broke. In practice, that meant that steps were left out, and things slipped through the cracks.</p>
<p>Now, my changes to code are clearer, my original intentions more obvious, and I feel far better with my code in source control. I can look at changes at a small granular level, or I can look at the big merges.</p>
<h4>Branch always vs. branch rarely</h4>
<p>Closely related to a cultural norm of small, frequent commits is a norm of branching. Every clone of a Mercurial repository introduces a new branch once a change has been made. It&#8217;s also easy to branch many times within that clone. When I first made the switch, I didn&#8217;t really understand this. I knew I could work separately from other developers in my own repository, but I didn&#8217;t think of it conceptually as branching. It was more like I had my own sandbox, which I could then merge with the main repository when I checked in. And the idea of easily branching within my own repository still seems new to me.</p>
<p>But I&#8217;m learning to embrace the value in branching. As with frequent commits, itâ€™s the ease of merging that makes the benefits of branching so readily available. And I&#8217;m beginning to value the power of having branches within my local repository. I can work on bug fixes separate from a major refactoring work, and easily (and quickly) switch between the two using a simple &#8220;hg up&#8221; command &#8211; even when I&#8217;m offline. That&#8217;s great for the times when I&#8217;m deep into feature code and a sudden urgent bug pops up that needs to be fixed and released immediately. I can also switch back and forth between work on two different features , which is great when I get stuck and just need a mental break from one of them. Also, it makes it super easy to prototype out new ideas without messing with my regular development.</p>
<p>One counterpoint to the ease of branching is that it may isolate developers. <a href="http://seeknuance.com/">John DeRosa</a> registers <a href="http://seeknuance.com/2008/07/06/mercurial-vs-subversion/">his concern</a> about this:</p>
<blockquote><p>&#8220;Additionally, I think distributed SCMs like Mercurial have a not-yet-fully-appreciated problem in making it too easy to not [ever] check code back into the main pool.Â  With a local repository, a developer can feel protected from accidents and continue working happily for quite a long time.Â  And then, say a year down the road, he/she does a massive check-in and discovers an integration problem.Â  Branches, or a local repository that is effectively a private branch,Â  should be easy to make â€” but not too easy.&#8221;</p></blockquote>
<p>Let me explain why I don&#8217;t buy it. First, &#8220;a year down the road&#8221;?! Seriously? It says something that you have to imagine a scenario so horrible and unlikely in order to envision easy branching as a bad thing. I think that the author likely didn&#8217;t realize how easy merging usually is with a DVCS like Mercurial. And he must have totally forgotten that this lone maverick developer could have been merging the main development line into her own repository every day or week. The right solution to this imagined (and barely imaginable) scenario is not to eliminate easy branching, since without it the lone developer will do the same thing, but be much more likely to lose her work because it won&#8217;t be stored in a repository. The right solution is to fix a broken culture that enables someone to go a full year with no accountability for their work.</p>
<h4>Source code files vs. all code files</h4>
<p>Another important difference relates to what files you put in the repository. Because Mercurial and other DVCS&#8217;s don&#8217;t handle versioning of large files well, it is much more tempting to store them in a different way. This most obviously manifests itself in the storage of built binaries. If they are largish, and you want to keep lots of copies of them (nightly builds backed up for QA purposes, or even just weekly or monthly builds) then your repository becomes quite large and unwieldy very quickly. These types of files typically don&#8217;t diff well, making diffs between versions very large, and because the files are very large themselves, it means that downloading the repository takes much longer.</p>
<p>In this area, Subversion currently has a clear advantage. Only the files in the working directory are downloaded to client computers, so storing the history of large binary files only requires storage scaling on the server. Bandwidth is significantly reduced. Because it&#8217;s fairly simple, many Subversion installations have used it to track changes to built binaries and other very large files. The challenges to scale and management are limited to one machine, the server.</p>
<p>Naturally, users of Mercurial push handling of these large files to other systems. Their VCS is the location for their source files, typically a bunch of text files, which external tools then build into large binaries which are almost never stored in the same or another Mercurial repository. It is true that efforts are underway to alleviate this weakness in Mercurial, though I&#8217;m sure some don&#8217;t see it as a problem at all. The bfiles extension is an attempt to limit provide a more centralized model for certain large files. Of course, it has tradeoffs, but the fact that it&#8217;s being actively developed indicates that, at least for many, the tradeoffs are worth it.</p>
<p>For now, I&#8217;m happy that this aspect of Mercurial motivates me to automate more, to maintain more of the components of my products as code (in some form) that is compiled (using some method) to create these human-unreadable products.</p>
<h4>Conclusion</h4>
<p>There are obviously different ways to look at these cultural and philosophical differences between Subversion users and Mercurial users. One might look over the differences and conclude that Subversion seems much more flexible than Mercurial. Therefore, it must be better. Another might see how much better Mercurial handles basic source control features, such as branching, merging, and tags, and conclude that it is therefore a better product. It&#8217;s pretty obvious to me that these two views are quite related.</p>
<p><a href="http://softwareswirl.blogspot.com/">Michael Haggerty</a> makes this point quite well in his post <a href="http://softwareswirl.blogspot.com/2009/08/git-mercurial-and-bazaarsimplicity.html">Git, Mercurial, and Bazaarâ€”simplicity through inflexibility</a>. The discussion is about the merging differences between Git and Subversion, but the principles apply to Mercurial as well. He argues that the very flexibility of Subversion is what makes merging more burdensome:</p>
<blockquote><p>&#8220;Starting with release 1.5, Subversion, ironically, supports a much more flexible model of merging than the DAG-based DVCSs. Changes from any commit can be merged to any branch at the single-file level of granularity, enabling all of the operations listed above and some even weirder things (for example, a change that was originally applied to one file can be &#8220;merged&#8221; onto a completely different file). If your workflow demands this sort of thing, Subversion might hold significant advantages for you.</p>
<p>&#8220;But there are also many disadvantage to Subversion&#8217;s flexibility:</p>
<ul>
<li>Subversion&#8217;s merging model is      more complicated than that of DAG-based VCSs, and therefore more      complicated to implement and less predictable.</li>
<li>It is much harder to      visualize the history of a Subversion project (contrast that to DVCSs,      whose history can be displayed as a single DAG).</li>
<li>Subversion merges are      innately slow, because of the large quantities of metadata that have to be      manipulated.</li>
<li>The bookkeeping of SVN merge      info requires more user conscientiousness, and mistakes are not as easy to      spot and fix.&#8221;</li>
</ul>
</blockquote>
<p>While he doesn&#8217;t take a stand on which is better, a CVCS like Subversion, or a DVCS like Mercurial or Git, I will. MercurialÂ  (and other DAG based DVCS&#8217;s) provides a level of intrinsic guidance to developers through the limitations it has. Like many other great products, it is defined in part by what is not included. One might easily say that it is defined in large part by that. Products like the original iPod and iPhone both have this same feature. By focusing on the most important features, and specifically limiting users choice in other areas (changing batteries, how to buy and download apps, etc.), Apple created products that are wildly successful. True, they may not be as flexible as an Android, Blackberry, or Windows Phone. But they got the right things right.</p>
<p>And I think Mercurial is a step in that direction. I don&#8217;t think it&#8217;s there yet, but I don&#8217;t think anything else is any closer. Some other DVCS&#8217;s (git, at least) are also heading in the right direction, though they may be coming from a different starting point. Mercurial creates a philosophical and cultural starting point because of the technical choices that define both its strengths as well as its weaknesses. That philosophical starting point is a fundamentally better starting point for software development. It leads to greater componentization, greater granularity of history, more productive use of development time, and more automation.</p>
]]></content:encoded>
			<wfw:commentRss>http://rock.hymasfamily.org/blog/2010/03/06/mercurial-will-make-you-a-better-developer/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
