Which repository is more compact: Git or SVN?

Git is reputed to have more compact repositories than SVN. It is really true? Let’s check ourselves.

To run honest comparison we will use SubGit utility that is unlike git-svn preserves more SVN and Git concepts in both SVN-to-Git and Git-to-SVN directions (for Git-to-SVN direction git-svn will just lose commits on anonymous branches).

Original Git vs converted SVN.

For example let’s convert Git.git repository to SVN format.
First, create an empty SVN repository:

$ svnadmin create git.svn

Second, create a full clone of the Git.git repository at git.svn/.git:

$ git clone --mirror git://github.com/gitster/git.git git.svn/.git
Cloning into bare repository 'git.svn/.git'...
remote: Counting objects: 197301, done.
remote: Compressing objects: 100% (68281/68281), done.
remote: Total 197301 (delta 135873), reused 184664 (delta 125681)
Receiving objects: 100% (197301/197301), 38.57 MiB | 927 KiB/s, done.
Resolving deltas: 100% (135873/135873), done.

Third, check the Git repository size while the translation is not started:

$ du git.svn/.git -s -h
44M     git.git/.git

Finally, run SubGit on the empty SVN repository:

$ subgit install git.svn

It will take rather long time because of the repository size.

When the translation is complete let’s check the repository size:

$ cd git.svn

$ svn info file://`pwd`
Path: git.svn
URL: file:///.../git.svn
Repository Root: file:///.../git.svn
Repository UUID: afa562f0-e173-47e2-83e6-2452fde0775f
Revision: 32338
Node Kind: directory
Last Changed Author: Junio C Hamano
Last Changed Rev: 32338
Last Changed Date: 2012-10-22 00:58:48 +0400 (Mon, 22 Oct 2012)

$  git rev-list --branches --tags | wc -l

As one can see, the translation is mostly lossless. The difference in the number of commits can be explained by the fact that in Git branch/tag addition/removal doesn’t result in new commit creation that is not true for SVN. To make sure, that Git commits are mapped to SVN commits one can run

$ mkdir .git/refs/notes
$ cp .git/refs/svn/map .git/refs/notes/commits
$ git log

and see an SVN revision for every Git commit.

Now let’s check how large (or small?) the resulting SVN repository is. Currently it contains not only basic SVN data, but some SubGit files (including logs).

First, remove SubGit specific data from the repository:

$ subgit uninstall --purge .

Second, move the Git repository outside of the SVN repository:

$ mv .git ../git.git

Now we have the most honest SVN analog of Git.git repository. Check it size:

$ du -s -h
1,8G    .

Let’s show this fact with a picture:

SVN repository size vs Git repository size

SVN repository size vs Git repository size

Original SVN vs converted Git.

One may say “this comparison is not honest, Git repository was natural but SVN repository — artificial”. Ok, let’s convert some SVN repository to Git.
Unfortunately I can’t convert the repository of Subversion at apache.org because Apache guys tend to ban people who generate too many requests. But I’ll try on SVNKit repository that is another Subversion implementation (despite the fact that SVNKit already has an official Git repository — anyway I need to have the SVNKit repository locally to estimate its size in SVN format).

Of course I can run svnrdump on it to get its dump, but fortunately I have a dump (not so fresh) of the SVNKit repository locally.

First, create an empty SVN repository

$ svnadmin create svnkit.svn

Second, load the dump into it:

$ svnadmin load svnkit.svn < svnkit.dump 2> /dev/null > /dev/null

Third, remember the repository size and the dump size (it is not so large as Git.git though)

$ cd svnkit.svn
$ svn info file://`pwd`
Path: svnkit.svn
URL: file:///.../svnkit.svn
Repository Root: file:///.../svnkit.svn
Repository UUID: 0a862816-5deb-0310-9199-c792c6ae6c6e
Revision: 7920
Node Kind: directory
Last Changed Author: semen
Last Changed Rev: 7920
Last Changed Date: 2011-09-15 19:25:27 +0400 (Thu, 15 Sep 2011)

$ du -s -h .
202M    .

$ du -h svnkit.dump
859M    svnkit.dump

Now let’s install SubGit into the repository

$ subgit install .

Translated Git repository size (even with SubGit-related metadata) is:

$ du -s -h .git
74M     .git

We can run “git gc” that is rather honest, because Git will run it anyway sooner or later:

$ git gc --prune
Counting objects: 143789, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (40515/40515), done.
Writing objects: 100% (143789/143789), done.
Total 143789 (delta 76663), reused 141252 (delta 75412)
Checking connectivity: 143789, done.

$ du -s -h .git
62M     .git
SVN repository size vs Git repository size

SVN repository size vs Git repository size

The difference is not so large now but is still significant. Why this happens? AFAIK this is because SVN keeps deltas between file contents in sequential revisions while Git keeps deltas between the most similar contents. So it’s natural to expect that the larger SVN and Git repositories are the more compact Git repository is (compared to Subversion), that is confirmed by ours tests.

They say that Subversion 1.8 will have more compact repository. Let’s wait and test!

SVNKit SvnRemoteXXX operations: one more common mistake

I’d like to describe one more mistake that one can encounter into while using SVNKit. Suppose you want to copy a file from a working copy to the repository directly. You write code like:

file File file = ...;
final SVNURL targetUrl = ...;

final SvnCopy copy = svnOperationFactory.createCopy();
copy.addCopySource(SvnCopySource.create(SvnTarget.fromFile(file), SVNRevision.BASE));

You run this code and get

org.tmatesoft.svn.core.SVNException: svn: E200007: Runner for 'org.tmatesoft.svn.core.wc2.SvnCopy' command have not been found; probably not yet implement in this API.
    at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:64)
    at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:51)
    at org.tmatesoft.svn.core.wc2.SvnOperationFactory.getImplementation(SvnOperationFactory.java:1340)
    at org.tmatesoft.svn.core.wc2.SvnOperationFactory.run(SvnOperationFactory.java:1227)
    at org.tmatesoft.svn.core.wc2.SvnOperation.run(SvnOperation.java:291)

I can agree that the stacktrace is a bit cryptic. Actually is tells us that SvnCopy class can perform copy from remote repository to working copy, or from working copy to working copy. To copy to remote repository one should use SvnRemoteCopy class instead:

final SvnRemoteCopy remoteCopy = svnOperationFactory.createRemoteCopy();
remoteCopy.addCopySource(SvnCopySource.create(SvnTarget.fromFile(file), SVNRevision.BASE));

The same is true about other SvnRemoteXXX operations.