Git is reputed to have more compact repositories than SVN. It is really true? Let’s check ourselves.
To run honest comparison we will use SubGit utility that is unlike git-svn preserves more SVN and Git concepts in both SVN-to-Git and Git-to-SVN directions (for Git-to-SVN direction git-svn will just lose commits on anonymous branches).
Original Git vs converted SVN.
For example let’s convert Git.git repository to SVN format.
First, create an empty SVN repository:
$ svnadmin create git.svn
Second, create a full clone of the Git.git repository at git.svn/.git:
$ git clone --mirror git://github.com/gitster/git.git git.svn/.git Cloning into bare repository 'git.svn/.git'... remote: Counting objects: 197301, done. remote: Compressing objects: 100% (68281/68281), done. remote: Total 197301 (delta 135873), reused 184664 (delta 125681) Receiving objects: 100% (197301/197301), 38.57 MiB | 927 KiB/s, done. Resolving deltas: 100% (135873/135873), done.
Third, check the Git repository size while the translation is not started:
$ du git.svn/.git -s -h 44M git.git/.git
Finally, run SubGit on the empty SVN repository:
$ subgit install git.svn
It will take rather long time because of the repository size.
When the translation is complete let’s check the repository size:
$ cd git.svn $ svn info file://`pwd` Path: git.svn URL: file:///.../git.svn Repository Root: file:///.../git.svn Repository UUID: afa562f0-e173-47e2-83e6-2452fde0775f Revision: 32338 Node Kind: directory Last Changed Author: Junio C Hamano Last Changed Rev: 32338 Last Changed Date: 2012-10-22 00:58:48 +0400 (Mon, 22 Oct 2012) $ git rev-list --branches --tags | wc -l 31573
As one can see, the translation is mostly lossless. The difference in the number of commits can be explained by the fact that in Git branch/tag addition/removal doesn’t result in new commit creation that is not true for SVN. To make sure, that Git commits are mapped to SVN commits one can run
$ mkdir .git/refs/notes $ cp .git/refs/svn/map .git/refs/notes/commits $ git log
and see an SVN revision for every Git commit.
Now let’s check how large (or small?) the resulting SVN repository is. Currently it contains not only basic SVN data, but some SubGit files (including logs).
First, remove SubGit specific data from the repository:
$ subgit uninstall --purge .
Second, move the Git repository outside of the SVN repository:
$ mv .git ../git.git
Now we have the most honest SVN analog of Git.git repository. Check it size:
$ du -s -h 1,8G .
Let’s show this fact with a picture:
Original SVN vs converted Git.
One may say “this comparison is not honest, Git repository was natural but SVN repository — artificial”. Ok, let’s convert some SVN repository to Git.
Unfortunately I can’t convert the repository of Subversion at apache.org because Apache guys tend to ban people who generate too many requests. But I’ll try on SVNKit repository that is another Subversion implementation (despite the fact that SVNKit already has an official Git repository — anyway I need to have the SVNKit repository locally to estimate its size in SVN format).
Of course I can run svnrdump on it to get its dump, but fortunately I have a dump (not so fresh) of the SVNKit repository locally.
First, create an empty SVN repository
$ svnadmin create svnkit.svn
Second, load the dump into it:
$ svnadmin load svnkit.svn < svnkit.dump 2> /dev/null > /dev/null
Third, remember the repository size and the dump size (it is not so large as Git.git though)
$ cd svnkit.svn $ svn info file://`pwd` Path: svnkit.svn URL: file:///.../svnkit.svn Repository Root: file:///.../svnkit.svn Repository UUID: 0a862816-5deb-0310-9199-c792c6ae6c6e Revision: 7920 Node Kind: directory Last Changed Author: semen Last Changed Rev: 7920 Last Changed Date: 2011-09-15 19:25:27 +0400 (Thu, 15 Sep 2011) $ du -s -h . 202M . $ du -h svnkit.dump 859M svnkit.dump
Now let’s install SubGit into the repository
$ subgit install .
Translated Git repository size (even with SubGit-related metadata) is:
$ du -s -h .git 74M .git
We can run “git gc” that is rather honest, because Git will run it anyway sooner or later:
$ git gc --prune Counting objects: 143789, done. Delta compression using up to 2 threads. Compressing objects: 100% (40515/40515), done. Writing objects: 100% (143789/143789), done. Total 143789 (delta 76663), reused 141252 (delta 75412) Checking connectivity: 143789, done. $ du -s -h .git 62M .git
The difference is not so large now but is still significant. Why this happens? AFAIK this is because SVN keeps deltas between file contents in sequential revisions while Git keeps deltas between the most similar contents. So it’s natural to expect that the larger SVN and Git repositories are the more compact Git repository is (compared to Subversion), that is confirmed by ours tests.
They say that Subversion 1.8 will have more compact repository. Let’s wait and test!