What SVNKit resources should be disposed?

Like in a previous post about SVNKit objects reusability I’d like to write about the most common mistakes related to resources closing.

SVNRepository instances should be closed

SVNRepository class of SVNKit represents a connection with a remote or maybe local repository. It has SVNRepository#closeSession method that closes that connection. Unfortunately people often forget to call it.

If SVNRepository instance was constructed using not SVNRepositoryFactory but using ISVNRepositoryPool it should be closed if and only if it was constructed by passing mayReuse=false to ISVNRepositoryPool#createRepository. Otherwise the repository is controlled by the pool and should be disposed together with the pool (by calling ISVNRepositoryPool#dispose).

For example, these ways of closing of SVNRepository are correct:

final SVNRepository svnRepository = SVNRepositoryFactory.create(url);
svnRepository.closeSession();

Non-reusable repository connection should be closed explicitly:

final ISVNRepositoryPool repositoryPool = new DefaultSVNRepositoryPool(null, null);
try {
    final SVNRepository svnRepository = repositoryPool.createRepository(url, false);
    svnRepository.closeSession();
} finally {
    repositoryPool.dispose();
}

Reusable repository connection will be closed together with the connections pool:

final ISVNRepositoryPool repositoryPool = new DefaultSVNRepositoryPool(null, null);
try {
    final SVNRepository svnRepository = repositoryPool.createRepository(url, true);
} finally {
    repositoryPool.dispose();
}

And of course all ISVNRepositoryPool instances should be always disposed in finally-block.

SVNClientManager instances should also be disposed with dispose() method

I would say that this is the most common mistake. As I mentioned in a previous post, SVNClientManager aggregates an ISVNRepositoryPool instance and implements this interface. And of course it should be disposed, otherwise all connections it creates won’t be closed

All other classes that have dispose() or close() methods should be disposed

Actually this goes without saying, I just remind about that evident fact.

Are SVNKit methods reenterable?

People often ask me which SVNKit objects can and which can’t be reused from different threads or while another operation running on those objects.

SVNRepository methods are not reenterable

This means that the same SVNRepository instance can’t be used from the several threads at the same time. But also this means that the same SVNRepository object can’t be used within the same thread but from some callback provided to another function.

For example, this code (rather useless) checking that paths returned by SVNRepository#log method really exist:

final SVNRepository svnRepository = SVNRepositoryFactory.create(url);
try {
    log(new String[]{""}, 1, 2, true, true, new ISVNLogEntryHandler() {
        @Override
        public void handleLogEntry(SVNLogEntry logEntry) throws SVNException {
            final long revision = logEntry.getRevision();
            final Map<String,SVNLogEntryPath> changedPaths = logEntry.getChangedPaths();
            for (Map.Entry<String, SVNLogEntryPath> entry : changedPaths.entrySet()) {
                final String path = entry.getKey();

                //WRONG!!! svnRepository object can't be reused!
                final SVNNodeKind kind = svnRepository.checkPath(path, revision);
                System.out.println(kind);
            }
        }
    });
} finally {
    svnRepository.closeSession();
}

fails with

java.lang.Error: SVNRepository methods are not reenterable
	at org.tmatesoft.svn.core.io.SVNRepository.lock(SVNRepository.java:2820)
	at org.tmatesoft.svn.core.io.SVNRepository.lock(SVNRepository.java:2811)
	at org.tmatesoft.svn.core.internal.io.fs.FSRepository.openRepositoryRoot(FSRepository.java:767)
	at org.tmatesoft.svn.core.internal.io.fs.FSRepository.openRepository(FSRepository.java:758)
	at org.tmatesoft.svn.core.internal.io.fs.FSRepository.checkPath(FSRepository.java:205)
	at org.tmatesoft.svn.test.InfoTest$1.handleLogEntry(InfoTest.java:150)
	at org.tmatesoft.svn.core.internal.io.fs.FSLog.sendLog(FSLog.java:332)
	at org.tmatesoft.svn.core.internal.io.fs.FSLog.runLog(FSLog.java:162)
	at org.tmatesoft.svn.core.internal.io.fs.FSRepository.logImpl(FSRepository.java:381)
	at org.tmatesoft.svn.core.io.SVNRepository.log(SVNRepository.java:1035)
	at org.tmatesoft.svn.core.io.SVNRepository.log(SVNRepository.java:940)
	at org.tmatesoft.svn.core.io.SVNRepository.log(SVNRepository.java:864)

The same is true about reusing SVNRepository object while commiting to repository.

SVNRepository#getCommitEditor starts a transaction. This transaction can be terminated in three ways:

  • By ISVNEditor#closeEdit call on the editor. In this case the transaction is committed (or rejected).
  • By ISVNEditor#abortEdit that terminates the transaction.
  • By any exception thrown by ISVNEditor methods.

In all other cases the transaction remains unfinished. While the transaction is not finished, a corresponding SVNRepository object can’t be reused. An example:

final SVNRepository svnRepository = SVNRepositoryFactory.create(url);
try {
    final ISVNEditor commitEditor = svnRepository.getCommitEditor("Commit message", null);
    commitEditor.openRoot(-1);

    //WRONG!!! svnRepository can't be reused until commitEditor.closeEdit(); is called
    svnRepository.checkPath("", -1);

    commitEditor.closeDir();
    commitEditor.closeEdit();
} finally {
    svnRepository.closeSession();
}

This code also fails with a similar stacktrace. One of the most common mistakes is not to cancel commit transaction if any custom code throws an exception:

final SVNRepository svnRepository = SVNRepositoryFactory.create(url);
try {
    try {
        final ISVNEditor commitEditor = svnRepository.getCommitEditor("Commit message", null);
        commitEditor.openRoot(-1);

        //some code that can throw an exception
        if (2 + 2 == 4) {
            throw new SomeException();
        }

        commitEditor.closeDir();
        commitEditor.closeEdit();
    } catch (SomeException e) {
        e.printStackTrace();

        //the commit transaction should be closed here by commitEditor.abortEdit() call
    }
    //this call will fail because of unclosed transaction
    svnRepository.checkPath("", -1);
} finally {
    svnRepository.closeSession();
}

Still incorrect because the catch block should contain commitEditor.abortEdit() call that would stop the commit transaction.

DefaultSVNRepositoryPool connections can’t be reused simultaneously

SVNKit uses ISVNRepositoryPool interface to keep and reuse connections between Subversion requests. This approach significantly improves SVNKit performance but the connections pool should be used carefully.

DefaultSVNRepositoryPool is an implementation of ISVNRepository pool provided by SVNKit. It keeps “repository root” -> SVNRepository instance map and returns an existing or creates a new connection on ISVNRepositoryPool#createRepository invocation.

Note that ISVNRepositoryPool does not know if any of the connection it keeps has any operation in progress and returns the connection if URL requested matches corresponding repository root of the saved connection. And from the previous section you know that SVNRepository instances can’t be reused.

For example:

final ISVNRepositoryPool repositoryPool = new DefaultSVNRepositoryPool(null, null);
try {
    final SVNRepository svnRepository1 = repositoryPool.createRepository(url, true);
    final SVNRepository svnRepository2 = repositoryPool.createRepository(url, true);
    final ISVNEditor commitEditor = svnRepository1.getCommitEditor("Commit message", null);
    commitEditor.openRoot(-1);

    //WRONG!!! svnRepository2 is the same object as svnRepository1!
    svnRepository2.checkPath("", -1);

    commitEditor.closeDir();
    commitEditor.closeEdit();
} finally {
    repositoryPool.dispose();
}

This code fails because repositoryPool.createRepository(url, true); returns the same instance for the 2nd and all subsequent calls. Instead one should create the second connection with mayReuse=false and of course close it by hand afterwards because it won’t be closed on ISVNRepositoryPool#dispose:

final ISVNRepositoryPool repositoryPool = new DefaultSVNRepositoryPool(null, null);
SVNRepository svnRepository2 = null;
try {
    final SVNRepository svnRepository1 = repositoryPool.createRepository(url, true);
    svnRepository2 = repositoryPool.createRepository(url, false);
    final ISVNEditor commitEditor = svnRepository1.getCommitEditor("Commit message", null);
    commitEditor.openRoot(-1);

    //Correct, svnRepository2 is another connection
    svnRepository2.checkPath("", -1);

    commitEditor.closeDir();
    commitEditor.closeEdit();
} finally {
    repositoryPool.dispose();
    if (svnRepository2 != null) {
        //it should be closed by hand because it was created with mayReuse=false
        svnRepository2.closeSession();
    }
}

This code is correct though is not symmetric. You can often meet it inside SVNKit itself for operations where 2 connections are used at the same time.

SVNClientManager and SVNXXXClient can’t be reused

This is also true because of several reasons. First, SVNClientManager implements and aggregates ISVNRepositoryPool which, as you know now, can’t be reused. But also because of the way SVNKit works it can’t be reused for working copy of 1.7 format operations (otherwise there can be an error “svn: E200030: There are unfinished transactions detected in …”).

The reason is that SVNBasicClient encapsulates SvnOperationFactory, that encapsulates SVNWCContext, that encapsulates SVNWCDb, that contains

private Map<String, SVNWCDbDir> dirData;

This is a cache path->working_copy_root_data where “working_copy_root_data” is a structure that contains a working copy root path and a database object (SVNSqlJetDb), and this database object contains “openCount” — transaction in progress counter that is increased when a transaction starts and is decreased when it ends (in thread-unsafe manner). If the operation is finished, but openCount > 0 (for example, because the database is used from another thread, you see

svn: E200030: There are unfinished transactions detected in ...

exception). So SVNSqlJetDb objects can’t be reused among threads. And the same is true about callbacks.

Instead of reusing SVNClientManager or SVNXXXClient instance one should create a separate instance per thread. For callbacks case — at least 2: one for the main operation and another one for operations inside a callback. But note: these operations cannot modify the same working copy because

Several working copy modification operations cannot run simultaneously

It is more Subversion’s restriction that SVNKit’s. Until WC 1.7 format any Subversion working copy directory could be processed independently allowing parallel executing of modification operaions if they run on different directories.

Now every working copy modification operations locks the whole working copy until completion and no other write operation can be run at the same time.

But read-only operations do not lock anything and can be run anytime. Subversion working copy 1.7 is based on transactions moving the working copy from valid state to another valid state. So read-only operations while another write operation will find find the working copy in some intermediate but valid state.

EOLs in Git and SVN

This post will explain how Subversion handles line endings, how Git couples with the same problem, and how not to lose those settings while Git to SVN or SVN to Git translation.

When a team members use different OSes with different default EOLs it’s important to allow them to work on the same files without causing EOLs mess or other problems. Everybody knows that if Windows Notepad doesn’t like LFs. But not everybody knows about CRLF problems in shell scripts:

$ echo '#!/bin/sh' > test.sh
$ echo >> test.sh
$ echo 'echo Hello world!' >> test.sh

$ bash test.sh
Hello world!
$ dash test.sh
Hello world!

$ unix2dos test.sh
unix2dos: converting file test.sh to DOS format ...

$ bash test.sh
test.sh: line 2: $'\r': command not found
Hello world!

$ dash test.sh
: not found test.sh:
Hello world!

Too strange behaviour for XIX century. To avoid these problems let’s take care about line endings.

EOLs in Subversion

Subversion controls line endings using svn:eol-style property. Its valid values are:

  • native — when checking out the file EOLs will be converted to the current system default EOL (CRLF on Windows, LF on Linux)
  • LF — when checking out the file EOLs will be converted to LFs
  • CRLF
  • CR
  • the property is not set — treat the file as binary, no EOL management

Note that at the SVN repository file can have arbitrary EOLs (and pristine files (se my post about pristine files) contain the same EOLs as the repository does) the conversion is performed while creation of the working copy file. Usually Subversion clients take care about repository contents and svn:eol-style correspondence. I.e. file with svn:eol-style=LF will have LF endings in the repository, file with svn:eol-style=CRLF will have CRLFs. In the special case svn:eol-style=native the file will be stored with LFs. But the Subversion remote API doesn’t check for EOLs inconsistencies. If the repository is rather old or a buggy client was used to work with it, there’s a chance to meet other combnations.

There’s one more Subversion property that relates to line endings — svn:mime-type. When you set svn:eol-style on a file, you say to Subversion that this file is a text file, but if you set svn:mime-type on the same file, its value should start with “text/” otherwise Subversion will report about svn:eol-style-svn:mime-type inconsistency and will fail to work (one of the most common mistake is to set svn:eol-style to some value and svn:mime-type to application/xml — use text/xml instead, application/xml means “not human-readable XML”).

Usually it is recommended to set Subversion autoproperties for svn:eol-style to native and to set it to LFs only shell scripts.

EOLs in Git

Line endings in Git for individual files are controlled by Git attributes “text” and “eol”. The first one says to Git that the file is not binary. Possbile “text” attribute values are:

  • auto — check for binary (Git thinks that the file is binary if the first 8kb contains at least one zero byte)
  • the attribute is set — the file is treated as text
  • the attribute is unset — the file is treated as binary

“eol” attribute is an analog of svn:eol-style. It’s value are:

  • lf — working tree file line endings will be converted to LFs, the blob is assumed to contain LFs
  • crlf — the same but to CRLFs, the blob is assumed to contain LFs
  • the attribute value is undefined (!eol) but “text” attribute is set — the file line endings are specified in core.eol config file (which possible values are lf, crlf, native (default))
  • the attribute value is unset and “text” attribute is unset too — thee file is treated as binary (note that if “eol” attribute is set, “text” value is ignored and assumed to be set)

So for the most of the files “/file text !eol” will be the best option (it’s meaning corresponds to svn:eol-style=native). For individual LF and CRLF files the setting will be “/lf_file eol=lf” “/crlf_file eol=crlf”.

One may even use *-rule to create an analog of Subversion autoproperties (this rule will be applied to every newly created file): “* text=auto !eol”. “text=auto” will care about binary files.

But be careful: if “eol” attribute is set, blob should already contain LFs. Otherwise you’ll have a problem: to understand if the file with “eol” attribute is changed Git converts it to LFs, calculates SHA-1 for it (assuming it’s a blob) and compares to the corresponding hash code in the database. If the object database blob contains CRLFs in the blob, these hash ids won’t be equal:

$ git init repo
$ echo "line 1" >> file
$ echo "line 2" >> file
$ unix2dos file
$ git add file
$ git commit -m "Added a file with CRLFs"

#now the database contains file with CRLFs, let's change it's eol attribute
$ echo "/file eol=crlf" > .gitattributes

$ git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       .gitattributes

#let's say to Git it should reread the file contents
$ touch file

$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   file
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       .gitattributes

#ok, we have some unexpected changes, let's try to discard them:
$ git reset --hard HEAD
HEAD is now at 14fd0ce Added a file with CRLFs

$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   file
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       .gitattributes
no changes added to commit (use "git add" and/or "git commit -a")

#they can't be discarded!!

Looks like Git is a bit stupid here, it could be more tolerant to EOLs settings changes. To couple with this one can just unset attributes for the file, to treat it as binary again, change it’s EOLs to LF (dos2unix), commit to put it into the database and set “text” and/or “eol” again. But this is the only problem that appears with attributes-related approach. Hope it will be fixed soon.

Why not to use git-svn if you care about EOLs?

Git-svn is a perl script that tries to convert Git contents to Subversion and vice versa. But it doesn’t perform EOLs conversion at all (absolutely ignoring svn:eol-style property and Git gitattributes), converting only Subversion repository contents to Git blobs not touching corresponding properties and attributes.

As we know, Subversion keeps all text files with LFs if svn:eol-style=native. So while cloning Subversion repository git-svn converts those text files to blobs with LFs and on Windows Git won’t convert their EOLs to native while checking out (that is inconsistent with Subversion property semantics).

Should one set core.autocrlf?

Usually people recommend to use core.autocrlf=true config setting (which sense is equal to set “* eol=crlf” attribute) but is a weapon of mass destruction: it will convert not only files svn:eol-style=native to CRLFs but also files with svn:eol-style=LF to CRLFs too. And also if core.autocrlf=true when adding any to the Git database, Git converts blobs’ EOLs to LFs and send to Subversion in this form — even for files with svn:eol-style=CRLF this will result into inconsistencies between Subversion file contents and svn:eol-style property.

My answer is no. To convert svn:eol-style to Git attributes correctly one should set attributes carefully for each file. Git attributes have higher priority than core.autocrlf, so if the corresponding attributes are set core.autocrlf value is ignored.

How can I do that automatically?

Fortunately there’s a Git-SVN bridge called SubGit. In short: you install it into the repository and it converts Subversion revisions to Git commits and vice versa. Among other features it performs svn:eol-style+svn:mime-type conversion to “text” and “eol” attributes:

$ svnadmin create repo
$ subgit install repo
$ git clone repo repo-git
$ cd repo-git
$ echo "line 1" >> native_file
$ echo "line 2" >> native_file
$ cp native_file lf_file
$ cp native_file crlf_file
$ cp native_file binary_file
$ unix2dos crlf_file
unix2dos: converting file crlf_file to DOS format ...
$ cp native_file auto_native_file
# dd if=/dev/zero of=auto_binary_file count=1
$ nano .gitattributes #edit .gitattributes this way:

$ cat .gitattributes
/native_file text !eol
/crlf_file eol=crlf
/lf_file eol=lf
/binary_file -text
/auto_native_file text=auto !eol
/auto_binary_file text=auto !eol

$ git add .gitattributes
$ git add *_file
$ git commit -m "Added different files"
$ git push origin master
Counting objects: 4, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 353 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (4/4), done.
To /tmp/repo
 * [new branch]      master -> master

$ cd ../repo
$ svn proplist -v --depth infinity file://`pwd`
Properties on 'file:///tmp/repo/trunk/lf_file':
  svn:eol-style
    LF
Properties on 'file:///tmp/repo/trunk/crlf_file':
  svn:eol-style
    CRLF
Properties on 'file:///tmp/repo/trunk/native_file':
  svn:eol-style
    native
Properties on 'file:///tmp/repo/trunk/auto_binary_file':
  svn:mime-type
    application/octet-stream
Properties on 'file:///tmp/repo/trunk/auto_native_file':
  svn:eol-style
    native