Subversion remote API: committing without working copy

Can you do something similar with Git? I’m sure: no. In my previous post I described Subversion API basics. Now I’d like to give one more example of editor-based remote API usage: commit creation on-the-fly.

Subversion has API bindings for the most popular programming languages. This time let’s use Java.

There’re 2 ways to use Subversion from Java. The first one is to use JavaHL API of Subversion. There’re 2 implementations of this API: native Subversion (compiled Subversion libraries + JNI) and SVNKit (pure Java implementation). The advantages of the native Subversion implementation are performance and stability. But the problem is that if something goes wrong in the native implementation the JVM is crashed, but if something goes wrong in SVNKit — an exception is thrown.

But as in this post I want to show the power of the remote API, JavaHL interface doesn’t suit because it provies only client API (see the first picture of my previous post; and the client API requires the working copy to commit). In opposite SVNKit provides all Subversion APIs for Java (like native Subversion provides all APIs for C language).

The central class of SVNKit remote API is SVNRepository (corresponds to svn_ra_session_t in C interface). It represents a connection with some certain protocol to some certain URL. After working with SVNRepository the connection should be closed with SVNRepository#closeSession (unless you use ISVNRepositoryPool).

Let’s consider an example:

.......

public class CommitWithoutWorkingCopy {

    public static void main(String[] args) {
        FSRepositoryFactory.setup();
        DAVRepositoryFactory.setup();
        SVNRepositoryFactoryImpl.setup();

        SVNRepository svnRepository = null;
        try {
            svnRepository = SVNRepositoryFactory.create(
                    SVNURL.parseURIEncoded("file:///tmp/test"));

            SVNDeltaGenerator deltaGenerator = new SVNDeltaGenerator();

            ISVNEditor commitEditor;
            String checksum;
            long latestRevision;
            SVNCommitInfo commitInfo;

            commitEditor = svnRepository.getCommitEditor(
                    "My first commit message", null);
            commitEditor.targetRevision(-1);
            commitEditor.openRoot(-1);
            commitEditor.addDir("trunk", null, -1);
            commitEditor.changeFileProperty("trunk/file",
                    "directoryPropertyName",
                    SVNPropertyValue.create("directoryPropertyValue"));
            commitEditor.addFile("trunk/file", null, -1);
            commitEditor.changeFileProperty("trunk/file",
                    "filePropertyName",
                    SVNPropertyValue.create("filePropertyValue"));
            commitEditor.applyTextDelta("trunk/file", null);

            final ByteArrayInputStream fileContentsStream =
                    new ByteArrayInputStream("File contents".getBytes());
            try {
                checksum = deltaGenerator.sendDelta("trunk/file",
                        fileContentsStream, commitEditor, true);
            } finally {
                try {
                    fileContentsStream.close();
                } catch (IOException e) {
                    //ignore
                }
            }
            commitEditor.closeFile("trunk/file", checksum);
            commitEditor.closeDir();
            commitEditor.addDir("branches", null, -1);
            commitEditor.closeDir();
            commitEditor.addDir("tags", null, -1);
            commitEditor.closeDir();
            commitInfo = commitEditor.closeEdit();

            latestRevision = commitInfo.getNewRevision();
            System.out.println("Committed revision " + latestRevision);

            commitEditor = svnRepository.getCommitEditor(
                    "My second commit message", null);
            commitEditor.targetRevision(-1);
            commitEditor.openRoot(1);
            commitEditor.openDir("branches", 1);
            commitEditor.addDir("branches/branch", "/trunk", 1);
            commitEditor.closeDir();
            commitEditor.closeDir();
            commitEditor.deleteEntry("tags", 1);
            commitEditor.closeDir();
            commitInfo = commitEditor.closeEdit();

            latestRevision = commitInfo.getNewRevision();
            System.out.println("Committed revision " + latestRevision);

        } catch (SVNException e) {
            e.printStackTrace();
        } finally {
            if (svnRepository != null) {
                svnRepository.closeSession();
            }
        }
    }
}

Do you understand what happens here? Just the opposite to update-like calls. You get an editor and call its methods (in update/status/switch/diff you run some method — SVNRepository#update or SVNRepository#status and provide your own editor to call). This is the beauty of Subversion API.

You just crawl the tree inside the URL, for which you create SVNRepository object, and describe you changes. The new revision is created only at ISVNEditor#closeEdit. At this time the transaction is fixed or rejected. You never know if someone else commits to the same repository at the same time until you call ISVNEditor#closeEdit to fix your revision. As you never know the latest repository state, you send delta against some certain revisions instead of against the latest revision — that’s what revision r1 means in the following code:

commitEditor.openDir("branches", 1);
//the delta is send against r1
commitEditor.closeDir();

If -1 is used instead of the latest revision, the changes are applied to the latest repository state.

As one can see Subversion checks checksums for every file

commitEditor.closeFile("trunk/file", checksum);

If the file was not added but changed, the checksum should be provided to ISVNEditor#applyTextDelta call. So every file checksum is checked twice: before and after applying delta. If any of checksum is wrong, the commit will be rejected.

One more detail, not very evident: all the paths, not starting with “/”, are relative to the URL of the SVNRepository object (“file:///tmp/test” in my example). But paths, starting with “/”, are relative to the repository root that may differ from the URL for which the connection is created. In my example “file:///tmp/test” is the repository root, that one can check by calling SVNRepository#getRepositoryRoot.

The example code produces the following history when running on the empty repository:

------------------------------------------------------------------------
r2 | (no author) | 2012-07-20 03:14:30 +0200 (Fri, 20 Jul 2012) | 1 line
Changed paths:
   A /branches/branch (from /trunk:1)
   D /tags

My second commit message
------------------------------------------------------------------------
r1 | (no author) | 2012-07-20 03:14:30 +0200 (Fri, 20 Jul 2012) | 1 line
Changed paths:
   A /branches
   A /tags
   A /trunk
   A /trunk/file

My first commit message
------------------------------------------------------------------------

Subversion remote API: listing repository with “status” request

One of the strongest side of Subversion is its nice API. It includes working copy API, remote API, client API, and repository API.

SVN API

SVN API

  • Client API is the replacement of CLI for programs. It consists of analogs of all command line calls like “checkout”, “update”, “propset” and so on. Usually every function of the client API, depending on whether the arguments are URLs or paths, performs call of working copy API and/or remote API.
  • Working copy API consists of low-level working operations. Such a workinig copy abstraction allows Subversion to change the working copy format without touching other functionality.
  • Consists of function for working with remote repository. This API doesn’t require working copy existence and allows to work with remote SVN repository with working copy at all.
  • Repository API is used on the server side and works with different subversion repository formats.

Maybe, I’ve missed some other APIs, but I consider them less important. To my opinion remote API is the most interesting, because it allows to work with SVN repository. It consists of different requests that are trasfered over the network with different protocols SVN supports: DAV, SVN and file-protocol. The requests are executed on the server and the answer is returned in a form of callbacks.

I would divide nearly all remote API requests into 3 groups:

  • editor-based: update, diff, status, …;
  • log-like: log, “get eligible mergeinfo”;
  • “cheap requests”: “get dir”, “get latest revision”, “info”.

Cheap requests usually get some information about only one node (directory or file). Log-like usually return a sequence of “log entry” structures (usually containing revision, author, date, and changed paths), one per revision. Editor-based calls crawl the directories within one revision.

All editor-like calls have the following structure:

Editor-based requests

Editor-based requests

Reporter is the working copy replacement. It consists of 3 functions: set_path, delete_path, link_path. They describe what working copy state you have locally (to describe the working copy state one doesn’t need to have the working copy actually).

And the server in return describes what actions you should apply to that working copy in order to reach state of some revision. The actions are given in a form of editor calls.

For example, the history is like the following:

------------------------------------------------------------------------
r2 | root | 2012-07-15 13:03:58 +0000 (Sun, 15 Jul 2012) | 1 line
Changed paths:
   A /trunk/file

Added a file.
------------------------------------------------------------------------
r1 | root | 2012-07-15 13:03:30 +0000 (Sun, 15 Jul 2012) | 1 line
Changed paths:
   A /branches
   A /tags
   A /trunk

Initial.
------------------------------------------------------------------------

If we describe our working copy with (pseudo-code):

set_path("", 0),
set_path("trunk", 2)
delete_path("trunk/file")

— non-interesting parameters are omitted, this means that we tell the server: “I have all working copy at the state corresponding to revision 0, except the trunk that has the state of revision 2 but trunk/file is deleted and we haven’t it locally”.

If we call “update” to revision 2, the server will send the commands (pseudo-code):

target_revision(2)
open_root(0)
add_directory("branches")
close_directory() //for branches
add_directory("tags")
close_directory() //for tags
open_directory("trunk", 2)
add_file("trunk/file")
//send file contents --- some calls, let's omit them
close_file() //for trunk/file
close_directory() //for trunk
close_directory() //for root
close_edit()

If we call “update” to revision 1, the server will send the commands (pseudo-code):

target_revision(1)
open_root(0)
add_directory("branches")
close_directory() //for branches
add_directory("tags")
close_directory() //for tags
//open_directory for trunk will only be called
//if trunk has properties
//changed in r2, otherwise the trunk state described is already the desired state
close_directory() //for root
close_edit()

If we call “update” to revision 1, the server will send the commands (pseudo-code):

target_revision(0)
open_root(0)
delete_entry("trunk", 2)
close_directory() //for root
close_edit()

Usually reporter crawl the working copy to generate correct set_path/link_path/delete_path sequence, and the editor calls are usually applied to the working copy or used of generate patch or to show the status. But both crawliing the working copy and applying changes are optional.

set_path calls actually has several parameters (not only path+revision). One of the parameters is start_empty. If it is true, the path is considered as locally empty and the server should send all it’s contents, revision parameter is ignored then. For example “svn checkout” and “svn export” use set_path(“”, ignored_revision, start_empty=TRUE) call do describe the working copy.

Another paramter is depth. It is used for sparse working copy operations. For the example above the report will tell the server not to send “trunk” contents in the case of “update” to r2:

set_path("", 0),
set_path("trunk", 1, depth=empty)

In opposite in this case all the “trunk” contents will be sent (“update” to 2):

set_path("", 0),
set_path("trunk", 2, start_empty=TRUE)

The only difference between “status” and “update” requests is that “status” doesn’t request the files contents. So it can be used to just list the repostiory paths and properties. Here’s the example of the code

#include <svn_client.h>
#include <svn_auth.h>
#include <svn_ra.h>

static svn_error_t *
set_target_revision(void *edit_baton,
                    svn_revnum_t target_revision,
                    apr_pool_t *pool) {
    fprintf(stderr, "listing revision\t\t\t%d\n", target_revision);
    return SVN_NO_ERROR;
}                                                                                                                                                                                                          

static svn_error_t *
open_root(void *edit_baton,
          svn_revnum_t base_revision,
          apr_pool_t *pool,
          void **dir_baton) {
    fprintf(stderr, "entered root directory\n");
    return SVN_NO_ERROR;
}                                                                                                                                                                                                          

static svn_error_t *
delete_entry(const char *path,
             svn_revnum_t revision,
             void *parent_baton,
             apr_pool_t *pool) {
    return SVN_NO_ERROR;
}

static svn_error_t *
add_directory(const char *path,
              void *parent_baton,
              const char *copyfrom_path,
              svn_revnum_t copyfrom_revision,
              apr_pool_t *pool,
              void **child_baton) {
    fprintf(stderr, "entered directory\t\t\t%s\n", path);
    return SVN_NO_ERROR;
}

static svn_error_t *
open_directory(const char *path,
               void *parent_baton,
               svn_revnum_t base_revision,
               apr_pool_t *pool,
               void **child_baton) {
    return SVN_NO_ERROR;
}

static svn_error_t *
change_dir_prop(void *dir_baton,
                const char *name,
                const svn_string_t *value,
                apr_pool_t *pool) {
    return SVN_NO_ERROR;
}

static svn_error_t *
close_directory(void *dir_baton,
                apr_pool_t *pool) {
    fprintf(stderr, "left directory\n");
    return SVN_NO_ERROR;
}

static svn_error_t *
add_file(const char *path,
         void *parent_baton,
         const char *copyfrom_path,
         svn_revnum_t copyfrom_revision,
         apr_pool_t *pool,
         void **file_baton) {
    fprintf(stderr, "entered file     \t\t\t%s\n", path);
    return SVN_NO_ERROR;
}

static svn_error_t *
open_file(const char *path,
          void *parent_baton,
          svn_revnum_t base_revision,
          apr_pool_t *pool,
          void **file_baton) {
  return SVN_NO_ERROR;
}

static svn_error_t *
apply_textdelta(void *file_baton,
                const char *base_checksum,
                apr_pool_t *pool,
                svn_txdelta_window_handler_t *handler,
                void **handler_baton) {
  return SVN_NO_ERROR;
}

static svn_error_t *
change_file_prop(void *file_baton,
                 const char *name,
                 const svn_string_t *value,
                 apr_pool_t *pool) {
  return SVN_NO_ERROR;
}

static svn_error_t *
close_file(void *file_baton,
           const char *text_checksum,
           apr_pool_t *pool) {
  fprintf(stderr, "left file, md5sum = %s\n", text_checksum);
  return SVN_NO_ERROR;
}

static svn_error_t *
close_edit(void *edit_baton,
           apr_pool_t *pool) {
  fprintf(stderr, "listing finished\n");
  return SVN_NO_ERROR;
}

static svn_error_t *
auth_callback(svn_auth_cred_username_t **cred, void *baton, const char *realm, svn_boolean_t may_save, apr_pool_t *pool) {
    if (cred) {
        svn_auth_cred_username_t *ret = apr_pcalloc (pool, sizeof (*ret));
        ret->username = apr_pstrdup(pool, "username");
        *cred = ret;
    }
    return SVN_NO_ERROR;
}

int main(int argc, char **argv) {
    apr_pool_t* pool;
    const char* url = "file:///path/to/svn/repository";

    apr_pool_initialize();
    apr_pool_create_ex(&pool, NULL, NULL, NULL);

    // initialize remote access API
    svn_ra_initialize(pool);

    svn_ra_callbacks2_t* callbacks;
    svn_ra_create_callbacks(&callbacks, pool);

    svn_ra_session_t* session;
    svn_error_t* error = svn_ra_open4(&session, NULL, url, NULL, callbacks, NULL, NULL, pool);

    if (!error) {
        const svn_ra_reporter3_t* status_reporter;
        void* reporter_baton;

        // revision to list (SVN_INVALID_REVNUM means HEAD revision)
        svn_revnum_t revision = SVN_INVALID_REVNUM;

        // setup our editor
        svn_delta_editor_t *editor = svn_delta_default_editor(pool);
        editor->set_target_revision = set_target_revision;
        editor->open_root = open_root;
        editor->add_directory = add_directory;
        editor->close_directory = close_directory;
        editor->add_file = add_file;
        editor->close_file = close_file;

        // run status call
        svn_ra_do_status2(session, &status_reporter, &reporter_baton, "", revision, svn_depth_infinity, editor, NULL, pool);

        // report our virtual working copy as empty (start_empty=TRUE)
        status_reporter->set_path(reporter_baton, "", 0, svn_depth_infinity, TRUE, NULL, pool);
        status_reporter->finish_report(reporter_baton, pool);
    } else {
        fprintf(stderr, "Unable to open connection to %s: %s\n", url, error->message);
    }

    apr_pool_destroy(pool);
    apr_pool_terminate();
    return 0;
}

To compile the code on debian we need the latest subversion from the trunk and APR and sqlite libraries from the apt:

$ sudo aptitude install libapr1-dev libaprutil1-dev libsqlite3-dev
$ gcc crawl_repository.c -I/usr/include/subversion-1 -I/usr/include/apr-1.0 -lsvn_ra-1 -lsvn_client-1 -o crawl_repository

The code reports the working copy root as with start_empty=TRUE. As result the server sends add_directory and add_file editor calls that we can use to list the repository contents.

$ ./crawl_repository
listing revision                        2
entered root directory
entered directory                       trunk
entered file                            trunk/file
left file, md5sum = d41d8cd98f00b204e9800998ecf8427e
left directory
entered directory                       branches
left directory
entered directory                       tags
left directory
left directory

And I’ll just notice that this approach should be faster than the approach used by “svn list –depth infinity”, because “svn list –depth infinity” uses a number of recursive “git dir” calls (that result in a number of network requests). The approach based on the “status” request and start_empty=TRUE allows to perform only one request.