Subversion remote API: listing repository with “status” request

One of the strongest side of Subversion is its nice API. It includes working copy API, remote API, client API, and repository API.

SVN API

SVN API

  • Client API is the replacement of CLI for programs. It consists of analogs of all command line calls like “checkout”, “update”, “propset” and so on. Usually every function of the client API, depending on whether the arguments are URLs or paths, performs call of working copy API and/or remote API.
  • Working copy API consists of low-level working operations. Such a workinig copy abstraction allows Subversion to change the working copy format without touching other functionality.
  • Consists of function for working with remote repository. This API doesn’t require working copy existence and allows to work with remote SVN repository with working copy at all.
  • Repository API is used on the server side and works with different subversion repository formats.

Maybe, I’ve missed some other APIs, but I consider them less important. To my opinion remote API is the most interesting, because it allows to work with SVN repository. It consists of different requests that are trasfered over the network with different protocols SVN supports: DAV, SVN and file-protocol. The requests are executed on the server and the answer is returned in a form of callbacks.

I would divide nearly all remote API requests into 3 groups:

  • editor-based: update, diff, status, …;
  • log-like: log, “get eligible mergeinfo”;
  • “cheap requests”: “get dir”, “get latest revision”, “info”.

Cheap requests usually get some information about only one node (directory or file). Log-like usually return a sequence of “log entry” structures (usually containing revision, author, date, and changed paths), one per revision. Editor-based calls crawl the directories within one revision.

All editor-like calls have the following structure:

Editor-based requests

Editor-based requests

Reporter is the working copy replacement. It consists of 3 functions: set_path, delete_path, link_path. They describe what working copy state you have locally (to describe the working copy state one doesn’t need to have the working copy actually).

And the server in return describes what actions you should apply to that working copy in order to reach state of some revision. The actions are given in a form of editor calls.

For example, the history is like the following:

------------------------------------------------------------------------
r2 | root | 2012-07-15 13:03:58 +0000 (Sun, 15 Jul 2012) | 1 line
Changed paths:
   A /trunk/file

Added a file.
------------------------------------------------------------------------
r1 | root | 2012-07-15 13:03:30 +0000 (Sun, 15 Jul 2012) | 1 line
Changed paths:
   A /branches
   A /tags
   A /trunk

Initial.
------------------------------------------------------------------------

If we describe our working copy with (pseudo-code):

set_path("", 0),
set_path("trunk", 2)
delete_path("trunk/file")

— non-interesting parameters are omitted, this means that we tell the server: “I have all working copy at the state corresponding to revision 0, except the trunk that has the state of revision 2 but trunk/file is deleted and we haven’t it locally”.

If we call “update” to revision 2, the server will send the commands (pseudo-code):

target_revision(2)
open_root(0)
add_directory("branches")
close_directory() //for branches
add_directory("tags")
close_directory() //for tags
open_directory("trunk", 2)
add_file("trunk/file")
//send file contents --- some calls, let's omit them
close_file() //for trunk/file
close_directory() //for trunk
close_directory() //for root
close_edit()

If we call “update” to revision 1, the server will send the commands (pseudo-code):

target_revision(1)
open_root(0)
add_directory("branches")
close_directory() //for branches
add_directory("tags")
close_directory() //for tags
//open_directory for trunk will only be called
//if trunk has properties
//changed in r2, otherwise the trunk state described is already the desired state
close_directory() //for root
close_edit()

If we call “update” to revision 1, the server will send the commands (pseudo-code):

target_revision(0)
open_root(0)
delete_entry("trunk", 2)
close_directory() //for root
close_edit()

Usually reporter crawl the working copy to generate correct set_path/link_path/delete_path sequence, and the editor calls are usually applied to the working copy or used of generate patch or to show the status. But both crawliing the working copy and applying changes are optional.

set_path calls actually has several parameters (not only path+revision). One of the parameters is start_empty. If it is true, the path is considered as locally empty and the server should send all it’s contents, revision parameter is ignored then. For example “svn checkout” and “svn export” use set_path(“”, ignored_revision, start_empty=TRUE) call do describe the working copy.

Another paramter is depth. It is used for sparse working copy operations. For the example above the report will tell the server not to send “trunk” contents in the case of “update” to r2:

set_path("", 0),
set_path("trunk", 1, depth=empty)

In opposite in this case all the “trunk” contents will be sent (“update” to 2):

set_path("", 0),
set_path("trunk", 2, start_empty=TRUE)

The only difference between “status” and “update” requests is that “status” doesn’t request the files contents. So it can be used to just list the repostiory paths and properties. Here’s the example of the code

#include <svn_client.h>
#include <svn_auth.h>
#include <svn_ra.h>

static svn_error_t *
set_target_revision(void *edit_baton,
                    svn_revnum_t target_revision,
                    apr_pool_t *pool) {
    fprintf(stderr, "listing revision\t\t\t%d\n", target_revision);
    return SVN_NO_ERROR;
}                                                                                                                                                                                                          

static svn_error_t *
open_root(void *edit_baton,
          svn_revnum_t base_revision,
          apr_pool_t *pool,
          void **dir_baton) {
    fprintf(stderr, "entered root directory\n");
    return SVN_NO_ERROR;
}                                                                                                                                                                                                          

static svn_error_t *
delete_entry(const char *path,
             svn_revnum_t revision,
             void *parent_baton,
             apr_pool_t *pool) {
    return SVN_NO_ERROR;
}

static svn_error_t *
add_directory(const char *path,
              void *parent_baton,
              const char *copyfrom_path,
              svn_revnum_t copyfrom_revision,
              apr_pool_t *pool,
              void **child_baton) {
    fprintf(stderr, "entered directory\t\t\t%s\n", path);
    return SVN_NO_ERROR;
}

static svn_error_t *
open_directory(const char *path,
               void *parent_baton,
               svn_revnum_t base_revision,
               apr_pool_t *pool,
               void **child_baton) {
    return SVN_NO_ERROR;
}

static svn_error_t *
change_dir_prop(void *dir_baton,
                const char *name,
                const svn_string_t *value,
                apr_pool_t *pool) {
    return SVN_NO_ERROR;
}

static svn_error_t *
close_directory(void *dir_baton,
                apr_pool_t *pool) {
    fprintf(stderr, "left directory\n");
    return SVN_NO_ERROR;
}

static svn_error_t *
add_file(const char *path,
         void *parent_baton,
         const char *copyfrom_path,
         svn_revnum_t copyfrom_revision,
         apr_pool_t *pool,
         void **file_baton) {
    fprintf(stderr, "entered file     \t\t\t%s\n", path);
    return SVN_NO_ERROR;
}

static svn_error_t *
open_file(const char *path,
          void *parent_baton,
          svn_revnum_t base_revision,
          apr_pool_t *pool,
          void **file_baton) {
  return SVN_NO_ERROR;
}

static svn_error_t *
apply_textdelta(void *file_baton,
                const char *base_checksum,
                apr_pool_t *pool,
                svn_txdelta_window_handler_t *handler,
                void **handler_baton) {
  return SVN_NO_ERROR;
}

static svn_error_t *
change_file_prop(void *file_baton,
                 const char *name,
                 const svn_string_t *value,
                 apr_pool_t *pool) {
  return SVN_NO_ERROR;
}

static svn_error_t *
close_file(void *file_baton,
           const char *text_checksum,
           apr_pool_t *pool) {
  fprintf(stderr, "left file, md5sum = %s\n", text_checksum);
  return SVN_NO_ERROR;
}

static svn_error_t *
close_edit(void *edit_baton,
           apr_pool_t *pool) {
  fprintf(stderr, "listing finished\n");
  return SVN_NO_ERROR;
}

static svn_error_t *
auth_callback(svn_auth_cred_username_t **cred, void *baton, const char *realm, svn_boolean_t may_save, apr_pool_t *pool) {
    if (cred) {
        svn_auth_cred_username_t *ret = apr_pcalloc (pool, sizeof (*ret));
        ret->username = apr_pstrdup(pool, "username");
        *cred = ret;
    }
    return SVN_NO_ERROR;
}

int main(int argc, char **argv) {
    apr_pool_t* pool;
    const char* url = "file:///path/to/svn/repository";

    apr_pool_initialize();
    apr_pool_create_ex(&pool, NULL, NULL, NULL);

    // initialize remote access API
    svn_ra_initialize(pool);

    svn_ra_callbacks2_t* callbacks;
    svn_ra_create_callbacks(&callbacks, pool);

    svn_ra_session_t* session;
    svn_error_t* error = svn_ra_open4(&session, NULL, url, NULL, callbacks, NULL, NULL, pool);

    if (!error) {
        const svn_ra_reporter3_t* status_reporter;
        void* reporter_baton;

        // revision to list (SVN_INVALID_REVNUM means HEAD revision)
        svn_revnum_t revision = SVN_INVALID_REVNUM;

        // setup our editor
        svn_delta_editor_t *editor = svn_delta_default_editor(pool);
        editor->set_target_revision = set_target_revision;
        editor->open_root = open_root;
        editor->add_directory = add_directory;
        editor->close_directory = close_directory;
        editor->add_file = add_file;
        editor->close_file = close_file;

        // run status call
        svn_ra_do_status2(session, &status_reporter, &reporter_baton, "", revision, svn_depth_infinity, editor, NULL, pool);

        // report our virtual working copy as empty (start_empty=TRUE)
        status_reporter->set_path(reporter_baton, "", 0, svn_depth_infinity, TRUE, NULL, pool);
        status_reporter->finish_report(reporter_baton, pool);
    } else {
        fprintf(stderr, "Unable to open connection to %s: %s\n", url, error->message);
    }

    apr_pool_destroy(pool);
    apr_pool_terminate();
    return 0;
}

To compile the code on debian we need the latest subversion from the trunk and APR and sqlite libraries from the apt:

$ sudo aptitude install libapr1-dev libaprutil1-dev libsqlite3-dev
$ gcc crawl_repository.c -I/usr/include/subversion-1 -I/usr/include/apr-1.0 -lsvn_ra-1 -lsvn_client-1 -o crawl_repository

The code reports the working copy root as with start_empty=TRUE. As result the server sends add_directory and add_file editor calls that we can use to list the repository contents.

$ ./crawl_repository
listing revision                        2
entered root directory
entered directory                       trunk
entered file                            trunk/file
left file, md5sum = d41d8cd98f00b204e9800998ecf8427e
left directory
entered directory                       branches
left directory
entered directory                       tags
left directory
left directory

And I’ll just notice that this approach should be faster than the approach used by “svn list –depth infinity”, because “svn list –depth infinity” uses a number of recursive “git dir” calls (that result in a number of network requests). The approach based on the “status” request and start_empty=TRUE allows to perform only one request.

Comments are closed.