EOLs in Git and SVN

This post will explain how Subversion handles line endings, how Git couples with the same problem, and how not to lose those settings while Git to SVN or SVN to Git translation.

When a team members use different OSes with different default EOLs it’s important to allow them to work on the same files without causing EOLs mess or other problems. Everybody knows that if Windows Notepad doesn’t like LFs. But not everybody knows about CRLF problems in shell scripts:

$ echo '#!/bin/sh' > test.sh
$ echo >> test.sh
$ echo 'echo Hello world!' >> test.sh

$ bash test.sh
Hello world!
$ dash test.sh
Hello world!

$ unix2dos test.sh
unix2dos: converting file test.sh to DOS format ...

$ bash test.sh
test.sh: line 2: $'\r': command not found
Hello world!

$ dash test.sh
: not found test.sh:
Hello world!

Too strange behaviour for XIX century. To avoid these problems let’s take care about line endings.

EOLs in Subversion

Subversion controls line endings using svn:eol-style property. Its valid values are:

  • native — when checking out the file EOLs will be converted to the current system default EOL (CRLF on Windows, LF on Linux)
  • LF — when checking out the file EOLs will be converted to LFs
  • CRLF
  • CR
  • the property is not set — treat the file as binary, no EOL management

Note that at the SVN repository file can have arbitrary EOLs (and pristine files (se my post about pristine files) contain the same EOLs as the repository does) the conversion is performed while creation of the working copy file. Usually Subversion clients take care about repository contents and svn:eol-style correspondence. I.e. file with svn:eol-style=LF will have LF endings in the repository, file with svn:eol-style=CRLF will have CRLFs. In the special case svn:eol-style=native the file will be stored with LFs. But the Subversion remote API doesn’t check for EOLs inconsistencies. If the repository is rather old or a buggy client was used to work with it, there’s a chance to meet other combnations.

There’s one more Subversion property that relates to line endings — svn:mime-type. When you set svn:eol-style on a file, you say to Subversion that this file is a text file, but if you set svn:mime-type on the same file, its value should start with “text/” otherwise Subversion will report about svn:eol-style-svn:mime-type inconsistency and will fail to work (one of the most common mistake is to set svn:eol-style to some value and svn:mime-type to application/xml — use text/xml instead, application/xml means “not human-readable XML”).

Usually it is recommended to set Subversion autoproperties for svn:eol-style to native and to set it to LFs only shell scripts.

EOLs in Git

Line endings in Git for individual files are controlled by Git attributes “text” and “eol”. The first one says to Git that the file is not binary. Possbile “text” attribute values are:

  • auto — check for binary (Git thinks that the file is binary if the first 8kb contains at least one zero byte)
  • the attribute is set — the file is treated as text
  • the attribute is unset — the file is treated as binary

“eol” attribute is an analog of svn:eol-style. It’s value are:

  • lf — working tree file line endings will be converted to LFs, the blob is assumed to contain LFs
  • crlf — the same but to CRLFs, the blob is assumed to contain LFs
  • the attribute value is undefined (!eol) but “text” attribute is set — the file line endings are specified in core.eol config file (which possible values are lf, crlf, native (default))
  • the attribute value is unset and “text” attribute is unset too — thee file is treated as binary (note that if “eol” attribute is set, “text” value is ignored and assumed to be set)

So for the most of the files “/file text !eol” will be the best option (it’s meaning corresponds to svn:eol-style=native). For individual LF and CRLF files the setting will be “/lf_file eol=lf” “/crlf_file eol=crlf”.

One may even use *-rule to create an analog of Subversion autoproperties (this rule will be applied to every newly created file): “* text=auto !eol”. “text=auto” will care about binary files.

But be careful: if “eol” attribute is set, blob should already contain LFs. Otherwise you’ll have a problem: to understand if the file with “eol” attribute is changed Git converts it to LFs, calculates SHA-1 for it (assuming it’s a blob) and compares to the corresponding hash code in the database. If the object database blob contains CRLFs in the blob, these hash ids won’t be equal:

$ git init repo
$ echo "line 1" >> file
$ echo "line 2" >> file
$ unix2dos file
$ git add file
$ git commit -m "Added a file with CRLFs"

#now the database contains file with CRLFs, let's change it's eol attribute
$ echo "/file eol=crlf" > .gitattributes

$ git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#       .gitattributes

#let's say to Git it should reread the file contents
$ touch file

$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#       modified:   file
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#       .gitattributes

#ok, we have some unexpected changes, let's try to discard them:
$ git reset --hard HEAD
HEAD is now at 14fd0ce Added a file with CRLFs

$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#       modified:   file
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#       .gitattributes
no changes added to commit (use "git add" and/or "git commit -a")

#they can't be discarded!!

Looks like Git is a bit stupid here, it could be more tolerant to EOLs settings changes. To couple with this one can just unset attributes for the file, to treat it as binary again, change it’s EOLs to LF (dos2unix), commit to put it into the database and set “text” and/or “eol” again. But this is the only problem that appears with attributes-related approach. Hope it will be fixed soon.

Why not to use git-svn if you care about EOLs?

Git-svn is a perl script that tries to convert Git contents to Subversion and vice versa. But it doesn’t perform EOLs conversion at all (absolutely ignoring svn:eol-style property and Git gitattributes), converting only Subversion repository contents to Git blobs not touching corresponding properties and attributes.

As we know, Subversion keeps all text files with LFs if svn:eol-style=native. So while cloning Subversion repository git-svn converts those text files to blobs with LFs and on Windows Git won’t convert their EOLs to native while checking out (that is inconsistent with Subversion property semantics).

Should one set core.autocrlf?

Usually people recommend to use core.autocrlf=true config setting (which sense is equal to set “* eol=crlf” attribute) but is a weapon of mass destruction: it will convert not only files svn:eol-style=native to CRLFs but also files with svn:eol-style=LF to CRLFs too. And also if core.autocrlf=true when adding any to the Git database, Git converts blobs’ EOLs to LFs and send to Subversion in this form — even for files with svn:eol-style=CRLF this will result into inconsistencies between Subversion file contents and svn:eol-style property.

My answer is no. To convert svn:eol-style to Git attributes correctly one should set attributes carefully for each file. Git attributes have higher priority than core.autocrlf, so if the corresponding attributes are set core.autocrlf value is ignored.

How can I do that automatically?

Fortunately there’s a Git-SVN bridge called SubGit. In short: you install it into the repository and it converts Subversion revisions to Git commits and vice versa. Among other features it performs svn:eol-style+svn:mime-type conversion to “text” and “eol” attributes:

$ svnadmin create repo
$ subgit install repo
$ git clone repo repo-git
$ cd repo-git
$ echo "line 1" >> native_file
$ echo "line 2" >> native_file
$ cp native_file lf_file
$ cp native_file crlf_file
$ cp native_file binary_file
$ unix2dos crlf_file
unix2dos: converting file crlf_file to DOS format ...
$ cp native_file auto_native_file
# dd if=/dev/zero of=auto_binary_file count=1
$ nano .gitattributes #edit .gitattributes this way:

$ cat .gitattributes
/native_file text !eol
/crlf_file eol=crlf
/lf_file eol=lf
/binary_file -text
/auto_native_file text=auto !eol
/auto_binary_file text=auto !eol

$ git add .gitattributes
$ git add *_file
$ git commit -m "Added different files"
$ git push origin master
Counting objects: 4, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 353 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (4/4), done.
To /tmp/repo
 * [new branch]      master -> master

$ cd ../repo
$ svn proplist -v --depth infinity file://`pwd`
Properties on 'file:///tmp/repo/trunk/lf_file':
Properties on 'file:///tmp/repo/trunk/crlf_file':
Properties on 'file:///tmp/repo/trunk/native_file':
Properties on 'file:///tmp/repo/trunk/auto_binary_file':
Properties on 'file:///tmp/repo/trunk/auto_native_file':

Comments are closed.