Unison for File Syncing

Virtualization technologies such as Docker, Vagrant, and Virtualbox provide new opportunities for pre-built development environment images. But where do your project files go?

I personally prefer keeping my files on my local development machine (a laptop in my case) – it is generally faster to develop in (the IDE is faster) and it means I can blow away the virtualized environment at any time, knowing my master source code is safe.

But how then to get the files into the virtualized environment?

55308Unison

The technology I wanted to focus on for this post is Unison, a bi-directional file syncing application available on Linux, Windows, and Mac OS X. You can set it up so any change on either file system is automatically replicated to the other side. This means for Magento, local file system edits are copied into the virtualized environment, plus any var/generated code created are automatically copied back to the laptop, for use by my IDE during debugging.

When you start up Unison, it does a full tree walk doing any necessary file copying. The default rule to check files is based on file size and timestamps. (You can use file contents checksum instead, but it runs slower.) After that phase, it watches the file system on the client and server hosts for modification events and triggers an incremental sync operation to push changes for modified files to the other end.

Unison uses ssh to log on to the remote server, making it also possible to use on remote cloud servers securely. It also supports plain sockets.

Documentation for Unison can be found at http://www.cis.upenn.edu/~bcpierce/unison/docs.html. It supports “profiles” to be defined by configuration files on disk. I am going old school here at the moment, and just using command line arguments (in a short shell script) to invoke the command. That is, for Magento 2, I am using the following command. Note that /magento2 is where file files reside inside the virtualized environment in this example.

$ unison . ssh://magento@localhost//magento2 \
     -auto -batch -repeat watch \
     -ignore "Path Dockerfile" \
     -ignore "Path Vagrantfile" \
     -ignore "Path .vagrant" \
     -ignore "Path .git" \
     -ignore "Path .gitignore" \
     -ignore "Path .gitattributes" \
     -ignore "Path var/cache" \
     -ignore "Path var/composer_home" \
     -ignore "Path var/log" \
     -ignore "Path var/page_cache" \
     -ignore "Path var/session" \
     -ignore "Path var/tmp" \
     -ignore "Path pub/media" \
     -ignore "Path pub/static" \
     -ignore "Path .idea" \
     -ignore "Path app/etc/env.php" \
     -ignore "Path .magento" \
     -ignore "Path template"

(Please let me know of any additions or removals to the above list.)

Note the above command includes synchronizing ‘vendor’, which adds quite a bit of startup time (around 30 seconds on my laptop). I was also playing with using the following command to do the sync once on ‘vendor’, then exit.

$ unison vendor ssh://magento@localhost//magento2/vendor -batch

Then I run that by hand whenever I make a change to the vendor directory. That solved the general startup performance problem of the watch mode as it could exclude the whole vendor directory, which is reasonably large.

Paths

The nice thing is on Windows the pre-compiled binaries I managed to find appear to be self-contained. That is, you don’t need to install Cygwin or similar to run the tool within. There is nothing wrong with Cygwin, but I find I have Cygwin, Git Bash, PowerShell, and CMD prompts open at different times. Cygwin binaries expect paths starting with /cygdrive/c/, Git Bash uses /c/ as prefix, and native Windows utilities use C:\. It gets confusing at times.

For Unison, I use paths relative to the project directory, with forward slashes. Following this strategy has not caused any problems to date.

Installation

Unison is an interesting project to me as it feels more like a “traditional” open source project. Spurts of activity at times, no commercial backing, different community members providing binaries for different platforms, various blog posts describing how to install that are out of date, etc.

For example, there are different versions of Unison. You need to make sure the client and the server you are talking to are the same version or else you may get problems. That is, it will start up without error, but after a while things go wrong.

So which version to use? The default that comes with Debian (easily installed via apt-get) is 2.40.102. Unfortunately, the Windows binary download page I found at https://www.irif.fr/~vouillon/unison/ does not have the same version available. I tried 2.40.69 (from back in 2011), but had some problems getting it working – possibly user error, I don’t know. But I found a newer version that did work 2.48.3. (Oh, that is 2.48.3 compiled with OCaml 4.01.0, not the binary compiled with OCaml 4.02.1 which the download page says in incompatible… ah, the joys of open source!)

Which is the primary reason for me posting this post. Its October 2016 as I write this, and the combination of Linux and Windows binaries that I found that appear to work together are… (drum roll please!):

Windows Installation

For Windows, there is a site with precompiled binaries. You can go to https://www.irif.fr/~vouillon/unison/ and grab the ZIP from https://www.irif.fr/~vouillon/unison/unison%202.48.3.zip. This ZIP includes the Unison executable, and second executable unison-fsmonitor.exe for watching for file system events (needed for the “watch” mode to work). Put these two binaries in your path, and you are ready to go.

Linux Installation

I could not find a nice version around that I could get to work via apt-get (when I followed the instructions in all the blog posts, they reported 404 errors), but there are binaries available online. So I used the following:

$ cd /usr/local/bin
$ curl -L -o unison https://github.com/TentativeConvert/Syndicator/raw/master/unison-binaries/unison-2.48.3
$ curl -L -o unison-fsmonitor https://github.com/TentativeConvert/Syndicator/raw/master/unison-binaries/unison-fsmonitor
$ chmod +x unison*

Not as nice as apt-get, but pretty straightforward regardless.

Mac OS X Installation

I have not tried it myself yet, but the Windows download page also has a section for OS X. Just download the ZIP at http://unison-binaries.inria.fr/files/Unison-OS-X-2.48.3.zip.

Conclusions

So far so good. I want to experiment more, but Unison is looking quite promising. It allows me to keep the master files natively on my laptop, it supports bi-directional syncing, has precompiled binaries, and more. I am writing this post in part to share the tool for those who did not know of it, but also asking for any experience of others using this tool.

Referring back to my previous post on the Magento 2 tool chain, Unison may be a good technology for the “file sharing” double-headed arrow for syncing files between the “project source files” and the “full-stack development environment”.

It is interesting also because it is not tied to any specific virtualization technology – it only relies on network access to the environment (ssh or raw sockets can both be used).

PS: Other Technologies

There are a range of alternative approaches that may also be of interest. I have mentioned many of these in previous posts, but I will briefly mention a list here as well in case you want to explore other options further.

  • You can run Samba inside the virtualized environment and then mount that from your development environment.
  • PHP Storm has built in “copy on write” where whenever it saves a file, it can also save a copy in the remote container.
  • WinSCP includes a “copy on write” (keep in sync) mode that will sftp a copy of any files in the local directory to the remote server. This is similar to PHP Storm, but works with any text editor as it just watches the file system for changes.
  • Docker 1.12 introduces improved native volume mounting, allowing Docker containers to mount the native file system directly. For Windows, this does not support iNotify events (yet), meaning tools like Grunt and Gulp inside the virtualized environment do not get “file changed” events. For Mac OS X, there are still some performance problems with very large numbers of files.
  • There is the docker-sync project that can use Unison or rsync to share files. It is kinda cool in demonstrating the flexibility of Docker – you spin up a separate container mounting the same file system and away you go. It will work with any mounted file system without modification to your other containers.
  • Vagrant rsync-auto I was trying recently to pretty good effect. It watches the file system and then performs a sync command to copy local file system changes into the container. If using Vagrant, it is worth looking into.
  • There is also rsync of course, but it is only a one-directional replication solution.

12 comments

  1. Hello Alan,
    interesting, VirtualBox shares are well known to be so slow. Why instead of using unison you do not use an NFS share? You will get an incredible performance boost if compared to VirtualBox shares.
    You can add NFS support to windows too if your problem is there. I sucsefully tried haneWIN NFS some times ago (https://www.hanewin.net/nfs-e.htm).

    Hope it helps.

    1. There are multiple ways of doing it that work – I was not trying to be prescriptive. Personally I found exporting a nfs mount via samba too slow (php storm would take like x10 slower when reinforcing files). If I understand, you are doing the opposite – mounting the nfs mount in the vm. Do you know if inotify events are supported via haneWIN NFS? I thought also Windows includes NFS built in now right? I cannot remember the exact results of trying that either – but vaguely recall not being satisfied.

      1. Not a samba share, an NFS share. I noticed a huge performance difference.

        Do not know if windows is supporting NFS natively, last time I tried it was not and AFAIK inotify is not supported at all 😦 .

        Thank you for your work.

  2. We’re using http://docker-sync.io/ for developing on MacOS with docker as the fileshare peformance is tragic on docker for mac, even NFS on mac is poor.

    1. Probably worth mentioning that docker-sync uses rsync or unison under the hood.

  3. inputpow · · Reply

    We’re using http://docker-sync.io/ for developing on MacOS with docker as the fileshare peformance is tragic on docker for mac, even NFS on mac is poor.

  4. Hi Alan,

    I just discovered your post as I searching on Unison. Could you perhaps help me out as I have not been been able to find an answer to my question on the web thus far?

    You write: “The default rule to check files is based on file size and timestamps. (You can use file contents checksum instead, but it runs slower.)”

    How can I enable Unison to check checksum instead of file size and time stamp? Which option do I need to enable? Thanks!

    1. It has been a while since I have fiddled with the options, but have a read up on “-fastcheck” in say the user guide: http://www.cis.upenn.edu/~bcpierce/unison/download/releases/stable/unison-manual.html

      1. Thank you very much for your immediate reply! I suppose I should disable “-fastcheck” then if I want to be sure the content all files will be scanned. I will give that a try.

      2. I would read up on the option to see if the defaults are what you want or not. But ultimately experimenting I think will be necessary. (I did not understand all that I read.)

  5. I know this is older, but I’ve just started trying this out with vagrant. Seems very slow. Did you have that issue as well?

    1. I have not tried vagrant for a while sorry. I don’t see why it should be especially slow… but who knows how the file watching api was implemented.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: