Daire Byrne is the Global Head of Systems at DNEG. He has a Bachelor of Science in Experimental Physics from University College Dublin and a Masters of Science in Optoelectronics from Trinity College Dublin.
With over 18 years of experience in the industry, Daire started his career at Framestore in 2002 as Senior Systems Administrator before joining DNEG in 2009 as Senior Systems Architect. In his role as Global Head of Systems, which he has held since 2018, Daire leads a small team of senior systems engineers and architects who look after core IT infrastructures, scalable storage and farm rendering systems, as well as R&D on new IT infrastructure technologies.
v5.11 of the (mighty) Linux kernel was recently released with new features and patches that DNEG contributed to, with “NFS re-exporting” now an officially supported mode of operation. Find out how Daire and the DNEG Systems team worked with the Linux NFS development community to optimise its cloud rendering requirements.
*** Warning: this is going to get technical! ***
Cloud Rendering at DNEG
Like all the major VFX houses, DNEG uses large compute “render farms” consisting of large numbers of computers, CPUs and storage, to produce our amazing final frame renders.
While DNEG has quite sizeable on-premises render farms in all our global locations, sometimes as a deadline approaches, we need to utilise an extra burst of rendering using cloud compute instances to get a project over the finish line. With all our storage located on-premises, using cloud compute presents some unique challenges in ensuring that all the data required for rendering is equally available to the on-premises and remote cloud compute instances.
Our digital content creation applications still heavily rely on POSIX filesystems on block storage which is not traditionally an optimal way of interacting with cloud services. The latency of POSIX metadata operations can greatly slow down the transfer of data over the internet and our VPN connections to the cloud are bandwidth limited. There are generally two main strategies for dealing with this cloud storage problem.
Exploring our options
We could transfer and replicate all the files that the renders will need ahead of time to some cloud storage servers (e.g. NFS servers) which will then serve all the client compute instances. Because rendering frames mostly reads much of the same data per render process (geometry, textures etc), we can upload a large dataset once, that will then be used by multiple clients. But this process of pre-processing renders to figure out all the files required can be difficult and time consuming in a fast-moving pipeline. Transferring the entirety of a file might also not be the most efficient option, if only a small part of that file is actually needed for a render (the parts can vary depending on where the scene’s camera is pointing). We also need to transfer back all the resulting files once the render has completed, as an extra step.
The alternative method of serving data to the cloud instances is to directly read the data from the on-premises storage “on demand”. But if we had every compute instance doing this, they would mostly be reading the same files over again, and it would quickly saturate our network uplink to the cloud. What we need is a cloud storage system that sits in the middle and caches the data that is read from on-premises storage and then serves that cached data to all the cloud instances. So the first read comes directly from on-premises storage but all subsequent read requests for the same data from the cloud instances are then served from a local cloud storage server. All the writes are written through the cloud storage cache server directly back to the on-premises storage.
This kind of system is often referred to as “WAN acceleration” or “Filer caching”. As long as your workload is mostly reads of common asset data, this can be a very efficient way to mix cloud instances and on-premises rendering without having to do too much extra work.
Our primary on-premise storage is all Linux based and so our Linux render farms use the Linux NFS server and client implementations to move our data around. So when we learnt that v4.13 of the mainline Linux kernel had introduced the ability to “re-export” an NFS client mount (open-by-handle), we began investigating whether this could be used to re-export our on-premises storage to the cloud. The Linux kernel has already had the ability to cache NFS client reads to disk using fscache/cachefiles for many years, so could we combine that with this new re-export capability?
Adapting the Linux kernel to our needs
With the help of the Linux NFS developer community we first had to spend a bit of time on fscache/cachefiles to get it stable on the mainline kernels again. Once that was done we could set up a cloud-based Linux NFS server and have it mount our on-premises storage using the “fsc” option to cache NFS reads to local storage. Then we could configure NFS exports of those mounts and have our cloud compute instances mount them. This worked pretty well for us and it helped out on many of our projects over the last couple of years.
But it still had various performance and stability issues that meant it was not suitable for general use. The idea of re-exporting a Linux kernel client mount using the Linux kernel server was still very much not a supported or recommended configuration after all. But we had managed to catalogue most of the issues and so we approached the Linux NFS developer community once again with our findings in the hope that it could be further refined and become a supported mode of operation that others could use too.
The release of Linux v5.11
The VFX industry owes a lot to the work of the Linux kernel developers and the NFS developers and maintainers in particular. Much of our distributed systems rely on the performance and stability of the Linux NFS client. When the developers showed an interest in our re-export use case and started providing patches to test, we provided feedback and soon the performance and stability of our setup improved as the patches were refined.
From Linux v5.11, the NFS re-export of an NFS client is now a supported mode of operation and performance should be pretty good without any extra work. Combined with fscache/cachefiles, this makes for a pretty good “WAN accelerator” for NFS filesystems. If you are interested in trying this out and discover bugs or performance issues, then engage with the Linux NFS developer community and they will no doubt be more than happy to help!