1 Mandatory File Locking For The Linux Operating System 2 3 Andy Walker <andy@lysaker.kvaerner.no> 4 5 15 April 1996 6 (Updated September 2007) 7 80. Why you should avoid mandatory locking 9----------------------------------------- 10 11The Linux implementation is prey to a number of difficult-to-fix race 12conditions which in practice make it not dependable: 13 14 - The write system call checks for a mandatory lock only once 15 at its start. It is therefore possible for a lock request to 16 be granted after this check but before the data is modified. 17 A process may then see file data change even while a mandatory 18 lock was held. 19 - Similarly, an exclusive lock may be granted on a file after 20 the kernel has decided to proceed with a read, but before the 21 read has actually completed, and the reading process may see 22 the file data in a state which should not have been visible 23 to it. 24 - Similar races make the claimed mutual exclusion between lock 25 and mmap similarly unreliable. 26 271. What is mandatory locking? 28------------------------------ 29 30Mandatory locking is kernel enforced file locking, as opposed to the more usual 31cooperative file locking used to guarantee sequential access to files among 32processes. File locks are applied using the flock() and fcntl() system calls 33(and the lockf() library routine which is a wrapper around fcntl().) It is 34normally a process' responsibility to check for locks on a file it wishes to 35update, before applying its own lock, updating the file and unlocking it again. 36The most commonly used example of this (and in the case of sendmail, the most 37troublesome) is access to a user's mailbox. The mail user agent and the mail 38transfer agent must guard against updating the mailbox at the same time, and 39prevent reading the mailbox while it is being updated. 40 41In a perfect world all processes would use and honour a cooperative, or 42"advisory" locking scheme. However, the world isn't perfect, and there's 43a lot of poorly written code out there. 44 45In trying to address this problem, the designers of System V UNIX came up 46with a "mandatory" locking scheme, whereby the operating system kernel would 47block attempts by a process to write to a file that another process holds a 48"read" -or- "shared" lock on, and block attempts to both read and write to a 49file that a process holds a "write " -or- "exclusive" lock on. 50 51The System V mandatory locking scheme was intended to have as little impact as 52possible on existing user code. The scheme is based on marking individual files 53as candidates for mandatory locking, and using the existing fcntl()/lockf() 54interface for applying locks just as if they were normal, advisory locks. 55 56Note 1: In saying "file" in the paragraphs above I am actually not telling 57the whole truth. System V locking is based on fcntl(). The granularity of 58fcntl() is such that it allows the locking of byte ranges in files, in addition 59to entire files, so the mandatory locking rules also have byte level 60granularity. 61 62Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite 63borrowing the fcntl() locking scheme from System V. The mandatory locking 64scheme is defined by the System V Interface Definition (SVID) Version 3. 65 662. Marking a file for mandatory locking 67--------------------------------------- 68 69A file is marked as a candidate for mandatory locking by setting the group-id 70bit in its file mode but removing the group-execute bit. This is an otherwise 71meaningless combination, and was chosen by the System V implementors so as not 72to break existing user programs. 73 74Note that the group-id bit is usually automatically cleared by the kernel when 75a setgid file is written to. This is a security measure. The kernel has been 76modified to recognize the special case of a mandatory lock candidate and to 77refrain from clearing this bit. Similarly the kernel has been modified not 78to run mandatory lock candidates with setgid privileges. 79 803. Available implementations 81---------------------------- 82 83I have considered the implementations of mandatory locking available with 84SunOS 4.1.x, Solaris 2.x and HP-UX 9.x. 85 86Generally I have tried to make the most sense out of the behaviour exhibited 87by these three reference systems. There are many anomalies. 88 89All the reference systems reject all calls to open() for a file on which 90another process has outstanding mandatory locks. This is in direct 91contravention of SVID 3, which states that only calls to open() with the 92O_TRUNC flag set should be rejected. The Linux implementation follows the SVID 93definition, which is the "Right Thing", since only calls with O_TRUNC can 94modify the contents of the file. 95 96HP-UX even disallows open() with O_TRUNC for a file with advisory locks, not 97just mandatory locks. That would appear to contravene POSIX.1. 98 99mmap() is another interesting case. All the operating systems mentioned 100prevent mandatory locks from being applied to an mmap()'ed file, but HP-UX 101also disallows advisory locks for such a file. SVID actually specifies the 102paranoid HP-UX behaviour. 103 104In my opinion only MAP_SHARED mappings should be immune from locking, and then 105only from mandatory locks - that is what is currently implemented. 106 107SunOS is so hopeless that it doesn't even honour the O_NONBLOCK flag for 108mandatory locks, so reads and writes to locked files always block when they 109should return EAGAIN. 110 111I'm afraid that this is such an esoteric area that the semantics described 112below are just as valid as any others, so long as the main points seem to 113agree. 114 1154. Semantics 116------------ 117 1181. Mandatory locks can only be applied via the fcntl()/lockf() locking 119 interface - in other words the System V/POSIX interface. BSD style 120 locks using flock() never result in a mandatory lock. 121 1222. If a process has locked a region of a file with a mandatory read lock, then 123 other processes are permitted to read from that region. If any of these 124 processes attempts to write to the region it will block until the lock is 125 released, unless the process has opened the file with the O_NONBLOCK 126 flag in which case the system call will return immediately with the error 127 status EAGAIN. 128 1293. If a process has locked a region of a file with a mandatory write lock, all 130 attempts to read or write to that region block until the lock is released, 131 unless a process has opened the file with the O_NONBLOCK flag in which case 132 the system call will return immediately with the error status EAGAIN. 133 1344. Calls to open() with O_TRUNC, or to creat(), on a existing file that has 135 any mandatory locks owned by other processes will be rejected with the 136 error status EAGAIN. 137 1385. Attempts to apply a mandatory lock to a file that is memory mapped and 139 shared (via mmap() with MAP_SHARED) will be rejected with the error status 140 EAGAIN. 141 1426. Attempts to create a shared memory map of a file (via mmap() with MAP_SHARED) 143 that has any mandatory locks in effect will be rejected with the error status 144 EAGAIN. 145 1465. Which system calls are affected? 147----------------------------------- 148 149Those which modify a file's contents, not just the inode. That gives read(), 150write(), readv(), writev(), open(), creat(), mmap(), truncate() and 151ftruncate(). truncate() and ftruncate() are considered to be "write" actions 152for the purposes of mandatory locking. 153 154The affected region is usually defined as stretching from the current position 155for the total number of bytes read or written. For the truncate calls it is 156defined as the bytes of a file removed or added (we must also consider bytes 157added, as a lock can specify just "the whole file", rather than a specific 158range of bytes.) 159 160Note 3: I may have overlooked some system calls that need mandatory lock 161checking in my eagerness to get this code out the door. Please let me know, or 162better still fix the system calls yourself and submit a patch to me or Linus. 163 1646. Warning! 165----------- 166 167Not even root can override a mandatory lock, so runaway processes can wreak 168havoc if they lock crucial files. The way around it is to change the file 169permissions (remove the setgid bit) before trying to read or write to it. 170Of course, that might be a bit tricky if the system is hung :-( 171 1727. The "mand" mount option 173-------------------------- 174Mandatory locking is disabled on all filesystems by default, and must be 175administratively enabled by mounting with "-o mand". That mount option 176is only allowed if the mounting task has the CAP_SYS_ADMIN capability. 177 178Since kernel v4.5, it is possible to disable mandatory locking 179altogether by setting CONFIG_MANDATORY_FILE_LOCKING to "n". A kernel 180with this disabled will reject attempts to mount filesystems with the 181"mand" mount option with the error status EPERM. 182