]> git.pld-linux.org Git - packages/kernel.git/blame - kernel-unionfs.patch
- fix oops on mount with quota when mount takes long time
[packages/kernel.git] / kernel-unionfs.patch
CommitLineData
0c5527e5 1diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
82260373 2index 8c624a1..4aa288b 100644
0c5527e5
AM
3--- a/Documentation/filesystems/00-INDEX
4+++ b/Documentation/filesystems/00-INDEX
82260373 5@@ -110,6 +110,8 @@ udf.txt
2380c486
JR
6 - info and mount options for the UDF filesystem.
7 ufs.txt
8 - info on the ufs filesystem.
9+unionfs/
10+ - info on the unionfs filesystem
11 vfat.txt
12 - info on using the VFAT filesystem used in Windows NT and Windows 95
13 vfs.txt
0c5527e5
AM
14diff --git a/Documentation/filesystems/unionfs/00-INDEX b/Documentation/filesystems/unionfs/00-INDEX
15new file mode 100644
16index 0000000..96fdf67
17--- /dev/null
18+++ b/Documentation/filesystems/unionfs/00-INDEX
2380c486
JR
19@@ -0,0 +1,10 @@
20+00-INDEX
21+ - this file.
22+concepts.txt
23+ - A brief introduction of concepts.
24+issues.txt
25+ - A summary of known issues with unionfs.
26+rename.txt
27+ - Information regarding rename operations.
28+usage.txt
29+ - Usage information and examples.
0c5527e5
AM
30diff --git a/Documentation/filesystems/unionfs/concepts.txt b/Documentation/filesystems/unionfs/concepts.txt
31new file mode 100644
32index 0000000..b853788
33--- /dev/null
34+++ b/Documentation/filesystems/unionfs/concepts.txt
2380c486
JR
35@@ -0,0 +1,287 @@
36+Unionfs 2.x CONCEPTS:
37+=====================
38+
39+This file describes the concepts needed by a namespace unification file
40+system.
41+
42+
43+Branch Priority:
44+================
45+
46+Each branch is assigned a unique priority - starting from 0 (highest
47+priority). No two branches can have the same priority.
48+
49+
50+Branch Mode:
51+============
52+
53+Each branch is assigned a mode - read-write or read-only. This allows
54+directories on media mounted read-write to be used in a read-only manner.
55+
56+
57+Whiteouts:
58+==========
59+
60+A whiteout removes a file name from the namespace. Whiteouts are needed when
61+one attempts to remove a file on a read-only branch.
62+
63+Suppose we have a two-branch union, where branch 0 is read-write and branch
64+1 is read-only. And a file 'foo' on branch 1:
65+
66+./b0/
67+./b1/
68+./b1/foo
69+
70+The unified view would simply be:
71+
72+./union/
73+./union/foo
74+
75+Since 'foo' is stored on a read-only branch, it cannot be removed. A
76+whiteout is used to remove the name 'foo' from the unified namespace. Again,
77+since branch 1 is read-only, the whiteout cannot be created there. So, we
78+try on a higher priority (lower numerically) branch and create the whiteout
79+there.
80+
81+./b0/
82+./b0/.wh.foo
83+./b1/
84+./b1/foo
85+
86+Later, when Unionfs traverses branches (due to lookup or readdir), it
87+eliminate 'foo' from the namespace (as well as the whiteout itself.)
88+
89+
90+Opaque Directories:
91+===================
92+
93+Assume we have a unionfs mount comprising of two branches. Branch 0 is
94+empty; branch 1 has the directory /a and file /a/f. Let's say we mount a
95+union of branch 0 as read-write and branch 1 as read-only. Now, let's say
96+we try to perform the following operation in the union:
97+
98+ rm -fr a
99+
100+Because branch 1 is not writable, we cannot physically remove the file /a/f
101+or the directory /a. So instead, we will create a whiteout in branch 0
102+named /.wh.a, masking out the name "a" from branch 1. Next, let's say we
103+try to create a directory named "a" as follows:
104+
105+ mkdir a
106+
107+Because we have a whiteout for "a" already, Unionfs behaves as if "a"
108+doesn't exist, and thus will delete the whiteout and replace it with an
109+actual directory named "a".
110+
111+The problem now is that if you try to "ls" in the union, Unionfs will
112+perform is normal directory name unification, for *all* directories named
113+"a" in all branches. This will cause the file /a/f from branch 1 to
114+re-appear in the union's namespace, which violates Unix semantics.
115+
116+To avoid this problem, we have a different form of whiteouts for
117+directories, called "opaque directories" (same as BSD Union Mount does).
118+Whenever we replace a whiteout with a directory, that directory is marked as
119+opaque. In Unionfs 2.x, it means that we create a file named
120+/a/.wh.__dir_opaque in branch 0, after having created directory /a there.
121+When unionfs notices that a directory is opaque, it stops all namespace
122+operations (including merging readdir contents) at that opaque directory.
123+This prevents re-exposing names from masked out directories.
124+
125+
126+Duplicate Elimination:
127+======================
128+
129+It is possible for files on different branches to have the same name.
130+Unionfs then has to select which instance of the file to show to the user.
131+Given the fact that each branch has a priority associated with it, the
132+simplest solution is to take the instance from the highest priority
133+(numerically lowest value) and "hide" the others.
134+
135+
136+Unlinking:
137+=========
138+
139+Unlink operation on non-directory instances is optimized to remove the
140+maximum possible objects in case multiple underlying branches have the same
141+file name. The unlink operation will first try to delete file instances
142+from highest priority branch and then move further to delete from remaining
143+branches in order of their decreasing priority. Consider a case (F..D..F),
144+where F is a file and D is a directory of the same name; here, some
145+intermediate branch could have an empty directory instance with the same
146+name, so this operation also tries to delete this directory instance and
147+proceed further to delete from next possible lower priority branch. The
148+unionfs unlink operation will smoothly delete the files with same name from
149+all possible underlying branches. In case if some error occurs, it creates
150+whiteout in highest priority branch that will hide file instance in rest of
151+the branches. An error could occur either if an unlink operations in any of
152+the underlying branch failed or if a branch has no write permission.
153+
154+This unlinking policy is known as "delete all" and it has the benefit of
155+overall reducing the number of inodes used by duplicate files, and further
156+reducing the total number of inodes consumed by whiteouts. The cost is of
157+extra processing, but testing shows this extra processing is well worth the
158+savings.
159+
160+
161+Copyup:
162+=======
163+
164+When a change is made to the contents of a file's data or meta-data, they
165+have to be stored somewhere. The best way is to create a copy of the
166+original file on a branch that is writable, and then redirect the write
167+though to this copy. The copy must be made on a higher priority branch so
168+that lookup and readdir return this newer "version" of the file rather than
169+the original (see duplicate elimination).
170+
171+An entire unionfs mount can be read-only or read-write. If it's read-only,
172+then none of the branches will be written to, even if some of the branches
173+are physically writeable. If the unionfs mount is read-write, then the
174+leftmost (highest priority) branch must be writeable (for copyup to take
175+place); the remaining branches can be any mix of read-write and read-only.
176+
177+In a writeable mount, unionfs will create new files/dir in the leftmost
178+branch. If one tries to modify a file in a read-only branch/media, unionfs
179+will copyup the file to the leftmost branch and modify it there. If you try
180+to modify a file from a writeable branch which is not the leftmost branch,
181+then unionfs will modify it in that branch; this is useful if you, say,
182+unify differnet packages (e.g., apache, sendmail, ftpd, etc.) and you want
183+changes to specific package files to remain logically in the directory where
184+they came from.
185+
186+Cache Coherency:
187+================
188+
189+Unionfs users often want to be able to modify files and directories directly
190+on the lower branches, and have those changes be visible at the Unionfs
191+level. This means that data (e.g., pages) and meta-data (dentries, inodes,
192+open files, etc.) have to be synchronized between the upper and lower
193+layers. In other words, the newest changes from a layer below have to be
194+propagated to the Unionfs layer above. If the two layers are not in sync, a
195+cache incoherency ensues, which could lead to application failures and even
196+oopses. The Linux kernel, however, has a rather limited set of mechanisms
197+to ensure this inter-layer cache coherency---so Unionfs has to do most of
198+the hard work on its own.
199+
200+Maintaining Invariants:
201+
202+The way Unionfs ensures cache coherency is as follows. At each entry point
203+to a Unionfs file system method, we call a utility function to validate the
204+primary objects of this method. Generally, we call unionfs_file_revalidate
205+on open files, and __unionfs_d_revalidate_chain on dentries (which also
206+validates inodes). These utility functions check to see whether the upper
207+Unionfs object is in sync with any of the lower objects that it represents.
208+The checks we perform include whether the Unionfs superblock has a newer
209+generation number, or if any of the lower objects mtime's or ctime's are
210+newer. (Note: generation numbers change when branch-management commands are
211+issued, so in a way, maintaining cache coherency is also very important for
212+branch-management.) If indeed we determine that any Unionfs object is no
213+longer in sync with its lower counterparts, then we rebuild that object
214+similarly to how we do so for branch-management.
215+
216+While rebuilding Unionfs's objects, we also purge any page mappings and
217+truncate inode pages (see fs/unionfs/dentry.c:purge_inode_data). This is to
218+ensure that Unionfs will re-get the newer data from the lower branches. We
219+perform this purging only if the Unionfs operation in question is a reading
220+operation; if Unionfs is performing a data writing operation (e.g., ->write,
221+->commit_write, etc.) then we do NOT flush the lower mappings/pages: this is
222+because (1) a self-deadlock could occur and (2) the upper Unionfs pages are
223+considered more authoritative anyway, as they are newer and will overwrite
224+any lower pages.
225+
226+Unionfs maintains the following important invariant regarding mtime's,
227+ctime's, and atime's: the upper inode object's times are the max() of all of
228+the lower ones. For non-directory objects, there's only one object below,
229+so the mapping is simple; for directory objects, there could me multiple
230+lower objects and we have to sync up with the newest one of all the lower
231+ones. This invariant is important to maintain, especially for directories
232+(besides, we need this to be POSIX compliant). A union could comprise
233+multiple writable branches, each of which could change. If we don't reflect
234+the newest possible mtime/ctime, some applications could fail. For example,
235+NFSv2/v3 exports check for newer directory mtimes on the server to determine
236+if the client-side attribute cache should be purged.
237+
238+To maintain these important invariants, of course, Unionfs carefully
239+synchronizes upper and lower times in various places. For example, if we
240+copy-up a file to a top-level branch, the parent directory where the file
241+was copied up to will now have a new mtime: so after a successful copy-up,
242+we sync up with the new top-level branch's parent directory mtime.
243+
244+Implementation:
245+
246+This cache-coherency implementation is efficient because it defers any
247+synchronizing between the upper and lower layers until absolutely needed.
248+Consider the example a common situation where users perform a lot of lower
249+changes, such as untarring a whole package. While these take place,
250+typically the user doesn't access the files via Unionfs; only after the
251+lower changes are done, does the user try to access the lower files. With
252+our cache-coherency implementation, the entirety of the changes to the lower
253+branches will not result in a single CPU cycle spent at the Unionfs level
254+until the user invokes a system call that goes through Unionfs.
255+
256+We have considered two alternate cache-coherency designs. (1) Using the
257+dentry/inode notify functionality to register interest in finding out about
258+any lower changes. This is a somewhat limited and also a heavy-handed
259+approach which could result in many notifications to the Unionfs layer upon
260+each small change at the lower layer (imagine a file being modified multiple
261+times in rapid succession). (2) Rewriting the VFS to support explicit
262+callbacks from lower objects to upper objects. We began exploring such an
263+implementation, but found it to be very complicated--it would have resulted
264+in massive VFS/MM changes which are unlikely to be accepted by the LKML
265+community. We therefore believe that our current cache-coherency design and
266+implementation represent the best approach at this time.
267+
268+Limitations:
269+
270+Our implementation works in that as long as a user process will have caused
271+Unionfs to be called, directly or indirectly, even to just do
272+->d_revalidate; then we will have purged the current Unionfs data and the
273+process will see the new data. For example, a process that continually
274+re-reads the same file's data will see the NEW data as soon as the lower
275+file had changed, upon the next read(2) syscall (even if the file is still
276+open!) However, this doesn't work when the process re-reads the open file's
277+data via mmap(2) (unless the user unmaps/closes the file and remaps/reopens
278+it). Once we respond to ->readpage(s), then the kernel maps the page into
279+the process's address space and there doesn't appear to be a way to force
280+the kernel to invalidate those pages/mappings, and force the process to
281+re-issue ->readpage. If there's a way to invalidate active mappings and
282+force a ->readpage, let us know please (invalidate_inode_pages2 doesn't do
283+the trick).
284+
285+Our current Unionfs code has to perform many file-revalidation calls. It
286+would be really nice if the VFS would export an optional file system hook
287+->file_revalidate (similarly to dentry->d_revalidate) that will be called
288+before each VFS op that has a "struct file" in it.
289+
290+Certain file systems have micro-second granularity (or better) for inode
291+times, and asynchronous actions could cause those times to change with some
292+small delay. In such cases, Unionfs may see a changed inode time that only
293+differs by a tiny fraction of a second: such a change may be a false
294+positive indication that the lower object has changed, whereas if unionfs
295+waits a little longer, that false indication will not be seen. (These false
296+positives are harmless, because they would at most cause unionfs to
297+re-validate an object that may need no revalidation, and print a debugging
298+message that clutters the console/logs.) Therefore, to minimize the chances
299+of these situations, we delay the detection of changed times by a small
300+factor of a few seconds, called UNIONFS_MIN_CC_TIME (which defaults to 3
301+seconds, as does NFS). This means that we will detect the change, only a
302+couple of seconds later, if indeed the time change persists in the lower
303+file object. This delayed detection has an added performance benefit: we
304+reduce the number of times that unionfs has to revalidate objects, in case
305+there's a lot of concurrent activity on both the upper and lower objects,
306+for the same file(s). Lastly, this delayed time attribute detection is
307+similar to how NFS clients operate (e.g., acregmin).
308+
309+Finally, there is no way currently in Linux to prevent lower directories
310+from being moved around (i.e., topology changes); there's no way to prevent
311+modifications to directory sub-trees of whole file systems which are mounted
312+read-write. It is therefore possible for in-flight operations in unionfs to
313+take place, while a lower directory is being moved around. Therefore, if
314+you try to, say, create a new file in a directory through unionfs, while the
315+directory is being moved around directly, then the new file may get created
316+in the new location where that directory was moved to. This is a somewhat
317+similar behaviour in NFS: an NFS client could be creating a new file while
318+th NFS server is moving th directory around; the file will get successfully
319+created in the new location. (The one exception in unionfs is that if the
320+branch is marked read-only by unionfs, then a copyup will take place.)
321+
322+For more information, see <http://unionfs.filesystems.org/>.
0c5527e5
AM
323diff --git a/Documentation/filesystems/unionfs/issues.txt b/Documentation/filesystems/unionfs/issues.txt
324new file mode 100644
325index 0000000..f4b7e7e
326--- /dev/null
327+++ b/Documentation/filesystems/unionfs/issues.txt
2380c486
JR
328@@ -0,0 +1,28 @@
329+KNOWN Unionfs 2.x ISSUES:
330+=========================
331+
332+1. Unionfs should not use lookup_one_len() on the underlying f/s as it
333+ confuses NFSv4. Currently, unionfs_lookup() passes lookup intents to the
334+ lower file-system, this eliminates part of the problem. The remaining
335+ calls to lookup_one_len may need to be changed to pass an intent. We are
336+ currently introducing VFS changes to fs/namei.c's do_path_lookup() to
337+ allow proper file lookup and opening in stackable file systems.
338+
339+2. Lockdep (a debugging feature) isn't aware of stacking, and so it
340+ incorrectly complains about locking problems. The problem boils down to
341+ this: Lockdep considers all objects of a certain type to be in the same
342+ class, for example, all inodes. Lockdep doesn't like to see a lock held
343+ on two inodes within the same task, and warns that it could lead to a
344+ deadlock. However, stackable file systems do precisely that: they lock
345+ an upper object, and then a lower object, in a strict order to avoid
346+ locking problems; in addition, Unionfs, as a fan-out file system, may
347+ have to lock several lower inodes. We are currently looking into Lockdep
348+ to see how to make it aware of stackable file systems. For now, we
349+ temporarily disable lockdep when calling vfs methods on lower objects,
350+ but only for those places where lockdep complained. While this solution
351+ may seem unclean, it is not without precedent: other places in the kernel
352+ also do similar temporary disabling, of course after carefully having
353+ checked that it is the right thing to do. Anyway, you get any warnings
354+ from Lockdep, please report them to the Unionfs maintainers.
355+
356+For more information, see <http://unionfs.filesystems.org/>.
0c5527e5
AM
357diff --git a/Documentation/filesystems/unionfs/rename.txt b/Documentation/filesystems/unionfs/rename.txt
358new file mode 100644
359index 0000000..e20bb82
360--- /dev/null
361+++ b/Documentation/filesystems/unionfs/rename.txt
2380c486
JR
362@@ -0,0 +1,31 @@
363+Rename is a complex beast. The following table shows which rename(2) operations
364+should succeed and which should fail.
365+
366+o: success
367+E: error (either unionfs or vfs)
368+X: EXDEV
369+
370+none = file does not exist
371+file = file is a file
372+dir = file is a empty directory
373+child= file is a non-empty directory
374+wh = file is a directory containing only whiteouts; this makes it logically
375+ empty
376+
377+ none file dir child wh
378+file o o E E E
379+dir o E o E o
380+child X E X E X
381+wh o E o E o
382+
383+
384+Renaming directories:
385+=====================
386+
387+Whenever a empty (either physically or logically) directory is being renamed,
388+the following sequence of events should take place:
389+
390+1) Remove whiteouts from both source and destination directory
391+2) Rename source to destination
392+3) Make destination opaque to prevent anything under it from showing up
393+
0c5527e5
AM
394diff --git a/Documentation/filesystems/unionfs/usage.txt b/Documentation/filesystems/unionfs/usage.txt
395new file mode 100644
396index 0000000..1adde69
397--- /dev/null
398+++ b/Documentation/filesystems/unionfs/usage.txt
2380c486
JR
399@@ -0,0 +1,134 @@
400+Unionfs is a stackable unification file system, which can appear to merge
401+the contents of several directories (branches), while keeping their physical
402+content separate. Unionfs is useful for unified source tree management,
403+merged contents of split CD-ROM, merged separate software package
404+directories, data grids, and more. Unionfs allows any mix of read-only and
405+read-write branches, as well as insertion and deletion of branches anywhere
406+in the fan-out. To maintain Unix semantics, Unionfs handles elimination of
407+duplicates, partial-error conditions, and more.
408+
409+GENERAL SYNTAX
410+==============
411+
412+# mount -t unionfs -o <OPTIONS>,<BRANCH-OPTIONS> none MOUNTPOINT
413+
414+OPTIONS can be any legal combination of:
415+
416+- ro # mount file system read-only
417+- rw # mount file system read-write
418+- remount # remount the file system (see Branch Management below)
419+- incgen # increment generation no. (see Cache Consistency below)
420+
421+BRANCH-OPTIONS can be either (1) a list of branches given to the "dirs="
422+option, or (2) a list of individual branch manipulation commands, combined
423+with the "remount" option, and is further described in the "Branch
424+Management" section below.
425+
426+The syntax for the "dirs=" mount option is:
427+
428+ dirs=branch[=ro|=rw][:...]
429+
430+The "dirs=" option takes a colon-delimited list of directories to compose
431+the union, with an optional branch mode for each of those directories.
432+Directories that come earlier (specified first, on the left) in the list
433+have a higher precedence than those which come later. Additionally,
434+read-only or read-write permissions of the branch can be specified by
435+appending =ro or =rw (default) to each directory. See the Copyup section in
436+concepts.txt, for a description of Unionfs's behavior when mixing read-only
437+and read-write branches and mounts.
438+
439+Syntax:
440+
441+ dirs=/branch1[=ro|=rw]:/branch2[=ro|=rw]:...:/branchN[=ro|=rw]
442+
443+Example:
444+
445+ dirs=/writable_branch=rw:/read-only_branch=ro
446+
447+
448+BRANCH MANAGEMENT
449+=================
450+
451+Once you mount your union for the first time, using the "dirs=" option, you
452+can then change the union's overall mode or reconfigure the branches, using
453+the remount option, as follows.
454+
455+To downgrade a union from read-write to read-only:
456+
457+# mount -t unionfs -o remount,ro none MOUNTPOINT
458+
459+To upgrade a union from read-only to read-write:
460+
461+# mount -t unionfs -o remount,rw none MOUNTPOINT
462+
463+To delete a branch /foo, regardless where it is in the current union:
464+
465+# mount -t unionfs -o remount,del=/foo none MOUNTPOINT
466+
467+To insert (add) a branch /foo before /bar:
468+
469+# mount -t unionfs -o remount,add=/bar:/foo none MOUNTPOINT
470+
471+To insert (add) a branch /foo (with the "rw" mode flag) before /bar:
472+
473+# mount -t unionfs -o remount,add=/bar:/foo=rw none MOUNTPOINT
474+
475+To insert (add) a branch /foo (in "rw" mode) at the very beginning (i.e., a
476+new highest-priority branch), you can use the above syntax, or use a short
477+hand version as follows:
478+
479+# mount -t unionfs -o remount,add=/foo none MOUNTPOINT
480+
481+To append a branch to the very end (new lowest-priority branch):
482+
483+# mount -t unionfs -o remount,add=:/foo none MOUNTPOINT
484+
485+To append a branch to the very end (new lowest-priority branch), in
486+read-only mode:
487+
488+# mount -t unionfs -o remount,add=:/foo=ro none MOUNTPOINT
489+
490+Finally, to change the mode of one existing branch, say /foo, from read-only
491+to read-write, and change /bar from read-write to read-only:
492+
493+# mount -t unionfs -o remount,mode=/foo=rw,mode=/bar=ro none MOUNTPOINT
494+
495+Note: in Unionfs 2.x, you cannot set the leftmost branch to readonly because
496+then Unionfs won't have any writable place for copyups to take place.
497+Moreover, the VFS can get confused when it tries to modify something in a
498+file system mounted read-write, but isn't permitted to write to it.
499+Instead, you should set the whole union as readonly, as described above.
500+If, however, you must set the leftmost branch as readonly, perhaps so you
501+can get a snapshot of it at a point in time, then you should insert a new
502+writable top-level branch, and mark the one you want as readonly. This can
503+be accomplished as follows, assuming that /foo is your current leftmost
504+branch:
505+
506+# mount -t tmpfs -o size=NNN /new
507+# mount -t unionfs -o remount,add=/new,mode=/foo=ro none MOUNTPOINT
508+<do what you want safely in /foo>
509+# mount -t unionfs -o remount,del=/new,mode=/foo=rw none MOUNTPOINT
510+<check if there's anything in /new you want to preserve>
511+# umount /new
512+
513+CACHE CONSISTENCY
514+=================
515+
516+If you modify any file on any of the lower branches directly, while there is
517+a Unionfs 2.x mounted above any of those branches, you should tell Unionfs
518+to purge its caches and re-get the objects. To do that, you have to
519+increment the generation number of the superblock using the following
520+command:
521+
522+# mount -t unionfs -o remount,incgen none MOUNTPOINT
523+
524+Note that the older way of incrementing the generation number using an
525+ioctl, is no longer supported in Unionfs 2.0 and newer. Ioctls in general
526+are not encouraged. Plus, an ioctl is per-file concept, whereas the
527+generation number is a per-file-system concept. Worse, such an ioctl
528+requires an open file, which then has to be invalidated by the very nature
529+of the generation number increase (read: the old generation increase ioctl
530+was pretty racy).
531+
532+
533+For more information, see <http://unionfs.filesystems.org/>.
0c5527e5 534diff --git a/MAINTAINERS b/MAINTAINERS
82260373 535index 560ecce..09e38d6 100644
0c5527e5
AM
536--- a/MAINTAINERS
537+++ b/MAINTAINERS
82260373 538@@ -6276,6 +6276,14 @@ F: Documentation/cdrom/
0c5527e5
AM
539 F: drivers/cdrom/cdrom.c
540 F: include/linux/cdrom.h
541
542+UNIONFS
543+P: Erez Zadok
544+M: ezk@cs.sunysb.edu
545+L: unionfs@filesystems.org
546+W: http://unionfs.filesystems.org/
547+T: git git.kernel.org/pub/scm/linux/kernel/git/ezk/unionfs.git
548+S: Maintained
549+
550 UNSORTED BLOCK IMAGES (UBI)
551 M: Artem Bityutskiy <dedekind1@gmail.com>
552 W: http://www.linux-mtd.infradead.org/
553diff --git a/fs/Kconfig b/fs/Kconfig
82260373 554index 3db9caa..3dc2dfd 100644
0c5527e5
AM
555--- a/fs/Kconfig
556+++ b/fs/Kconfig
82260373 557@@ -170,6 +170,7 @@ if MISC_FILESYSTEMS
2380c486
JR
558 source "fs/adfs/Kconfig"
559 source "fs/affs/Kconfig"
560 source "fs/ecryptfs/Kconfig"
561+source "fs/unionfs/Kconfig"
562 source "fs/hfs/Kconfig"
563 source "fs/hfsplus/Kconfig"
564 source "fs/befs/Kconfig"
0c5527e5 565diff --git a/fs/Makefile b/fs/Makefile
82260373 566index a7f7cef..672664b 100644
0c5527e5
AM
567--- a/fs/Makefile
568+++ b/fs/Makefile
82260373 569@@ -81,6 +81,7 @@ obj-$(CONFIG_ISO9660_FS) += isofs/
2380c486
JR
570 obj-$(CONFIG_HFSPLUS_FS) += hfsplus/ # Before hfs to find wrapped HFS+
571 obj-$(CONFIG_HFS_FS) += hfs/
572 obj-$(CONFIG_ECRYPT_FS) += ecryptfs/
573+obj-$(CONFIG_UNION_FS) += unionfs/
574 obj-$(CONFIG_VXFS_FS) += freevxfs/
575 obj-$(CONFIG_NFS_FS) += nfs/
576 obj-$(CONFIG_EXPORTFS) += exportfs/
0c5527e5 577diff --git a/fs/namei.c b/fs/namei.c
82260373 578index 0087cf9..d3118a7 100644
0c5527e5
AM
579--- a/fs/namei.c
580+++ b/fs/namei.c
82260373
AM
581@@ -562,6 +562,7 @@ void release_open_intent(struct nameidata *nd)
582 fput(file);
583 }
2380c486
JR
584 }
585+EXPORT_SYMBOL_GPL(release_open_intent);
586
82260373
AM
587 static inline int d_revalidate(struct dentry *dentry, struct nameidata *nd)
588 {
0c5527e5 589diff --git a/fs/splice.c b/fs/splice.c
82260373 590index 50a5d97..a3af841 100644
0c5527e5
AM
591--- a/fs/splice.c
592+++ b/fs/splice.c
82260373 593@@ -1081,8 +1081,8 @@ EXPORT_SYMBOL(generic_splice_sendpage);
2380c486
JR
594 /*
595 * Attempt to initiate a splice from pipe to file.
596 */
597-static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
598- loff_t *ppos, size_t len, unsigned int flags)
599+long vfs_splice_from(struct pipe_inode_info *pipe, struct file *out,
600+ loff_t *ppos, size_t len, unsigned int flags)
601 {
4ae1df7a
JR
602 ssize_t (*splice_write)(struct pipe_inode_info *, struct file *,
603 loff_t *, size_t, unsigned int);
82260373 604@@ -1105,13 +1105,14 @@ static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
2380c486 605
4ae1df7a 606 return splice_write(pipe, out, ppos, len, flags);
2380c486
JR
607 }
608+EXPORT_SYMBOL_GPL(vfs_splice_from);
609
610 /*
611 * Attempt to initiate a splice from a file to a pipe.
612 */
613-static long do_splice_to(struct file *in, loff_t *ppos,
614- struct pipe_inode_info *pipe, size_t len,
615- unsigned int flags)
616+long vfs_splice_to(struct file *in, loff_t *ppos,
617+ struct pipe_inode_info *pipe, size_t len,
618+ unsigned int flags)
619 {
4ae1df7a
JR
620 ssize_t (*splice_read)(struct file *, loff_t *,
621 struct pipe_inode_info *, size_t, unsigned int);
82260373 622@@ -1131,6 +1132,7 @@ static long do_splice_to(struct file *in, loff_t *ppos,
2380c486 623
4ae1df7a 624 return splice_read(in, ppos, pipe, len, flags);
2380c486
JR
625 }
626+EXPORT_SYMBOL_GPL(vfs_splice_to);
627
628 /**
629 * splice_direct_to_actor - splices data directly between two non-pipes
82260373 630@@ -1200,7 +1202,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
2380c486
JR
631 size_t read_len;
632 loff_t pos = sd->pos, prev_pos = pos;
633
634- ret = do_splice_to(in, &pos, pipe, len, flags);
635+ ret = vfs_splice_to(in, &pos, pipe, len, flags);
636 if (unlikely(ret <= 0))
637 goto out_release;
638
82260373 639@@ -1259,8 +1261,8 @@ static int direct_splice_actor(struct pipe_inode_info *pipe,
2380c486
JR
640 {
641 struct file *file = sd->u.file;
642
76514441 643- return do_splice_from(pipe, file, &file->f_pos, sd->total_len,
0c5527e5 644- sd->flags);
76514441 645+ return vfs_splice_from(pipe, file, &file->f_pos, sd->total_len,
0c5527e5 646+ sd->flags);
2380c486
JR
647 }
648
0c5527e5 649 /**
82260373 650@@ -1345,7 +1347,7 @@ static long do_splice(struct file *in, loff_t __user *off_in,
2380c486
JR
651 } else
652 off = &out->f_pos;
653
13e5c3b1
AM
654- ret = do_splice_from(ipipe, out, off, len, flags);
655+ ret = vfs_splice_from(ipipe, out, off, len, flags);
2380c486
JR
656
657 if (off_out && copy_to_user(off_out, off, sizeof(loff_t)))
658 ret = -EFAULT;
82260373 659@@ -1365,7 +1367,7 @@ static long do_splice(struct file *in, loff_t __user *off_in,
2380c486
JR
660 } else
661 off = &in->f_pos;
662
13e5c3b1
AM
663- ret = do_splice_to(in, off, opipe, len, flags);
664+ ret = vfs_splice_to(in, off, opipe, len, flags);
2380c486
JR
665
666 if (off_in && copy_to_user(off_in, off, sizeof(loff_t)))
667 ret = -EFAULT;
0c5527e5
AM
668diff --git a/fs/stack.c b/fs/stack.c
669index 4a6f7f4..7eeef12 100644
670--- a/fs/stack.c
671+++ b/fs/stack.c
7670a7fc
AM
672@@ -1,8 +1,20 @@
673+/*
674+ * Copyright (c) 2006-2009 Erez Zadok
675+ * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
676+ * Copyright (c) 2006-2009 Stony Brook University
677+ * Copyright (c) 2006-2009 The Research Foundation of SUNY
678+ *
679+ * This program is free software; you can redistribute it and/or modify
680+ * it under the terms of the GNU General Public License version 2 as
681+ * published by the Free Software Foundation.
682+ */
683+
684 #include <linux/module.h>
685 #include <linux/fs.h>
686 #include <linux/fs_stack.h>
687
688-/* does _NOT_ require i_mutex to be held.
689+/*
690+ * does _NOT_ require i_mutex to be held.
691 *
692 * This function cannot be inlined since i_size_{read,write} is rather
693 * heavy-weight on 32-bit systems
0c5527e5
AM
694diff --git a/fs/unionfs/Kconfig b/fs/unionfs/Kconfig
695new file mode 100644
696index 0000000..f3c1ac4
697--- /dev/null
698+++ b/fs/unionfs/Kconfig
699@@ -0,0 +1,24 @@
700+config UNION_FS
701+ tristate "Union file system (EXPERIMENTAL)"
702+ depends on EXPERIMENTAL
703+ help
704+ Unionfs is a stackable unification file system, which appears to
705+ merge the contents of several directories (branches), while keeping
706+ their physical content separate.
707+
708+ See <http://unionfs.filesystems.org> for details
709+
710+config UNION_FS_XATTR
711+ bool "Unionfs extended attributes"
712+ depends on UNION_FS
713+ help
714+ Extended attributes are name:value pairs associated with inodes by
715+ the kernel or by users (see the attr(5) manual page).
716+
717+ If unsure, say N.
718+
719+config UNION_FS_DEBUG
720+ bool "Debug Unionfs"
721+ depends on UNION_FS
722+ help
723+ If you say Y here, you can turn on debugging output from Unionfs.
724diff --git a/fs/unionfs/Makefile b/fs/unionfs/Makefile
725new file mode 100644
82260373 726index 0000000..10a321a
0c5527e5
AM
727--- /dev/null
728+++ b/fs/unionfs/Makefile
729@@ -0,0 +1,17 @@
82260373 730+UNIONFS_VERSION="2.5.8 (for 2.6.38-rc7)"
0c5527e5
AM
731+
732+EXTRA_CFLAGS += -DUNIONFS_VERSION=\"$(UNIONFS_VERSION)\"
733+
734+obj-$(CONFIG_UNION_FS) += unionfs.o
735+
736+unionfs-y := subr.o dentry.o file.o inode.o main.o super.o \
737+ rdstate.o copyup.o dirhelper.o rename.o unlink.o \
738+ lookup.o commonfops.o dirfops.o sioq.o mmap.o whiteout.o
739+
740+unionfs-$(CONFIG_UNION_FS_XATTR) += xattr.o
741+
742+unionfs-$(CONFIG_UNION_FS_DEBUG) += debug.o
743+
744+ifeq ($(CONFIG_UNION_FS_DEBUG),y)
745+EXTRA_CFLAGS += -DDEBUG
746+endif
747diff --git a/fs/unionfs/commonfops.c b/fs/unionfs/commonfops.c
748new file mode 100644
749index 0000000..51ea65e
750--- /dev/null
751+++ b/fs/unionfs/commonfops.c
4ae1df7a 752@@ -0,0 +1,896 @@
2380c486 753+/*
7670a7fc 754+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
755+ * Copyright (c) 2003-2006 Charles P. Wright
756+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
757+ * Copyright (c) 2005-2006 Junjiro Okajima
758+ * Copyright (c) 2005 Arun M. Krishnakumar
759+ * Copyright (c) 2004-2006 David P. Quigley
760+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
761+ * Copyright (c) 2003 Puja Gupta
762+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
763+ * Copyright (c) 2003-2010 Stony Brook University
764+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
765+ *
766+ * This program is free software; you can redistribute it and/or modify
767+ * it under the terms of the GNU General Public License version 2 as
768+ * published by the Free Software Foundation.
769+ */
770+
771+#include "union.h"
772+
773+/*
774+ * 1) Copyup the file
775+ * 2) Rename the file to '.unionfs<original inode#><counter>' - obviously
776+ * stolen from NFS's silly rename
777+ */
778+static int copyup_deleted_file(struct file *file, struct dentry *dentry,
779+ struct dentry *parent, int bstart, int bindex)
780+{
781+ static unsigned int counter;
782+ const int i_inosize = sizeof(dentry->d_inode->i_ino) * 2;
783+ const int countersize = sizeof(counter) * 2;
784+ const int nlen = sizeof(".unionfs") + i_inosize + countersize - 1;
785+ char name[nlen + 1];
786+ int err;
787+ struct dentry *tmp_dentry = NULL;
788+ struct dentry *lower_dentry;
789+ struct dentry *lower_dir_dentry = NULL;
790+
791+ lower_dentry = unionfs_lower_dentry_idx(dentry, bstart);
792+
793+ sprintf(name, ".unionfs%*.*lx",
794+ i_inosize, i_inosize, lower_dentry->d_inode->i_ino);
795+
796+ /*
797+ * Loop, looking for an unused temp name to copyup to.
798+ *
799+ * It's somewhat silly that we look for a free temp tmp name in the
800+ * source branch (bstart) instead of the dest branch (bindex), where
801+ * the final name will be created. We _will_ catch it if somehow
802+ * the name exists in the dest branch, but it'd be nice to catch it
803+ * sooner than later.
804+ */
805+retry:
806+ tmp_dentry = NULL;
807+ do {
808+ char *suffix = name + nlen - countersize;
809+
810+ dput(tmp_dentry);
811+ counter++;
812+ sprintf(suffix, "%*.*x", countersize, countersize, counter);
813+
814+ pr_debug("unionfs: trying to rename %s to %s\n",
815+ dentry->d_name.name, name);
816+
4ae1df7a 817+ tmp_dentry = lookup_lck_len(name, lower_dentry->d_parent,
2380c486
JR
818+ nlen);
819+ if (IS_ERR(tmp_dentry)) {
820+ err = PTR_ERR(tmp_dentry);
821+ goto out;
822+ }
823+ } while (tmp_dentry->d_inode != NULL); /* need negative dentry */
824+ dput(tmp_dentry);
825+
826+ err = copyup_named_file(parent->d_inode, file, name, bstart, bindex,
827+ i_size_read(file->f_path.dentry->d_inode));
828+ if (err) {
829+ if (unlikely(err == -EEXIST))
830+ goto retry;
831+ goto out;
832+ }
833+
834+ /* bring it to the same state as an unlinked file */
835+ lower_dentry = unionfs_lower_dentry_idx(dentry, dbstart(dentry));
836+ if (!unionfs_lower_inode_idx(dentry->d_inode, bindex)) {
837+ atomic_inc(&lower_dentry->d_inode->i_count);
838+ unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
839+ lower_dentry->d_inode);
840+ }
841+ lower_dir_dentry = lock_parent(lower_dentry);
842+ err = vfs_unlink(lower_dir_dentry->d_inode, lower_dentry);
843+ unlock_dir(lower_dir_dentry);
844+
845+out:
846+ if (!err)
847+ unionfs_check_dentry(dentry);
848+ return err;
849+}
850+
851+/*
852+ * put all references held by upper struct file and free lower file pointer
853+ * array
854+ */
855+static void cleanup_file(struct file *file)
856+{
857+ int bindex, bstart, bend;
858+ struct file **lower_files;
859+ struct file *lower_file;
860+ struct super_block *sb = file->f_path.dentry->d_sb;
861+
862+ lower_files = UNIONFS_F(file)->lower_files;
863+ bstart = fbstart(file);
864+ bend = fbend(file);
865+
866+ for (bindex = bstart; bindex <= bend; bindex++) {
867+ int i; /* holds (possibly) updated branch index */
868+ int old_bid;
869+
870+ lower_file = unionfs_lower_file_idx(file, bindex);
871+ if (!lower_file)
872+ continue;
873+
874+ /*
875+ * Find new index of matching branch with an open
876+ * file, since branches could have been added or
877+ * deleted causing the one with open files to shift.
878+ */
879+ old_bid = UNIONFS_F(file)->saved_branch_ids[bindex];
880+ i = branch_id_to_idx(sb, old_bid);
881+ if (unlikely(i < 0)) {
882+ printk(KERN_ERR "unionfs: no superblock for "
883+ "file %p\n", file);
884+ continue;
885+ }
886+
887+ /* decrement count of open files */
888+ branchput(sb, i);
889+ /*
890+ * fput will perform an mntput for us on the correct branch.
891+ * Although we're using the file's old branch configuration,
892+ * bindex, which is the old index, correctly points to the
893+ * right branch in the file's branch list. In other words,
894+ * we're going to mntput the correct branch even if branches
895+ * have been added/removed.
896+ */
897+ fput(lower_file);
898+ UNIONFS_F(file)->lower_files[bindex] = NULL;
899+ UNIONFS_F(file)->saved_branch_ids[bindex] = -1;
900+ }
901+
902+ UNIONFS_F(file)->lower_files = NULL;
903+ kfree(lower_files);
904+ kfree(UNIONFS_F(file)->saved_branch_ids);
905+ /* set to NULL because caller needs to know if to kfree on error */
906+ UNIONFS_F(file)->saved_branch_ids = NULL;
907+}
908+
909+/* open all lower files for a given file */
910+static int open_all_files(struct file *file)
911+{
912+ int bindex, bstart, bend, err = 0;
913+ struct file *lower_file;
914+ struct dentry *lower_dentry;
915+ struct dentry *dentry = file->f_path.dentry;
916+ struct super_block *sb = dentry->d_sb;
917+
918+ bstart = dbstart(dentry);
919+ bend = dbend(dentry);
920+
921+ for (bindex = bstart; bindex <= bend; bindex++) {
922+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
923+ if (!lower_dentry)
924+ continue;
925+
926+ dget(lower_dentry);
927+ unionfs_mntget(dentry, bindex);
928+ branchget(sb, bindex);
929+
930+ lower_file =
931+ dentry_open(lower_dentry,
932+ unionfs_lower_mnt_idx(dentry, bindex),
933+ file->f_flags, current_cred());
934+ if (IS_ERR(lower_file)) {
935+ branchput(sb, bindex);
936+ err = PTR_ERR(lower_file);
937+ goto out;
938+ } else {
939+ unionfs_set_lower_file_idx(file, bindex, lower_file);
940+ }
941+ }
942+out:
943+ return err;
944+}
945+
946+/* open the highest priority file for a given upper file */
947+static int open_highest_file(struct file *file, bool willwrite)
948+{
949+ int bindex, bstart, bend, err = 0;
950+ struct file *lower_file;
951+ struct dentry *lower_dentry;
952+ struct dentry *dentry = file->f_path.dentry;
953+ struct dentry *parent = dget_parent(dentry);
954+ struct inode *parent_inode = parent->d_inode;
955+ struct super_block *sb = dentry->d_sb;
956+
957+ bstart = dbstart(dentry);
958+ bend = dbend(dentry);
959+
960+ lower_dentry = unionfs_lower_dentry(dentry);
961+ if (willwrite && IS_WRITE_FLAG(file->f_flags) && is_robranch(dentry)) {
962+ for (bindex = bstart - 1; bindex >= 0; bindex--) {
963+ err = copyup_file(parent_inode, file, bstart, bindex,
964+ i_size_read(dentry->d_inode));
965+ if (!err)
966+ break;
967+ }
968+ atomic_set(&UNIONFS_F(file)->generation,
969+ atomic_read(&UNIONFS_I(dentry->d_inode)->
970+ generation));
971+ goto out;
972+ }
973+
974+ dget(lower_dentry);
975+ unionfs_mntget(dentry, bstart);
976+ lower_file = dentry_open(lower_dentry,
977+ unionfs_lower_mnt_idx(dentry, bstart),
978+ file->f_flags, current_cred());
979+ if (IS_ERR(lower_file)) {
980+ err = PTR_ERR(lower_file);
981+ goto out;
982+ }
983+ branchget(sb, bstart);
984+ unionfs_set_lower_file(file, lower_file);
985+ /* Fix up the position. */
986+ lower_file->f_pos = file->f_pos;
987+
988+ memcpy(&lower_file->f_ra, &file->f_ra, sizeof(struct file_ra_state));
989+out:
990+ dput(parent);
991+ return err;
992+}
993+
994+/* perform a delayed copyup of a read-write file on a read-only branch */
995+static int do_delayed_copyup(struct file *file, struct dentry *parent)
996+{
997+ int bindex, bstart, bend, err = 0;
998+ struct dentry *dentry = file->f_path.dentry;
999+ struct inode *parent_inode = parent->d_inode;
1000+
1001+ bstart = fbstart(file);
1002+ bend = fbend(file);
1003+
1004+ BUG_ON(!S_ISREG(dentry->d_inode->i_mode));
1005+
1006+ unionfs_check_file(file);
1007+ for (bindex = bstart - 1; bindex >= 0; bindex--) {
1008+ if (!d_deleted(dentry))
1009+ err = copyup_file(parent_inode, file, bstart,
1010+ bindex,
1011+ i_size_read(dentry->d_inode));
1012+ else
1013+ err = copyup_deleted_file(file, dentry, parent,
1014+ bstart, bindex);
1015+ /* if succeeded, set lower open-file flags and break */
1016+ if (!err) {
1017+ struct file *lower_file;
1018+ lower_file = unionfs_lower_file_idx(file, bindex);
1019+ lower_file->f_flags = file->f_flags;
1020+ break;
1021+ }
1022+ }
1023+ if (err || (bstart <= fbstart(file)))
1024+ goto out;
1025+ bend = fbend(file);
1026+ for (bindex = bstart; bindex <= bend; bindex++) {
1027+ if (unionfs_lower_file_idx(file, bindex)) {
1028+ branchput(dentry->d_sb, bindex);
1029+ fput(unionfs_lower_file_idx(file, bindex));
1030+ unionfs_set_lower_file_idx(file, bindex, NULL);
1031+ }
1032+ }
1033+ path_put_lowers(dentry, bstart, bend, false);
1034+ iput_lowers(dentry->d_inode, bstart, bend, false);
1035+ /* for reg file, we only open it "once" */
1036+ fbend(file) = fbstart(file);
1037+ dbend(dentry) = dbstart(dentry);
1038+ ibend(dentry->d_inode) = ibstart(dentry->d_inode);
1039+
1040+out:
1041+ unionfs_check_file(file);
1042+ return err;
1043+}
1044+
1045+/*
1046+ * Helper function for unionfs_file_revalidate/locked.
1047+ * Expects dentry/parent to be locked already, and revalidated.
1048+ */
1049+static int __unionfs_file_revalidate(struct file *file, struct dentry *dentry,
1050+ struct dentry *parent,
1051+ struct super_block *sb, int sbgen,
1052+ int dgen, bool willwrite)
1053+{
1054+ int fgen;
1055+ int bstart, bend, orig_brid;
1056+ int size;
1057+ int err = 0;
1058+
1059+ fgen = atomic_read(&UNIONFS_F(file)->generation);
1060+
1061+ /*
1062+ * There are two cases we are interested in. The first is if the
1063+ * generation is lower than the super-block. The second is if
1064+ * someone has copied up this file from underneath us, we also need
1065+ * to refresh things.
1066+ */
1067+ if (d_deleted(dentry) ||
1068+ (sbgen <= fgen &&
1069+ dbstart(dentry) == fbstart(file) &&
1070+ unionfs_lower_file(file)))
1071+ goto out_may_copyup;
1072+
1073+ /* save orig branch ID */
1074+ orig_brid = UNIONFS_F(file)->saved_branch_ids[fbstart(file)];
1075+
1076+ /* First we throw out the existing files. */
1077+ cleanup_file(file);
1078+
1079+ /* Now we reopen the file(s) as in unionfs_open. */
1080+ bstart = fbstart(file) = dbstart(dentry);
1081+ bend = fbend(file) = dbend(dentry);
1082+
1083+ size = sizeof(struct file *) * sbmax(sb);
1084+ UNIONFS_F(file)->lower_files = kzalloc(size, GFP_KERNEL);
1085+ if (unlikely(!UNIONFS_F(file)->lower_files)) {
1086+ err = -ENOMEM;
1087+ goto out;
1088+ }
1089+ size = sizeof(int) * sbmax(sb);
1090+ UNIONFS_F(file)->saved_branch_ids = kzalloc(size, GFP_KERNEL);
1091+ if (unlikely(!UNIONFS_F(file)->saved_branch_ids)) {
1092+ err = -ENOMEM;
1093+ goto out;
1094+ }
1095+
1096+ if (S_ISDIR(dentry->d_inode->i_mode)) {
1097+ /* We need to open all the files. */
1098+ err = open_all_files(file);
1099+ if (err)
1100+ goto out;
1101+ } else {
1102+ int new_brid;
1103+ /* We only open the highest priority branch. */
1104+ err = open_highest_file(file, willwrite);
1105+ if (err)
1106+ goto out;
1107+ new_brid = UNIONFS_F(file)->saved_branch_ids[fbstart(file)];
1108+ if (unlikely(new_brid != orig_brid && sbgen > fgen)) {
1109+ /*
1110+ * If we re-opened the file on a different branch
1111+ * than the original one, and this was due to a new
1112+ * branch inserted, then update the mnt counts of
1113+ * the old and new branches accordingly.
1114+ */
1115+ unionfs_mntget(dentry, bstart);
1116+ unionfs_mntput(sb->s_root,
1117+ branch_id_to_idx(sb, orig_brid));
1118+ }
1119+ /* regular files have only one open lower file */
1120+ fbend(file) = fbstart(file);
1121+ }
1122+ atomic_set(&UNIONFS_F(file)->generation,
1123+ atomic_read(&UNIONFS_I(dentry->d_inode)->generation));
1124+
1125+out_may_copyup:
1126+ /* Copyup on the first write to a file on a readonly branch. */
1127+ if (willwrite && IS_WRITE_FLAG(file->f_flags) &&
1128+ !IS_WRITE_FLAG(unionfs_lower_file(file)->f_flags) &&
1129+ is_robranch(dentry)) {
1130+ pr_debug("unionfs: do delay copyup of \"%s\"\n",
1131+ dentry->d_name.name);
1132+ err = do_delayed_copyup(file, parent);
1133+ /* regular files have only one open lower file */
1134+ if (!err && !S_ISDIR(dentry->d_inode->i_mode))
1135+ fbend(file) = fbstart(file);
1136+ }
1137+
1138+out:
1139+ if (err) {
1140+ kfree(UNIONFS_F(file)->lower_files);
1141+ kfree(UNIONFS_F(file)->saved_branch_ids);
1142+ }
1143+ return err;
1144+}
1145+
1146+/*
1147+ * Revalidate the struct file
1148+ * @file: file to revalidate
1149+ * @parent: parent dentry (locked by caller)
1150+ * @willwrite: true if caller may cause changes to the file; false otherwise.
1151+ * Caller must lock/unlock dentry's branch configuration.
1152+ */
1153+int unionfs_file_revalidate(struct file *file, struct dentry *parent,
1154+ bool willwrite)
1155+{
1156+ struct super_block *sb;
1157+ struct dentry *dentry;
1158+ int sbgen, dgen;
1159+ int err = 0;
1160+
1161+ dentry = file->f_path.dentry;
1162+ sb = dentry->d_sb;
1163+ verify_locked(dentry);
1164+ verify_locked(parent);
1165+
1166+ /*
1167+ * First revalidate the dentry inside struct file,
1168+ * but not unhashed dentries.
1169+ */
1170+ if (!d_deleted(dentry) &&
1171+ !__unionfs_d_revalidate(dentry, parent, willwrite)) {
1172+ err = -ESTALE;
1173+ goto out;
1174+ }
1175+
1176+ sbgen = atomic_read(&UNIONFS_SB(sb)->generation);
1177+ dgen = atomic_read(&UNIONFS_D(dentry)->generation);
1178+
1179+ if (unlikely(sbgen > dgen)) { /* XXX: should never happen */
1180+ pr_debug("unionfs: failed to revalidate dentry (%s)\n",
1181+ dentry->d_name.name);
1182+ err = -ESTALE;
1183+ goto out;
1184+ }
1185+
1186+ err = __unionfs_file_revalidate(file, dentry, parent, sb,
1187+ sbgen, dgen, willwrite);
1188+out:
1189+ return err;
1190+}
1191+
1192+/* unionfs_open helper function: open a directory */
1193+static int __open_dir(struct inode *inode, struct file *file)
1194+{
1195+ struct dentry *lower_dentry;
1196+ struct file *lower_file;
1197+ int bindex, bstart, bend;
1198+ struct vfsmount *mnt;
1199+
1200+ bstart = fbstart(file) = dbstart(file->f_path.dentry);
1201+ bend = fbend(file) = dbend(file->f_path.dentry);
1202+
1203+ for (bindex = bstart; bindex <= bend; bindex++) {
1204+ lower_dentry =
1205+ unionfs_lower_dentry_idx(file->f_path.dentry, bindex);
1206+ if (!lower_dentry)
1207+ continue;
1208+
1209+ dget(lower_dentry);
1210+ unionfs_mntget(file->f_path.dentry, bindex);
1211+ mnt = unionfs_lower_mnt_idx(file->f_path.dentry, bindex);
1212+ lower_file = dentry_open(lower_dentry, mnt, file->f_flags,
1213+ current_cred());
1214+ if (IS_ERR(lower_file))
1215+ return PTR_ERR(lower_file);
1216+
1217+ unionfs_set_lower_file_idx(file, bindex, lower_file);
1218+
1219+ /*
1220+ * The branchget goes after the open, because otherwise
1221+ * we would miss the reference on release.
1222+ */
1223+ branchget(inode->i_sb, bindex);
1224+ }
1225+
1226+ return 0;
1227+}
1228+
1229+/* unionfs_open helper function: open a file */
1230+static int __open_file(struct inode *inode, struct file *file,
1231+ struct dentry *parent)
1232+{
1233+ struct dentry *lower_dentry;
1234+ struct file *lower_file;
1235+ int lower_flags;
1236+ int bindex, bstart, bend;
1237+
1238+ lower_dentry = unionfs_lower_dentry(file->f_path.dentry);
1239+ lower_flags = file->f_flags;
1240+
1241+ bstart = fbstart(file) = dbstart(file->f_path.dentry);
1242+ bend = fbend(file) = dbend(file->f_path.dentry);
1243+
1244+ /*
1245+ * check for the permission for lower file. If the error is
1246+ * COPYUP_ERR, copyup the file.
1247+ */
1248+ if (lower_dentry->d_inode && is_robranch(file->f_path.dentry)) {
1249+ /*
1250+ * if the open will change the file, copy it up otherwise
1251+ * defer it.
1252+ */
1253+ if (lower_flags & O_TRUNC) {
1254+ int size = 0;
1255+ int err = -EROFS;
1256+
1257+ /* copyup the file */
1258+ for (bindex = bstart - 1; bindex >= 0; bindex--) {
1259+ err = copyup_file(parent->d_inode, file,
1260+ bstart, bindex, size);
1261+ if (!err)
1262+ break;
1263+ }
1264+ return err;
1265+ } else {
1266+ /*
1267+ * turn off writeable flags, to force delayed copyup
1268+ * by caller.
1269+ */
1270+ lower_flags &= ~(OPEN_WRITE_FLAGS);
1271+ }
1272+ }
1273+
1274+ dget(lower_dentry);
1275+
1276+ /*
1277+ * dentry_open will decrement mnt refcnt if err.
1278+ * otherwise fput() will do an mntput() for us upon file close.
1279+ */
1280+ unionfs_mntget(file->f_path.dentry, bstart);
1281+ lower_file =
1282+ dentry_open(lower_dentry,
1283+ unionfs_lower_mnt_idx(file->f_path.dentry, bstart),
1284+ lower_flags, current_cred());
1285+ if (IS_ERR(lower_file))
1286+ return PTR_ERR(lower_file);
1287+
1288+ unionfs_set_lower_file(file, lower_file);
1289+ branchget(inode->i_sb, bstart);
1290+
1291+ return 0;
1292+}
1293+
1294+int unionfs_open(struct inode *inode, struct file *file)
1295+{
1296+ int err = 0;
1297+ struct file *lower_file = NULL;
1298+ struct dentry *dentry = file->f_path.dentry;
1299+ struct dentry *parent;
1300+ int bindex = 0, bstart = 0, bend = 0;
1301+ int size;
1302+ int valid = 0;
1303+
1304+ unionfs_read_lock(inode->i_sb, UNIONFS_SMUTEX_PARENT);
1305+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1306+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1307+
1308+ /* don't open unhashed/deleted files */
1309+ if (d_deleted(dentry)) {
1310+ err = -ENOENT;
1311+ goto out_nofree;
1312+ }
1313+
1314+ /* XXX: should I change 'false' below to the 'willwrite' flag? */
1315+ valid = __unionfs_d_revalidate(dentry, parent, false);
1316+ if (unlikely(!valid)) {
1317+ err = -ESTALE;
1318+ goto out_nofree;
1319+ }
1320+
1321+ file->private_data =
1322+ kzalloc(sizeof(struct unionfs_file_info), GFP_KERNEL);
1323+ if (unlikely(!UNIONFS_F(file))) {
1324+ err = -ENOMEM;
1325+ goto out_nofree;
1326+ }
1327+ fbstart(file) = -1;
1328+ fbend(file) = -1;
1329+ atomic_set(&UNIONFS_F(file)->generation,
1330+ atomic_read(&UNIONFS_I(inode)->generation));
1331+
1332+ size = sizeof(struct file *) * sbmax(inode->i_sb);
1333+ UNIONFS_F(file)->lower_files = kzalloc(size, GFP_KERNEL);
1334+ if (unlikely(!UNIONFS_F(file)->lower_files)) {
1335+ err = -ENOMEM;
1336+ goto out;
1337+ }
1338+ size = sizeof(int) * sbmax(inode->i_sb);
1339+ UNIONFS_F(file)->saved_branch_ids = kzalloc(size, GFP_KERNEL);
1340+ if (unlikely(!UNIONFS_F(file)->saved_branch_ids)) {
1341+ err = -ENOMEM;
1342+ goto out;
1343+ }
1344+
1345+ bstart = fbstart(file) = dbstart(dentry);
1346+ bend = fbend(file) = dbend(dentry);
1347+
1348+ /*
1349+ * open all directories and make the unionfs file struct point to
1350+ * these lower file structs
1351+ */
1352+ if (S_ISDIR(inode->i_mode))
1353+ err = __open_dir(inode, file); /* open a dir */
1354+ else
1355+ err = __open_file(inode, file, parent); /* open a file */
1356+
1357+ /* freeing the allocated resources, and fput the opened files */
1358+ if (err) {
1359+ for (bindex = bstart; bindex <= bend; bindex++) {
1360+ lower_file = unionfs_lower_file_idx(file, bindex);
1361+ if (!lower_file)
1362+ continue;
1363+
1364+ branchput(dentry->d_sb, bindex);
1365+ /* fput calls dput for lower_dentry */
1366+ fput(lower_file);
1367+ }
1368+ }
1369+
1370+out:
1371+ if (err) {
1372+ kfree(UNIONFS_F(file)->lower_files);
1373+ kfree(UNIONFS_F(file)->saved_branch_ids);
1374+ kfree(UNIONFS_F(file));
1375+ }
1376+out_nofree:
1377+ if (!err) {
1378+ unionfs_postcopyup_setmnt(dentry);
1379+ unionfs_copy_attr_times(inode);
1380+ unionfs_check_file(file);
1381+ unionfs_check_inode(inode);
1382+ }
1383+ unionfs_unlock_dentry(dentry);
1384+ unionfs_unlock_parent(dentry, parent);
1385+ unionfs_read_unlock(inode->i_sb);
1386+ return err;
1387+}
1388+
1389+/*
1390+ * release all lower object references & free the file info structure
1391+ *
1392+ * No need to grab sb info's rwsem.
1393+ */
1394+int unionfs_file_release(struct inode *inode, struct file *file)
1395+{
1396+ struct file *lower_file = NULL;
1397+ struct unionfs_file_info *fileinfo;
1398+ struct unionfs_inode_info *inodeinfo;
1399+ struct super_block *sb = inode->i_sb;
1400+ struct dentry *dentry = file->f_path.dentry;
1401+ struct dentry *parent;
1402+ int bindex, bstart, bend;
1403+ int fgen, err = 0;
1404+
4ae1df7a
JR
1405+ /*
1406+ * Since mm/memory.c:might_fault() (under PROVE_LOCKING) was
1407+ * modified in 2.6.29-rc1 to call might_lock_read on mmap_sem, this
1408+ * has been causing false positives in file system stacking layers.
1409+ * In particular, our ->mmap is called after sys_mmap2 already holds
1410+ * mmap_sem, then we lock our own mutexes; but earlier, it's
1411+ * possible for lockdep to have locked our mutexes first, and then
1412+ * we call a lower ->readdir which could call might_fault. The
1413+ * different ordering of the locks is what lockdep complains about
1414+ * -- unnecessarily. Therefore, we have no choice but to tell
1415+ * lockdep to temporarily turn off lockdep here. Note: the comments
1416+ * inside might_sleep also suggest that it would have been
1417+ * nicer to only annotate paths that needs that might_lock_read.
1418+ */
1419+ lockdep_off();
2380c486
JR
1420+ unionfs_read_lock(sb, UNIONFS_SMUTEX_PARENT);
1421+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1422+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1423+
1424+ /*
1425+ * We try to revalidate, but the VFS ignores return return values
1426+ * from file->release, so we must always try to succeed here,
1427+ * including to do the kfree and dput below. So if revalidation
1428+ * failed, all we can do is print some message and keep going.
1429+ */
1430+ err = unionfs_file_revalidate(file, parent,
1431+ UNIONFS_F(file)->wrote_to_file);
1432+ if (!err)
1433+ unionfs_check_file(file);
1434+ fileinfo = UNIONFS_F(file);
1435+ BUG_ON(file->f_path.dentry->d_inode != inode);
1436+ inodeinfo = UNIONFS_I(inode);
1437+
1438+ /* fput all the lower files */
1439+ fgen = atomic_read(&fileinfo->generation);
1440+ bstart = fbstart(file);
1441+ bend = fbend(file);
1442+
1443+ for (bindex = bstart; bindex <= bend; bindex++) {
1444+ lower_file = unionfs_lower_file_idx(file, bindex);
1445+
1446+ if (lower_file) {
1447+ unionfs_set_lower_file_idx(file, bindex, NULL);
1448+ fput(lower_file);
1449+ branchput(sb, bindex);
1450+ }
1451+
1452+ /* if there are no more refs to the dentry, dput it */
1453+ if (d_deleted(dentry)) {
1454+ dput(unionfs_lower_dentry_idx(dentry, bindex));
1455+ unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
1456+ }
1457+ }
1458+
1459+ kfree(fileinfo->lower_files);
1460+ kfree(fileinfo->saved_branch_ids);
1461+
1462+ if (fileinfo->rdstate) {
1463+ fileinfo->rdstate->access = jiffies;
1464+ spin_lock(&inodeinfo->rdlock);
1465+ inodeinfo->rdcount++;
1466+ list_add_tail(&fileinfo->rdstate->cache,
1467+ &inodeinfo->readdircache);
1468+ mark_inode_dirty(inode);
1469+ spin_unlock(&inodeinfo->rdlock);
1470+ fileinfo->rdstate = NULL;
1471+ }
1472+ kfree(fileinfo);
1473+
1474+ unionfs_unlock_dentry(dentry);
1475+ unionfs_unlock_parent(dentry, parent);
1476+ unionfs_read_unlock(sb);
4ae1df7a 1477+ lockdep_on();
2380c486
JR
1478+ return err;
1479+}
1480+
1481+/* pass the ioctl to the lower fs */
1482+static long do_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
1483+{
1484+ struct file *lower_file;
1485+ int err;
1486+
1487+ lower_file = unionfs_lower_file(file);
1488+
1489+ err = -ENOTTY;
1490+ if (!lower_file || !lower_file->f_op)
1491+ goto out;
1492+ if (lower_file->f_op->unlocked_ioctl) {
1493+ err = lower_file->f_op->unlocked_ioctl(lower_file, cmd, arg);
0c5527e5 1494+#ifdef CONFIG_COMPAT
82260373 1495+ } else if (lower_file->f_op->ioctl) {
0c5527e5 1496+ err = lower_file->f_op->compat_ioctl(
82260373 1497+ lower_file->f_path.dentry->d_inode,
2380c486 1498+ lower_file, cmd, arg);
0c5527e5 1499+#endif
2380c486
JR
1500+ }
1501+
1502+out:
1503+ return err;
1504+}
1505+
1506+/*
1507+ * return to user-space the branch indices containing the file in question
1508+ *
1509+ * We use fd_set and therefore we are limited to the number of the branches
1510+ * to FD_SETSIZE, which is currently 1024 - plenty for most people
1511+ */
1512+static int unionfs_ioctl_queryfile(struct file *file, struct dentry *parent,
1513+ unsigned int cmd, unsigned long arg)
1514+{
1515+ int err = 0;
1516+ fd_set branchlist;
1517+ int bstart = 0, bend = 0, bindex = 0;
1518+ int orig_bstart, orig_bend;
1519+ struct dentry *dentry, *lower_dentry;
1520+ struct vfsmount *mnt;
1521+
1522+ dentry = file->f_path.dentry;
1523+ orig_bstart = dbstart(dentry);
1524+ orig_bend = dbend(dentry);
1525+ err = unionfs_partial_lookup(dentry, parent);
1526+ if (err)
1527+ goto out;
1528+ bstart = dbstart(dentry);
1529+ bend = dbend(dentry);
1530+
1531+ FD_ZERO(&branchlist);
1532+
1533+ for (bindex = bstart; bindex <= bend; bindex++) {
1534+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
1535+ if (!lower_dentry)
1536+ continue;
1537+ if (likely(lower_dentry->d_inode))
1538+ FD_SET(bindex, &branchlist);
1539+ /* purge any lower objects after partial_lookup */
1540+ if (bindex < orig_bstart || bindex > orig_bend) {
1541+ dput(lower_dentry);
1542+ unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
1543+ iput(unionfs_lower_inode_idx(dentry->d_inode, bindex));
1544+ unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
1545+ NULL);
1546+ mnt = unionfs_lower_mnt_idx(dentry, bindex);
1547+ if (!mnt)
1548+ continue;
1549+ unionfs_mntput(dentry, bindex);
1550+ unionfs_set_lower_mnt_idx(dentry, bindex, NULL);
1551+ }
1552+ }
1553+ /* restore original dentry's offsets */
1554+ dbstart(dentry) = orig_bstart;
1555+ dbend(dentry) = orig_bend;
1556+ ibstart(dentry->d_inode) = orig_bstart;
1557+ ibend(dentry->d_inode) = orig_bend;
1558+
1559+ err = copy_to_user((void __user *)arg, &branchlist, sizeof(fd_set));
1560+ if (unlikely(err))
1561+ err = -EFAULT;
1562+
1563+out:
1564+ return err < 0 ? err : bend;
1565+}
1566+
1567+long unionfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
1568+{
1569+ long err;
1570+ struct dentry *dentry = file->f_path.dentry;
1571+ struct dentry *parent;
1572+
1573+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
1574+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1575+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1576+
1577+ err = unionfs_file_revalidate(file, parent, true);
1578+ if (unlikely(err))
1579+ goto out;
1580+
1581+ /* check if asked for local commands */
1582+ switch (cmd) {
1583+ case UNIONFS_IOCTL_INCGEN:
1584+ /* Increment the superblock generation count */
1585+ pr_info("unionfs: incgen ioctl deprecated; "
1586+ "use \"-o remount,incgen\"\n");
1587+ err = -ENOSYS;
1588+ break;
1589+
1590+ case UNIONFS_IOCTL_QUERYFILE:
1591+ /* Return list of branches containing the given file */
1592+ err = unionfs_ioctl_queryfile(file, parent, cmd, arg);
1593+ break;
1594+
1595+ default:
1596+ /* pass the ioctl down */
1597+ err = do_ioctl(file, cmd, arg);
1598+ break;
1599+ }
1600+
1601+out:
1602+ unionfs_check_file(file);
1603+ unionfs_unlock_dentry(dentry);
1604+ unionfs_unlock_parent(dentry, parent);
1605+ unionfs_read_unlock(dentry->d_sb);
1606+ return err;
1607+}
1608+
1609+int unionfs_flush(struct file *file, fl_owner_t id)
1610+{
1611+ int err = 0;
1612+ struct file *lower_file = NULL;
1613+ struct dentry *dentry = file->f_path.dentry;
1614+ struct dentry *parent;
1615+ int bindex, bstart, bend;
1616+
1617+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
1618+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1619+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1620+
1621+ err = unionfs_file_revalidate(file, parent,
1622+ UNIONFS_F(file)->wrote_to_file);
1623+ if (unlikely(err))
1624+ goto out;
1625+ unionfs_check_file(file);
1626+
1627+ bstart = fbstart(file);
1628+ bend = fbend(file);
1629+ for (bindex = bstart; bindex <= bend; bindex++) {
1630+ lower_file = unionfs_lower_file_idx(file, bindex);
1631+
1632+ if (lower_file && lower_file->f_op &&
1633+ lower_file->f_op->flush) {
1634+ err = lower_file->f_op->flush(lower_file, id);
1635+ if (err)
1636+ goto out;
1637+ }
1638+
1639+ }
1640+
1641+out:
1642+ if (!err)
1643+ unionfs_check_file(file);
1644+ unionfs_unlock_dentry(dentry);
1645+ unionfs_unlock_parent(dentry, parent);
1646+ unionfs_read_unlock(dentry->d_sb);
1647+ return err;
1648+}
0c5527e5
AM
1649diff --git a/fs/unionfs/copyup.c b/fs/unionfs/copyup.c
1650new file mode 100644
1651index 0000000..bba3a75
1652--- /dev/null
1653+++ b/fs/unionfs/copyup.c
1654@@ -0,0 +1,896 @@
2380c486 1655+/*
7670a7fc 1656+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
1657+ * Copyright (c) 2003-2006 Charles P. Wright
1658+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
1659+ * Copyright (c) 2005-2006 Junjiro Okajima
1660+ * Copyright (c) 2005 Arun M. Krishnakumar
1661+ * Copyright (c) 2004-2006 David P. Quigley
1662+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
1663+ * Copyright (c) 2003 Puja Gupta
1664+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
1665+ * Copyright (c) 2003-2010 Stony Brook University
1666+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
1667+ *
1668+ * This program is free software; you can redistribute it and/or modify
1669+ * it under the terms of the GNU General Public License version 2 as
1670+ * published by the Free Software Foundation.
1671+ */
1672+
1673+#include "union.h"
1674+
1675+/*
1676+ * For detailed explanation of copyup see:
1677+ * Documentation/filesystems/unionfs/concepts.txt
1678+ */
1679+
1680+#ifdef CONFIG_UNION_FS_XATTR
1681+/* copyup all extended attrs for a given dentry */
1682+static int copyup_xattrs(struct dentry *old_lower_dentry,
1683+ struct dentry *new_lower_dentry)
1684+{
1685+ int err = 0;
1686+ ssize_t list_size = -1;
1687+ char *name_list = NULL;
1688+ char *attr_value = NULL;
1689+ char *name_list_buf = NULL;
1690+
1691+ /* query the actual size of the xattr list */
1692+ list_size = vfs_listxattr(old_lower_dentry, NULL, 0);
1693+ if (list_size <= 0) {
1694+ err = list_size;
1695+ goto out;
1696+ }
1697+
1698+ /* allocate space for the actual list */
1699+ name_list = unionfs_xattr_alloc(list_size + 1, XATTR_LIST_MAX);
1700+ if (unlikely(!name_list || IS_ERR(name_list))) {
1701+ err = PTR_ERR(name_list);
1702+ goto out;
1703+ }
1704+
1705+ name_list_buf = name_list; /* save for kfree at end */
1706+
1707+ /* now get the actual xattr list of the source file */
1708+ list_size = vfs_listxattr(old_lower_dentry, name_list, list_size);
1709+ if (list_size <= 0) {
1710+ err = list_size;
1711+ goto out;
1712+ }
1713+
1714+ /* allocate space to hold each xattr's value */
1715+ attr_value = unionfs_xattr_alloc(XATTR_SIZE_MAX, XATTR_SIZE_MAX);
1716+ if (unlikely(!attr_value || IS_ERR(attr_value))) {
1717+ err = PTR_ERR(name_list);
1718+ goto out;
1719+ }
1720+
1721+ /* in a loop, get and set each xattr from src to dst file */
1722+ while (*name_list) {
1723+ ssize_t size;
1724+
1725+ /* Lock here since vfs_getxattr doesn't lock for us */
1726+ mutex_lock(&old_lower_dentry->d_inode->i_mutex);
1727+ size = vfs_getxattr(old_lower_dentry, name_list,
1728+ attr_value, XATTR_SIZE_MAX);
1729+ mutex_unlock(&old_lower_dentry->d_inode->i_mutex);
1730+ if (size < 0) {
1731+ err = size;
1732+ goto out;
1733+ }
1734+ if (size > XATTR_SIZE_MAX) {
1735+ err = -E2BIG;
1736+ goto out;
1737+ }
1738+ /* Don't lock here since vfs_setxattr does it for us. */
1739+ err = vfs_setxattr(new_lower_dentry, name_list, attr_value,
1740+ size, 0);
1741+ /*
1742+ * Selinux depends on "security.*" xattrs, so to maintain
1743+ * the security of copied-up files, if Selinux is active,
1744+ * then we must copy these xattrs as well. So we need to
1745+ * temporarily get FOWNER privileges.
1746+ * XXX: move entire copyup code to SIOQ.
1747+ */
1748+ if (err == -EPERM && !capable(CAP_FOWNER)) {
1749+ const struct cred *old_creds;
1750+ struct cred *new_creds;
1751+
1752+ new_creds = prepare_creds();
1753+ if (unlikely(!new_creds)) {
1754+ err = -ENOMEM;
1755+ goto out;
1756+ }
1757+ cap_raise(new_creds->cap_effective, CAP_FOWNER);
1758+ old_creds = override_creds(new_creds);
1759+ err = vfs_setxattr(new_lower_dentry, name_list,
1760+ attr_value, size, 0);
1761+ revert_creds(old_creds);
1762+ }
1763+ if (err < 0)
1764+ goto out;
1765+ name_list += strlen(name_list) + 1;
1766+ }
1767+out:
1768+ unionfs_xattr_kfree(name_list_buf);
1769+ unionfs_xattr_kfree(attr_value);
1770+ /* Ignore if xattr isn't supported */
1771+ if (err == -ENOTSUPP || err == -EOPNOTSUPP)
1772+ err = 0;
1773+ return err;
1774+}
1775+#endif /* CONFIG_UNION_FS_XATTR */
1776+
1777+/*
1778+ * Determine the mode based on the copyup flags, and the existing dentry.
1779+ *
1780+ * Handle file systems which may not support certain options. For example
1781+ * jffs2 doesn't allow one to chmod a symlink. So we ignore such harmless
1782+ * errors, rather than propagating them up, which results in copyup errors
1783+ * and errors returned back to users.
1784+ */
1785+static int copyup_permissions(struct super_block *sb,
1786+ struct dentry *old_lower_dentry,
1787+ struct dentry *new_lower_dentry)
1788+{
1789+ struct inode *i = old_lower_dentry->d_inode;
1790+ struct iattr newattrs;
1791+ int err;
1792+
1793+ newattrs.ia_atime = i->i_atime;
1794+ newattrs.ia_mtime = i->i_mtime;
1795+ newattrs.ia_ctime = i->i_ctime;
1796+ newattrs.ia_gid = i->i_gid;
1797+ newattrs.ia_uid = i->i_uid;
1798+ newattrs.ia_valid = ATTR_CTIME | ATTR_ATIME | ATTR_MTIME |
1799+ ATTR_ATIME_SET | ATTR_MTIME_SET | ATTR_FORCE |
1800+ ATTR_GID | ATTR_UID;
1801+ mutex_lock(&new_lower_dentry->d_inode->i_mutex);
1802+ err = notify_change(new_lower_dentry, &newattrs);
1803+ if (err)
1804+ goto out;
1805+
1806+ /* now try to change the mode and ignore EOPNOTSUPP on symlinks */
1807+ newattrs.ia_mode = i->i_mode;
1808+ newattrs.ia_valid = ATTR_MODE | ATTR_FORCE;
1809+ err = notify_change(new_lower_dentry, &newattrs);
1810+ if (err == -EOPNOTSUPP &&
1811+ S_ISLNK(new_lower_dentry->d_inode->i_mode)) {
1812+ printk(KERN_WARNING
1813+ "unionfs: changing \"%s\" symlink mode unsupported\n",
1814+ new_lower_dentry->d_name.name);
1815+ err = 0;
1816+ }
1817+
1818+out:
1819+ mutex_unlock(&new_lower_dentry->d_inode->i_mutex);
1820+ return err;
1821+}
1822+
1823+/*
1824+ * create the new device/file/directory - use copyup_permission to copyup
1825+ * times, and mode
1826+ *
1827+ * if the object being copied up is a regular file, the file is only created,
1828+ * the contents have to be copied up separately
1829+ */
1830+static int __copyup_ndentry(struct dentry *old_lower_dentry,
1831+ struct dentry *new_lower_dentry,
1832+ struct dentry *new_lower_parent_dentry,
1833+ char *symbuf)
1834+{
1835+ int err = 0;
1836+ umode_t old_mode = old_lower_dentry->d_inode->i_mode;
1837+ struct sioq_args args;
1838+
1839+ if (S_ISDIR(old_mode)) {
1840+ args.mkdir.parent = new_lower_parent_dentry->d_inode;
1841+ args.mkdir.dentry = new_lower_dentry;
1842+ args.mkdir.mode = old_mode;
1843+
1844+ run_sioq(__unionfs_mkdir, &args);
1845+ err = args.err;
1846+ } else if (S_ISLNK(old_mode)) {
1847+ args.symlink.parent = new_lower_parent_dentry->d_inode;
1848+ args.symlink.dentry = new_lower_dentry;
1849+ args.symlink.symbuf = symbuf;
1850+
1851+ run_sioq(__unionfs_symlink, &args);
1852+ err = args.err;
1853+ } else if (S_ISBLK(old_mode) || S_ISCHR(old_mode) ||
1854+ S_ISFIFO(old_mode) || S_ISSOCK(old_mode)) {
1855+ args.mknod.parent = new_lower_parent_dentry->d_inode;
1856+ args.mknod.dentry = new_lower_dentry;
1857+ args.mknod.mode = old_mode;
1858+ args.mknod.dev = old_lower_dentry->d_inode->i_rdev;
1859+
1860+ run_sioq(__unionfs_mknod, &args);
1861+ err = args.err;
1862+ } else if (S_ISREG(old_mode)) {
1863+ struct nameidata nd;
1864+ err = init_lower_nd(&nd, LOOKUP_CREATE);
1865+ if (unlikely(err < 0))
1866+ goto out;
1867+ args.create.nd = &nd;
1868+ args.create.parent = new_lower_parent_dentry->d_inode;
1869+ args.create.dentry = new_lower_dentry;
1870+ args.create.mode = old_mode;
1871+
1872+ run_sioq(__unionfs_create, &args);
1873+ err = args.err;
1874+ release_lower_nd(&nd, err);
1875+ } else {
1876+ printk(KERN_CRIT "unionfs: unknown inode type %d\n",
1877+ old_mode);
1878+ BUG();
1879+ }
1880+
1881+out:
1882+ return err;
1883+}
1884+
1885+static int __copyup_reg_data(struct dentry *dentry,
1886+ struct dentry *new_lower_dentry, int new_bindex,
1887+ struct dentry *old_lower_dentry, int old_bindex,
1888+ struct file **copyup_file, loff_t len)
1889+{
1890+ struct super_block *sb = dentry->d_sb;
1891+ struct file *input_file;
1892+ struct file *output_file;
1893+ struct vfsmount *output_mnt;
1894+ mm_segment_t old_fs;
1895+ char *buf = NULL;
1896+ ssize_t read_bytes, write_bytes;
1897+ loff_t size;
1898+ int err = 0;
1899+
1900+ /* open old file */
1901+ unionfs_mntget(dentry, old_bindex);
1902+ branchget(sb, old_bindex);
1903+ /* dentry_open calls dput and mntput if it returns an error */
1904+ input_file = dentry_open(old_lower_dentry,
1905+ unionfs_lower_mnt_idx(dentry, old_bindex),
1906+ O_RDONLY | O_LARGEFILE, current_cred());
1907+ if (IS_ERR(input_file)) {
1908+ dput(old_lower_dentry);
1909+ err = PTR_ERR(input_file);
1910+ goto out;
1911+ }
1912+ if (unlikely(!input_file->f_op || !input_file->f_op->read)) {
1913+ err = -EINVAL;
1914+ goto out_close_in;
1915+ }
1916+
1917+ /* open new file */
1918+ dget(new_lower_dentry);
1919+ output_mnt = unionfs_mntget(sb->s_root, new_bindex);
1920+ branchget(sb, new_bindex);
1921+ output_file = dentry_open(new_lower_dentry, output_mnt,
1922+ O_RDWR | O_LARGEFILE, current_cred());
1923+ if (IS_ERR(output_file)) {
1924+ err = PTR_ERR(output_file);
1925+ goto out_close_in2;
1926+ }
1927+ if (unlikely(!output_file->f_op || !output_file->f_op->write)) {
1928+ err = -EINVAL;
1929+ goto out_close_out;
1930+ }
1931+
1932+ /* allocating a buffer */
1933+ buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
1934+ if (unlikely(!buf)) {
1935+ err = -ENOMEM;
1936+ goto out_close_out;
1937+ }
1938+
1939+ input_file->f_pos = 0;
1940+ output_file->f_pos = 0;
1941+
1942+ old_fs = get_fs();
1943+ set_fs(KERNEL_DS);
1944+
1945+ size = len;
1946+ err = 0;
1947+ do {
1948+ if (len >= PAGE_SIZE)
1949+ size = PAGE_SIZE;
1950+ else if ((len < PAGE_SIZE) && (len > 0))
1951+ size = len;
1952+
1953+ len -= PAGE_SIZE;
1954+
1955+ read_bytes =
1956+ input_file->f_op->read(input_file,
1957+ (char __user *)buf, size,
1958+ &input_file->f_pos);
1959+ if (read_bytes <= 0) {
1960+ err = read_bytes;
1961+ break;
1962+ }
1963+
1964+ /* see Documentation/filesystems/unionfs/issues.txt */
1965+ lockdep_off();
1966+ write_bytes =
1967+ output_file->f_op->write(output_file,
1968+ (char __user *)buf,
1969+ read_bytes,
1970+ &output_file->f_pos);
1971+ lockdep_on();
1972+ if ((write_bytes < 0) || (write_bytes < read_bytes)) {
1973+ err = write_bytes;
1974+ break;
1975+ }
1976+ } while ((read_bytes > 0) && (len > 0));
1977+
1978+ set_fs(old_fs);
1979+
1980+ kfree(buf);
1981+
1982+ if (!err)
0c5527e5 1983+ err = output_file->f_op->fsync(output_file, 0);
2380c486
JR
1984+
1985+ if (err)
1986+ goto out_close_out;
1987+
1988+ if (copyup_file) {
1989+ *copyup_file = output_file;
1990+ goto out_close_in;
1991+ }
1992+
1993+out_close_out:
1994+ fput(output_file);
1995+
1996+out_close_in2:
1997+ branchput(sb, new_bindex);
1998+
1999+out_close_in:
2000+ fput(input_file);
2001+
2002+out:
2003+ branchput(sb, old_bindex);
2004+
2005+ return err;
2006+}
2007+
2008+/*
2009+ * dput the lower references for old and new dentry & clear a lower dentry
2010+ * pointer
2011+ */
2012+static void __clear(struct dentry *dentry, struct dentry *old_lower_dentry,
2013+ int old_bstart, int old_bend,
2014+ struct dentry *new_lower_dentry, int new_bindex)
2015+{
2016+ /* get rid of the lower dentry and all its traces */
2017+ unionfs_set_lower_dentry_idx(dentry, new_bindex, NULL);
2018+ dbstart(dentry) = old_bstart;
2019+ dbend(dentry) = old_bend;
2020+
2021+ dput(new_lower_dentry);
2022+ dput(old_lower_dentry);
2023+}
2024+
2025+/*
2026+ * Copy up a dentry to a file of specified name.
2027+ *
2028+ * @dir: used to pull the ->i_sb to access other branches
2029+ * @dentry: the non-negative dentry whose lower_inode we should copy
2030+ * @bstart: the branch of the lower_inode to copy from
2031+ * @new_bindex: the branch to create the new file in
2032+ * @name: the name of the file to create
2033+ * @namelen: length of @name
2034+ * @copyup_file: the "struct file" to return (optional)
2035+ * @len: how many bytes to copy-up?
2036+ */
2037+int copyup_dentry(struct inode *dir, struct dentry *dentry, int bstart,
2038+ int new_bindex, const char *name, int namelen,
2039+ struct file **copyup_file, loff_t len)
2040+{
2041+ struct dentry *new_lower_dentry;
2042+ struct dentry *old_lower_dentry = NULL;
2043+ struct super_block *sb;
2044+ int err = 0;
2045+ int old_bindex;
2046+ int old_bstart;
2047+ int old_bend;
2048+ struct dentry *new_lower_parent_dentry = NULL;
2049+ mm_segment_t oldfs;
2050+ char *symbuf = NULL;
2051+
2052+ verify_locked(dentry);
2053+
2054+ old_bindex = bstart;
2055+ old_bstart = dbstart(dentry);
2056+ old_bend = dbend(dentry);
2057+
2058+ BUG_ON(new_bindex < 0);
2059+ BUG_ON(new_bindex >= old_bindex);
2060+
2061+ sb = dir->i_sb;
2062+
2063+ err = is_robranch_super(sb, new_bindex);
2064+ if (err)
2065+ goto out;
2066+
2067+ /* Create the directory structure above this dentry. */
2068+ new_lower_dentry = create_parents(dir, dentry, name, new_bindex);
2069+ if (IS_ERR(new_lower_dentry)) {
2070+ err = PTR_ERR(new_lower_dentry);
2071+ goto out;
2072+ }
2073+
2074+ old_lower_dentry = unionfs_lower_dentry_idx(dentry, old_bindex);
2075+ /* we conditionally dput this old_lower_dentry at end of function */
2076+ dget(old_lower_dentry);
2077+
2078+ /* For symlinks, we must read the link before we lock the directory. */
2079+ if (S_ISLNK(old_lower_dentry->d_inode->i_mode)) {
2080+
2081+ symbuf = kmalloc(PATH_MAX, GFP_KERNEL);
2082+ if (unlikely(!symbuf)) {
2083+ __clear(dentry, old_lower_dentry,
2084+ old_bstart, old_bend,
2085+ new_lower_dentry, new_bindex);
2086+ err = -ENOMEM;
2087+ goto out_free;
2088+ }
2089+
2090+ oldfs = get_fs();
2091+ set_fs(KERNEL_DS);
2092+ err = old_lower_dentry->d_inode->i_op->readlink(
2093+ old_lower_dentry,
2094+ (char __user *)symbuf,
2095+ PATH_MAX);
2096+ set_fs(oldfs);
2097+ if (err < 0) {
2098+ __clear(dentry, old_lower_dentry,
2099+ old_bstart, old_bend,
2100+ new_lower_dentry, new_bindex);
2101+ goto out_free;
2102+ }
2103+ symbuf[err] = '\0';
2104+ }
2105+
2106+ /* Now we lock the parent, and create the object in the new branch. */
2107+ new_lower_parent_dentry = lock_parent(new_lower_dentry);
2108+
2109+ /* create the new inode */
2110+ err = __copyup_ndentry(old_lower_dentry, new_lower_dentry,
2111+ new_lower_parent_dentry, symbuf);
2112+
2113+ if (err) {
2114+ __clear(dentry, old_lower_dentry,
2115+ old_bstart, old_bend,
2116+ new_lower_dentry, new_bindex);
2117+ goto out_unlock;
2118+ }
2119+
2120+ /* We actually copyup the file here. */
2121+ if (S_ISREG(old_lower_dentry->d_inode->i_mode))
2122+ err = __copyup_reg_data(dentry, new_lower_dentry, new_bindex,
2123+ old_lower_dentry, old_bindex,
2124+ copyup_file, len);
2125+ if (err)
2126+ goto out_unlink;
2127+
2128+ /* Set permissions. */
2129+ err = copyup_permissions(sb, old_lower_dentry, new_lower_dentry);
2130+ if (err)
2131+ goto out_unlink;
2132+
2133+#ifdef CONFIG_UNION_FS_XATTR
2134+ /* Selinux uses extended attributes for permissions. */
2135+ err = copyup_xattrs(old_lower_dentry, new_lower_dentry);
2136+ if (err)
2137+ goto out_unlink;
2138+#endif /* CONFIG_UNION_FS_XATTR */
2139+
2140+ /* do not allow files getting deleted to be re-interposed */
2141+ if (!d_deleted(dentry))
2142+ unionfs_reinterpose(dentry);
2143+
2144+ goto out_unlock;
2145+
2146+out_unlink:
2147+ /*
2148+ * copyup failed, because we possibly ran out of space or
2149+ * quota, or something else happened so let's unlink; we don't
2150+ * really care about the return value of vfs_unlink
2151+ */
2152+ vfs_unlink(new_lower_parent_dentry->d_inode, new_lower_dentry);
2153+
2154+ if (copyup_file) {
2155+ /* need to close the file */
2156+
2157+ fput(*copyup_file);
2158+ branchput(sb, new_bindex);
2159+ }
2160+
2161+ /*
2162+ * TODO: should we reset the error to something like -EIO?
2163+ *
2164+ * If we don't reset, the user may get some nonsensical errors, but
2165+ * on the other hand, if we reset to EIO, we guarantee that the user
2166+ * will get a "confusing" error message.
2167+ */
2168+
2169+out_unlock:
2170+ unlock_dir(new_lower_parent_dentry);
2171+
2172+out_free:
2173+ /*
2174+ * If old_lower_dentry was not a file, then we need to dput it. If
2175+ * it was a file, then it was already dput indirectly by other
2176+ * functions we call above which operate on regular files.
2177+ */
2178+ if (old_lower_dentry && old_lower_dentry->d_inode &&
2179+ !S_ISREG(old_lower_dentry->d_inode->i_mode))
2180+ dput(old_lower_dentry);
2181+ kfree(symbuf);
2182+
2183+ if (err) {
2184+ /*
2185+ * if directory creation succeeded, but inode copyup failed,
2186+ * then purge new dentries.
2187+ */
2188+ if (dbstart(dentry) < old_bstart &&
2189+ ibstart(dentry->d_inode) > dbstart(dentry))
2190+ __clear(dentry, NULL, old_bstart, old_bend,
2191+ unionfs_lower_dentry(dentry), dbstart(dentry));
2192+ goto out;
2193+ }
2194+ if (!S_ISDIR(dentry->d_inode->i_mode)) {
2195+ unionfs_postcopyup_release(dentry);
2196+ if (!unionfs_lower_inode(dentry->d_inode)) {
2197+ /*
2198+ * If we got here, then we copied up to an
2199+ * unlinked-open file, whose name is .unionfsXXXXX.
2200+ */
2201+ struct inode *inode = new_lower_dentry->d_inode;
2202+ atomic_inc(&inode->i_count);
2203+ unionfs_set_lower_inode_idx(dentry->d_inode,
2204+ ibstart(dentry->d_inode),
2205+ inode);
2206+ }
2207+ }
2208+ unionfs_postcopyup_setmnt(dentry);
2209+ /* sync inode times from copied-up inode to our inode */
2210+ unionfs_copy_attr_times(dentry->d_inode);
2211+ unionfs_check_inode(dir);
2212+ unionfs_check_dentry(dentry);
2213+out:
2214+ return err;
2215+}
2216+
2217+/*
2218+ * This function creates a copy of a file represented by 'file' which
2219+ * currently resides in branch 'bstart' to branch 'new_bindex.' The copy
2220+ * will be named "name".
2221+ */
2222+int copyup_named_file(struct inode *dir, struct file *file, char *name,
2223+ int bstart, int new_bindex, loff_t len)
2224+{
2225+ int err = 0;
2226+ struct file *output_file = NULL;
2227+
2228+ err = copyup_dentry(dir, file->f_path.dentry, bstart, new_bindex,
2229+ name, strlen(name), &output_file, len);
2230+ if (!err) {
2231+ fbstart(file) = new_bindex;
2232+ unionfs_set_lower_file_idx(file, new_bindex, output_file);
2233+ }
2234+
2235+ return err;
2236+}
2237+
2238+/*
2239+ * This function creates a copy of a file represented by 'file' which
2240+ * currently resides in branch 'bstart' to branch 'new_bindex'.
2241+ */
2242+int copyup_file(struct inode *dir, struct file *file, int bstart,
2243+ int new_bindex, loff_t len)
2244+{
2245+ int err = 0;
2246+ struct file *output_file = NULL;
2247+ struct dentry *dentry = file->f_path.dentry;
2248+
2249+ err = copyup_dentry(dir, dentry, bstart, new_bindex,
2250+ dentry->d_name.name, dentry->d_name.len,
2251+ &output_file, len);
2252+ if (!err) {
2253+ fbstart(file) = new_bindex;
2254+ unionfs_set_lower_file_idx(file, new_bindex, output_file);
2255+ }
2256+
2257+ return err;
2258+}
2259+
2260+/* purge a dentry's lower-branch states (dput/mntput, etc.) */
2261+static void __cleanup_dentry(struct dentry *dentry, int bindex,
2262+ int old_bstart, int old_bend)
2263+{
2264+ int loop_start;
2265+ int loop_end;
2266+ int new_bstart = -1;
2267+ int new_bend = -1;
2268+ int i;
2269+
2270+ loop_start = min(old_bstart, bindex);
2271+ loop_end = max(old_bend, bindex);
2272+
2273+ /*
2274+ * This loop sets the bstart and bend for the new dentry by
2275+ * traversing from left to right. It also dputs all negative
2276+ * dentries except bindex
2277+ */
2278+ for (i = loop_start; i <= loop_end; i++) {
2279+ if (!unionfs_lower_dentry_idx(dentry, i))
2280+ continue;
2281+
2282+ if (i == bindex) {
2283+ new_bend = i;
2284+ if (new_bstart < 0)
2285+ new_bstart = i;
2286+ continue;
2287+ }
2288+
2289+ if (!unionfs_lower_dentry_idx(dentry, i)->d_inode) {
2290+ dput(unionfs_lower_dentry_idx(dentry, i));
2291+ unionfs_set_lower_dentry_idx(dentry, i, NULL);
2292+
2293+ unionfs_mntput(dentry, i);
2294+ unionfs_set_lower_mnt_idx(dentry, i, NULL);
2295+ } else {
2296+ if (new_bstart < 0)
2297+ new_bstart = i;
2298+ new_bend = i;
2299+ }
2300+ }
2301+
2302+ if (new_bstart < 0)
2303+ new_bstart = bindex;
2304+ if (new_bend < 0)
2305+ new_bend = bindex;
2306+ dbstart(dentry) = new_bstart;
2307+ dbend(dentry) = new_bend;
2308+
2309+}
2310+
2311+/* set lower inode ptr and update bstart & bend if necessary */
2312+static void __set_inode(struct dentry *upper, struct dentry *lower,
2313+ int bindex)
2314+{
2315+ unionfs_set_lower_inode_idx(upper->d_inode, bindex,
2316+ igrab(lower->d_inode));
2317+ if (likely(ibstart(upper->d_inode) > bindex))
2318+ ibstart(upper->d_inode) = bindex;
2319+ if (likely(ibend(upper->d_inode) < bindex))
2320+ ibend(upper->d_inode) = bindex;
2321+
2322+}
2323+
2324+/* set lower dentry ptr and update bstart & bend if necessary */
2325+static void __set_dentry(struct dentry *upper, struct dentry *lower,
2326+ int bindex)
2327+{
2328+ unionfs_set_lower_dentry_idx(upper, bindex, lower);
2329+ if (likely(dbstart(upper) > bindex))
2330+ dbstart(upper) = bindex;
2331+ if (likely(dbend(upper) < bindex))
2332+ dbend(upper) = bindex;
2333+}
2334+
2335+/*
2336+ * This function replicates the directory structure up-to given dentry
2337+ * in the bindex branch.
2338+ */
2339+struct dentry *create_parents(struct inode *dir, struct dentry *dentry,
2340+ const char *name, int bindex)
2341+{
2342+ int err;
2343+ struct dentry *child_dentry;
2344+ struct dentry *parent_dentry;
2345+ struct dentry *lower_parent_dentry = NULL;
2346+ struct dentry *lower_dentry = NULL;
2347+ const char *childname;
2348+ unsigned int childnamelen;
2349+ int nr_dentry;
2350+ int count = 0;
2351+ int old_bstart;
2352+ int old_bend;
2353+ struct dentry **path = NULL;
2354+ struct super_block *sb;
2355+
2356+ verify_locked(dentry);
2357+
2358+ err = is_robranch_super(dir->i_sb, bindex);
2359+ if (err) {
2360+ lower_dentry = ERR_PTR(err);
2361+ goto out;
2362+ }
2363+
2364+ old_bstart = dbstart(dentry);
2365+ old_bend = dbend(dentry);
2366+
2367+ lower_dentry = ERR_PTR(-ENOMEM);
2368+
2369+ /* There is no sense allocating any less than the minimum. */
2370+ nr_dentry = 1;
2371+ path = kmalloc(nr_dentry * sizeof(struct dentry *), GFP_KERNEL);
2372+ if (unlikely(!path))
2373+ goto out;
2374+
2375+ /* assume the negative dentry of unionfs as the parent dentry */
2376+ parent_dentry = dentry;
2377+
2378+ /*
2379+ * This loop finds the first parent that exists in the given branch.
2380+ * We start building the directory structure from there. At the end
2381+ * of the loop, the following should hold:
2382+ * - child_dentry is the first nonexistent child
2383+ * - parent_dentry is the first existent parent
2384+ * - path[0] is the = deepest child
2385+ * - path[count] is the first child to create
2386+ */
2387+ do {
2388+ child_dentry = parent_dentry;
2389+
2390+ /* find the parent directory dentry in unionfs */
2391+ parent_dentry = dget_parent(child_dentry);
2392+
2393+ /* find out the lower_parent_dentry in the given branch */
2394+ lower_parent_dentry =
2395+ unionfs_lower_dentry_idx(parent_dentry, bindex);
2396+
2397+ /* grow path table */
2398+ if (count == nr_dentry) {
2399+ void *p;
2400+
2401+ nr_dentry *= 2;
2402+ p = krealloc(path, nr_dentry * sizeof(struct dentry *),
2403+ GFP_KERNEL);
2404+ if (unlikely(!p)) {
2405+ lower_dentry = ERR_PTR(-ENOMEM);
2406+ goto out;
2407+ }
2408+ path = p;
2409+ }
2410+
2411+ /* store the child dentry */
2412+ path[count++] = child_dentry;
2413+ } while (!lower_parent_dentry);
2414+ count--;
2415+
2416+ sb = dentry->d_sb;
2417+
2418+ /*
2419+ * This code goes between the begin/end labels and basically
2420+ * emulates a while(child_dentry != dentry), only cleaner and
2421+ * shorter than what would be a much longer while loop.
2422+ */
2423+begin:
2424+ /* get lower parent dir in the current branch */
2425+ lower_parent_dentry = unionfs_lower_dentry_idx(parent_dentry, bindex);
2426+ dput(parent_dentry);
2427+
2428+ /* init the values to lookup */
2429+ childname = child_dentry->d_name.name;
2430+ childnamelen = child_dentry->d_name.len;
2431+
2432+ if (child_dentry != dentry) {
2433+ /* lookup child in the underlying file system */
4ae1df7a 2434+ lower_dentry = lookup_lck_len(childname, lower_parent_dentry,
2380c486
JR
2435+ childnamelen);
2436+ if (IS_ERR(lower_dentry))
2437+ goto out;
2438+ } else {
2439+ /*
2440+ * Is the name a whiteout of the child name ? lookup the
2441+ * whiteout child in the underlying file system
2442+ */
4ae1df7a 2443+ lower_dentry = lookup_lck_len(name, lower_parent_dentry,
2380c486
JR
2444+ strlen(name));
2445+ if (IS_ERR(lower_dentry))
2446+ goto out;
2447+
2448+ /* Replace the current dentry (if any) with the new one */
2449+ dput(unionfs_lower_dentry_idx(dentry, bindex));
2450+ unionfs_set_lower_dentry_idx(dentry, bindex,
2451+ lower_dentry);
2452+
2453+ __cleanup_dentry(dentry, bindex, old_bstart, old_bend);
2454+ goto out;
2455+ }
2456+
2457+ if (lower_dentry->d_inode) {
2458+ /*
2459+ * since this already exists we dput to avoid
2460+ * multiple references on the same dentry
2461+ */
2462+ dput(lower_dentry);
2463+ } else {
2464+ struct sioq_args args;
2465+
2466+ /* it's a negative dentry, create a new dir */
2467+ lower_parent_dentry = lock_parent(lower_dentry);
2468+
2469+ args.mkdir.parent = lower_parent_dentry->d_inode;
2470+ args.mkdir.dentry = lower_dentry;
2471+ args.mkdir.mode = child_dentry->d_inode->i_mode;
2472+
2473+ run_sioq(__unionfs_mkdir, &args);
2474+ err = args.err;
2475+
2476+ if (!err)
2477+ err = copyup_permissions(dir->i_sb, child_dentry,
2478+ lower_dentry);
2479+ unlock_dir(lower_parent_dentry);
2480+ if (err) {
2481+ dput(lower_dentry);
2482+ lower_dentry = ERR_PTR(err);
2483+ goto out;
2484+ }
2485+
2486+ }
2487+
2488+ __set_inode(child_dentry, lower_dentry, bindex);
2489+ __set_dentry(child_dentry, lower_dentry, bindex);
2490+ /*
2491+ * update times of this dentry, but also the parent, because if
2492+ * we changed, the parent may have changed too.
2493+ */
2494+ fsstack_copy_attr_times(parent_dentry->d_inode,
2495+ lower_parent_dentry->d_inode);
2496+ unionfs_copy_attr_times(child_dentry->d_inode);
2497+
2498+ parent_dentry = child_dentry;
2499+ child_dentry = path[--count];
2500+ goto begin;
2501+out:
2502+ /* cleanup any leftover locks from the do/while loop above */
2503+ if (IS_ERR(lower_dentry))
2504+ while (count)
2505+ dput(path[count--]);
2506+ kfree(path);
2507+ return lower_dentry;
2508+}
2509+
2510+/*
2511+ * Post-copyup helper to ensure we have valid mnts: set lower mnt of
2512+ * dentry+parents to the first parent node that has an mnt.
2513+ */
2514+void unionfs_postcopyup_setmnt(struct dentry *dentry)
2515+{
2516+ struct dentry *parent, *hasone;
2517+ int bindex = dbstart(dentry);
2518+
2519+ if (unionfs_lower_mnt_idx(dentry, bindex))
2520+ return;
2521+ hasone = dentry->d_parent;
2522+ /* this loop should stop at root dentry */
2523+ while (!unionfs_lower_mnt_idx(hasone, bindex))
2524+ hasone = hasone->d_parent;
2525+ parent = dentry;
2526+ while (!unionfs_lower_mnt_idx(parent, bindex)) {
2527+ unionfs_set_lower_mnt_idx(parent, bindex,
2528+ unionfs_mntget(hasone, bindex));
2529+ parent = parent->d_parent;
2530+ }
2531+}
2532+
2533+/*
2534+ * Post-copyup helper to release all non-directory source objects of a
2535+ * copied-up file. Regular files should have only one lower object.
2536+ */
2537+void unionfs_postcopyup_release(struct dentry *dentry)
2538+{
2539+ int bstart, bend;
2540+
2541+ BUG_ON(S_ISDIR(dentry->d_inode->i_mode));
2542+ bstart = dbstart(dentry);
2543+ bend = dbend(dentry);
2544+
2545+ path_put_lowers(dentry, bstart + 1, bend, false);
2546+ iput_lowers(dentry->d_inode, bstart + 1, bend, false);
2547+
2548+ dbend(dentry) = bstart;
2549+ ibend(dentry->d_inode) = ibstart(dentry->d_inode) = bstart;
2550+}
0c5527e5
AM
2551diff --git a/fs/unionfs/debug.c b/fs/unionfs/debug.c
2552new file mode 100644
82260373 2553index 0000000..a76f92a
0c5527e5
AM
2554--- /dev/null
2555+++ b/fs/unionfs/debug.c
82260373 2556@@ -0,0 +1,548 @@
2380c486 2557+/*
7670a7fc 2558+ * Copyright (c) 2003-2010 Erez Zadok
2380c486 2559+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7670a7fc
AM
2560+ * Copyright (c) 2003-2010 Stony Brook University
2561+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
2562+ *
2563+ * This program is free software; you can redistribute it and/or modify
2564+ * it under the terms of the GNU General Public License version 2 as
2565+ * published by the Free Software Foundation.
2566+ */
2567+
2568+#include "union.h"
2569+
2570+/*
2571+ * Helper debugging functions for maintainers (and for users to report back
2572+ * useful information back to maintainers)
2573+ */
2574+
2575+/* it's always useful to know what part of the code called us */
2576+#define PRINT_CALLER(fname, fxn, line) \
2577+ do { \
2578+ if (!printed_caller) { \
2579+ pr_debug("PC:%s:%s:%d\n", (fname), (fxn), (line)); \
2580+ printed_caller = 1; \
2581+ } \
2582+ } while (0)
2583+
2584+/*
2585+ * __unionfs_check_{inode,dentry,file} perform exhaustive sanity checking on
2586+ * the fan-out of various Unionfs objects. We check that no lower objects
2587+ * exist outside the start/end branch range; that all objects within are
2588+ * non-NULL (with some allowed exceptions); that for every lower file
2589+ * there's a lower dentry+inode; that the start/end ranges match for all
2590+ * corresponding lower objects; that open files/symlinks have only one lower
2591+ * objects, but directories can have several; and more.
2592+ */
2593+void __unionfs_check_inode(const struct inode *inode,
2594+ const char *fname, const char *fxn, int line)
2595+{
2596+ int bindex;
2597+ int istart, iend;
2598+ struct inode *lower_inode;
2599+ struct super_block *sb;
2600+ int printed_caller = 0;
2601+ void *poison_ptr;
2602+
2603+ /* for inodes now */
2604+ BUG_ON(!inode);
2605+ sb = inode->i_sb;
2606+ istart = ibstart(inode);
2607+ iend = ibend(inode);
2608+ /* don't check inode if no lower branches */
2609+ if (istart < 0 && iend < 0)
2610+ return;
2611+ if (unlikely(istart > iend)) {
2612+ PRINT_CALLER(fname, fxn, line);
2613+ pr_debug(" Ci0: inode=%p istart/end=%d:%d\n",
2614+ inode, istart, iend);
2615+ }
2616+ if (unlikely((istart == -1 && iend != -1) ||
2617+ (istart != -1 && iend == -1))) {
2618+ PRINT_CALLER(fname, fxn, line);
2619+ pr_debug(" Ci1: inode=%p istart/end=%d:%d\n",
2620+ inode, istart, iend);
2621+ }
2622+ if (!S_ISDIR(inode->i_mode)) {
2623+ if (unlikely(iend != istart)) {
2624+ PRINT_CALLER(fname, fxn, line);
2625+ pr_debug(" Ci2: inode=%p istart=%d iend=%d\n",
2626+ inode, istart, iend);
2627+ }
2628+ }
2629+
2630+ for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2631+ if (unlikely(!UNIONFS_I(inode))) {
2632+ PRINT_CALLER(fname, fxn, line);
2633+ pr_debug(" Ci3: no inode_info %p\n", inode);
2634+ return;
2635+ }
2636+ if (unlikely(!UNIONFS_I(inode)->lower_inodes)) {
2637+ PRINT_CALLER(fname, fxn, line);
2638+ pr_debug(" Ci4: no lower_inodes %p\n", inode);
2639+ return;
2640+ }
2641+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
2642+ if (lower_inode) {
2643+ memset(&poison_ptr, POISON_INUSE, sizeof(void *));
2644+ if (unlikely(bindex < istart || bindex > iend)) {
2645+ PRINT_CALLER(fname, fxn, line);
2646+ pr_debug(" Ci5: inode/linode=%p:%p bindex=%d "
2647+ "istart/end=%d:%d\n", inode,
2648+ lower_inode, bindex, istart, iend);
2649+ } else if (unlikely(lower_inode == poison_ptr)) {
2650+ /* freed inode! */
2651+ PRINT_CALLER(fname, fxn, line);
2652+ pr_debug(" Ci6: inode/linode=%p:%p bindex=%d "
2653+ "istart/end=%d:%d\n", inode,
2654+ lower_inode, bindex, istart, iend);
2655+ }
2656+ continue;
2657+ }
2658+ /* if we get here, then lower_inode == NULL */
2659+ if (bindex < istart || bindex > iend)
2660+ continue;
2661+ /*
2662+ * directories can have NULL lower inodes in b/t start/end,
2663+ * but NOT if at the start/end range.
2664+ */
2665+ if (unlikely(S_ISDIR(inode->i_mode) &&
2666+ bindex > istart && bindex < iend))
2667+ continue;
2668+ PRINT_CALLER(fname, fxn, line);
2669+ pr_debug(" Ci7: inode/linode=%p:%p "
2670+ "bindex=%d istart/end=%d:%d\n",
2671+ inode, lower_inode, bindex, istart, iend);
2672+ }
2673+}
2674+
2675+void __unionfs_check_dentry(const struct dentry *dentry,
2676+ const char *fname, const char *fxn, int line)
2677+{
2678+ int bindex;
2679+ int dstart, dend, istart, iend;
2680+ struct dentry *lower_dentry;
2681+ struct inode *inode, *lower_inode;
2682+ struct super_block *sb;
2683+ struct vfsmount *lower_mnt;
2684+ int printed_caller = 0;
2685+ void *poison_ptr;
2686+
2687+ BUG_ON(!dentry);
2688+ sb = dentry->d_sb;
2689+ inode = dentry->d_inode;
2690+ dstart = dbstart(dentry);
2691+ dend = dbend(dentry);
2692+ /* don't check dentry/mnt if no lower branches */
2693+ if (dstart < 0 && dend < 0)
2694+ goto check_inode;
2695+ BUG_ON(dstart > dend);
2696+
2697+ if (unlikely((dstart == -1 && dend != -1) ||
2698+ (dstart != -1 && dend == -1))) {
2699+ PRINT_CALLER(fname, fxn, line);
2700+ pr_debug(" CD0: dentry=%p dstart/end=%d:%d\n",
2701+ dentry, dstart, dend);
2702+ }
2703+ /*
2704+ * check for NULL dentries inside the start/end range, or
2705+ * non-NULL dentries outside the start/end range.
2706+ */
2707+ for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2708+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
2709+ if (lower_dentry) {
2710+ if (unlikely(bindex < dstart || bindex > dend)) {
2711+ PRINT_CALLER(fname, fxn, line);
2712+ pr_debug(" CD1: dentry/lower=%p:%p(%p) "
2713+ "bindex=%d dstart/end=%d:%d\n",
2714+ dentry, lower_dentry,
2715+ (lower_dentry ? lower_dentry->d_inode :
2716+ (void *) -1L),
2717+ bindex, dstart, dend);
2718+ }
2719+ } else { /* lower_dentry == NULL */
2720+ if (bindex < dstart || bindex > dend)
2721+ continue;
2722+ /*
2723+ * Directories can have NULL lower inodes in b/t
2724+ * start/end, but NOT if at the start/end range.
2725+ * Ignore this rule, however, if this is a NULL
2726+ * dentry or a deleted dentry.
2727+ */
2728+ if (unlikely(!d_deleted((struct dentry *) dentry) &&
2729+ inode &&
2730+ !(inode && S_ISDIR(inode->i_mode) &&
2731+ bindex > dstart && bindex < dend))) {
2732+ PRINT_CALLER(fname, fxn, line);
2733+ pr_debug(" CD2: dentry/lower=%p:%p(%p) "
2734+ "bindex=%d dstart/end=%d:%d\n",
2735+ dentry, lower_dentry,
2736+ (lower_dentry ?
2737+ lower_dentry->d_inode :
2738+ (void *) -1L),
2739+ bindex, dstart, dend);
2740+ }
2741+ }
2742+ }
2743+
2744+ /* check for vfsmounts same as for dentries */
2745+ for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2746+ lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
2747+ if (lower_mnt) {
2748+ if (unlikely(bindex < dstart || bindex > dend)) {
2749+ PRINT_CALLER(fname, fxn, line);
2750+ pr_debug(" CM0: dentry/lmnt=%p:%p bindex=%d "
2751+ "dstart/end=%d:%d\n", dentry,
2752+ lower_mnt, bindex, dstart, dend);
2753+ }
2754+ } else { /* lower_mnt == NULL */
2755+ if (bindex < dstart || bindex > dend)
2756+ continue;
2757+ /*
2758+ * Directories can have NULL lower inodes in b/t
2759+ * start/end, but NOT if at the start/end range.
2760+ * Ignore this rule, however, if this is a NULL
2761+ * dentry.
2762+ */
2763+ if (unlikely(inode &&
2764+ !(inode && S_ISDIR(inode->i_mode) &&
2765+ bindex > dstart && bindex < dend))) {
2766+ PRINT_CALLER(fname, fxn, line);
2767+ pr_debug(" CM1: dentry/lmnt=%p:%p "
2768+ "bindex=%d dstart/end=%d:%d\n",
2769+ dentry, lower_mnt, bindex,
2770+ dstart, dend);
2771+ }
2772+ }
2773+ }
2774+
2775+check_inode:
2776+ /* for inodes now */
2777+ if (!inode)
2778+ return;
2779+ istart = ibstart(inode);
2780+ iend = ibend(inode);
2781+ /* don't check inode if no lower branches */
2782+ if (istart < 0 && iend < 0)
2783+ return;
2784+ BUG_ON(istart > iend);
2785+ if (unlikely((istart == -1 && iend != -1) ||
2786+ (istart != -1 && iend == -1))) {
2787+ PRINT_CALLER(fname, fxn, line);
2788+ pr_debug(" CI0: dentry/inode=%p:%p istart/end=%d:%d\n",
2789+ dentry, inode, istart, iend);
2790+ }
2791+ if (unlikely(istart != dstart)) {
2792+ PRINT_CALLER(fname, fxn, line);
2793+ pr_debug(" CI1: dentry/inode=%p:%p istart=%d dstart=%d\n",
2794+ dentry, inode, istart, dstart);
2795+ }
2796+ if (unlikely(iend != dend)) {
2797+ PRINT_CALLER(fname, fxn, line);
2798+ pr_debug(" CI2: dentry/inode=%p:%p iend=%d dend=%d\n",
2799+ dentry, inode, iend, dend);
2800+ }
2801+
2802+ if (!S_ISDIR(inode->i_mode)) {
2803+ if (unlikely(dend != dstart)) {
2804+ PRINT_CALLER(fname, fxn, line);
2805+ pr_debug(" CI3: dentry/inode=%p:%p dstart=%d dend=%d\n",
2806+ dentry, inode, dstart, dend);
2807+ }
2808+ if (unlikely(iend != istart)) {
2809+ PRINT_CALLER(fname, fxn, line);
2810+ pr_debug(" CI4: dentry/inode=%p:%p istart=%d iend=%d\n",
2811+ dentry, inode, istart, iend);
2812+ }
2813+ }
2814+
2815+ for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2816+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
2817+ if (lower_inode) {
2818+ memset(&poison_ptr, POISON_INUSE, sizeof(void *));
2819+ if (unlikely(bindex < istart || bindex > iend)) {
2820+ PRINT_CALLER(fname, fxn, line);
2821+ pr_debug(" CI5: dentry/linode=%p:%p bindex=%d "
2822+ "istart/end=%d:%d\n", dentry,
2823+ lower_inode, bindex, istart, iend);
2824+ } else if (unlikely(lower_inode == poison_ptr)) {
2825+ /* freed inode! */
2826+ PRINT_CALLER(fname, fxn, line);
2827+ pr_debug(" CI6: dentry/linode=%p:%p bindex=%d "
2828+ "istart/end=%d:%d\n", dentry,
2829+ lower_inode, bindex, istart, iend);
2830+ }
2831+ continue;
2832+ }
2833+ /* if we get here, then lower_inode == NULL */
2834+ if (bindex < istart || bindex > iend)
2835+ continue;
2836+ /*
2837+ * directories can have NULL lower inodes in b/t start/end,
2838+ * but NOT if at the start/end range.
2839+ */
2840+ if (unlikely(S_ISDIR(inode->i_mode) &&
2841+ bindex > istart && bindex < iend))
2842+ continue;
2843+ PRINT_CALLER(fname, fxn, line);
2844+ pr_debug(" CI7: dentry/linode=%p:%p "
2845+ "bindex=%d istart/end=%d:%d\n",
2846+ dentry, lower_inode, bindex, istart, iend);
2847+ }
2848+
2849+ /*
2850+ * If it's a directory, then intermediate objects b/t start/end can
2851+ * be NULL. But, check that all three are NULL: lower dentry, mnt,
2852+ * and inode.
2853+ */
2854+ if (dstart >= 0 && dend >= 0 && S_ISDIR(inode->i_mode))
2855+ for (bindex = dstart+1; bindex < dend; bindex++) {
2856+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
2857+ lower_dentry = unionfs_lower_dentry_idx(dentry,
2858+ bindex);
2859+ lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
2860+ if (unlikely(!((lower_inode && lower_dentry &&
2861+ lower_mnt) ||
2862+ (!lower_inode &&
2863+ !lower_dentry && !lower_mnt)))) {
2864+ PRINT_CALLER(fname, fxn, line);
2865+ pr_debug(" Cx: lmnt/ldentry/linode=%p:%p:%p "
2866+ "bindex=%d dstart/end=%d:%d\n",
2867+ lower_mnt, lower_dentry, lower_inode,
2868+ bindex, dstart, dend);
2869+ }
2870+ }
2871+ /* check if lower inode is newer than upper one (it shouldn't) */
2872+ if (unlikely(is_newer_lower(dentry) && !is_negative_lower(dentry))) {
2873+ PRINT_CALLER(fname, fxn, line);
2874+ for (bindex = ibstart(inode); bindex <= ibend(inode);
2875+ bindex++) {
2876+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
2877+ if (unlikely(!lower_inode))
2878+ continue;
2879+ pr_debug(" CI8: bindex=%d mtime/lmtime=%lu.%lu/%lu.%lu "
2880+ "ctime/lctime=%lu.%lu/%lu.%lu\n",
2881+ bindex,
2882+ inode->i_mtime.tv_sec,
2883+ inode->i_mtime.tv_nsec,
2884+ lower_inode->i_mtime.tv_sec,
2885+ lower_inode->i_mtime.tv_nsec,
2886+ inode->i_ctime.tv_sec,
2887+ inode->i_ctime.tv_nsec,
2888+ lower_inode->i_ctime.tv_sec,
2889+ lower_inode->i_ctime.tv_nsec);
2890+ }
2891+ }
2892+}
2893+
2894+void __unionfs_check_file(const struct file *file,
2895+ const char *fname, const char *fxn, int line)
2896+{
2897+ int bindex;
2898+ int dstart, dend, fstart, fend;
2899+ struct dentry *dentry;
2900+ struct file *lower_file;
2901+ struct inode *inode;
2902+ struct super_block *sb;
2903+ int printed_caller = 0;
2904+
2905+ BUG_ON(!file);
2906+ dentry = file->f_path.dentry;
2907+ sb = dentry->d_sb;
2908+ dstart = dbstart(dentry);
2909+ dend = dbend(dentry);
2910+ BUG_ON(dstart > dend);
2911+ fstart = fbstart(file);
2912+ fend = fbend(file);
2913+ BUG_ON(fstart > fend);
2914+
2915+ if (unlikely((fstart == -1 && fend != -1) ||
2916+ (fstart != -1 && fend == -1))) {
2917+ PRINT_CALLER(fname, fxn, line);
2918+ pr_debug(" CF0: file/dentry=%p:%p fstart/end=%d:%d\n",
2919+ file, dentry, fstart, fend);
2920+ }
2921+ if (unlikely(fstart != dstart)) {
2922+ PRINT_CALLER(fname, fxn, line);
2923+ pr_debug(" CF1: file/dentry=%p:%p fstart=%d dstart=%d\n",
2924+ file, dentry, fstart, dstart);
2925+ }
2926+ if (unlikely(fend != dend)) {
2927+ PRINT_CALLER(fname, fxn, line);
2928+ pr_debug(" CF2: file/dentry=%p:%p fend=%d dend=%d\n",
2929+ file, dentry, fend, dend);
2930+ }
2931+ inode = dentry->d_inode;
2932+ if (!S_ISDIR(inode->i_mode)) {
2933+ if (unlikely(fend != fstart)) {
2934+ PRINT_CALLER(fname, fxn, line);
2935+ pr_debug(" CF3: file/inode=%p:%p fstart=%d fend=%d\n",
2936+ file, inode, fstart, fend);
2937+ }
2938+ if (unlikely(dend != dstart)) {
2939+ PRINT_CALLER(fname, fxn, line);
2940+ pr_debug(" CF4: file/dentry=%p:%p dstart=%d dend=%d\n",
2941+ file, dentry, dstart, dend);
2942+ }
2943+ }
2944+
2945+ /*
2946+ * check for NULL dentries inside the start/end range, or
2947+ * non-NULL dentries outside the start/end range.
2948+ */
2949+ for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2950+ lower_file = unionfs_lower_file_idx(file, bindex);
2951+ if (lower_file) {
2952+ if (unlikely(bindex < fstart || bindex > fend)) {
2953+ PRINT_CALLER(fname, fxn, line);
2954+ pr_debug(" CF5: file/lower=%p:%p bindex=%d "
2955+ "fstart/end=%d:%d\n", file,
2956+ lower_file, bindex, fstart, fend);
2957+ }
2958+ } else { /* lower_file == NULL */
2959+ if (bindex >= fstart && bindex <= fend) {
2960+ /*
2961+ * directories can have NULL lower inodes in
2962+ * b/t start/end, but NOT if at the
2963+ * start/end range.
2964+ */
2965+ if (unlikely(!(S_ISDIR(inode->i_mode) &&
2966+ bindex > fstart &&
2967+ bindex < fend))) {
2968+ PRINT_CALLER(fname, fxn, line);
2969+ pr_debug(" CF6: file/lower=%p:%p "
2970+ "bindex=%d fstart/end=%d:%d\n",
2971+ file, lower_file, bindex,
2972+ fstart, fend);
2973+ }
2974+ }
2975+ }
2976+ }
2977+
2978+ __unionfs_check_dentry(dentry, fname, fxn, line);
2979+}
2980+
2981+void __unionfs_check_nd(const struct nameidata *nd,
2982+ const char *fname, const char *fxn, int line)
2983+{
2984+ struct file *file;
2985+ int printed_caller = 0;
2986+
2987+ if (unlikely(!nd))
2988+ return;
2989+ if (nd->flags & LOOKUP_OPEN) {
2990+ file = nd->intent.open.file;
2991+ if (unlikely(file->f_path.dentry &&
2992+ strcmp(file->f_path.dentry->d_sb->s_type->name,
2993+ UNIONFS_NAME))) {
2994+ PRINT_CALLER(fname, fxn, line);
2995+ pr_debug(" CND1: lower_file of type %s\n",
2996+ file->f_path.dentry->d_sb->s_type->name);
2380c486
JR
2997+ }
2998+ }
2999+}
3000+
82260373
AM
3001+static unsigned int __mnt_get_count(struct vfsmount *mnt)
3002+{
3003+#ifdef CONFIG_SMP
3004+ unsigned int count = 0;
3005+ int cpu;
3006+
3007+ for_each_possible_cpu(cpu) {
3008+ count += per_cpu_ptr(mnt->mnt_pcp, cpu)->mnt_count;
3009+ }
3010+
3011+ return count;
3012+#else
3013+ return mnt->mnt_count;
3014+#endif
3015+}
3016+
2380c486
JR
3017+/* useful to track vfsmount leaks that could cause EBUSY on unmount */
3018+void __show_branch_counts(const struct super_block *sb,
3019+ const char *file, const char *fxn, int line)
3020+{
3021+ int i;
3022+ struct vfsmount *mnt;
3023+
3024+ pr_debug("BC:");
3025+ for (i = 0; i < sbmax(sb); i++) {
3026+ if (likely(sb->s_root))
3027+ mnt = UNIONFS_D(sb->s_root)->lower_paths[i].mnt;
3028+ else
3029+ mnt = NULL;
3030+ printk(KERN_CONT "%d:",
82260373 3031+ (mnt ? __mnt_get_count(mnt) : -99));
2380c486
JR
3032+ }
3033+ printk(KERN_CONT "%s:%s:%d\n", file, fxn, line);
3034+}
3035+
3036+void __show_inode_times(const struct inode *inode,
3037+ const char *file, const char *fxn, int line)
3038+{
3039+ struct inode *lower_inode;
3040+ int bindex;
3041+
3042+ for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
3043+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
3044+ if (unlikely(!lower_inode))
3045+ continue;
3046+ pr_debug("IT(%lu:%d): %s:%s:%d "
3047+ "um=%lu/%lu lm=%lu/%lu uc=%lu/%lu lc=%lu/%lu\n",
3048+ inode->i_ino, bindex,
3049+ file, fxn, line,
3050+ inode->i_mtime.tv_sec, inode->i_mtime.tv_nsec,
3051+ lower_inode->i_mtime.tv_sec,
3052+ lower_inode->i_mtime.tv_nsec,
3053+ inode->i_ctime.tv_sec, inode->i_ctime.tv_nsec,
3054+ lower_inode->i_ctime.tv_sec,
3055+ lower_inode->i_ctime.tv_nsec);
3056+ }
3057+}
3058+
3059+void __show_dinode_times(const struct dentry *dentry,
3060+ const char *file, const char *fxn, int line)
3061+{
3062+ struct inode *inode = dentry->d_inode;
3063+ struct inode *lower_inode;
3064+ int bindex;
3065+
3066+ for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
3067+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
3068+ if (!lower_inode)
3069+ continue;
3070+ pr_debug("DT(%s:%lu:%d): %s:%s:%d "
3071+ "um=%lu/%lu lm=%lu/%lu uc=%lu/%lu lc=%lu/%lu\n",
3072+ dentry->d_name.name, inode->i_ino, bindex,
3073+ file, fxn, line,
3074+ inode->i_mtime.tv_sec, inode->i_mtime.tv_nsec,
3075+ lower_inode->i_mtime.tv_sec,
3076+ lower_inode->i_mtime.tv_nsec,
3077+ inode->i_ctime.tv_sec, inode->i_ctime.tv_nsec,
3078+ lower_inode->i_ctime.tv_sec,
3079+ lower_inode->i_ctime.tv_nsec);
3080+ }
3081+}
3082+
3083+void __show_inode_counts(const struct inode *inode,
3084+ const char *file, const char *fxn, int line)
3085+{
3086+ struct inode *lower_inode;
3087+ int bindex;
3088+
3089+ if (unlikely(!inode)) {
3090+ pr_debug("SiC: Null inode\n");
3091+ return;
3092+ }
3093+ for (bindex = sbstart(inode->i_sb); bindex <= sbend(inode->i_sb);
3094+ bindex++) {
3095+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
3096+ if (unlikely(!lower_inode))
3097+ continue;
3098+ pr_debug("SIC(%lu:%d:%d): lc=%d %s:%s:%d\n",
3099+ inode->i_ino, bindex,
3100+ atomic_read(&(inode)->i_count),
3101+ atomic_read(&(lower_inode)->i_count),
3102+ file, fxn, line);
3103+ }
3104+}
0c5527e5
AM
3105diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c
3106new file mode 100644
3107index 0000000..a0c3bba
3108--- /dev/null
3109+++ b/fs/unionfs/dentry.c
2380c486
JR
3110@@ -0,0 +1,397 @@
3111+/*
7670a7fc 3112+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
3113+ * Copyright (c) 2003-2006 Charles P. Wright
3114+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3115+ * Copyright (c) 2005-2006 Junjiro Okajima
3116+ * Copyright (c) 2005 Arun M. Krishnakumar
3117+ * Copyright (c) 2004-2006 David P. Quigley
3118+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3119+ * Copyright (c) 2003 Puja Gupta
3120+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
3121+ * Copyright (c) 2003-2010 Stony Brook University
3122+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
3123+ *
3124+ * This program is free software; you can redistribute it and/or modify
3125+ * it under the terms of the GNU General Public License version 2 as
3126+ * published by the Free Software Foundation.
3127+ */
3128+
3129+#include "union.h"
3130+
3131+bool is_negative_lower(const struct dentry *dentry)
3132+{
3133+ int bindex;
3134+ struct dentry *lower_dentry;
3135+
3136+ BUG_ON(!dentry);
3137+ /* cache coherency: check if file was deleted on lower branch */
3138+ if (dbstart(dentry) < 0)
3139+ return true;
3140+ for (bindex = dbstart(dentry); bindex <= dbend(dentry); bindex++) {
3141+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3142+ /* unhashed (i.e., unlinked) lower dentries don't count */
3143+ if (lower_dentry && lower_dentry->d_inode &&
3144+ !d_deleted(lower_dentry) &&
3145+ !(lower_dentry->d_flags & DCACHE_NFSFS_RENAMED))
3146+ return false;
3147+ }
3148+ return true;
3149+}
3150+
3151+static inline void __dput_lowers(struct dentry *dentry, int start, int end)
3152+{
3153+ struct dentry *lower_dentry;
3154+ int bindex;
3155+
3156+ if (start < 0)
3157+ return;
3158+ for (bindex = start; bindex <= end; bindex++) {
3159+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3160+ if (!lower_dentry)
3161+ continue;
3162+ unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
3163+ dput(lower_dentry);
3164+ }
3165+}
3166+
3167+/*
3168+ * Purge and invalidate as many data pages of a unionfs inode. This is
3169+ * called when the lower inode has changed, and we want to force processes
3170+ * to re-get the new data.
3171+ */
3172+static inline void purge_inode_data(struct inode *inode)
3173+{
3174+ /* remove all non-private mappings */
3175+ unmap_mapping_range(inode->i_mapping, 0, 0, 0);
3176+ /* invalidate as many pages as possible */
3177+ invalidate_mapping_pages(inode->i_mapping, 0, -1);
3178+ /*
3179+ * Don't try to truncate_inode_pages here, because this could lead
3180+ * to a deadlock between some of address_space ops and dentry
3181+ * revalidation: the address space op is invoked with a lock on our
3182+ * own page, and truncate_inode_pages will block on locked pages.
3183+ */
3184+}
3185+
3186+/*
3187+ * Revalidate a single file/symlink/special dentry. Assume that info nodes
3188+ * of the @dentry and its @parent are locked. Assume parent is valid,
3189+ * otherwise return false (and let's hope the VFS will try to re-lookup this
3190+ * dentry). Returns true if valid, false otherwise.
3191+ */
3192+bool __unionfs_d_revalidate(struct dentry *dentry, struct dentry *parent,
3193+ bool willwrite)
3194+{
3195+ bool valid = true; /* default is valid */
3196+ struct dentry *lower_dentry;
3197+ struct dentry *result;
3198+ int bindex, bstart, bend;
3199+ int sbgen, dgen, pdgen;
3200+ int positive = 0;
3201+ int interpose_flag;
3202+
3203+ verify_locked(dentry);
3204+ verify_locked(parent);
3205+
3206+ /* if the dentry is unhashed, do NOT revalidate */
3207+ if (d_deleted(dentry))
3208+ goto out;
3209+
3210+ dgen = atomic_read(&UNIONFS_D(dentry)->generation);
3211+
3212+ if (is_newer_lower(dentry)) {
3213+ /* root dentry is always valid */
3214+ if (IS_ROOT(dentry)) {
3215+ unionfs_copy_attr_times(dentry->d_inode);
3216+ } else {
3217+ /*
3218+ * reset generation number to zero, guaranteed to be
3219+ * "old"
3220+ */
3221+ dgen = 0;
3222+ atomic_set(&UNIONFS_D(dentry)->generation, dgen);
3223+ }
3224+ if (!willwrite)
3225+ purge_inode_data(dentry->d_inode);
3226+ }
3227+
3228+ sbgen = atomic_read(&UNIONFS_SB(dentry->d_sb)->generation);
3229+
3230+ BUG_ON(dbstart(dentry) == -1);
3231+ if (dentry->d_inode)
3232+ positive = 1;
3233+
3234+ /* if our dentry is valid, then validate all lower ones */
3235+ if (sbgen == dgen)
3236+ goto validate_lowers;
3237+
3238+ /* The root entry should always be valid */
3239+ BUG_ON(IS_ROOT(dentry));
3240+
3241+ /* We can't work correctly if our parent isn't valid. */
3242+ pdgen = atomic_read(&UNIONFS_D(parent)->generation);
3243+
3244+ /* Free the pointers for our inodes and this dentry. */
3245+ path_put_lowers_all(dentry, false);
3246+
3247+ interpose_flag = INTERPOSE_REVAL_NEG;
3248+ if (positive) {
3249+ interpose_flag = INTERPOSE_REVAL;
3250+ iput_lowers_all(dentry->d_inode, true);
3251+ }
3252+
3253+ if (realloc_dentry_private_data(dentry) != 0) {
3254+ valid = false;
3255+ goto out;
3256+ }
3257+
3258+ result = unionfs_lookup_full(dentry, parent, interpose_flag);
3259+ if (result) {
3260+ if (IS_ERR(result)) {
3261+ valid = false;
3262+ goto out;
3263+ }
3264+ /*
3265+ * current unionfs_lookup_backend() doesn't return
3266+ * a valid dentry
3267+ */
3268+ dput(dentry);
3269+ dentry = result;
3270+ }
3271+
3272+ if (unlikely(positive && is_negative_lower(dentry))) {
3273+ /* call make_bad_inode here ? */
3274+ d_drop(dentry);
3275+ valid = false;
3276+ goto out;
3277+ }
3278+
3279+ /*
3280+ * if we got here then we have revalidated our dentry and all lower
3281+ * ones, so we can return safely.
3282+ */
3283+ if (!valid) /* lower dentry revalidation failed */
3284+ goto out;
3285+
3286+ /*
3287+ * If the parent's gen no. matches the superblock's gen no., then
3288+ * we can update our denty's gen no. If they didn't match, then it
3289+ * was OK to revalidate this dentry with a stale parent, but we'll
3290+ * purposely not update our dentry's gen no. (so it can be redone);
3291+ * and, we'll mark our parent dentry as invalid so it'll force it
3292+ * (and our dentry) to be revalidated.
3293+ */
3294+ if (pdgen == sbgen)
3295+ atomic_set(&UNIONFS_D(dentry)->generation, sbgen);
3296+ goto out;
3297+
3298+validate_lowers:
3299+
3300+ /* The revalidation must occur across all branches */
3301+ bstart = dbstart(dentry);
3302+ bend = dbend(dentry);
3303+ BUG_ON(bstart == -1);
3304+ for (bindex = bstart; bindex <= bend; bindex++) {
3305+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3306+ if (!lower_dentry || !lower_dentry->d_op
3307+ || !lower_dentry->d_op->d_revalidate)
3308+ continue;
3309+ /*
3310+ * Don't pass nameidata to lower file system, because we
3311+ * don't want an arbitrary lower file being opened or
3312+ * returned to us: it may be useless to us because of the
3313+ * fanout nature of unionfs (cf. file/directory open-file
3314+ * invariants). We will open lower files as and when needed
3315+ * later on.
3316+ */
3317+ if (!lower_dentry->d_op->d_revalidate(lower_dentry, NULL))
3318+ valid = false;
3319+ }
3320+
3321+ if (!dentry->d_inode ||
3322+ ibstart(dentry->d_inode) < 0 ||
3323+ ibend(dentry->d_inode) < 0) {
3324+ valid = false;
3325+ goto out;
3326+ }
3327+
3328+ if (valid) {
3329+ /*
3330+ * If we get here, and we copy the meta-data from the lower
3331+ * inode to our inode, then it is vital that we have already
3332+ * purged all unionfs-level file data. We do that in the
3333+ * caller (__unionfs_d_revalidate) by calling
3334+ * purge_inode_data.
3335+ */
3336+ unionfs_copy_attr_all(dentry->d_inode,
3337+ unionfs_lower_inode(dentry->d_inode));
3338+ fsstack_copy_inode_size(dentry->d_inode,
3339+ unionfs_lower_inode(dentry->d_inode));
3340+ }
3341+
3342+out:
3343+ return valid;
3344+}
3345+
3346+/*
3347+ * Determine if the lower inode objects have changed from below the unionfs
3348+ * inode. Return true if changed, false otherwise.
3349+ *
3350+ * We check if the mtime or ctime have changed. However, the inode times
3351+ * can be changed by anyone without much protection, including
3352+ * asynchronously. This can sometimes cause unionfs to find that the lower
3353+ * file system doesn't change its inode times quick enough, resulting in a
3354+ * false positive indication (which is harmless, it just makes unionfs do
3355+ * extra work in re-validating the objects). To minimize the chances of
3356+ * these situations, we still consider such small time changes valid, but we
3357+ * don't print debugging messages unless the time changes are greater than
3358+ * UNIONFS_MIN_CC_TIME (which defaults to 3 seconds, as with NFS's acregmin)
3359+ * because significant changes are more likely due to users manually
3360+ * touching lower files.
3361+ */
3362+bool is_newer_lower(const struct dentry *dentry)
3363+{
3364+ int bindex;
3365+ struct inode *inode;
3366+ struct inode *lower_inode;
3367+
3368+ /* ignore if we're called on semi-initialized dentries/inodes */
3369+ if (!dentry || !UNIONFS_D(dentry))
3370+ return false;
3371+ inode = dentry->d_inode;
3372+ if (!inode || !UNIONFS_I(inode)->lower_inodes ||
3373+ ibstart(inode) < 0 || ibend(inode) < 0)
3374+ return false;
3375+
3376+ for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
3377+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
3378+ if (!lower_inode)
3379+ continue;
3380+
3381+ /* check if mtime/ctime have changed */
3382+ if (unlikely(timespec_compare(&inode->i_mtime,
3383+ &lower_inode->i_mtime) < 0)) {
3384+ if ((lower_inode->i_mtime.tv_sec -
3385+ inode->i_mtime.tv_sec) > UNIONFS_MIN_CC_TIME) {
3386+ pr_info("unionfs: new lower inode mtime "
3387+ "(bindex=%d, name=%s)\n", bindex,
3388+ dentry->d_name.name);
3389+ show_dinode_times(dentry);
3390+ }
3391+ return true;
3392+ }
3393+ if (unlikely(timespec_compare(&inode->i_ctime,
3394+ &lower_inode->i_ctime) < 0)) {
3395+ if ((lower_inode->i_ctime.tv_sec -
3396+ inode->i_ctime.tv_sec) > UNIONFS_MIN_CC_TIME) {
3397+ pr_info("unionfs: new lower inode ctime "
3398+ "(bindex=%d, name=%s)\n", bindex,
3399+ dentry->d_name.name);
3400+ show_dinode_times(dentry);
3401+ }
3402+ return true;
3403+ }
3404+ }
3405+
3406+ /*
3407+ * Last check: if this is a positive dentry, but somehow all lower
3408+ * dentries are negative or unhashed, then this dentry needs to be
3409+ * revalidated, because someone probably deleted the objects from
3410+ * the lower branches directly.
3411+ */
3412+ if (is_negative_lower(dentry))
3413+ return true;
3414+
3415+ return false; /* default: lower is not newer */
3416+}
3417+
3418+static int unionfs_d_revalidate(struct dentry *dentry,
3419+ struct nameidata *nd_unused)
3420+{
3421+ bool valid = true;
3422+ int err = 1; /* 1 means valid for the VFS */
3423+ struct dentry *parent;
3424+
3425+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
3426+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
3427+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3428+
3429+ valid = __unionfs_d_revalidate(dentry, parent, false);
3430+ if (valid) {
3431+ unionfs_postcopyup_setmnt(dentry);
3432+ unionfs_check_dentry(dentry);
3433+ } else {
3434+ d_drop(dentry);
3435+ err = valid;
3436+ }
3437+ unionfs_unlock_dentry(dentry);
3438+ unionfs_unlock_parent(dentry, parent);
3439+ unionfs_read_unlock(dentry->d_sb);
3440+
3441+ return err;
3442+}
3443+
3444+static void unionfs_d_release(struct dentry *dentry)
3445+{
3446+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
3447+ if (unlikely(!UNIONFS_D(dentry)))
3448+ goto out; /* skip if no lower branches */
3449+ /* must lock our branch configuration here */
3450+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3451+
3452+ unionfs_check_dentry(dentry);
3453+ /* this could be a negative dentry, so check first */
3454+ if (dbstart(dentry) < 0) {
3455+ unionfs_unlock_dentry(dentry);
3456+ goto out; /* due to a (normal) failed lookup */
3457+ }
3458+
3459+ /* Release all the lower dentries */
3460+ path_put_lowers_all(dentry, true);
3461+
3462+ unionfs_unlock_dentry(dentry);
3463+
3464+out:
3465+ free_dentry_private_data(dentry);
3466+ unionfs_read_unlock(dentry->d_sb);
3467+ return;
3468+}
3469+
3470+/*
3471+ * Called when we're removing the last reference to our dentry. So we
3472+ * should drop all lower references too.
3473+ */
3474+static void unionfs_d_iput(struct dentry *dentry, struct inode *inode)
3475+{
3476+ int rc;
3477+
3478+ BUG_ON(!dentry);
3479+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
3480+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3481+
3482+ if (!UNIONFS_D(dentry) || dbstart(dentry) < 0)
3483+ goto drop_lower_inodes;
3484+ path_put_lowers_all(dentry, false);
3485+
3486+drop_lower_inodes:
3487+ rc = atomic_read(&inode->i_count);
3488+ if (rc == 1 && inode->i_nlink == 1 && ibstart(inode) >= 0) {
3489+ /* see Documentation/filesystems/unionfs/issues.txt */
3490+ lockdep_off();
3491+ iput(unionfs_lower_inode(inode));
3492+ lockdep_on();
3493+ unionfs_set_lower_inode(inode, NULL);
3494+ /* XXX: may need to set start/end to -1? */
3495+ }
3496+
3497+ iput(inode);
3498+
3499+ unionfs_unlock_dentry(dentry);
3500+ unionfs_read_unlock(dentry->d_sb);
3501+}
3502+
3503+struct dentry_operations unionfs_dops = {
3504+ .d_revalidate = unionfs_d_revalidate,
3505+ .d_release = unionfs_d_release,
3506+ .d_iput = unionfs_d_iput,
3507+};
0c5527e5
AM
3508diff --git a/fs/unionfs/dirfops.c b/fs/unionfs/dirfops.c
3509new file mode 100644
3510index 0000000..7da0ff0
3511--- /dev/null
3512+++ b/fs/unionfs/dirfops.c
2380c486
JR
3513@@ -0,0 +1,302 @@
3514+/*
7670a7fc 3515+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
3516+ * Copyright (c) 2003-2006 Charles P. Wright
3517+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3518+ * Copyright (c) 2005-2006 Junjiro Okajima
3519+ * Copyright (c) 2005 Arun M. Krishnakumar
3520+ * Copyright (c) 2004-2006 David P. Quigley
3521+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3522+ * Copyright (c) 2003 Puja Gupta
3523+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
3524+ * Copyright (c) 2003-2010 Stony Brook University
3525+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
3526+ *
3527+ * This program is free software; you can redistribute it and/or modify
3528+ * it under the terms of the GNU General Public License version 2 as
3529+ * published by the Free Software Foundation.
3530+ */
3531+
3532+#include "union.h"
3533+
3534+/* Make sure our rdstate is playing by the rules. */
3535+static void verify_rdstate_offset(struct unionfs_dir_state *rdstate)
3536+{
3537+ BUG_ON(rdstate->offset >= DIREOF);
3538+ BUG_ON(rdstate->cookie >= MAXRDCOOKIE);
3539+}
3540+
3541+struct unionfs_getdents_callback {
3542+ struct unionfs_dir_state *rdstate;
3543+ void *dirent;
3544+ int entries_written;
3545+ int filldir_called;
3546+ int filldir_error;
3547+ filldir_t filldir;
3548+ struct super_block *sb;
3549+};
3550+
3551+/* based on generic filldir in fs/readir.c */
3552+static int unionfs_filldir(void *dirent, const char *oname, int namelen,
3553+ loff_t offset, u64 ino, unsigned int d_type)
3554+{
3555+ struct unionfs_getdents_callback *buf = dirent;
3556+ struct filldir_node *found = NULL;
3557+ int err = 0;
3558+ int is_whiteout;
3559+ char *name = (char *) oname;
3560+
3561+ buf->filldir_called++;
3562+
3563+ is_whiteout = is_whiteout_name(&name, &namelen);
3564+
3565+ found = find_filldir_node(buf->rdstate, name, namelen, is_whiteout);
3566+
3567+ if (found) {
3568+ /*
3569+ * If we had non-whiteout entry in dir cache, then mark it
3570+ * as a whiteout and but leave it in the dir cache.
3571+ */
3572+ if (is_whiteout && !found->whiteout)
3573+ found->whiteout = is_whiteout;
3574+ goto out;
3575+ }
3576+
3577+ /* if 'name' isn't a whiteout, filldir it. */
3578+ if (!is_whiteout) {
3579+ off_t pos = rdstate2offset(buf->rdstate);
3580+ u64 unionfs_ino = ino;
3581+
3582+ err = buf->filldir(buf->dirent, name, namelen, pos,
3583+ unionfs_ino, d_type);
3584+ buf->rdstate->offset++;
3585+ verify_rdstate_offset(buf->rdstate);
3586+ }
3587+ /*
3588+ * If we did fill it, stuff it in our hash, otherwise return an
3589+ * error.
3590+ */
3591+ if (err) {
3592+ buf->filldir_error = err;
3593+ goto out;
3594+ }
3595+ buf->entries_written++;
3596+ err = add_filldir_node(buf->rdstate, name, namelen,
3597+ buf->rdstate->bindex, is_whiteout);
3598+ if (err)
3599+ buf->filldir_error = err;
3600+
3601+out:
3602+ return err;
3603+}
3604+
3605+static int unionfs_readdir(struct file *file, void *dirent, filldir_t filldir)
3606+{
3607+ int err = 0;
3608+ struct file *lower_file = NULL;
3609+ struct dentry *dentry = file->f_path.dentry;
3610+ struct dentry *parent;
3611+ struct inode *inode = NULL;
3612+ struct unionfs_getdents_callback buf;
3613+ struct unionfs_dir_state *uds;
3614+ int bend;
3615+ loff_t offset;
3616+
3617+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
3618+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
3619+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3620+
3621+ err = unionfs_file_revalidate(file, parent, false);
3622+ if (unlikely(err))
3623+ goto out;
3624+
3625+ inode = dentry->d_inode;
3626+
3627+ uds = UNIONFS_F(file)->rdstate;
3628+ if (!uds) {
3629+ if (file->f_pos == DIREOF) {
3630+ goto out;
3631+ } else if (file->f_pos > 0) {
3632+ uds = find_rdstate(inode, file->f_pos);
3633+ if (unlikely(!uds)) {
3634+ err = -ESTALE;
3635+ goto out;
3636+ }
3637+ UNIONFS_F(file)->rdstate = uds;
3638+ } else {
3639+ init_rdstate(file);
3640+ uds = UNIONFS_F(file)->rdstate;
3641+ }
3642+ }
3643+ bend = fbend(file);
3644+
3645+ while (uds->bindex <= bend) {
3646+ lower_file = unionfs_lower_file_idx(file, uds->bindex);
3647+ if (!lower_file) {
3648+ uds->bindex++;
3649+ uds->dirpos = 0;
3650+ continue;
3651+ }
3652+
3653+ /* prepare callback buffer */
3654+ buf.filldir_called = 0;
3655+ buf.filldir_error = 0;
3656+ buf.entries_written = 0;
3657+ buf.dirent = dirent;
3658+ buf.filldir = filldir;
3659+ buf.rdstate = uds;
3660+ buf.sb = inode->i_sb;
3661+
3662+ /* Read starting from where we last left off. */
3663+ offset = vfs_llseek(lower_file, uds->dirpos, SEEK_SET);
3664+ if (offset < 0) {
3665+ err = offset;
3666+ goto out;
3667+ }
3668+ err = vfs_readdir(lower_file, unionfs_filldir, &buf);
3669+
3670+ /* Save the position for when we continue. */
3671+ offset = vfs_llseek(lower_file, 0, SEEK_CUR);
3672+ if (offset < 0) {
3673+ err = offset;
3674+ goto out;
3675+ }
3676+ uds->dirpos = offset;
3677+
3678+ /* Copy the atime. */
3679+ fsstack_copy_attr_atime(inode,
3680+ lower_file->f_path.dentry->d_inode);
3681+
3682+ if (err < 0)
3683+ goto out;
3684+
3685+ if (buf.filldir_error)
3686+ break;
3687+
3688+ if (!buf.entries_written) {
3689+ uds->bindex++;
3690+ uds->dirpos = 0;
3691+ }
3692+ }
3693+
3694+ if (!buf.filldir_error && uds->bindex >= bend) {
3695+ /* Save the number of hash entries for next time. */
3696+ UNIONFS_I(inode)->hashsize = uds->hashentries;
3697+ free_rdstate(uds);
3698+ UNIONFS_F(file)->rdstate = NULL;
3699+ file->f_pos = DIREOF;
3700+ } else {
3701+ file->f_pos = rdstate2offset(uds);
3702+ }
3703+
3704+out:
3705+ if (!err)
3706+ unionfs_check_file(file);
3707+ unionfs_unlock_dentry(dentry);
3708+ unionfs_unlock_parent(dentry, parent);
3709+ unionfs_read_unlock(dentry->d_sb);
3710+ return err;
3711+}
3712+
3713+/*
3714+ * This is not meant to be a generic repositioning function. If you do
3715+ * things that aren't supported, then we return EINVAL.
3716+ *
3717+ * What is allowed:
3718+ * (1) seeking to the same position that you are currently at
3719+ * This really has no effect, but returns where you are.
3720+ * (2) seeking to the beginning of the file
3721+ * This throws out all state, and lets you begin again.
3722+ */
3723+static loff_t unionfs_dir_llseek(struct file *file, loff_t offset, int origin)
3724+{
3725+ struct unionfs_dir_state *rdstate;
3726+ struct dentry *dentry = file->f_path.dentry;
3727+ struct dentry *parent;
3728+ loff_t err;
3729+
3730+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
3731+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
3732+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3733+
3734+ err = unionfs_file_revalidate(file, parent, false);
3735+ if (unlikely(err))
3736+ goto out;
3737+
3738+ rdstate = UNIONFS_F(file)->rdstate;
3739+
3740+ /*
3741+ * we let users seek to their current position, but not anywhere
3742+ * else.
3743+ */
3744+ if (!offset) {
3745+ switch (origin) {
3746+ case SEEK_SET:
3747+ if (rdstate) {
3748+ free_rdstate(rdstate);
3749+ UNIONFS_F(file)->rdstate = NULL;
3750+ }
3751+ init_rdstate(file);
3752+ err = 0;
3753+ break;
3754+ case SEEK_CUR:
3755+ err = file->f_pos;
3756+ break;
3757+ case SEEK_END:
3758+ /* Unsupported, because we would break everything. */
3759+ err = -EINVAL;
3760+ break;
3761+ }
3762+ } else {
3763+ switch (origin) {
3764+ case SEEK_SET:
3765+ if (rdstate) {
3766+ if (offset == rdstate2offset(rdstate))
3767+ err = offset;
3768+ else if (file->f_pos == DIREOF)
3769+ err = DIREOF;
3770+ else
3771+ err = -EINVAL;
3772+ } else {
3773+ struct inode *inode;
3774+ inode = dentry->d_inode;
3775+ rdstate = find_rdstate(inode, offset);
3776+ if (rdstate) {
3777+ UNIONFS_F(file)->rdstate = rdstate;
3778+ err = rdstate->offset;
3779+ } else {
3780+ err = -EINVAL;
3781+ }
3782+ }
3783+ break;
3784+ case SEEK_CUR:
3785+ case SEEK_END:
3786+ /* Unsupported, because we would break everything. */
3787+ err = -EINVAL;
3788+ break;
3789+ }
3790+ }
3791+
3792+out:
3793+ if (!err)
3794+ unionfs_check_file(file);
3795+ unionfs_unlock_dentry(dentry);
3796+ unionfs_unlock_parent(dentry, parent);
3797+ unionfs_read_unlock(dentry->d_sb);
3798+ return err;
3799+}
3800+
3801+/*
3802+ * Trimmed directory options, we shouldn't pass everything down since
3803+ * we don't want to operate on partial directories.
3804+ */
3805+struct file_operations unionfs_dir_fops = {
3806+ .llseek = unionfs_dir_llseek,
3807+ .read = generic_read_dir,
3808+ .readdir = unionfs_readdir,
3809+ .unlocked_ioctl = unionfs_ioctl,
3810+ .open = unionfs_open,
3811+ .release = unionfs_file_release,
3812+ .flush = unionfs_flush,
3813+ .fsync = unionfs_fsync,
3814+ .fasync = unionfs_fasync,
3815+};
0c5527e5
AM
3816diff --git a/fs/unionfs/dirhelper.c b/fs/unionfs/dirhelper.c
3817new file mode 100644
3818index 0000000..033343b
3819--- /dev/null
3820+++ b/fs/unionfs/dirhelper.c
2380c486
JR
3821@@ -0,0 +1,158 @@
3822+/*
7670a7fc 3823+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
3824+ * Copyright (c) 2003-2006 Charles P. Wright
3825+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3826+ * Copyright (c) 2005-2006 Junjiro Okajima
3827+ * Copyright (c) 2005 Arun M. Krishnakumar
3828+ * Copyright (c) 2004-2006 David P. Quigley
3829+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3830+ * Copyright (c) 2003 Puja Gupta
3831+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
3832+ * Copyright (c) 2003-2010 Stony Brook University
3833+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
3834+ *
3835+ * This program is free software; you can redistribute it and/or modify
3836+ * it under the terms of the GNU General Public License version 2 as
3837+ * published by the Free Software Foundation.
3838+ */
3839+
3840+#include "union.h"
3841+
3842+#define RD_NONE 0
3843+#define RD_CHECK_EMPTY 1
3844+/* The callback structure for check_empty. */
3845+struct unionfs_rdutil_callback {
3846+ int err;
3847+ int filldir_called;
3848+ struct unionfs_dir_state *rdstate;
3849+ int mode;
3850+};
3851+
3852+/* This filldir function makes sure only whiteouts exist within a directory. */
3853+static int readdir_util_callback(void *dirent, const char *oname, int namelen,
3854+ loff_t offset, u64 ino, unsigned int d_type)
3855+{
3856+ int err = 0;
3857+ struct unionfs_rdutil_callback *buf = dirent;
3858+ int is_whiteout;
3859+ struct filldir_node *found;
3860+ char *name = (char *) oname;
3861+
3862+ buf->filldir_called = 1;
3863+
3864+ if (name[0] == '.' && (namelen == 1 ||
3865+ (name[1] == '.' && namelen == 2)))
3866+ goto out;
3867+
3868+ is_whiteout = is_whiteout_name(&name, &namelen);
3869+
3870+ found = find_filldir_node(buf->rdstate, name, namelen, is_whiteout);
3871+ /* If it was found in the table there was a previous whiteout. */
3872+ if (found)
3873+ goto out;
3874+
3875+ /*
3876+ * if it wasn't found and isn't a whiteout, the directory isn't
3877+ * empty.
3878+ */
3879+ err = -ENOTEMPTY;
3880+ if ((buf->mode == RD_CHECK_EMPTY) && !is_whiteout)
3881+ goto out;
3882+
3883+ err = add_filldir_node(buf->rdstate, name, namelen,
3884+ buf->rdstate->bindex, is_whiteout);
3885+
3886+out:
3887+ buf->err = err;
3888+ return err;
3889+}
3890+
3891+/* Is a directory logically empty? */
3892+int check_empty(struct dentry *dentry, struct dentry *parent,
3893+ struct unionfs_dir_state **namelist)
3894+{
3895+ int err = 0;
3896+ struct dentry *lower_dentry = NULL;
3897+ struct vfsmount *mnt;
3898+ struct super_block *sb;
3899+ struct file *lower_file;
3900+ struct unionfs_rdutil_callback *buf = NULL;
3901+ int bindex, bstart, bend, bopaque;
3902+
3903+ sb = dentry->d_sb;
3904+
3905+
3906+ BUG_ON(!S_ISDIR(dentry->d_inode->i_mode));
3907+
3908+ err = unionfs_partial_lookup(dentry, parent);
3909+ if (err)
3910+ goto out;
3911+
3912+ bstart = dbstart(dentry);
3913+ bend = dbend(dentry);
3914+ bopaque = dbopaque(dentry);
3915+ if (0 <= bopaque && bopaque < bend)
3916+ bend = bopaque;
3917+
3918+ buf = kmalloc(sizeof(struct unionfs_rdutil_callback), GFP_KERNEL);
3919+ if (unlikely(!buf)) {
3920+ err = -ENOMEM;
3921+ goto out;
3922+ }
3923+ buf->err = 0;
3924+ buf->mode = RD_CHECK_EMPTY;
3925+ buf->rdstate = alloc_rdstate(dentry->d_inode, bstart);
3926+ if (unlikely(!buf->rdstate)) {
3927+ err = -ENOMEM;
3928+ goto out;
3929+ }
3930+
3931+ /* Process the lower directories with rdutil_callback as a filldir. */
3932+ for (bindex = bstart; bindex <= bend; bindex++) {
3933+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3934+ if (!lower_dentry)
3935+ continue;
3936+ if (!lower_dentry->d_inode)
3937+ continue;
3938+ if (!S_ISDIR(lower_dentry->d_inode->i_mode))
3939+ continue;
3940+
3941+ dget(lower_dentry);
3942+ mnt = unionfs_mntget(dentry, bindex);
3943+ branchget(sb, bindex);
3944+ lower_file = dentry_open(lower_dentry, mnt, O_RDONLY, current_cred());
3945+ if (IS_ERR(lower_file)) {
3946+ err = PTR_ERR(lower_file);
3947+ branchput(sb, bindex);
3948+ goto out;
3949+ }
3950+
3951+ do {
3952+ buf->filldir_called = 0;
3953+ buf->rdstate->bindex = bindex;
3954+ err = vfs_readdir(lower_file,
3955+ readdir_util_callback, buf);
3956+ if (buf->err)
3957+ err = buf->err;
3958+ } while ((err >= 0) && buf->filldir_called);
3959+
3960+ /* fput calls dput for lower_dentry */
3961+ fput(lower_file);
3962+ branchput(sb, bindex);
3963+
3964+ if (err < 0)
3965+ goto out;
3966+ }
3967+
3968+out:
3969+ if (buf) {
3970+ if (namelist && !err)
3971+ *namelist = buf->rdstate;
3972+ else if (buf->rdstate)
3973+ free_rdstate(buf->rdstate);
3974+ kfree(buf);
3975+ }
3976+
3977+
3978+ return err;
3979+}
0c5527e5
AM
3980diff --git a/fs/unionfs/fanout.h b/fs/unionfs/fanout.h
3981new file mode 100644
3982index 0000000..5b77eac
3983--- /dev/null
3984+++ b/fs/unionfs/fanout.h
2380c486
JR
3985@@ -0,0 +1,407 @@
3986+/*
7670a7fc 3987+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
3988+ * Copyright (c) 2003-2006 Charles P. Wright
3989+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3990+ * Copyright (c) 2005 Arun M. Krishnakumar
3991+ * Copyright (c) 2004-2006 David P. Quigley
3992+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3993+ * Copyright (c) 2003 Puja Gupta
3994+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
3995+ * Copyright (c) 2003-2010 Stony Brook University
3996+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
3997+ *
3998+ * This program is free software; you can redistribute it and/or modify
3999+ * it under the terms of the GNU General Public License version 2 as
4000+ * published by the Free Software Foundation.
4001+ */
4002+
4003+#ifndef _FANOUT_H_
4004+#define _FANOUT_H_
4005+
4006+/*
4007+ * Inode to private data
4008+ *
4009+ * Since we use containers and the struct inode is _inside_ the
4010+ * unionfs_inode_info structure, UNIONFS_I will always (given a non-NULL
4011+ * inode pointer), return a valid non-NULL pointer.
4012+ */
4013+static inline struct unionfs_inode_info *UNIONFS_I(const struct inode *inode)
4014+{
4015+ return container_of(inode, struct unionfs_inode_info, vfs_inode);
4016+}
4017+
4018+#define ibstart(ino) (UNIONFS_I(ino)->bstart)
4019+#define ibend(ino) (UNIONFS_I(ino)->bend)
4020+
4021+/* Dentry to private data */
4022+#define UNIONFS_D(dent) ((struct unionfs_dentry_info *)(dent)->d_fsdata)
4023+#define dbstart(dent) (UNIONFS_D(dent)->bstart)
4024+#define dbend(dent) (UNIONFS_D(dent)->bend)
4025+#define dbopaque(dent) (UNIONFS_D(dent)->bopaque)
4026+
4027+/* Superblock to private data */
4028+#define UNIONFS_SB(super) ((struct unionfs_sb_info *)(super)->s_fs_info)
4029+#define sbstart(sb) 0
4030+#define sbend(sb) (UNIONFS_SB(sb)->bend)
4031+#define sbmax(sb) (UNIONFS_SB(sb)->bend + 1)
4032+#define sbhbid(sb) (UNIONFS_SB(sb)->high_branch_id)
4033+
4034+/* File to private Data */
4035+#define UNIONFS_F(file) ((struct unionfs_file_info *)((file)->private_data))
4036+#define fbstart(file) (UNIONFS_F(file)->bstart)
4037+#define fbend(file) (UNIONFS_F(file)->bend)
4038+
4039+/* macros to manipulate branch IDs in stored in our superblock */
4040+static inline int branch_id(struct super_block *sb, int index)
4041+{
4042+ BUG_ON(!sb || index < 0);
4043+ return UNIONFS_SB(sb)->data[index].branch_id;
4044+}
4045+
4046+static inline void set_branch_id(struct super_block *sb, int index, int val)
4047+{
4048+ BUG_ON(!sb || index < 0);
4049+ UNIONFS_SB(sb)->data[index].branch_id = val;
4050+}
4051+
4052+static inline void new_branch_id(struct super_block *sb, int index)
4053+{
4054+ BUG_ON(!sb || index < 0);
4055+ set_branch_id(sb, index, ++UNIONFS_SB(sb)->high_branch_id);
4056+}
4057+
4058+/*
4059+ * Find new index of matching branch with an existing superblock of a known
4060+ * (possibly old) id. This is needed because branches could have been
4061+ * added/deleted causing the branches of any open files to shift.
4062+ *
4063+ * @sb: the new superblock which may have new/different branch IDs
4064+ * @id: the old/existing id we're looking for
4065+ * Returns index of newly found branch (0 or greater), -1 otherwise.
4066+ */
4067+static inline int branch_id_to_idx(struct super_block *sb, int id)
4068+{
4069+ int i;
4070+ for (i = 0; i < sbmax(sb); i++) {
4071+ if (branch_id(sb, i) == id)
4072+ return i;
4073+ }
4074+ /* in the non-ODF code, this should really never happen */
4075+ printk(KERN_WARNING "unionfs: cannot find branch with id %d\n", id);
4076+ return -1;
4077+}
4078+
4079+/* File to lower file. */
4080+static inline struct file *unionfs_lower_file(const struct file *f)
4081+{
4082+ BUG_ON(!f);
4083+ return UNIONFS_F(f)->lower_files[fbstart(f)];
4084+}
4085+
4086+static inline struct file *unionfs_lower_file_idx(const struct file *f,
4087+ int index)
4088+{
4089+ BUG_ON(!f || index < 0);
4090+ return UNIONFS_F(f)->lower_files[index];
4091+}
4092+
4093+static inline void unionfs_set_lower_file_idx(struct file *f, int index,
4094+ struct file *val)
4095+{
4096+ BUG_ON(!f || index < 0);
4097+ UNIONFS_F(f)->lower_files[index] = val;
4098+ /* save branch ID (may be redundant?) */
4099+ UNIONFS_F(f)->saved_branch_ids[index] =
4100+ branch_id((f)->f_path.dentry->d_sb, index);
4101+}
4102+
4103+static inline void unionfs_set_lower_file(struct file *f, struct file *val)
4104+{
4105+ BUG_ON(!f);
4106+ unionfs_set_lower_file_idx((f), fbstart(f), (val));
4107+}
4108+
4109+/* Inode to lower inode. */
4110+static inline struct inode *unionfs_lower_inode(const struct inode *i)
4111+{
4112+ BUG_ON(!i);
4113+ return UNIONFS_I(i)->lower_inodes[ibstart(i)];
4114+}
4115+
4116+static inline struct inode *unionfs_lower_inode_idx(const struct inode *i,
4117+ int index)
4118+{
4119+ BUG_ON(!i || index < 0);
4120+ return UNIONFS_I(i)->lower_inodes[index];
4121+}
4122+
4123+static inline void unionfs_set_lower_inode_idx(struct inode *i, int index,
4124+ struct inode *val)
4125+{
4126+ BUG_ON(!i || index < 0);
4127+ UNIONFS_I(i)->lower_inodes[index] = val;
4128+}
4129+
4130+static inline void unionfs_set_lower_inode(struct inode *i, struct inode *val)
4131+{
4132+ BUG_ON(!i);
4133+ UNIONFS_I(i)->lower_inodes[ibstart(i)] = val;
4134+}
4135+
4136+/* Superblock to lower superblock. */
4137+static inline struct super_block *unionfs_lower_super(
4138+ const struct super_block *sb)
4139+{
4140+ BUG_ON(!sb);
4141+ return UNIONFS_SB(sb)->data[sbstart(sb)].sb;
4142+}
4143+
4144+static inline struct super_block *unionfs_lower_super_idx(
4145+ const struct super_block *sb,
4146+ int index)
4147+{
4148+ BUG_ON(!sb || index < 0);
4149+ return UNIONFS_SB(sb)->data[index].sb;
4150+}
4151+
4152+static inline void unionfs_set_lower_super_idx(struct super_block *sb,
4153+ int index,
4154+ struct super_block *val)
4155+{
4156+ BUG_ON(!sb || index < 0);
4157+ UNIONFS_SB(sb)->data[index].sb = val;
4158+}
4159+
4160+static inline void unionfs_set_lower_super(struct super_block *sb,
4161+ struct super_block *val)
4162+{
4163+ BUG_ON(!sb);
4164+ UNIONFS_SB(sb)->data[sbstart(sb)].sb = val;
4165+}
4166+
4167+/* Branch count macros. */
4168+static inline int branch_count(const struct super_block *sb, int index)
4169+{
4170+ BUG_ON(!sb || index < 0);
4171+ return atomic_read(&UNIONFS_SB(sb)->data[index].open_files);
4172+}
4173+
4174+static inline void set_branch_count(struct super_block *sb, int index, int val)
4175+{
4176+ BUG_ON(!sb || index < 0);
4177+ atomic_set(&UNIONFS_SB(sb)->data[index].open_files, val);
4178+}
4179+
4180+static inline void branchget(struct super_block *sb, int index)
4181+{
4182+ BUG_ON(!sb || index < 0);
4183+ atomic_inc(&UNIONFS_SB(sb)->data[index].open_files);
4184+}
4185+
4186+static inline void branchput(struct super_block *sb, int index)
4187+{
4188+ BUG_ON(!sb || index < 0);
4189+ atomic_dec(&UNIONFS_SB(sb)->data[index].open_files);
4190+}
4191+
4192+/* Dentry macros */
4193+static inline void unionfs_set_lower_dentry_idx(struct dentry *dent, int index,
4194+ struct dentry *val)
4195+{
4196+ BUG_ON(!dent || index < 0);
4197+ UNIONFS_D(dent)->lower_paths[index].dentry = val;
4198+}
4199+
4200+static inline struct dentry *unionfs_lower_dentry_idx(
4201+ const struct dentry *dent,
4202+ int index)
4203+{
4204+ BUG_ON(!dent || index < 0);
4205+ return UNIONFS_D(dent)->lower_paths[index].dentry;
4206+}
4207+
4208+static inline struct dentry *unionfs_lower_dentry(const struct dentry *dent)
4209+{
4210+ BUG_ON(!dent);
4211+ return unionfs_lower_dentry_idx(dent, dbstart(dent));
4212+}
4213+
4214+static inline void unionfs_set_lower_mnt_idx(struct dentry *dent, int index,
4215+ struct vfsmount *mnt)
4216+{
4217+ BUG_ON(!dent || index < 0);
4218+ UNIONFS_D(dent)->lower_paths[index].mnt = mnt;
4219+}
4220+
4221+static inline struct vfsmount *unionfs_lower_mnt_idx(
4222+ const struct dentry *dent,
4223+ int index)
4224+{
4225+ BUG_ON(!dent || index < 0);
4226+ return UNIONFS_D(dent)->lower_paths[index].mnt;
4227+}
4228+
4229+static inline struct vfsmount *unionfs_lower_mnt(const struct dentry *dent)
4230+{
4231+ BUG_ON(!dent);
4232+ return unionfs_lower_mnt_idx(dent, dbstart(dent));
4233+}
4234+
4235+/* Macros for locking a dentry. */
4236+enum unionfs_dentry_lock_class {
4237+ UNIONFS_DMUTEX_NORMAL,
4238+ UNIONFS_DMUTEX_ROOT,
4239+ UNIONFS_DMUTEX_PARENT,
4240+ UNIONFS_DMUTEX_CHILD,
4241+ UNIONFS_DMUTEX_WHITEOUT,
4242+ UNIONFS_DMUTEX_REVAL_PARENT, /* for file/dentry revalidate */
4243+ UNIONFS_DMUTEX_REVAL_CHILD, /* for file/dentry revalidate */
4244+};
4245+
4246+static inline void unionfs_lock_dentry(struct dentry *d,
4247+ unsigned int subclass)
4248+{
4249+ BUG_ON(!d);
4250+ mutex_lock_nested(&UNIONFS_D(d)->lock, subclass);
4251+}
4252+
4253+static inline void unionfs_unlock_dentry(struct dentry *d)
4254+{
4255+ BUG_ON(!d);
4256+ mutex_unlock(&UNIONFS_D(d)->lock);
4257+}
4258+
4259+static inline struct dentry *unionfs_lock_parent(struct dentry *d,
4260+ unsigned int subclass)
4261+{
4262+ struct dentry *p;
4263+
4264+ BUG_ON(!d);
4265+ p = dget_parent(d);
4266+ if (p != d)
4267+ mutex_lock_nested(&UNIONFS_D(p)->lock, subclass);
4268+ return p;
4269+}
4270+
4271+static inline void unionfs_unlock_parent(struct dentry *d, struct dentry *p)
4272+{
4273+ BUG_ON(!d);
4274+ BUG_ON(!p);
4275+ if (p != d) {
4276+ BUG_ON(!mutex_is_locked(&UNIONFS_D(p)->lock));
4277+ mutex_unlock(&UNIONFS_D(p)->lock);
4278+ }
4279+ dput(p);
4280+}
4281+
4282+static inline void verify_locked(struct dentry *d)
4283+{
4284+ BUG_ON(!d);
4285+ BUG_ON(!mutex_is_locked(&UNIONFS_D(d)->lock));
4286+}
4287+
4288+/* macros to put lower objects */
4289+
4290+/*
4291+ * iput lower inodes of an unionfs dentry, from bstart to bend. If
4292+ * @free_lower is true, then also kfree the memory used to hold the lower
4293+ * object pointers.
4294+ */
4295+static inline void iput_lowers(struct inode *inode,
4296+ int bstart, int bend, bool free_lower)
4297+{
4298+ struct inode *lower_inode;
4299+ int bindex;
4300+
4301+ BUG_ON(!inode);
4302+ BUG_ON(!UNIONFS_I(inode));
4303+ BUG_ON(bstart < 0);
4304+
4305+ for (bindex = bstart; bindex <= bend; bindex++) {
4306+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
4307+ if (lower_inode) {
4308+ unionfs_set_lower_inode_idx(inode, bindex, NULL);
4309+ /* see Documentation/filesystems/unionfs/issues.txt */
4310+ lockdep_off();
4311+ iput(lower_inode);
4312+ lockdep_on();
4313+ }
4314+ }
4315+
4316+ if (free_lower) {
4317+ kfree(UNIONFS_I(inode)->lower_inodes);
4318+ UNIONFS_I(inode)->lower_inodes = NULL;
4319+ }
4320+}
4321+
4322+/* iput all lower inodes, and reset start/end branch indices to -1 */
4323+static inline void iput_lowers_all(struct inode *inode, bool free_lower)
4324+{
4325+ int bstart, bend;
4326+
4327+ BUG_ON(!inode);
4328+ BUG_ON(!UNIONFS_I(inode));
4329+ bstart = ibstart(inode);
4330+ bend = ibend(inode);
4331+ BUG_ON(bstart < 0);
4332+
4333+ iput_lowers(inode, bstart, bend, free_lower);
4334+ ibstart(inode) = ibend(inode) = -1;
4335+}
4336+
4337+/*
4338+ * dput/mntput all lower dentries and vfsmounts of an unionfs dentry, from
4339+ * bstart to bend. If @free_lower is true, then also kfree the memory used
4340+ * to hold the lower object pointers.
4341+ *
4342+ * XXX: implement using path_put VFS macros
4343+ */
4344+static inline void path_put_lowers(struct dentry *dentry,
4345+ int bstart, int bend, bool free_lower)
4346+{
4347+ struct dentry *lower_dentry;
4348+ struct vfsmount *lower_mnt;
4349+ int bindex;
4350+
4351+ BUG_ON(!dentry);
4352+ BUG_ON(!UNIONFS_D(dentry));
4353+ BUG_ON(bstart < 0);
4354+
4355+ for (bindex = bstart; bindex <= bend; bindex++) {
4356+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
4357+ if (lower_dentry) {
4358+ unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
4359+ dput(lower_dentry);
4360+ }
4361+ lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
4362+ if (lower_mnt) {
4363+ unionfs_set_lower_mnt_idx(dentry, bindex, NULL);
4364+ mntput(lower_mnt);
4365+ }
4366+ }
4367+
4368+ if (free_lower) {
4369+ kfree(UNIONFS_D(dentry)->lower_paths);
4370+ UNIONFS_D(dentry)->lower_paths = NULL;
4371+ }
4372+}
4373+
4374+/*
4375+ * dput/mntput all lower dentries and vfsmounts, and reset start/end branch
4376+ * indices to -1.
4377+ */
4378+static inline void path_put_lowers_all(struct dentry *dentry, bool free_lower)
4379+{
4380+ int bstart, bend;
4381+
4382+ BUG_ON(!dentry);
4383+ BUG_ON(!UNIONFS_D(dentry));
4384+ bstart = dbstart(dentry);
4385+ bend = dbend(dentry);
4386+ BUG_ON(bstart < 0);
4387+
4388+ path_put_lowers(dentry, bstart, bend, free_lower);
4389+ dbstart(dentry) = dbend(dentry) = -1;
4390+}
4391+
4392+#endif /* not _FANOUT_H */
0c5527e5
AM
4393diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
4394new file mode 100644
4395index 0000000..1c694c3
4396--- /dev/null
4397+++ b/fs/unionfs/file.c
4398@@ -0,0 +1,382 @@
2380c486 4399+/*
7670a7fc 4400+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
4401+ * Copyright (c) 2003-2006 Charles P. Wright
4402+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
4403+ * Copyright (c) 2005-2006 Junjiro Okajima
4404+ * Copyright (c) 2005 Arun M. Krishnakumar
4405+ * Copyright (c) 2004-2006 David P. Quigley
4406+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
4407+ * Copyright (c) 2003 Puja Gupta
4408+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
4409+ * Copyright (c) 2003-2010 Stony Brook University
4410+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
4411+ *
4412+ * This program is free software; you can redistribute it and/or modify
4413+ * it under the terms of the GNU General Public License version 2 as
4414+ * published by the Free Software Foundation.
4415+ */
4416+
4417+#include "union.h"
4418+
4419+static ssize_t unionfs_read(struct file *file, char __user *buf,
4420+ size_t count, loff_t *ppos)
4421+{
4422+ int err;
4423+ struct file *lower_file;
4424+ struct dentry *dentry = file->f_path.dentry;
4425+ struct dentry *parent;
4426+
4427+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4428+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4429+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4430+
4431+ err = unionfs_file_revalidate(file, parent, false);
4432+ if (unlikely(err))
4433+ goto out;
4434+
4435+ lower_file = unionfs_lower_file(file);
4436+ err = vfs_read(lower_file, buf, count, ppos);
4437+ /* update our inode atime upon a successful lower read */
4438+ if (err >= 0) {
4439+ fsstack_copy_attr_atime(dentry->d_inode,
4440+ lower_file->f_path.dentry->d_inode);
4441+ unionfs_check_file(file);
4442+ }
4443+
4444+out:
4445+ unionfs_unlock_dentry(dentry);
4446+ unionfs_unlock_parent(dentry, parent);
4447+ unionfs_read_unlock(dentry->d_sb);
4448+ return err;
4449+}
4450+
4451+static ssize_t unionfs_write(struct file *file, const char __user *buf,
4452+ size_t count, loff_t *ppos)
4453+{
4454+ int err = 0;
4455+ struct file *lower_file;
4456+ struct dentry *dentry = file->f_path.dentry;
4457+ struct dentry *parent;
4458+
4459+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4460+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4461+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4462+
4463+ err = unionfs_file_revalidate(file, parent, true);
4464+ if (unlikely(err))
4465+ goto out;
4466+
4467+ lower_file = unionfs_lower_file(file);
4468+ err = vfs_write(lower_file, buf, count, ppos);
4469+ /* update our inode times+sizes upon a successful lower write */
4470+ if (err >= 0) {
4471+ fsstack_copy_inode_size(dentry->d_inode,
4472+ lower_file->f_path.dentry->d_inode);
4473+ fsstack_copy_attr_times(dentry->d_inode,
4474+ lower_file->f_path.dentry->d_inode);
4475+ UNIONFS_F(file)->wrote_to_file = true; /* for delayed copyup */
4476+ unionfs_check_file(file);
4477+ }
4478+
4479+out:
4480+ unionfs_unlock_dentry(dentry);
4481+ unionfs_unlock_parent(dentry, parent);
4482+ unionfs_read_unlock(dentry->d_sb);
4483+ return err;
4484+}
4485+
4486+static int unionfs_file_readdir(struct file *file, void *dirent,
4487+ filldir_t filldir)
4488+{
4489+ return -ENOTDIR;
4490+}
4491+
4492+static int unionfs_mmap(struct file *file, struct vm_area_struct *vma)
4493+{
4494+ int err = 0;
4495+ bool willwrite;
4496+ struct file *lower_file;
4497+ struct dentry *dentry = file->f_path.dentry;
4498+ struct dentry *parent;
7670a7fc 4499+ const struct vm_operations_struct *saved_vm_ops = NULL;
2380c486
JR
4500+
4501+ /*
4502+ * Since mm/memory.c:might_fault() (under PROVE_LOCKING) was
4503+ * modified in 2.6.29-rc1 to call might_lock_read on mmap_sem, this
4504+ * has been causing false positives in file system stacking layers.
4505+ * In particular, our ->mmap is called after sys_mmap2 already holds
4506+ * mmap_sem, then we lock our own mutexes; but earlier, it's
4507+ * possible for lockdep to have locked our mutexes first, and then
4508+ * we call a lower ->readdir which could call might_fault. The
4509+ * different ordering of the locks is what lockdep complains about
4510+ * -- unnecessarily. Therefore, we have no choice but to tell
4511+ * lockdep to temporarily turn off lockdep here. Note: the comments
4512+ * inside might_sleep also suggest that it would have been
4513+ * nicer to only annotate paths that needs that might_lock_read.
4514+ */
4515+ lockdep_off();
4516+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4517+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4518+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4519+
4520+ /* This might be deferred to mmap's writepage */
4521+ willwrite = ((vma->vm_flags | VM_SHARED | VM_WRITE) == vma->vm_flags);
4522+ err = unionfs_file_revalidate(file, parent, willwrite);
4523+ if (unlikely(err))
4524+ goto out;
4525+ unionfs_check_file(file);
4526+
4527+ /*
4528+ * File systems which do not implement ->writepage may use
4529+ * generic_file_readonly_mmap as their ->mmap op. If you call
4530+ * generic_file_readonly_mmap with VM_WRITE, you'd get an -EINVAL.
4531+ * But we cannot call the lower ->mmap op, so we can't tell that
4532+ * writeable mappings won't work. Therefore, our only choice is to
4533+ * check if the lower file system supports the ->writepage, and if
4534+ * not, return EINVAL (the same error that
4535+ * generic_file_readonly_mmap returns in that case).
4536+ */
4537+ lower_file = unionfs_lower_file(file);
4538+ if (willwrite && !lower_file->f_mapping->a_ops->writepage) {
4539+ err = -EINVAL;
4540+ printk(KERN_ERR "unionfs: branch %d file system does not "
4541+ "support writeable mmap\n", fbstart(file));
4542+ goto out;
4543+ }
4544+
4545+ /*
4546+ * find and save lower vm_ops.
4547+ *
4548+ * XXX: the VFS should have a cleaner way of finding the lower vm_ops
4549+ */
4550+ if (!UNIONFS_F(file)->lower_vm_ops) {
4551+ err = lower_file->f_op->mmap(lower_file, vma);
4552+ if (err) {
4553+ printk(KERN_ERR "unionfs: lower mmap failed %d\n", err);
4554+ goto out;
4555+ }
4556+ saved_vm_ops = vma->vm_ops;
4557+ err = do_munmap(current->mm, vma->vm_start,
4558+ vma->vm_end - vma->vm_start);
4559+ if (err) {
4560+ printk(KERN_ERR "unionfs: do_munmap failed %d\n", err);
4561+ goto out;
4562+ }
4563+ }
4564+
4565+ file->f_mapping->a_ops = &unionfs_dummy_aops;
4566+ err = generic_file_mmap(file, vma);
4567+ file->f_mapping->a_ops = &unionfs_aops;
4568+ if (err) {
4569+ printk(KERN_ERR "unionfs: generic_file_mmap failed %d\n", err);
4570+ goto out;
4571+ }
4572+ vma->vm_ops = &unionfs_vm_ops;
4573+ if (!UNIONFS_F(file)->lower_vm_ops)
4574+ UNIONFS_F(file)->lower_vm_ops = saved_vm_ops;
4575+
4576+out:
4577+ if (!err) {
4578+ /* copyup could cause parent dir times to change */
4579+ unionfs_copy_attr_times(parent->d_inode);
4580+ unionfs_check_file(file);
4581+ }
4582+ unionfs_unlock_dentry(dentry);
4583+ unionfs_unlock_parent(dentry, parent);
4584+ unionfs_read_unlock(dentry->d_sb);
4585+ lockdep_on();
4586+ return err;
4587+}
4588+
0c5527e5 4589+int unionfs_fsync(struct file *file, int datasync)
2380c486
JR
4590+{
4591+ int bindex, bstart, bend;
4592+ struct file *lower_file;
0c5527e5 4593+ struct dentry *dentry = file->f_path.dentry;
2380c486
JR
4594+ struct dentry *lower_dentry;
4595+ struct dentry *parent;
4596+ struct inode *lower_inode, *inode;
4597+ int err = -EINVAL;
4598+
4599+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4600+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4601+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4602+
4603+ err = unionfs_file_revalidate(file, parent, true);
4604+ if (unlikely(err))
4605+ goto out;
4606+ unionfs_check_file(file);
4607+
4608+ bstart = fbstart(file);
4609+ bend = fbend(file);
4610+ if (bstart < 0 || bend < 0)
4611+ goto out;
4612+
4613+ inode = dentry->d_inode;
4614+ if (unlikely(!inode)) {
4615+ printk(KERN_ERR
4616+ "unionfs: null lower inode in unionfs_fsync\n");
4617+ goto out;
4618+ }
4619+ for (bindex = bstart; bindex <= bend; bindex++) {
4620+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
4621+ if (!lower_inode || !lower_inode->i_fop->fsync)
4622+ continue;
4623+ lower_file = unionfs_lower_file_idx(file, bindex);
4624+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
4625+ mutex_lock(&lower_inode->i_mutex);
0c5527e5 4626+ err = lower_inode->i_fop->fsync(lower_file, datasync);
2380c486
JR
4627+ if (!err && bindex == bstart)
4628+ fsstack_copy_attr_times(inode, lower_inode);
4629+ mutex_unlock(&lower_inode->i_mutex);
4630+ if (err)
4631+ goto out;
4632+ }
4633+
4634+out:
4635+ if (!err)
4636+ unionfs_check_file(file);
4637+ unionfs_unlock_dentry(dentry);
4638+ unionfs_unlock_parent(dentry, parent);
4639+ unionfs_read_unlock(dentry->d_sb);
4640+ return err;
4641+}
4642+
4643+int unionfs_fasync(int fd, struct file *file, int flag)
4644+{
4645+ int bindex, bstart, bend;
4646+ struct file *lower_file;
4647+ struct dentry *dentry = file->f_path.dentry;
4648+ struct dentry *parent;
4649+ struct inode *lower_inode, *inode;
4650+ int err = 0;
4651+
4652+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4653+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4654+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4655+
4656+ err = unionfs_file_revalidate(file, parent, true);
4657+ if (unlikely(err))
4658+ goto out;
4659+ unionfs_check_file(file);
4660+
4661+ bstart = fbstart(file);
4662+ bend = fbend(file);
4663+ if (bstart < 0 || bend < 0)
4664+ goto out;
4665+
4666+ inode = dentry->d_inode;
4667+ if (unlikely(!inode)) {
4668+ printk(KERN_ERR
4669+ "unionfs: null lower inode in unionfs_fasync\n");
4670+ goto out;
4671+ }
4672+ for (bindex = bstart; bindex <= bend; bindex++) {
4673+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
4674+ if (!lower_inode || !lower_inode->i_fop->fasync)
4675+ continue;
4676+ lower_file = unionfs_lower_file_idx(file, bindex);
4677+ mutex_lock(&lower_inode->i_mutex);
4678+ err = lower_inode->i_fop->fasync(fd, lower_file, flag);
4679+ if (!err && bindex == bstart)
4680+ fsstack_copy_attr_times(inode, lower_inode);
4681+ mutex_unlock(&lower_inode->i_mutex);
4682+ if (err)
4683+ goto out;
4684+ }
4685+
4686+out:
4687+ if (!err)
4688+ unionfs_check_file(file);
4689+ unionfs_unlock_dentry(dentry);
4690+ unionfs_unlock_parent(dentry, parent);
4691+ unionfs_read_unlock(dentry->d_sb);
4692+ return err;
4693+}
4694+
4695+static ssize_t unionfs_splice_read(struct file *file, loff_t *ppos,
4696+ struct pipe_inode_info *pipe, size_t len,
4697+ unsigned int flags)
4698+{
4699+ ssize_t err;
4700+ struct file *lower_file;
4701+ struct dentry *dentry = file->f_path.dentry;
4702+ struct dentry *parent;
4703+
4704+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4705+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4706+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4707+
4708+ err = unionfs_file_revalidate(file, parent, false);
4709+ if (unlikely(err))
4710+ goto out;
4711+
4712+ lower_file = unionfs_lower_file(file);
4713+ err = vfs_splice_to(lower_file, ppos, pipe, len, flags);
4714+ /* update our inode atime upon a successful lower splice-read */
4715+ if (err >= 0) {
4716+ fsstack_copy_attr_atime(dentry->d_inode,
4717+ lower_file->f_path.dentry->d_inode);
4718+ unionfs_check_file(file);
4719+ }
4720+
4721+out:
4722+ unionfs_unlock_dentry(dentry);
4723+ unionfs_unlock_parent(dentry, parent);
4724+ unionfs_read_unlock(dentry->d_sb);
4725+ return err;
4726+}
4727+
4728+static ssize_t unionfs_splice_write(struct pipe_inode_info *pipe,
4729+ struct file *file, loff_t *ppos,
4730+ size_t len, unsigned int flags)
4731+{
4732+ ssize_t err = 0;
4733+ struct file *lower_file;
4734+ struct dentry *dentry = file->f_path.dentry;
4735+ struct dentry *parent;
4736+
4737+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4738+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4739+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4740+
4741+ err = unionfs_file_revalidate(file, parent, true);
4742+ if (unlikely(err))
4743+ goto out;
4744+
4745+ lower_file = unionfs_lower_file(file);
4746+ err = vfs_splice_from(pipe, lower_file, ppos, len, flags);
4747+ /* update our inode times+sizes upon a successful lower write */
4748+ if (err >= 0) {
4749+ fsstack_copy_inode_size(dentry->d_inode,
4750+ lower_file->f_path.dentry->d_inode);
4751+ fsstack_copy_attr_times(dentry->d_inode,
4752+ lower_file->f_path.dentry->d_inode);
4753+ unionfs_check_file(file);
4754+ }
4755+
4756+out:
4757+ unionfs_unlock_dentry(dentry);
4758+ unionfs_unlock_parent(dentry, parent);
4759+ unionfs_read_unlock(dentry->d_sb);
4760+ return err;
4761+}
4762+
4763+struct file_operations unionfs_main_fops = {
4764+ .llseek = generic_file_llseek,
4765+ .read = unionfs_read,
4766+ .write = unionfs_write,
4767+ .readdir = unionfs_file_readdir,
4768+ .unlocked_ioctl = unionfs_ioctl,
0c5527e5
AM
4769+#ifdef CONFIG_COMPAT
4770+ .compat_ioctl = unionfs_ioctl,
4771+#endif
2380c486
JR
4772+ .mmap = unionfs_mmap,
4773+ .open = unionfs_open,
4774+ .flush = unionfs_flush,
4775+ .release = unionfs_file_release,
4776+ .fsync = unionfs_fsync,
4777+ .fasync = unionfs_fasync,
4778+ .splice_read = unionfs_splice_read,
4779+ .splice_write = unionfs_splice_write,
4780+};
0c5527e5
AM
4781diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
4782new file mode 100644
82260373 4783index 0000000..0066238
0c5527e5
AM
4784--- /dev/null
4785+++ b/fs/unionfs/inode.c
82260373 4786@@ -0,0 +1,1077 @@
2380c486 4787+/*
7670a7fc 4788+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
4789+ * Copyright (c) 2003-2006 Charles P. Wright
4790+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
4791+ * Copyright (c) 2005-2006 Junjiro Okajima
4792+ * Copyright (c) 2005 Arun M. Krishnakumar
4793+ * Copyright (c) 2004-2006 David P. Quigley
4794+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
4795+ * Copyright (c) 2003 Puja Gupta
4796+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
4797+ * Copyright (c) 2003-2010 Stony Brook University
4798+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
4799+ *
4800+ * This program is free software; you can redistribute it and/or modify
4801+ * it under the terms of the GNU General Public License version 2 as
4802+ * published by the Free Software Foundation.
4803+ */
4804+
4805+#include "union.h"
4806+
4807+/*
4808+ * Find a writeable branch to create new object in. Checks all writeble
4809+ * branches of the parent inode, from istart to iend order; if none are
4810+ * suitable, also tries branch 0 (which may require a copyup).
4811+ *
4812+ * Return a lower_dentry we can use to create object in, or ERR_PTR.
4813+ */
4814+static struct dentry *find_writeable_branch(struct inode *parent,
4815+ struct dentry *dentry)
4816+{
4817+ int err = -EINVAL;
4818+ int bindex, istart, iend;
4819+ struct dentry *lower_dentry = NULL;
4820+
4821+ istart = ibstart(parent);
4822+ iend = ibend(parent);
4823+ if (istart < 0)
4824+ goto out;
4825+
4826+begin:
4827+ for (bindex = istart; bindex <= iend; bindex++) {
4828+ /* skip non-writeable branches */
4829+ err = is_robranch_super(dentry->d_sb, bindex);
4830+ if (err) {
4831+ err = -EROFS;
4832+ continue;
4833+ }
4834+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
4835+ if (!lower_dentry)
4836+ continue;
4837+ /*
4838+ * check for whiteouts in writeable branch, and remove them
4839+ * if necessary.
4840+ */
4841+ err = check_unlink_whiteout(dentry, lower_dentry, bindex);
4842+ if (err > 0) /* ignore if whiteout found and removed */
4843+ err = 0;
4844+ if (err)
4845+ continue;
4846+ /* if get here, we can write to the branch */
4847+ break;
4848+ }
4849+ /*
4850+ * If istart wasn't already branch 0, and we got any error, then try
4851+ * branch 0 (which may require copyup)
4852+ */
4853+ if (err && istart > 0) {
4854+ istart = iend = 0;
4855+ goto begin;
4856+ }
4857+
4858+ /*
4859+ * If we tried even branch 0, and still got an error, abort. But if
4860+ * the error was an EROFS, then we should try to copyup.
4861+ */
4862+ if (err && err != -EROFS)
4863+ goto out;
4864+
4865+ /*
4866+ * If we get here, then check if copyup needed. If lower_dentry is
4867+ * NULL, create the entire dentry directory structure in branch 0.
4868+ */
4869+ if (!lower_dentry) {
4870+ bindex = 0;
4871+ lower_dentry = create_parents(parent, dentry,
4872+ dentry->d_name.name, bindex);
4873+ if (IS_ERR(lower_dentry)) {
4874+ err = PTR_ERR(lower_dentry);
4875+ goto out;
4876+ }
4877+ }
4878+ err = 0; /* all's well */
4879+out:
4880+ if (err)
4881+ return ERR_PTR(err);
4882+ return lower_dentry;
4883+}
4884+
4885+static int unionfs_create(struct inode *dir, struct dentry *dentry,
4886+ int mode, struct nameidata *nd_unused)
4887+{
4888+ int err = 0;
4889+ struct dentry *lower_dentry = NULL;
4890+ struct dentry *lower_parent_dentry = NULL;
4891+ struct dentry *parent;
4892+ int valid = 0;
4893+ struct nameidata lower_nd;
4894+
4895+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
4896+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4897+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4898+
4899+ valid = __unionfs_d_revalidate(dentry, parent, false);
4900+ if (unlikely(!valid)) {
4901+ err = -ESTALE; /* same as what real_lookup does */
4902+ goto out;
4903+ }
4904+
4905+ lower_dentry = find_writeable_branch(dir, dentry);
4906+ if (IS_ERR(lower_dentry)) {
4907+ err = PTR_ERR(lower_dentry);
4908+ goto out;
4909+ }
4910+
4911+ lower_parent_dentry = lock_parent(lower_dentry);
4912+ if (IS_ERR(lower_parent_dentry)) {
4913+ err = PTR_ERR(lower_parent_dentry);
7670a7fc 4914+ goto out_unlock;
2380c486
JR
4915+ }
4916+
4917+ err = init_lower_nd(&lower_nd, LOOKUP_CREATE);
4918+ if (unlikely(err < 0))
7670a7fc 4919+ goto out_unlock;
2380c486
JR
4920+ err = vfs_create(lower_parent_dentry->d_inode, lower_dentry, mode,
4921+ &lower_nd);
4922+ release_lower_nd(&lower_nd, err);
4923+
4924+ if (!err) {
4925+ err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
4926+ if (!err) {
4927+ unionfs_copy_attr_times(dir);
4928+ fsstack_copy_inode_size(dir,
4929+ lower_parent_dentry->d_inode);
4930+ /* update no. of links on parent directory */
4931+ dir->i_nlink = unionfs_get_nlinks(dir);
4932+ }
4933+ }
4934+
7670a7fc 4935+out_unlock:
2380c486 4936+ unlock_dir(lower_parent_dentry);
2380c486
JR
4937+out:
4938+ if (!err) {
4939+ unionfs_postcopyup_setmnt(dentry);
4940+ unionfs_check_inode(dir);
4941+ unionfs_check_dentry(dentry);
4942+ }
4943+ unionfs_unlock_dentry(dentry);
4944+ unionfs_unlock_parent(dentry, parent);
4945+ unionfs_read_unlock(dentry->d_sb);
4946+ return err;
4947+}
4948+
4949+/*
4950+ * unionfs_lookup is the only special function which takes a dentry, yet we
4951+ * do NOT want to call __unionfs_d_revalidate_chain because by definition,
4952+ * we don't have a valid dentry here yet.
4953+ */
4954+static struct dentry *unionfs_lookup(struct inode *dir,
4955+ struct dentry *dentry,
4956+ struct nameidata *nd_unused)
4957+{
4958+ struct dentry *ret, *parent;
4959+ int err = 0;
4960+
4961+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
4962+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4963+
4964+ /*
4965+ * As long as we lock/dget the parent, then can skip validating the
4966+ * parent now; we may have to rebuild this dentry on the next
4967+ * ->d_revalidate, however.
4968+ */
4969+
4970+ /* allocate dentry private data. We free it in ->d_release */
4971+ err = new_dentry_private_data(dentry, UNIONFS_DMUTEX_CHILD);
4972+ if (unlikely(err)) {
4973+ ret = ERR_PTR(err);
4974+ goto out;
4975+ }
4976+
4977+ ret = unionfs_lookup_full(dentry, parent, INTERPOSE_LOOKUP);
4978+
4979+ if (!IS_ERR(ret)) {
4980+ if (ret)
4981+ dentry = ret;
4982+ /* lookup_full can return multiple positive dentries */
4983+ if (dentry->d_inode && !S_ISDIR(dentry->d_inode->i_mode)) {
4984+ BUG_ON(dbstart(dentry) < 0);
4985+ unionfs_postcopyup_release(dentry);
4986+ }
4987+ unionfs_copy_attr_times(dentry->d_inode);
4988+ }
4989+
4990+ unionfs_check_inode(dir);
4991+ if (!IS_ERR(ret))
4992+ unionfs_check_dentry(dentry);
4993+ unionfs_check_dentry(parent);
4994+ unionfs_unlock_dentry(dentry); /* locked in new_dentry_private data */
4995+
4996+out:
4997+ unionfs_unlock_parent(dentry, parent);
4998+ unionfs_read_unlock(dentry->d_sb);
4999+
5000+ return ret;
5001+}
5002+
5003+static int unionfs_link(struct dentry *old_dentry, struct inode *dir,
5004+ struct dentry *new_dentry)
5005+{
5006+ int err = 0;
5007+ struct dentry *lower_old_dentry = NULL;
5008+ struct dentry *lower_new_dentry = NULL;
5009+ struct dentry *lower_dir_dentry = NULL;
5010+ struct dentry *old_parent, *new_parent;
5011+ char *name = NULL;
5012+ bool valid;
5013+
5014+ unionfs_read_lock(old_dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5015+ old_parent = dget_parent(old_dentry);
5016+ new_parent = dget_parent(new_dentry);
5017+ unionfs_double_lock_parents(old_parent, new_parent);
5018+ unionfs_double_lock_dentry(old_dentry, new_dentry);
5019+
5020+ valid = __unionfs_d_revalidate(old_dentry, old_parent, false);
5021+ if (unlikely(!valid)) {
5022+ err = -ESTALE;
5023+ goto out;
5024+ }
5025+ if (new_dentry->d_inode) {
5026+ valid = __unionfs_d_revalidate(new_dentry, new_parent, false);
5027+ if (unlikely(!valid)) {
5028+ err = -ESTALE;
5029+ goto out;
5030+ }
5031+ }
5032+
5033+ lower_new_dentry = unionfs_lower_dentry(new_dentry);
5034+
5035+ /* check for a whiteout in new dentry branch, and delete it */
5036+ err = check_unlink_whiteout(new_dentry, lower_new_dentry,
5037+ dbstart(new_dentry));
5038+ if (err > 0) { /* whiteout found and removed successfully */
5039+ lower_dir_dentry = dget_parent(lower_new_dentry);
5040+ fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
5041+ dput(lower_dir_dentry);
5042+ dir->i_nlink = unionfs_get_nlinks(dir);
5043+ err = 0;
5044+ }
5045+ if (err)
5046+ goto out;
5047+
5048+ /* check if parent hierachy is needed, then link in same branch */
5049+ if (dbstart(old_dentry) != dbstart(new_dentry)) {
5050+ lower_new_dentry = create_parents(dir, new_dentry,
5051+ new_dentry->d_name.name,
5052+ dbstart(old_dentry));
5053+ err = PTR_ERR(lower_new_dentry);
5054+ if (IS_COPYUP_ERR(err))
5055+ goto docopyup;
5056+ if (!lower_new_dentry || IS_ERR(lower_new_dentry))
5057+ goto out;
5058+ }
5059+ lower_new_dentry = unionfs_lower_dentry(new_dentry);
5060+ lower_old_dentry = unionfs_lower_dentry(old_dentry);
5061+
5062+ BUG_ON(dbstart(old_dentry) != dbstart(new_dentry));
5063+ lower_dir_dentry = lock_parent(lower_new_dentry);
5064+ err = is_robranch(old_dentry);
5065+ if (!err) {
5066+ /* see Documentation/filesystems/unionfs/issues.txt */
5067+ lockdep_off();
5068+ err = vfs_link(lower_old_dentry, lower_dir_dentry->d_inode,
5069+ lower_new_dentry);
5070+ lockdep_on();
5071+ }
5072+ unlock_dir(lower_dir_dentry);
5073+
5074+docopyup:
5075+ if (IS_COPYUP_ERR(err)) {
5076+ int old_bstart = dbstart(old_dentry);
5077+ int bindex;
5078+
5079+ for (bindex = old_bstart - 1; bindex >= 0; bindex--) {
5080+ err = copyup_dentry(old_parent->d_inode,
5081+ old_dentry, old_bstart,
5082+ bindex, old_dentry->d_name.name,
5083+ old_dentry->d_name.len, NULL,
5084+ i_size_read(old_dentry->d_inode));
5085+ if (err)
5086+ continue;
5087+ lower_new_dentry =
5088+ create_parents(dir, new_dentry,
5089+ new_dentry->d_name.name,
5090+ bindex);
5091+ lower_old_dentry = unionfs_lower_dentry(old_dentry);
5092+ lower_dir_dentry = lock_parent(lower_new_dentry);
5093+ /* see Documentation/filesystems/unionfs/issues.txt */
5094+ lockdep_off();
5095+ /* do vfs_link */
5096+ err = vfs_link(lower_old_dentry,
5097+ lower_dir_dentry->d_inode,
5098+ lower_new_dentry);
5099+ lockdep_on();
5100+ unlock_dir(lower_dir_dentry);
5101+ goto check_link;
5102+ }
5103+ goto out;
5104+ }
5105+
5106+check_link:
5107+ if (err || !lower_new_dentry->d_inode)
5108+ goto out;
5109+
5110+ /* Its a hard link, so use the same inode */
5111+ new_dentry->d_inode = igrab(old_dentry->d_inode);
5112+ d_add(new_dentry, new_dentry->d_inode);
5113+ unionfs_copy_attr_all(dir, lower_new_dentry->d_parent->d_inode);
5114+ fsstack_copy_inode_size(dir, lower_new_dentry->d_parent->d_inode);
5115+
5116+ /* propagate number of hard-links */
5117+ old_dentry->d_inode->i_nlink = unionfs_get_nlinks(old_dentry->d_inode);
5118+ /* new dentry's ctime may have changed due to hard-link counts */
5119+ unionfs_copy_attr_times(new_dentry->d_inode);
5120+
5121+out:
5122+ if (!new_dentry->d_inode)
5123+ d_drop(new_dentry);
5124+
5125+ kfree(name);
5126+ if (!err)
5127+ unionfs_postcopyup_setmnt(new_dentry);
5128+
5129+ unionfs_check_inode(dir);
5130+ unionfs_check_dentry(new_dentry);
5131+ unionfs_check_dentry(old_dentry);
5132+
5133+ unionfs_double_unlock_dentry(old_dentry, new_dentry);
5134+ unionfs_double_unlock_parents(old_parent, new_parent);
5135+ dput(new_parent);
5136+ dput(old_parent);
5137+ unionfs_read_unlock(old_dentry->d_sb);
5138+
5139+ return err;
5140+}
5141+
5142+static int unionfs_symlink(struct inode *dir, struct dentry *dentry,
5143+ const char *symname)
5144+{
5145+ int err = 0;
5146+ struct dentry *lower_dentry = NULL;
5147+ struct dentry *wh_dentry = NULL;
5148+ struct dentry *lower_parent_dentry = NULL;
5149+ struct dentry *parent;
5150+ char *name = NULL;
5151+ int valid = 0;
5152+ umode_t mode;
5153+
5154+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5155+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5156+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5157+
5158+ valid = __unionfs_d_revalidate(dentry, parent, false);
5159+ if (unlikely(!valid)) {
5160+ err = -ESTALE;
5161+ goto out;
5162+ }
5163+
5164+ /*
5165+ * It's only a bug if this dentry was not negative and couldn't be
5166+ * revalidated (shouldn't happen).
5167+ */
5168+ BUG_ON(!valid && dentry->d_inode);
5169+
5170+ lower_dentry = find_writeable_branch(dir, dentry);
5171+ if (IS_ERR(lower_dentry)) {
5172+ err = PTR_ERR(lower_dentry);
5173+ goto out;
5174+ }
5175+
5176+ lower_parent_dentry = lock_parent(lower_dentry);
5177+ if (IS_ERR(lower_parent_dentry)) {
5178+ err = PTR_ERR(lower_parent_dentry);
7670a7fc 5179+ goto out_unlock;
2380c486
JR
5180+ }
5181+
5182+ mode = S_IALLUGO;
5183+ err = vfs_symlink(lower_parent_dentry->d_inode, lower_dentry, symname);
5184+ if (!err) {
5185+ err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
5186+ if (!err) {
5187+ unionfs_copy_attr_times(dir);
5188+ fsstack_copy_inode_size(dir,
5189+ lower_parent_dentry->d_inode);
5190+ /* update no. of links on parent directory */
5191+ dir->i_nlink = unionfs_get_nlinks(dir);
5192+ }
5193+ }
5194+
7670a7fc 5195+out_unlock:
2380c486 5196+ unlock_dir(lower_parent_dentry);
2380c486
JR
5197+out:
5198+ dput(wh_dentry);
5199+ kfree(name);
5200+
5201+ if (!err) {
5202+ unionfs_postcopyup_setmnt(dentry);
5203+ unionfs_check_inode(dir);
5204+ unionfs_check_dentry(dentry);
5205+ }
5206+ unionfs_unlock_dentry(dentry);
5207+ unionfs_unlock_parent(dentry, parent);
5208+ unionfs_read_unlock(dentry->d_sb);
5209+ return err;
5210+}
5211+
5212+static int unionfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
5213+{
5214+ int err = 0;
5215+ struct dentry *lower_dentry = NULL;
5216+ struct dentry *lower_parent_dentry = NULL;
5217+ struct dentry *parent;
5218+ int bindex = 0, bstart;
5219+ char *name = NULL;
5220+ int valid;
5221+
5222+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5223+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5224+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5225+
5226+ valid = __unionfs_d_revalidate(dentry, parent, false);
5227+ if (unlikely(!valid)) {
5228+ err = -ESTALE; /* same as what real_lookup does */
5229+ goto out;
5230+ }
5231+
5232+ bstart = dbstart(dentry);
5233+
5234+ lower_dentry = unionfs_lower_dentry(dentry);
5235+
5236+ /* check for a whiteout in new dentry branch, and delete it */
5237+ err = check_unlink_whiteout(dentry, lower_dentry, bstart);
5238+ if (err > 0) /* whiteout found and removed successfully */
5239+ err = 0;
5240+ if (err) {
5241+ /* exit if the error returned was NOT -EROFS */
5242+ if (!IS_COPYUP_ERR(err))
5243+ goto out;
5244+ bstart--;
5245+ }
5246+
5247+ /* check if copyup's needed, and mkdir */
5248+ for (bindex = bstart; bindex >= 0; bindex--) {
5249+ int i;
5250+ int bend = dbend(dentry);
5251+
5252+ if (is_robranch_super(dentry->d_sb, bindex))
5253+ continue;
5254+
5255+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
5256+ if (!lower_dentry) {
5257+ lower_dentry = create_parents(dir, dentry,
5258+ dentry->d_name.name,
5259+ bindex);
5260+ if (!lower_dentry || IS_ERR(lower_dentry)) {
5261+ printk(KERN_ERR "unionfs: lower dentry "
5262+ " NULL for bindex = %d\n", bindex);
5263+ continue;
5264+ }
5265+ }
5266+
5267+ lower_parent_dentry = lock_parent(lower_dentry);
5268+
5269+ if (IS_ERR(lower_parent_dentry)) {
5270+ err = PTR_ERR(lower_parent_dentry);
5271+ goto out;
5272+ }
5273+
5274+ err = vfs_mkdir(lower_parent_dentry->d_inode, lower_dentry,
5275+ mode);
5276+
5277+ unlock_dir(lower_parent_dentry);
5278+
5279+ /* did the mkdir succeed? */
5280+ if (err)
5281+ break;
5282+
5283+ for (i = bindex + 1; i <= bend; i++) {
5284+ /* XXX: use path_put_lowers? */
5285+ if (unionfs_lower_dentry_idx(dentry, i)) {
5286+ dput(unionfs_lower_dentry_idx(dentry, i));
5287+ unionfs_set_lower_dentry_idx(dentry, i, NULL);
5288+ }
5289+ }
5290+ dbend(dentry) = bindex;
5291+
5292+ /*
5293+ * Only INTERPOSE_LOOKUP can return a value other than 0 on
5294+ * err.
5295+ */
5296+ err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
5297+ if (!err) {
5298+ unionfs_copy_attr_times(dir);
5299+ fsstack_copy_inode_size(dir,
5300+ lower_parent_dentry->d_inode);
5301+
5302+ /* update number of links on parent directory */
5303+ dir->i_nlink = unionfs_get_nlinks(dir);
5304+ }
5305+
5306+ err = make_dir_opaque(dentry, dbstart(dentry));
5307+ if (err) {
5308+ printk(KERN_ERR "unionfs: mkdir: error creating "
5309+ ".wh.__dir_opaque: %d\n", err);
5310+ goto out;
5311+ }
5312+
5313+ /* we are done! */
5314+ break;
5315+ }
5316+
5317+out:
5318+ if (!dentry->d_inode)
5319+ d_drop(dentry);
5320+
5321+ kfree(name);
5322+
5323+ if (!err) {
5324+ unionfs_copy_attr_times(dentry->d_inode);
5325+ unionfs_postcopyup_setmnt(dentry);
5326+ }
5327+ unionfs_check_inode(dir);
5328+ unionfs_check_dentry(dentry);
5329+ unionfs_unlock_dentry(dentry);
5330+ unionfs_unlock_parent(dentry, parent);
5331+ unionfs_read_unlock(dentry->d_sb);
5332+
5333+ return err;
5334+}
5335+
5336+static int unionfs_mknod(struct inode *dir, struct dentry *dentry, int mode,
5337+ dev_t dev)
5338+{
5339+ int err = 0;
5340+ struct dentry *lower_dentry = NULL;
5341+ struct dentry *wh_dentry = NULL;
5342+ struct dentry *lower_parent_dentry = NULL;
5343+ struct dentry *parent;
5344+ char *name = NULL;
5345+ int valid = 0;
5346+
5347+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5348+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5349+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5350+
5351+ valid = __unionfs_d_revalidate(dentry, parent, false);
5352+ if (unlikely(!valid)) {
5353+ err = -ESTALE;
5354+ goto out;
5355+ }
5356+
5357+ /*
5358+ * It's only a bug if this dentry was not negative and couldn't be
5359+ * revalidated (shouldn't happen).
5360+ */
5361+ BUG_ON(!valid && dentry->d_inode);
5362+
5363+ lower_dentry = find_writeable_branch(dir, dentry);
5364+ if (IS_ERR(lower_dentry)) {
5365+ err = PTR_ERR(lower_dentry);
5366+ goto out;
5367+ }
5368+
5369+ lower_parent_dentry = lock_parent(lower_dentry);
5370+ if (IS_ERR(lower_parent_dentry)) {
5371+ err = PTR_ERR(lower_parent_dentry);
7670a7fc 5372+ goto out_unlock;
2380c486
JR
5373+ }
5374+
5375+ err = vfs_mknod(lower_parent_dentry->d_inode, lower_dentry, mode, dev);
5376+ if (!err) {
5377+ err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
5378+ if (!err) {
5379+ unionfs_copy_attr_times(dir);
5380+ fsstack_copy_inode_size(dir,
5381+ lower_parent_dentry->d_inode);
5382+ /* update no. of links on parent directory */
5383+ dir->i_nlink = unionfs_get_nlinks(dir);
5384+ }
5385+ }
5386+
7670a7fc 5387+out_unlock:
2380c486 5388+ unlock_dir(lower_parent_dentry);
2380c486
JR
5389+out:
5390+ dput(wh_dentry);
5391+ kfree(name);
5392+
5393+ if (!err) {
5394+ unionfs_postcopyup_setmnt(dentry);
5395+ unionfs_check_inode(dir);
5396+ unionfs_check_dentry(dentry);
5397+ }
5398+ unionfs_unlock_dentry(dentry);
5399+ unionfs_unlock_parent(dentry, parent);
5400+ unionfs_read_unlock(dentry->d_sb);
5401+ return err;
5402+}
5403+
5404+/* requires sb, dentry, and parent to already be locked */
5405+static int __unionfs_readlink(struct dentry *dentry, char __user *buf,
5406+ int bufsiz)
5407+{
5408+ int err;
5409+ struct dentry *lower_dentry;
5410+
5411+ lower_dentry = unionfs_lower_dentry(dentry);
5412+
5413+ if (!lower_dentry->d_inode->i_op ||
5414+ !lower_dentry->d_inode->i_op->readlink) {
5415+ err = -EINVAL;
5416+ goto out;
5417+ }
5418+
5419+ err = lower_dentry->d_inode->i_op->readlink(lower_dentry,
5420+ buf, bufsiz);
5421+ if (err >= 0)
5422+ fsstack_copy_attr_atime(dentry->d_inode,
5423+ lower_dentry->d_inode);
5424+
5425+out:
5426+ return err;
5427+}
5428+
5429+static int unionfs_readlink(struct dentry *dentry, char __user *buf,
5430+ int bufsiz)
5431+{
5432+ int err;
5433+ struct dentry *parent;
5434+
5435+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5436+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5437+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5438+
5439+ if (unlikely(!__unionfs_d_revalidate(dentry, parent, false))) {
5440+ err = -ESTALE;
5441+ goto out;
5442+ }
5443+
5444+ err = __unionfs_readlink(dentry, buf, bufsiz);
5445+
5446+out:
5447+ unionfs_check_dentry(dentry);
5448+ unionfs_unlock_dentry(dentry);
5449+ unionfs_unlock_parent(dentry, parent);
5450+ unionfs_read_unlock(dentry->d_sb);
5451+
5452+ return err;
5453+}
5454+
5455+static void *unionfs_follow_link(struct dentry *dentry, struct nameidata *nd)
5456+{
5457+ char *buf;
5458+ int len = PAGE_SIZE, err;
5459+ mm_segment_t old_fs;
5460+ struct dentry *parent;
5461+
5462+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5463+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5464+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5465+
5466+ /* This is freed by the put_link method assuming a successful call. */
5467+ buf = kmalloc(len, GFP_KERNEL);
5468+ if (unlikely(!buf)) {
5469+ err = -ENOMEM;
5470+ goto out;
5471+ }
5472+
5473+ /* read the symlink, and then we will follow it */
5474+ old_fs = get_fs();
5475+ set_fs(KERNEL_DS);
5476+ err = __unionfs_readlink(dentry, buf, len);
5477+ set_fs(old_fs);
5478+ if (err < 0) {
5479+ kfree(buf);
5480+ buf = NULL;
5481+ goto out;
5482+ }
5483+ buf[err] = 0;
5484+ nd_set_link(nd, buf);
5485+ err = 0;
5486+
5487+out:
5488+ if (err >= 0) {
5489+ unionfs_check_nd(nd);
5490+ unionfs_check_dentry(dentry);
5491+ }
5492+
5493+ unionfs_unlock_dentry(dentry);
5494+ unionfs_unlock_parent(dentry, parent);
5495+ unionfs_read_unlock(dentry->d_sb);
5496+
5497+ return ERR_PTR(err);
5498+}
5499+
5500+/* this @nd *IS* still used */
5501+static void unionfs_put_link(struct dentry *dentry, struct nameidata *nd,
5502+ void *cookie)
5503+{
5504+ struct dentry *parent;
0c5527e5 5505+ char *buf;
2380c486
JR
5506+
5507+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5508+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5509+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5510+
5511+ if (unlikely(!__unionfs_d_revalidate(dentry, parent, false)))
5512+ printk(KERN_ERR
5513+ "unionfs: put_link failed to revalidate dentry\n");
5514+
5515+ unionfs_check_dentry(dentry);
0c5527e5
AM
5516+#if 0
5517+ /* XXX: can't run this check b/c this fxn can receive a poisoned 'nd' PTR */
2380c486 5518+ unionfs_check_nd(nd);
0c5527e5
AM
5519+#endif
5520+ buf = nd_get_link(nd);
5521+ if (!IS_ERR(buf))
5522+ kfree(buf);
2380c486
JR
5523+ unionfs_unlock_dentry(dentry);
5524+ unionfs_unlock_parent(dentry, parent);
5525+ unionfs_read_unlock(dentry->d_sb);
5526+}
5527+
5528+/*
5529+ * This is a variant of fs/namei.c:permission() or inode_permission() which
5530+ * skips over EROFS tests (because we perform copyup on EROFS).
5531+ */
82260373 5532+static int __inode_permission(struct inode *inode, int mask, unsigned int flags)
2380c486
JR
5533+{
5534+ int retval;
5535+
5536+ /* nobody gets write access to an immutable file */
5537+ if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode))
5538+ return -EACCES;
5539+
5540+ /* Ordinary permission routines do not understand MAY_APPEND. */
5541+ if (inode->i_op && inode->i_op->permission) {
82260373 5542+ retval = inode->i_op->permission(inode, mask, flags);
2380c486
JR
5543+ if (!retval) {
5544+ /*
5545+ * Exec permission on a regular file is denied if none
5546+ * of the execute bits are set.
5547+ *
5548+ * This check should be done by the ->permission()
5549+ * method.
5550+ */
5551+ if ((mask & MAY_EXEC) && S_ISREG(inode->i_mode) &&
5552+ !(inode->i_mode & S_IXUGO))
5553+ return -EACCES;
5554+ }
5555+ } else {
82260373 5556+ retval = generic_permission(inode, mask, flags, NULL);
2380c486
JR
5557+ }
5558+ if (retval)
5559+ return retval;
5560+
5561+ return security_inode_permission(inode,
5562+ mask & (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND));
5563+}
5564+
5565+/*
5566+ * Don't grab the superblock read-lock in unionfs_permission, which prevents
5567+ * a deadlock with the branch-management "add branch" code (which grabbed
5568+ * the write lock). It is safe to not grab the read lock here, because even
5569+ * with branch management taking place, there is no chance that
5570+ * unionfs_permission, or anything it calls, will use stale branch
5571+ * information.
5572+ */
82260373 5573+static int unionfs_permission(struct inode *inode, int mask, unsigned int flags)
2380c486
JR
5574+{
5575+ struct inode *lower_inode = NULL;
5576+ int err = 0;
5577+ int bindex, bstart, bend;
5578+ const int is_file = !S_ISDIR(inode->i_mode);
5579+ const int write_mask = (mask & MAY_WRITE) && !(mask & MAY_READ);
5580+ struct inode *inode_grabbed = igrab(inode);
5581+ struct dentry *dentry = d_find_alias(inode);
5582+
82260373
AM
5583+ if (flags & IPERM_FLAG_RCU)
5584+ return -ECHILD;
5585+
2380c486
JR
5586+ if (dentry)
5587+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5588+
5589+ if (!UNIONFS_I(inode)->lower_inodes) {
5590+ if (is_file) /* dirs can be unlinked but chdir'ed to */
5591+ err = -ESTALE; /* force revalidate */
5592+ goto out;
5593+ }
5594+ bstart = ibstart(inode);
5595+ bend = ibend(inode);
5596+ if (unlikely(bstart < 0 || bend < 0)) {
5597+ /*
5598+ * With branch-management, we can get a stale inode here.
5599+ * If so, we return ESTALE back to link_path_walk, which
5600+ * would discard the dcache entry and re-lookup the
5601+ * dentry+inode. This should be equivalent to issuing
5602+ * __unionfs_d_revalidate_chain on nd.dentry here.
5603+ */
5604+ if (is_file) /* dirs can be unlinked but chdir'ed to */
5605+ err = -ESTALE; /* force revalidate */
5606+ goto out;
5607+ }
5608+
5609+ for (bindex = bstart; bindex <= bend; bindex++) {
5610+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
5611+ if (!lower_inode)
5612+ continue;
5613+
5614+ /*
5615+ * check the condition for D-F-D underlying files/directories,
5616+ * we don't have to check for files, if we are checking for
5617+ * directories.
5618+ */
5619+ if (!is_file && !S_ISDIR(lower_inode->i_mode))
5620+ continue;
5621+
5622+ /*
5623+ * We check basic permissions, but we ignore any conditions
5624+ * such as readonly file systems or branches marked as
5625+ * readonly, because those conditions should lead to a
5626+ * copyup taking place later on. However, if user never had
5627+ * access to the file, then no copyup could ever take place.
5628+ */
82260373 5629+ err = __inode_permission(lower_inode, mask, flags);
2380c486
JR
5630+ if (err && err != -EACCES && err != EPERM && bindex > 0) {
5631+ umode_t mode = lower_inode->i_mode;
5632+ if ((is_robranch_super(inode->i_sb, bindex) ||
4ae1df7a 5633+ __is_rdonly(lower_inode)) &&
2380c486
JR
5634+ (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)))
5635+ err = 0;
5636+ if (IS_COPYUP_ERR(err))
5637+ err = 0;
5638+ }
5639+
5640+ /*
4ae1df7a
JR
5641+ * NFS HACK: NFSv2/3 return EACCES on readonly-exported,
5642+ * locally readonly-mounted file systems, instead of EROFS
5643+ * like other file systems do. So we have no choice here
5644+ * but to intercept this and ignore it for NFS branches
5645+ * marked readonly. Specifically, we avoid using NFS's own
5646+ * "broken" ->permission method, and rely on
5647+ * generic_permission() to do basic checking for us.
5648+ */
5649+ if (err && err == -EACCES &&
5650+ is_robranch_super(inode->i_sb, bindex) &&
5651+ lower_inode->i_sb->s_magic == NFS_SUPER_MAGIC)
82260373 5652+ err = generic_permission(lower_inode, mask, flags, NULL);
4ae1df7a
JR
5653+
5654+ /*
2380c486
JR
5655+ * The permissions are an intersection of the overall directory
5656+ * permissions, so we fail if one fails.
5657+ */
5658+ if (err)
5659+ goto out;
5660+
5661+ /* only the leftmost file matters. */
5662+ if (is_file || write_mask) {
5663+ if (is_file && write_mask) {
5664+ err = get_write_access(lower_inode);
5665+ if (!err)
5666+ put_write_access(lower_inode);
5667+ }
5668+ break;
5669+ }
5670+ }
5671+ /* sync times which may have changed (asynchronously) below */
5672+ unionfs_copy_attr_times(inode);
5673+
5674+out:
5675+ unionfs_check_inode(inode);
5676+ if (dentry) {
5677+ unionfs_unlock_dentry(dentry);
5678+ dput(dentry);
5679+ }
5680+ iput(inode_grabbed);
5681+ return err;
5682+}
5683+
5684+static int unionfs_setattr(struct dentry *dentry, struct iattr *ia)
5685+{
5686+ int err = 0;
5687+ struct dentry *lower_dentry;
5688+ struct dentry *parent;
5689+ struct inode *inode;
5690+ struct inode *lower_inode;
5691+ int bstart, bend, bindex;
5692+ loff_t size;
82260373
AM
5693+ struct iattr lower_ia;
5694+
5695+ /* check if user has permission to change inode */
5696+ err = inode_change_ok(dentry->d_inode, ia);
5697+ if (err)
5698+ goto out_err;
2380c486
JR
5699+
5700+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5701+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5702+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5703+
5704+ if (unlikely(!__unionfs_d_revalidate(dentry, parent, false))) {
5705+ err = -ESTALE;
5706+ goto out;
5707+ }
5708+
5709+ bstart = dbstart(dentry);
5710+ bend = dbend(dentry);
5711+ inode = dentry->d_inode;
5712+
5713+ /*
5714+ * mode change is for clearing setuid/setgid. Allow lower filesystem
5715+ * to reinterpret it in its own way.
5716+ */
5717+ if (ia->ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID))
5718+ ia->ia_valid &= ~ATTR_MODE;
5719+
5720+ lower_dentry = unionfs_lower_dentry(dentry);
5721+ if (!lower_dentry) { /* should never happen after above revalidate */
5722+ err = -EINVAL;
5723+ goto out;
5724+ }
5725+ lower_inode = unionfs_lower_inode(inode);
5726+
5727+ /* check if user has permission to change lower inode */
5728+ err = inode_change_ok(lower_inode, ia);
5729+ if (err)
5730+ goto out;
5731+
5732+ /* copyup if the file is on a read only branch */
5733+ if (is_robranch_super(dentry->d_sb, bstart)
4ae1df7a 5734+ || __is_rdonly(lower_inode)) {
2380c486
JR
5735+ /* check if we have a branch to copy up to */
5736+ if (bstart <= 0) {
5737+ err = -EACCES;
5738+ goto out;
5739+ }
5740+
5741+ if (ia->ia_valid & ATTR_SIZE)
5742+ size = ia->ia_size;
5743+ else
5744+ size = i_size_read(inode);
5745+ /* copyup to next available branch */
5746+ for (bindex = bstart - 1; bindex >= 0; bindex--) {
5747+ err = copyup_dentry(parent->d_inode,
5748+ dentry, bstart, bindex,
5749+ dentry->d_name.name,
5750+ dentry->d_name.len,
5751+ NULL, size);
5752+ if (!err)
5753+ break;
5754+ }
5755+ if (err)
5756+ goto out;
5757+ /* get updated lower_dentry/inode after copyup */
5758+ lower_dentry = unionfs_lower_dentry(dentry);
5759+ lower_inode = unionfs_lower_inode(inode);
5760+ }
5761+
5762+ /*
5763+ * If shrinking, first truncate upper level to cancel writing dirty
5764+ * pages beyond the new eof; and also if its' maxbytes is more
5765+ * limiting (fail with -EFBIG before making any change to the lower
5766+ * level). There is no need to vmtruncate the upper level
5767+ * afterwards in the other cases: we fsstack_copy_inode_size from
5768+ * the lower level.
5769+ */
5770+ if (ia->ia_valid & ATTR_SIZE) {
5771+ size = i_size_read(inode);
5772+ if (ia->ia_size < size || (ia->ia_size > size &&
5773+ inode->i_sb->s_maxbytes < lower_inode->i_sb->s_maxbytes)) {
5774+ err = vmtruncate(inode, ia->ia_size);
5775+ if (err)
5776+ goto out;
5777+ }
5778+ }
5779+
5780+ /* notify the (possibly copied-up) lower inode */
4ae1df7a
JR
5781+ /*
5782+ * Note: we use lower_dentry->d_inode, because lower_inode may be
5783+ * unlinked (no inode->i_sb and i_ino==0. This happens if someone
5784+ * tries to open(), unlink(), then ftruncate() a file.
5785+ */
82260373
AM
5786+ /* prepare our own lower struct iattr (with our own lower file) */
5787+ memcpy(&lower_ia, ia, sizeof(lower_ia));
5788+ if (ia->ia_valid & ATTR_FILE) {
5789+ lower_ia.ia_file = unionfs_lower_file(ia->ia_file);
5790+ BUG_ON(!lower_ia.ia_file); // XXX?
5791+ }
5792+
4ae1df7a 5793+ mutex_lock(&lower_dentry->d_inode->i_mutex);
82260373 5794+ err = notify_change(lower_dentry, &lower_ia);
4ae1df7a 5795+ mutex_unlock(&lower_dentry->d_inode->i_mutex);
2380c486
JR
5796+ if (err)
5797+ goto out;
5798+
5799+ /* get attributes from the first lower inode */
4ae1df7a
JR
5800+ if (ibstart(inode) >= 0)
5801+ unionfs_copy_attr_all(inode, lower_inode);
2380c486
JR
5802+ /*
5803+ * unionfs_copy_attr_all will copy the lower times to our inode if
5804+ * the lower ones are newer (useful for cache coherency). However,
5805+ * ->setattr is the only place in which we may have to copy the
5806+ * lower inode times absolutely, to support utimes(2).
5807+ */
5808+ if (ia->ia_valid & ATTR_MTIME_SET)
5809+ inode->i_mtime = lower_inode->i_mtime;
5810+ if (ia->ia_valid & ATTR_CTIME)
5811+ inode->i_ctime = lower_inode->i_ctime;
5812+ if (ia->ia_valid & ATTR_ATIME_SET)
5813+ inode->i_atime = lower_inode->i_atime;
5814+ fsstack_copy_inode_size(inode, lower_inode);
5815+
5816+out:
5817+ if (!err)
5818+ unionfs_check_dentry(dentry);
5819+ unionfs_unlock_dentry(dentry);
5820+ unionfs_unlock_parent(dentry, parent);
5821+ unionfs_read_unlock(dentry->d_sb);
82260373 5822+out_err:
2380c486
JR
5823+ return err;
5824+}
5825+
5826+struct inode_operations unionfs_symlink_iops = {
5827+ .readlink = unionfs_readlink,
5828+ .permission = unionfs_permission,
5829+ .follow_link = unionfs_follow_link,
5830+ .setattr = unionfs_setattr,
5831+ .put_link = unionfs_put_link,
5832+};
5833+
5834+struct inode_operations unionfs_dir_iops = {
5835+ .create = unionfs_create,
5836+ .lookup = unionfs_lookup,
5837+ .link = unionfs_link,
5838+ .unlink = unionfs_unlink,
5839+ .symlink = unionfs_symlink,
5840+ .mkdir = unionfs_mkdir,
5841+ .rmdir = unionfs_rmdir,
5842+ .mknod = unionfs_mknod,
5843+ .rename = unionfs_rename,
5844+ .permission = unionfs_permission,
5845+ .setattr = unionfs_setattr,
5846+#ifdef CONFIG_UNION_FS_XATTR
5847+ .setxattr = unionfs_setxattr,
5848+ .getxattr = unionfs_getxattr,
5849+ .removexattr = unionfs_removexattr,
5850+ .listxattr = unionfs_listxattr,
5851+#endif /* CONFIG_UNION_FS_XATTR */
5852+};
5853+
5854+struct inode_operations unionfs_main_iops = {
5855+ .permission = unionfs_permission,
5856+ .setattr = unionfs_setattr,
5857+#ifdef CONFIG_UNION_FS_XATTR
5858+ .setxattr = unionfs_setxattr,
5859+ .getxattr = unionfs_getxattr,
5860+ .removexattr = unionfs_removexattr,
5861+ .listxattr = unionfs_listxattr,
5862+#endif /* CONFIG_UNION_FS_XATTR */
5863+};
0c5527e5
AM
5864diff --git a/fs/unionfs/lookup.c b/fs/unionfs/lookup.c
5865new file mode 100644
5866index 0000000..b63c17e
5867--- /dev/null
5868+++ b/fs/unionfs/lookup.c
2380c486
JR
5869@@ -0,0 +1,569 @@
5870+/*
7670a7fc 5871+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
5872+ * Copyright (c) 2003-2006 Charles P. Wright
5873+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
5874+ * Copyright (c) 2005-2006 Junjiro Okajima
5875+ * Copyright (c) 2005 Arun M. Krishnakumar
5876+ * Copyright (c) 2004-2006 David P. Quigley
5877+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
5878+ * Copyright (c) 2003 Puja Gupta
5879+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
5880+ * Copyright (c) 2003-2010 Stony Brook University
5881+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
5882+ *
5883+ * This program is free software; you can redistribute it and/or modify
5884+ * it under the terms of the GNU General Public License version 2 as
5885+ * published by the Free Software Foundation.
5886+ */
5887+
5888+#include "union.h"
5889+
5890+/*
5891+ * Lookup one path component @name relative to a <base,mnt> path pair.
5892+ * Behaves nearly the same as lookup_one_len (i.e., return negative dentry
5893+ * on ENOENT), but uses the @mnt passed, so it can cross bind mounts and
5894+ * other lower mounts properly. If @new_mnt is non-null, will fill in the
5895+ * new mnt there. Caller is responsible to dput/mntput/path_put returned
5896+ * @dentry and @new_mnt.
5897+ */
5898+struct dentry *__lookup_one(struct dentry *base, struct vfsmount *mnt,
5899+ const char *name, struct vfsmount **new_mnt)
5900+{
5901+ struct dentry *dentry = NULL;
5902+ struct nameidata lower_nd;
5903+ int err;
5904+
5905+ /* we use flags=0 to get basic lookup */
5906+ err = vfs_path_lookup(base, mnt, name, 0, &lower_nd);
5907+
5908+ switch (err) {
5909+ case 0: /* no error */
5910+ dentry = lower_nd.path.dentry;
5911+ if (new_mnt)
5912+ *new_mnt = lower_nd.path.mnt; /* rc already inc'ed */
5913+ break;
5914+ case -ENOENT:
5915+ /*
5916+ * We don't consider ENOENT an error, and we want to return
5917+ * a negative dentry (ala lookup_one_len). As we know
5918+ * there was no inode for this name before (-ENOENT), then
5919+ * it's safe to call lookup_one_len (which doesn't take a
5920+ * vfsmount).
5921+ */
4ae1df7a 5922+ dentry = lookup_lck_len(name, base, strlen(name));
2380c486
JR
5923+ if (new_mnt)
5924+ *new_mnt = mntget(lower_nd.path.mnt);
5925+ break;
5926+ default: /* all other real errors */
5927+ dentry = ERR_PTR(err);
5928+ break;
5929+ }
5930+
5931+ return dentry;
5932+}
5933+
5934+/*
5935+ * This is a utility function that fills in a unionfs dentry.
5936+ * Caller must lock this dentry with unionfs_lock_dentry.
5937+ *
5938+ * Returns: 0 (ok), or -ERRNO if an error occurred.
5939+ * XXX: get rid of _partial_lookup and make callers call _lookup_full directly
5940+ */
5941+int unionfs_partial_lookup(struct dentry *dentry, struct dentry *parent)
5942+{
5943+ struct dentry *tmp;
5944+ int err = -ENOSYS;
5945+
5946+ tmp = unionfs_lookup_full(dentry, parent, INTERPOSE_PARTIAL);
5947+
5948+ if (!tmp) {
5949+ err = 0;
5950+ goto out;
5951+ }
5952+ if (IS_ERR(tmp)) {
5953+ err = PTR_ERR(tmp);
5954+ goto out;
5955+ }
5956+ /* XXX: need to change the interface */
5957+ BUG_ON(tmp != dentry);
5958+out:
5959+ return err;
5960+}
5961+
5962+/* The dentry cache is just so we have properly sized dentries. */
5963+static struct kmem_cache *unionfs_dentry_cachep;
5964+int unionfs_init_dentry_cache(void)
5965+{
5966+ unionfs_dentry_cachep =
5967+ kmem_cache_create("unionfs_dentry",
5968+ sizeof(struct unionfs_dentry_info),
5969+ 0, SLAB_RECLAIM_ACCOUNT, NULL);
5970+
5971+ return (unionfs_dentry_cachep ? 0 : -ENOMEM);
5972+}
5973+
5974+void unionfs_destroy_dentry_cache(void)
5975+{
5976+ if (unionfs_dentry_cachep)
5977+ kmem_cache_destroy(unionfs_dentry_cachep);
5978+}
5979+
5980+void free_dentry_private_data(struct dentry *dentry)
5981+{
5982+ if (!dentry || !dentry->d_fsdata)
5983+ return;
5984+ kfree(UNIONFS_D(dentry)->lower_paths);
5985+ UNIONFS_D(dentry)->lower_paths = NULL;
5986+ kmem_cache_free(unionfs_dentry_cachep, dentry->d_fsdata);
5987+ dentry->d_fsdata = NULL;
5988+}
5989+
5990+static inline int __realloc_dentry_private_data(struct dentry *dentry)
5991+{
5992+ struct unionfs_dentry_info *info = UNIONFS_D(dentry);
5993+ void *p;
5994+ int size;
5995+
5996+ BUG_ON(!info);
5997+
5998+ size = sizeof(struct path) * sbmax(dentry->d_sb);
5999+ p = krealloc(info->lower_paths, size, GFP_ATOMIC);
6000+ if (unlikely(!p))
6001+ return -ENOMEM;
6002+
6003+ info->lower_paths = p;
6004+
6005+ info->bstart = -1;
6006+ info->bend = -1;
6007+ info->bopaque = -1;
6008+ info->bcount = sbmax(dentry->d_sb);
6009+ atomic_set(&info->generation,
6010+ atomic_read(&UNIONFS_SB(dentry->d_sb)->generation));
6011+
6012+ memset(info->lower_paths, 0, size);
6013+
6014+ return 0;
6015+}
6016+
6017+/* UNIONFS_D(dentry)->lock must be locked */
6018+int realloc_dentry_private_data(struct dentry *dentry)
6019+{
6020+ if (!__realloc_dentry_private_data(dentry))
6021+ return 0;
6022+
6023+ kfree(UNIONFS_D(dentry)->lower_paths);
6024+ free_dentry_private_data(dentry);
6025+ return -ENOMEM;
6026+}
6027+
6028+/* allocate new dentry private data */
6029+int new_dentry_private_data(struct dentry *dentry, int subclass)
6030+{
6031+ struct unionfs_dentry_info *info = UNIONFS_D(dentry);
6032+
6033+ BUG_ON(info);
6034+
6035+ info = kmem_cache_alloc(unionfs_dentry_cachep, GFP_ATOMIC);
6036+ if (unlikely(!info))
6037+ return -ENOMEM;
6038+
6039+ mutex_init(&info->lock);
6040+ mutex_lock_nested(&info->lock, subclass);
6041+
6042+ info->lower_paths = NULL;
6043+
6044+ dentry->d_fsdata = info;
6045+
6046+ if (!__realloc_dentry_private_data(dentry))
6047+ return 0;
6048+
6049+ mutex_unlock(&info->lock);
6050+ free_dentry_private_data(dentry);
6051+ return -ENOMEM;
6052+}
6053+
6054+/*
6055+ * scan through the lower dentry objects, and set bstart to reflect the
6056+ * starting branch
6057+ */
6058+void update_bstart(struct dentry *dentry)
6059+{
6060+ int bindex;
6061+ int bstart = dbstart(dentry);
6062+ int bend = dbend(dentry);
6063+ struct dentry *lower_dentry;
6064+
6065+ for (bindex = bstart; bindex <= bend; bindex++) {
6066+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6067+ if (!lower_dentry)
6068+ continue;
6069+ if (lower_dentry->d_inode) {
6070+ dbstart(dentry) = bindex;
6071+ break;
6072+ }
6073+ dput(lower_dentry);
6074+ unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
6075+ }
6076+}
6077+
6078+
6079+/*
6080+ * Initialize a nameidata structure (the intent part) we can pass to a lower
6081+ * file system. Returns 0 on success or -error (only -ENOMEM possible).
6082+ * Inside that nd structure, this function may also return an allocated
6083+ * struct file (for open intents). The caller, when done with this nd, must
6084+ * kfree the intent file (using release_lower_nd).
6085+ *
6086+ * XXX: this code, and the callers of this code, should be redone using
6087+ * vfs_path_lookup() when (1) the nameidata structure is refactored into a
6088+ * separate intent-structure, and (2) open_namei() is broken into a VFS-only
6089+ * function and a method that other file systems can call.
6090+ */
6091+int init_lower_nd(struct nameidata *nd, unsigned int flags)
6092+{
6093+ int err = 0;
6094+#ifdef ALLOC_LOWER_ND_FILE
6095+ /*
6096+ * XXX: one day we may need to have the lower return an open file
6097+ * for us. It is not needed in 2.6.23-rc1 for nfs2/nfs3, but may
6098+ * very well be needed for nfs4.
6099+ */
6100+ struct file *file;
6101+#endif /* ALLOC_LOWER_ND_FILE */
6102+
6103+ memset(nd, 0, sizeof(struct nameidata));
6104+ if (!flags)
6105+ return err;
6106+
6107+ switch (flags) {
6108+ case LOOKUP_CREATE:
6109+ nd->intent.open.flags |= O_CREAT;
6110+ /* fall through: shared code for create/open cases */
6111+ case LOOKUP_OPEN:
6112+ nd->flags = flags;
6113+ nd->intent.open.flags |= (FMODE_READ | FMODE_WRITE);
6114+#ifdef ALLOC_LOWER_ND_FILE
6115+ file = kzalloc(sizeof(struct file), GFP_KERNEL);
6116+ if (unlikely(!file)) {
6117+ err = -ENOMEM;
6118+ break; /* exit switch statement and thus return */
6119+ }
6120+ nd->intent.open.file = file;
6121+#endif /* ALLOC_LOWER_ND_FILE */
6122+ break;
6123+ default:
6124+ /*
6125+ * We should never get here, for now.
6126+ * We can add new cases here later on.
6127+ */
6128+ pr_debug("unionfs: unknown nameidata flag 0x%x\n", flags);
6129+ BUG();
6130+ break;
6131+ }
6132+
6133+ return err;
6134+}
6135+
6136+void release_lower_nd(struct nameidata *nd, int err)
6137+{
6138+ if (!nd->intent.open.file)
6139+ return;
6140+ else if (!err)
6141+ release_open_intent(nd);
6142+#ifdef ALLOC_LOWER_ND_FILE
6143+ kfree(nd->intent.open.file);
6144+#endif /* ALLOC_LOWER_ND_FILE */
6145+}
6146+
6147+/*
6148+ * Main (and complex) driver function for Unionfs's lookup
6149+ *
6150+ * Returns: NULL (ok), ERR_PTR if an error occurred, or a non-null non-error
6151+ * PTR if d_splice returned a different dentry.
6152+ *
6153+ * If lookupmode is INTERPOSE_PARTIAL/REVAL/REVAL_NEG, the passed dentry's
6154+ * inode info must be locked. If lookupmode is INTERPOSE_LOOKUP (i.e., a
6155+ * newly looked-up dentry), then unionfs_lookup_backend will return a locked
6156+ * dentry's info, which the caller must unlock.
6157+ */
6158+struct dentry *unionfs_lookup_full(struct dentry *dentry,
6159+ struct dentry *parent, int lookupmode)
6160+{
6161+ int err = 0;
6162+ struct dentry *lower_dentry = NULL;
6163+ struct vfsmount *lower_mnt;
6164+ struct vfsmount *lower_dir_mnt;
6165+ struct dentry *wh_lower_dentry = NULL;
6166+ struct dentry *lower_dir_dentry = NULL;
6167+ struct dentry *d_interposed = NULL;
6168+ int bindex, bstart, bend, bopaque;
6169+ int opaque, num_positive = 0;
6170+ const char *name;
6171+ int namelen;
6172+ int pos_start, pos_end;
6173+
6174+ /*
6175+ * We should already have a lock on this dentry in the case of a
6176+ * partial lookup, or a revalidation. Otherwise it is returned from
6177+ * new_dentry_private_data already locked.
6178+ */
6179+ verify_locked(dentry);
6180+ verify_locked(parent);
6181+
6182+ /* must initialize dentry operations */
6183+ dentry->d_op = &unionfs_dops;
6184+
6185+ /* We never partial lookup the root directory. */
6186+ if (IS_ROOT(dentry))
6187+ goto out;
6188+
6189+ name = dentry->d_name.name;
6190+ namelen = dentry->d_name.len;
6191+
6192+ /* No dentries should get created for possible whiteout names. */
6193+ if (!is_validname(name)) {
6194+ err = -EPERM;
6195+ goto out_free;
6196+ }
6197+
6198+ /* Now start the actual lookup procedure. */
6199+ bstart = dbstart(parent);
6200+ bend = dbend(parent);
6201+ bopaque = dbopaque(parent);
6202+ BUG_ON(bstart < 0);
6203+
6204+ /* adjust bend to bopaque if needed */
6205+ if ((bopaque >= 0) && (bopaque < bend))
6206+ bend = bopaque;
6207+
6208+ /* lookup all possible dentries */
6209+ for (bindex = bstart; bindex <= bend; bindex++) {
6210+
6211+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6212+ lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
6213+
6214+ /* skip if we already have a positive lower dentry */
6215+ if (lower_dentry) {
6216+ if (dbstart(dentry) < 0)
6217+ dbstart(dentry) = bindex;
6218+ if (bindex > dbend(dentry))
6219+ dbend(dentry) = bindex;
6220+ if (lower_dentry->d_inode)
6221+ num_positive++;
6222+ continue;
6223+ }
6224+
6225+ lower_dir_dentry =
6226+ unionfs_lower_dentry_idx(parent, bindex);
6227+ /* if the lower dentry's parent does not exist, skip this */
6228+ if (!lower_dir_dentry || !lower_dir_dentry->d_inode)
6229+ continue;
6230+
6231+ /* also skip it if the parent isn't a directory. */
6232+ if (!S_ISDIR(lower_dir_dentry->d_inode->i_mode))
6233+ continue; /* XXX: should be BUG_ON */
6234+
6235+ /* check for whiteouts: stop lookup if found */
6236+ wh_lower_dentry = lookup_whiteout(name, lower_dir_dentry);
6237+ if (IS_ERR(wh_lower_dentry)) {
6238+ err = PTR_ERR(wh_lower_dentry);
6239+ goto out_free;
6240+ }
6241+ if (wh_lower_dentry->d_inode) {
6242+ dbend(dentry) = dbopaque(dentry) = bindex;
6243+ if (dbstart(dentry) < 0)
6244+ dbstart(dentry) = bindex;
6245+ dput(wh_lower_dentry);
6246+ break;
6247+ }
6248+ dput(wh_lower_dentry);
6249+
6250+ /* Now do regular lookup; lookup @name */
6251+ lower_dir_mnt = unionfs_lower_mnt_idx(parent, bindex);
6252+ lower_mnt = NULL; /* XXX: needed? */
6253+
6254+ lower_dentry = __lookup_one(lower_dir_dentry, lower_dir_mnt,
6255+ name, &lower_mnt);
6256+
6257+ if (IS_ERR(lower_dentry)) {
6258+ err = PTR_ERR(lower_dentry);
6259+ goto out_free;
6260+ }
6261+ unionfs_set_lower_dentry_idx(dentry, bindex, lower_dentry);
6262+ if (!lower_mnt)
6263+ lower_mnt = unionfs_mntget(dentry->d_sb->s_root,
6264+ bindex);
6265+ unionfs_set_lower_mnt_idx(dentry, bindex, lower_mnt);
6266+
6267+ /* adjust dbstart/end */
6268+ if (dbstart(dentry) < 0)
6269+ dbstart(dentry) = bindex;
6270+ if (bindex > dbend(dentry))
6271+ dbend(dentry) = bindex;
6272+ /*
6273+ * We always store the lower dentries above, and update
6274+ * dbstart/dbend, even if the whole unionfs dentry is
6275+ * negative (i.e., no lower inodes).
6276+ */
6277+ if (!lower_dentry->d_inode)
6278+ continue;
6279+ num_positive++;
6280+
6281+ /*
6282+ * check if we just found an opaque directory, if so, stop
6283+ * lookups here.
6284+ */
6285+ if (!S_ISDIR(lower_dentry->d_inode->i_mode))
6286+ continue;
6287+ opaque = is_opaque_dir(dentry, bindex);
6288+ if (opaque < 0) {
6289+ err = opaque;
6290+ goto out_free;
6291+ } else if (opaque) {
6292+ dbend(dentry) = dbopaque(dentry) = bindex;
6293+ break;
6294+ }
6295+ dbend(dentry) = bindex;
6296+
6297+ /* update parent directory's atime with the bindex */
6298+ fsstack_copy_attr_atime(parent->d_inode,
6299+ lower_dir_dentry->d_inode);
6300+ }
6301+
6302+ /* sanity checks, then decide if to process a negative dentry */
6303+ BUG_ON(dbstart(dentry) < 0 && dbend(dentry) >= 0);
6304+ BUG_ON(dbstart(dentry) >= 0 && dbend(dentry) < 0);
6305+
6306+ if (num_positive > 0)
6307+ goto out_positive;
6308+
6309+ /*** handle NEGATIVE dentries ***/
6310+
6311+ /*
6312+ * If negative, keep only first lower negative dentry, to save on
6313+ * memory.
6314+ */
6315+ if (dbstart(dentry) < dbend(dentry)) {
6316+ path_put_lowers(dentry, dbstart(dentry) + 1,
6317+ dbend(dentry), false);
6318+ dbend(dentry) = dbstart(dentry);
6319+ }
6320+ if (lookupmode == INTERPOSE_PARTIAL)
6321+ goto out;
6322+ if (lookupmode == INTERPOSE_LOOKUP) {
6323+ /*
6324+ * If all we found was a whiteout in the first available
6325+ * branch, then create a negative dentry for a possibly new
6326+ * file to be created.
6327+ */
6328+ if (dbopaque(dentry) < 0)
6329+ goto out;
6330+ /* XXX: need to get mnt here */
6331+ bindex = dbstart(dentry);
6332+ if (unionfs_lower_dentry_idx(dentry, bindex))
6333+ goto out;
6334+ lower_dir_dentry =
6335+ unionfs_lower_dentry_idx(parent, bindex);
6336+ if (!lower_dir_dentry || !lower_dir_dentry->d_inode)
6337+ goto out;
6338+ if (!S_ISDIR(lower_dir_dentry->d_inode->i_mode))
6339+ goto out; /* XXX: should be BUG_ON */
6340+ /* XXX: do we need to cross bind mounts here? */
4ae1df7a 6341+ lower_dentry = lookup_lck_len(name, lower_dir_dentry, namelen);
2380c486
JR
6342+ if (IS_ERR(lower_dentry)) {
6343+ err = PTR_ERR(lower_dentry);
6344+ goto out;
6345+ }
6346+ /* XXX: need to mntget/mntput as needed too! */
6347+ unionfs_set_lower_dentry_idx(dentry, bindex, lower_dentry);
6348+ /* XXX: wrong mnt for crossing bind mounts! */
6349+ lower_mnt = unionfs_mntget(dentry->d_sb->s_root, bindex);
6350+ unionfs_set_lower_mnt_idx(dentry, bindex, lower_mnt);
6351+
6352+ goto out;
6353+ }
6354+
6355+ /* if we're revalidating a positive dentry, don't make it negative */
6356+ if (lookupmode != INTERPOSE_REVAL)
6357+ d_add(dentry, NULL);
6358+
6359+ goto out;
6360+
6361+out_positive:
6362+ /*** handle POSITIVE dentries ***/
6363+
6364+ /*
6365+ * This unionfs dentry is positive (at least one lower inode
6366+ * exists), so scan entire dentry from beginning to end, and remove
6367+ * any negative lower dentries, if any. Then, update dbstart/dbend
6368+ * to reflect the start/end of positive dentries.
6369+ */
6370+ pos_start = pos_end = -1;
6371+ for (bindex = bstart; bindex <= bend; bindex++) {
6372+ lower_dentry = unionfs_lower_dentry_idx(dentry,
6373+ bindex);
6374+ if (lower_dentry && lower_dentry->d_inode) {
6375+ if (pos_start < 0)
6376+ pos_start = bindex;
6377+ if (bindex > pos_end)
6378+ pos_end = bindex;
6379+ continue;
6380+ }
6381+ path_put_lowers(dentry, bindex, bindex, false);
6382+ }
6383+ if (pos_start >= 0)
6384+ dbstart(dentry) = pos_start;
6385+ if (pos_end >= 0)
6386+ dbend(dentry) = pos_end;
6387+
6388+ /* Partial lookups need to re-interpose, or throw away older negs. */
6389+ if (lookupmode == INTERPOSE_PARTIAL) {
6390+ if (dentry->d_inode) {
6391+ unionfs_reinterpose(dentry);
6392+ goto out;
6393+ }
6394+
6395+ /*
6396+ * This dentry was positive, so it is as if we had a
6397+ * negative revalidation.
6398+ */
6399+ lookupmode = INTERPOSE_REVAL_NEG;
6400+ update_bstart(dentry);
6401+ }
6402+
6403+ /*
6404+ * Interpose can return a dentry if d_splice returned a different
6405+ * dentry.
6406+ */
6407+ d_interposed = unionfs_interpose(dentry, dentry->d_sb, lookupmode);
6408+ if (IS_ERR(d_interposed))
6409+ err = PTR_ERR(d_interposed);
6410+ else if (d_interposed)
6411+ dentry = d_interposed;
6412+
6413+ if (!err)
6414+ goto out;
6415+ d_drop(dentry);
6416+
6417+out_free:
6418+ /* should dput/mntput all the underlying dentries on error condition */
6419+ if (dbstart(dentry) >= 0)
6420+ path_put_lowers_all(dentry, false);
6421+ /* free lower_paths unconditionally */
6422+ kfree(UNIONFS_D(dentry)->lower_paths);
6423+ UNIONFS_D(dentry)->lower_paths = NULL;
6424+
6425+out:
6426+ if (dentry && UNIONFS_D(dentry)) {
6427+ BUG_ON(dbstart(dentry) < 0 && dbend(dentry) >= 0);
6428+ BUG_ON(dbstart(dentry) >= 0 && dbend(dentry) < 0);
6429+ }
6430+ if (d_interposed && UNIONFS_D(d_interposed)) {
6431+ BUG_ON(dbstart(d_interposed) < 0 && dbend(d_interposed) >= 0);
6432+ BUG_ON(dbstart(d_interposed) >= 0 && dbend(d_interposed) < 0);
6433+ }
6434+
6435+ if (!err && d_interposed)
6436+ return d_interposed;
6437+ return ERR_PTR(err);
6438+}
0c5527e5
AM
6439diff --git a/fs/unionfs/main.c b/fs/unionfs/main.c
6440new file mode 100644
82260373 6441index 0000000..9ee58eb
0c5527e5
AM
6442--- /dev/null
6443+++ b/fs/unionfs/main.c
82260373 6444@@ -0,0 +1,762 @@
2380c486 6445+/*
7670a7fc 6446+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
6447+ * Copyright (c) 2003-2006 Charles P. Wright
6448+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
6449+ * Copyright (c) 2005-2006 Junjiro Okajima
6450+ * Copyright (c) 2005 Arun M. Krishnakumar
6451+ * Copyright (c) 2004-2006 David P. Quigley
6452+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
6453+ * Copyright (c) 2003 Puja Gupta
6454+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
6455+ * Copyright (c) 2003-2010 Stony Brook University
6456+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
6457+ *
6458+ * This program is free software; you can redistribute it and/or modify
6459+ * it under the terms of the GNU General Public License version 2 as
6460+ * published by the Free Software Foundation.
6461+ */
6462+
6463+#include "union.h"
6464+#include <linux/module.h>
6465+#include <linux/moduleparam.h>
6466+
6467+static void unionfs_fill_inode(struct dentry *dentry,
6468+ struct inode *inode)
6469+{
6470+ struct inode *lower_inode;
6471+ struct dentry *lower_dentry;
6472+ int bindex, bstart, bend;
6473+
6474+ bstart = dbstart(dentry);
6475+ bend = dbend(dentry);
6476+
6477+ for (bindex = bstart; bindex <= bend; bindex++) {
6478+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6479+ if (!lower_dentry) {
6480+ unionfs_set_lower_inode_idx(inode, bindex, NULL);
6481+ continue;
6482+ }
6483+
6484+ /* Initialize the lower inode to the new lower inode. */
6485+ if (!lower_dentry->d_inode)
6486+ continue;
6487+
6488+ unionfs_set_lower_inode_idx(inode, bindex,
6489+ igrab(lower_dentry->d_inode));
6490+ }
6491+
6492+ ibstart(inode) = dbstart(dentry);
6493+ ibend(inode) = dbend(dentry);
6494+
6495+ /* Use attributes from the first branch. */
6496+ lower_inode = unionfs_lower_inode(inode);
6497+
6498+ /* Use different set of inode ops for symlinks & directories */
6499+ if (S_ISLNK(lower_inode->i_mode))
6500+ inode->i_op = &unionfs_symlink_iops;
6501+ else if (S_ISDIR(lower_inode->i_mode))
6502+ inode->i_op = &unionfs_dir_iops;
6503+
6504+ /* Use different set of file ops for directories */
6505+ if (S_ISDIR(lower_inode->i_mode))
6506+ inode->i_fop = &unionfs_dir_fops;
6507+
6508+ /* properly initialize special inodes */
6509+ if (S_ISBLK(lower_inode->i_mode) || S_ISCHR(lower_inode->i_mode) ||
6510+ S_ISFIFO(lower_inode->i_mode) || S_ISSOCK(lower_inode->i_mode))
6511+ init_special_inode(inode, lower_inode->i_mode,
6512+ lower_inode->i_rdev);
6513+
6514+ /* all well, copy inode attributes */
6515+ unionfs_copy_attr_all(inode, lower_inode);
6516+ fsstack_copy_inode_size(inode, lower_inode);
6517+}
6518+
6519+/*
6520+ * Connect a unionfs inode dentry/inode with several lower ones. This is
6521+ * the classic stackable file system "vnode interposition" action.
6522+ *
6523+ * @sb: unionfs's super_block
6524+ */
6525+struct dentry *unionfs_interpose(struct dentry *dentry, struct super_block *sb,
6526+ int flag)
6527+{
6528+ int err = 0;
6529+ struct inode *inode;
6530+ int need_fill_inode = 1;
6531+ struct dentry *spliced = NULL;
6532+
6533+ verify_locked(dentry);
6534+
6535+ /*
6536+ * We allocate our new inode below by calling unionfs_iget,
6537+ * which will initialize some of the new inode's fields
6538+ */
6539+
6540+ /*
6541+ * On revalidate we've already got our own inode and just need
6542+ * to fix it up.
6543+ */
6544+ if (flag == INTERPOSE_REVAL) {
6545+ inode = dentry->d_inode;
6546+ UNIONFS_I(inode)->bstart = -1;
6547+ UNIONFS_I(inode)->bend = -1;
6548+ atomic_set(&UNIONFS_I(inode)->generation,
6549+ atomic_read(&UNIONFS_SB(sb)->generation));
6550+
6551+ UNIONFS_I(inode)->lower_inodes =
6552+ kcalloc(sbmax(sb), sizeof(struct inode *), GFP_KERNEL);
6553+ if (unlikely(!UNIONFS_I(inode)->lower_inodes)) {
6554+ err = -ENOMEM;
6555+ goto out;
6556+ }
6557+ } else {
6558+ /* get unique inode number for unionfs */
6559+ inode = unionfs_iget(sb, iunique(sb, UNIONFS_ROOT_INO));
6560+ if (IS_ERR(inode)) {
6561+ err = PTR_ERR(inode);
6562+ goto out;
6563+ }
6564+ if (atomic_read(&inode->i_count) > 1)
6565+ goto skip;
6566+ }
6567+
6568+ need_fill_inode = 0;
6569+ unionfs_fill_inode(dentry, inode);
6570+
6571+skip:
6572+ /* only (our) lookup wants to do a d_add */
6573+ switch (flag) {
6574+ case INTERPOSE_DEFAULT:
6575+ /* for operations which create new inodes */
6576+ d_add(dentry, inode);
6577+ break;
6578+ case INTERPOSE_REVAL_NEG:
6579+ d_instantiate(dentry, inode);
6580+ break;
6581+ case INTERPOSE_LOOKUP:
6582+ spliced = d_splice_alias(inode, dentry);
6583+ if (spliced && spliced != dentry) {
6584+ /*
6585+ * d_splice can return a dentry if it was
6586+ * disconnected and had to be moved. We must ensure
6587+ * that the private data of the new dentry is
6588+ * correct and that the inode info was filled
6589+ * properly. Finally we must return this new
6590+ * dentry.
6591+ */
6592+ spliced->d_op = &unionfs_dops;
6593+ spliced->d_fsdata = dentry->d_fsdata;
6594+ dentry->d_fsdata = NULL;
6595+ dentry = spliced;
6596+ if (need_fill_inode) {
6597+ need_fill_inode = 0;
6598+ unionfs_fill_inode(dentry, inode);
6599+ }
6600+ goto out_spliced;
6601+ } else if (!spliced) {
6602+ if (need_fill_inode) {
6603+ need_fill_inode = 0;
6604+ unionfs_fill_inode(dentry, inode);
6605+ goto out_spliced;
6606+ }
6607+ }
6608+ break;
6609+ case INTERPOSE_REVAL:
6610+ /* Do nothing. */
6611+ break;
6612+ default:
6613+ printk(KERN_CRIT "unionfs: invalid interpose flag passed!\n");
6614+ BUG();
6615+ }
6616+ goto out;
6617+
6618+out_spliced:
6619+ if (!err)
6620+ return spliced;
6621+out:
6622+ return ERR_PTR(err);
6623+}
6624+
6625+/* like interpose above, but for an already existing dentry */
6626+void unionfs_reinterpose(struct dentry *dentry)
6627+{
6628+ struct dentry *lower_dentry;
6629+ struct inode *inode;
6630+ int bindex, bstart, bend;
6631+
6632+ verify_locked(dentry);
6633+
6634+ /* This is pre-allocated inode */
6635+ inode = dentry->d_inode;
6636+
6637+ bstart = dbstart(dentry);
6638+ bend = dbend(dentry);
6639+ for (bindex = bstart; bindex <= bend; bindex++) {
6640+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6641+ if (!lower_dentry)
6642+ continue;
6643+
6644+ if (!lower_dentry->d_inode)
6645+ continue;
6646+ if (unionfs_lower_inode_idx(inode, bindex))
6647+ continue;
6648+ unionfs_set_lower_inode_idx(inode, bindex,
6649+ igrab(lower_dentry->d_inode));
6650+ }
6651+ ibstart(inode) = dbstart(dentry);
6652+ ibend(inode) = dbend(dentry);
6653+}
6654+
6655+/*
6656+ * make sure the branch we just looked up (nd) makes sense:
6657+ *
6658+ * 1) we're not trying to stack unionfs on top of unionfs
6659+ * 2) it exists
6660+ * 3) is a directory
6661+ */
6662+int check_branch(struct nameidata *nd)
6663+{
6664+ /* XXX: remove in ODF code -- stacking unions allowed there */
6665+ if (!strcmp(nd->path.dentry->d_sb->s_type->name, UNIONFS_NAME))
6666+ return -EINVAL;
6667+ if (!nd->path.dentry->d_inode)
6668+ return -ENOENT;
6669+ if (!S_ISDIR(nd->path.dentry->d_inode->i_mode))
6670+ return -ENOTDIR;
6671+ return 0;
6672+}
6673+
6674+/* checks if two lower_dentries have overlapping branches */
6675+static int is_branch_overlap(struct dentry *dent1, struct dentry *dent2)
6676+{
6677+ struct dentry *dent = NULL;
6678+
6679+ dent = dent1;
6680+ while ((dent != dent2) && (dent->d_parent != dent))
6681+ dent = dent->d_parent;
6682+
6683+ if (dent == dent2)
6684+ return 1;
6685+
6686+ dent = dent2;
6687+ while ((dent != dent1) && (dent->d_parent != dent))
6688+ dent = dent->d_parent;
6689+
6690+ return (dent == dent1);
6691+}
6692+
6693+/*
6694+ * Parse "ro" or "rw" options, but default to "rw" if no mode options was
6695+ * specified. Fill the mode bits in @perms. If encounter an unknown
6696+ * string, return -EINVAL. Otherwise return 0.
6697+ */
6698+int parse_branch_mode(const char *name, int *perms)
6699+{
6700+ if (!name || !strcmp(name, "rw")) {
6701+ *perms = MAY_READ | MAY_WRITE;
6702+ return 0;
6703+ }
6704+ if (!strcmp(name, "ro")) {
6705+ *perms = MAY_READ;
6706+ return 0;
6707+ }
6708+ return -EINVAL;
6709+}
6710+
6711+/*
6712+ * parse the dirs= mount argument
6713+ *
6714+ * We don't need to lock the superblock private data's rwsem, as we get
6715+ * called only by unionfs_read_super - it is still a long time before anyone
6716+ * can even get a reference to us.
6717+ */
6718+static int parse_dirs_option(struct super_block *sb, struct unionfs_dentry_info
6719+ *lower_root_info, char *options)
6720+{
6721+ struct nameidata nd;
6722+ char *name;
6723+ int err = 0;
6724+ int branches = 1;
6725+ int bindex = 0;
6726+ int i = 0;
6727+ int j = 0;
6728+ struct dentry *dent1;
6729+ struct dentry *dent2;
6730+
6731+ if (options[0] == '\0') {
6732+ printk(KERN_ERR "unionfs: no branches specified\n");
6733+ err = -EINVAL;
82260373 6734+ goto out_return;
2380c486
JR
6735+ }
6736+
6737+ /*
6738+ * Each colon means we have a separator, this is really just a rough
6739+ * guess, since strsep will handle empty fields for us.
6740+ */
6741+ for (i = 0; options[i]; i++)
6742+ if (options[i] == ':')
6743+ branches++;
6744+
6745+ /* allocate space for underlying pointers to lower dentry */
6746+ UNIONFS_SB(sb)->data =
6747+ kcalloc(branches, sizeof(struct unionfs_data), GFP_KERNEL);
6748+ if (unlikely(!UNIONFS_SB(sb)->data)) {
6749+ err = -ENOMEM;
82260373 6750+ goto out_return;
2380c486
JR
6751+ }
6752+
6753+ lower_root_info->lower_paths =
6754+ kcalloc(branches, sizeof(struct path), GFP_KERNEL);
6755+ if (unlikely(!lower_root_info->lower_paths)) {
6756+ err = -ENOMEM;
82260373
AM
6757+ /* free the underlying pointer array */
6758+ kfree(UNIONFS_SB(sb)->data);
6759+ UNIONFS_SB(sb)->data = NULL;
6760+ goto out_return;
2380c486
JR
6761+ }
6762+
6763+ /* now parsing a string such as "b1:b2=rw:b3=ro:b4" */
6764+ branches = 0;
6765+ while ((name = strsep(&options, ":")) != NULL) {
6766+ int perms;
6767+ char *mode = strchr(name, '=');
6768+
6769+ if (!name)
6770+ continue;
6771+ if (!*name) { /* bad use of ':' (extra colons) */
6772+ err = -EINVAL;
6773+ goto out;
6774+ }
6775+
6776+ branches++;
6777+
6778+ /* strip off '=' if any */
6779+ if (mode)
6780+ *mode++ = '\0';
6781+
6782+ err = parse_branch_mode(mode, &perms);
6783+ if (err) {
6784+ printk(KERN_ERR "unionfs: invalid mode \"%s\" for "
6785+ "branch %d\n", mode, bindex);
6786+ goto out;
6787+ }
6788+ /* ensure that leftmost branch is writeable */
6789+ if (!bindex && !(perms & MAY_WRITE)) {
6790+ printk(KERN_ERR "unionfs: leftmost branch cannot be "
6791+ "read-only (use \"-o ro\" to create a "
6792+ "read-only union)\n");
6793+ err = -EINVAL;
6794+ goto out;
6795+ }
6796+
6797+ err = path_lookup(name, LOOKUP_FOLLOW, &nd);
6798+ if (err) {
6799+ printk(KERN_ERR "unionfs: error accessing "
6800+ "lower directory '%s' (error %d)\n",
6801+ name, err);
6802+ goto out;
6803+ }
6804+
6805+ err = check_branch(&nd);
6806+ if (err) {
6807+ printk(KERN_ERR "unionfs: lower directory "
6808+ "'%s' is not a valid branch\n", name);
6809+ path_put(&nd.path);
6810+ goto out;
6811+ }
6812+
6813+ lower_root_info->lower_paths[bindex].dentry = nd.path.dentry;
6814+ lower_root_info->lower_paths[bindex].mnt = nd.path.mnt;
6815+
6816+ set_branchperms(sb, bindex, perms);
6817+ set_branch_count(sb, bindex, 0);
6818+ new_branch_id(sb, bindex);
6819+
6820+ if (lower_root_info->bstart < 0)
6821+ lower_root_info->bstart = bindex;
6822+ lower_root_info->bend = bindex;
6823+ bindex++;
6824+ }
6825+
6826+ if (branches == 0) {
6827+ printk(KERN_ERR "unionfs: no branches specified\n");
6828+ err = -EINVAL;
6829+ goto out;
6830+ }
6831+
6832+ BUG_ON(branches != (lower_root_info->bend + 1));
6833+
6834+ /*
6835+ * Ensure that no overlaps exist in the branches.
6836+ *
6837+ * This test is required because the Linux kernel has no support
6838+ * currently for ensuring coherency between stackable layers and
6839+ * branches. If we were to allow overlapping branches, it would be
6840+ * possible, for example, to delete a file via one branch, which
6841+ * would not be reflected in another branch. Such incoherency could
6842+ * lead to inconsistencies and even kernel oopses. Rather than
6843+ * implement hacks to work around some of these cache-coherency
6844+ * problems, we prevent branch overlapping, for now. A complete
6845+ * solution will involve proper kernel/VFS support for cache
6846+ * coherency, at which time we could safely remove this
6847+ * branch-overlapping test.
6848+ */
6849+ for (i = 0; i < branches; i++) {
6850+ dent1 = lower_root_info->lower_paths[i].dentry;
6851+ for (j = i + 1; j < branches; j++) {
6852+ dent2 = lower_root_info->lower_paths[j].dentry;
6853+ if (is_branch_overlap(dent1, dent2)) {
6854+ printk(KERN_ERR "unionfs: branches %d and "
6855+ "%d overlap\n", i, j);
6856+ err = -EINVAL;
6857+ goto out;
6858+ }
6859+ }
6860+ }
6861+
6862+out:
6863+ if (err) {
6864+ for (i = 0; i < branches; i++)
6865+ path_put(&lower_root_info->lower_paths[i]);
6866+
6867+ kfree(lower_root_info->lower_paths);
6868+ kfree(UNIONFS_SB(sb)->data);
6869+
6870+ /*
6871+ * MUST clear the pointers to prevent potential double free if
6872+ * the caller dies later on
6873+ */
6874+ lower_root_info->lower_paths = NULL;
6875+ UNIONFS_SB(sb)->data = NULL;
6876+ }
82260373 6877+out_return:
2380c486
JR
6878+ return err;
6879+}
6880+
6881+/*
6882+ * Parse mount options. See the manual page for usage instructions.
6883+ *
6884+ * Returns the dentry object of the lower-level (lower) directory;
6885+ * We want to mount our stackable file system on top of that lower directory.
6886+ */
6887+static struct unionfs_dentry_info *unionfs_parse_options(
6888+ struct super_block *sb,
6889+ char *options)
6890+{
6891+ struct unionfs_dentry_info *lower_root_info;
6892+ char *optname;
6893+ int err = 0;
6894+ int bindex;
6895+ int dirsfound = 0;
6896+
6897+ /* allocate private data area */
6898+ err = -ENOMEM;
6899+ lower_root_info =
6900+ kzalloc(sizeof(struct unionfs_dentry_info), GFP_KERNEL);
6901+ if (unlikely(!lower_root_info))
6902+ goto out_error;
6903+ lower_root_info->bstart = -1;
6904+ lower_root_info->bend = -1;
6905+ lower_root_info->bopaque = -1;
6906+
6907+ while ((optname = strsep(&options, ",")) != NULL) {
6908+ char *optarg;
6909+
6910+ if (!optname || !*optname)
6911+ continue;
6912+
6913+ optarg = strchr(optname, '=');
6914+ if (optarg)
6915+ *optarg++ = '\0';
6916+
6917+ /*
6918+ * All of our options take an argument now. Insert ones that
6919+ * don't, above this check.
6920+ */
6921+ if (!optarg) {
6922+ printk(KERN_ERR "unionfs: %s requires an argument\n",
6923+ optname);
6924+ err = -EINVAL;
6925+ goto out_error;
6926+ }
6927+
6928+ if (!strcmp("dirs", optname)) {
6929+ if (++dirsfound > 1) {
6930+ printk(KERN_ERR
6931+ "unionfs: multiple dirs specified\n");
6932+ err = -EINVAL;
6933+ goto out_error;
6934+ }
6935+ err = parse_dirs_option(sb, lower_root_info, optarg);
6936+ if (err)
6937+ goto out_error;
6938+ continue;
6939+ }
6940+
6941+ err = -EINVAL;
6942+ printk(KERN_ERR
6943+ "unionfs: unrecognized option '%s'\n", optname);
6944+ goto out_error;
6945+ }
6946+ if (dirsfound != 1) {
6947+ printk(KERN_ERR "unionfs: dirs option required\n");
6948+ err = -EINVAL;
6949+ goto out_error;
6950+ }
6951+ goto out;
6952+
6953+out_error:
6954+ if (lower_root_info && lower_root_info->lower_paths) {
6955+ for (bindex = lower_root_info->bstart;
6956+ bindex >= 0 && bindex <= lower_root_info->bend;
6957+ bindex++)
6958+ path_put(&lower_root_info->lower_paths[bindex]);
6959+ }
6960+
6961+ kfree(lower_root_info->lower_paths);
6962+ kfree(lower_root_info);
6963+
6964+ kfree(UNIONFS_SB(sb)->data);
6965+ UNIONFS_SB(sb)->data = NULL;
6966+
6967+ lower_root_info = ERR_PTR(err);
6968+out:
6969+ return lower_root_info;
6970+}
6971+
6972+/*
6973+ * our custom d_alloc_root work-alike
6974+ *
6975+ * we can't use d_alloc_root if we want to use our own interpose function
6976+ * unchanged, so we simply call our own "fake" d_alloc_root
6977+ */
6978+static struct dentry *unionfs_d_alloc_root(struct super_block *sb)
6979+{
6980+ struct dentry *ret = NULL;
6981+
6982+ if (sb) {
6983+ static const struct qstr name = {
6984+ .name = "/",
6985+ .len = 1
6986+ };
6987+
6988+ ret = d_alloc(NULL, &name);
6989+ if (likely(ret)) {
6990+ ret->d_op = &unionfs_dops;
6991+ ret->d_sb = sb;
6992+ ret->d_parent = ret;
6993+ }
6994+ }
6995+ return ret;
6996+}
6997+
6998+/*
6999+ * There is no need to lock the unionfs_super_info's rwsem as there is no
7000+ * way anyone can have a reference to the superblock at this point in time.
7001+ */
7002+static int unionfs_read_super(struct super_block *sb, void *raw_data,
7003+ int silent)
7004+{
7005+ int err = 0;
7006+ struct unionfs_dentry_info *lower_root_info = NULL;
7007+ int bindex, bstart, bend;
7008+
7009+ if (!raw_data) {
7010+ printk(KERN_ERR
7011+ "unionfs: read_super: missing data argument\n");
7012+ err = -EINVAL;
7013+ goto out;
7014+ }
7015+
7016+ /* Allocate superblock private data */
7017+ sb->s_fs_info = kzalloc(sizeof(struct unionfs_sb_info), GFP_KERNEL);
7018+ if (unlikely(!UNIONFS_SB(sb))) {
7019+ printk(KERN_CRIT "unionfs: read_super: out of memory\n");
7020+ err = -ENOMEM;
7021+ goto out;
7022+ }
7023+
7024+ UNIONFS_SB(sb)->bend = -1;
7025+ atomic_set(&UNIONFS_SB(sb)->generation, 1);
7026+ init_rwsem(&UNIONFS_SB(sb)->rwsem);
7027+ UNIONFS_SB(sb)->high_branch_id = -1; /* -1 == invalid branch ID */
7028+
7029+ lower_root_info = unionfs_parse_options(sb, raw_data);
7030+ if (IS_ERR(lower_root_info)) {
7031+ printk(KERN_ERR
7032+ "unionfs: read_super: error while parsing options "
7033+ "(err = %ld)\n", PTR_ERR(lower_root_info));
7034+ err = PTR_ERR(lower_root_info);
7035+ lower_root_info = NULL;
7036+ goto out_free;
7037+ }
7038+ if (lower_root_info->bstart == -1) {
7039+ err = -ENOENT;
7040+ goto out_free;
7041+ }
7042+
7043+ /* set the lower superblock field of upper superblock */
7044+ bstart = lower_root_info->bstart;
7045+ BUG_ON(bstart != 0);
7046+ sbend(sb) = bend = lower_root_info->bend;
7047+ for (bindex = bstart; bindex <= bend; bindex++) {
7048+ struct dentry *d = lower_root_info->lower_paths[bindex].dentry;
7049+ atomic_inc(&d->d_sb->s_active);
7050+ unionfs_set_lower_super_idx(sb, bindex, d->d_sb);
7051+ }
7052+
7053+ /* max Bytes is the maximum bytes from highest priority branch */
7054+ sb->s_maxbytes = unionfs_lower_super_idx(sb, 0)->s_maxbytes;
7055+
7056+ /*
7057+ * Our c/m/atime granularity is 1 ns because we may stack on file
7058+ * systems whose granularity is as good. This is important for our
7059+ * time-based cache coherency.
7060+ */
7061+ sb->s_time_gran = 1;
7062+
7063+ sb->s_op = &unionfs_sops;
7064+
7065+ /* See comment next to the definition of unionfs_d_alloc_root */
7066+ sb->s_root = unionfs_d_alloc_root(sb);
7067+ if (unlikely(!sb->s_root)) {
7068+ err = -ENOMEM;
7069+ goto out_dput;
7070+ }
7071+
7072+ /* link the upper and lower dentries */
7073+ sb->s_root->d_fsdata = NULL;
7074+ err = new_dentry_private_data(sb->s_root, UNIONFS_DMUTEX_ROOT);
7075+ if (unlikely(err))
7076+ goto out_freedpd;
7077+
7078+ /* Set the lower dentries for s_root */
7079+ for (bindex = bstart; bindex <= bend; bindex++) {
7080+ struct dentry *d;
7081+ struct vfsmount *m;
7082+
7083+ d = lower_root_info->lower_paths[bindex].dentry;
7084+ m = lower_root_info->lower_paths[bindex].mnt;
7085+
7086+ unionfs_set_lower_dentry_idx(sb->s_root, bindex, d);
7087+ unionfs_set_lower_mnt_idx(sb->s_root, bindex, m);
7088+ }
7089+ dbstart(sb->s_root) = bstart;
7090+ dbend(sb->s_root) = bend;
7091+
7092+ /* Set the generation number to one, since this is for the mount. */
7093+ atomic_set(&UNIONFS_D(sb->s_root)->generation, 1);
7094+
7095+ /*
7096+ * Call interpose to create the upper level inode. Only
7097+ * INTERPOSE_LOOKUP can return a value other than 0 on err.
7098+ */
7099+ err = PTR_ERR(unionfs_interpose(sb->s_root, sb, 0));
7100+ unionfs_unlock_dentry(sb->s_root);
7101+ if (!err)
7102+ goto out;
7103+ /* else fall through */
7104+
7105+out_freedpd:
7106+ if (UNIONFS_D(sb->s_root)) {
7107+ kfree(UNIONFS_D(sb->s_root)->lower_paths);
7108+ free_dentry_private_data(sb->s_root);
7109+ }
7110+ dput(sb->s_root);
7111+
7112+out_dput:
7113+ if (lower_root_info && !IS_ERR(lower_root_info)) {
7114+ for (bindex = lower_root_info->bstart;
7115+ bindex <= lower_root_info->bend; bindex++) {
7116+ struct dentry *d;
7117+ d = lower_root_info->lower_paths[bindex].dentry;
7118+ /* drop refs we took earlier */
7119+ atomic_dec(&d->d_sb->s_active);
7120+ path_put(&lower_root_info->lower_paths[bindex]);
7121+ }
7122+ kfree(lower_root_info->lower_paths);
7123+ kfree(lower_root_info);
7124+ lower_root_info = NULL;
7125+ }
7126+
7127+out_free:
7128+ kfree(UNIONFS_SB(sb)->data);
7129+ kfree(UNIONFS_SB(sb));
7130+ sb->s_fs_info = NULL;
7131+
7132+out:
7133+ if (lower_root_info && !IS_ERR(lower_root_info)) {
7134+ kfree(lower_root_info->lower_paths);
7135+ kfree(lower_root_info);
7136+ }
7137+ return err;
7138+}
7139+
7140+static int unionfs_get_sb(struct file_system_type *fs_type,
7141+ int flags, const char *dev_name,
7142+ void *raw_data, struct vfsmount *mnt)
7143+{
7144+ int err;
7145+ err = get_sb_nodev(fs_type, flags, raw_data, unionfs_read_super, mnt);
7146+ if (!err)
7147+ UNIONFS_SB(mnt->mnt_sb)->dev_name =
7148+ kstrdup(dev_name, GFP_KERNEL);
7149+ return err;
7150+}
7151+
7152+static struct file_system_type unionfs_fs_type = {
7153+ .owner = THIS_MODULE,
7154+ .name = UNIONFS_NAME,
7155+ .get_sb = unionfs_get_sb,
7156+ .kill_sb = generic_shutdown_super,
7157+ .fs_flags = FS_REVAL_DOT,
7158+};
7159+
7160+static int __init init_unionfs_fs(void)
7161+{
7162+ int err;
7163+
7164+ pr_info("Registering unionfs " UNIONFS_VERSION "\n");
7165+
7166+ err = unionfs_init_filldir_cache();
7167+ if (unlikely(err))
7168+ goto out;
7169+ err = unionfs_init_inode_cache();
7170+ if (unlikely(err))
7171+ goto out;
7172+ err = unionfs_init_dentry_cache();
7173+ if (unlikely(err))
7174+ goto out;
7175+ err = init_sioq();
7176+ if (unlikely(err))
7177+ goto out;
7178+ err = register_filesystem(&unionfs_fs_type);
7179+out:
7180+ if (unlikely(err)) {
7181+ stop_sioq();
7182+ unionfs_destroy_filldir_cache();
7183+ unionfs_destroy_inode_cache();
7184+ unionfs_destroy_dentry_cache();
7185+ }
7186+ return err;
7187+}
7188+
7189+static void __exit exit_unionfs_fs(void)
7190+{
7191+ stop_sioq();
7192+ unionfs_destroy_filldir_cache();
7193+ unionfs_destroy_inode_cache();
7194+ unionfs_destroy_dentry_cache();
7195+ unregister_filesystem(&unionfs_fs_type);
7196+ pr_info("Completed unionfs module unload\n");
7197+}
7198+
7199+MODULE_AUTHOR("Erez Zadok, Filesystems and Storage Lab, Stony Brook University"
7200+ " (http://www.fsl.cs.sunysb.edu)");
7201+MODULE_DESCRIPTION("Unionfs " UNIONFS_VERSION
7202+ " (http://unionfs.filesystems.org)");
7203+MODULE_LICENSE("GPL");
7204+
7205+module_init(init_unionfs_fs);
7206+module_exit(exit_unionfs_fs);
0c5527e5
AM
7207diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c
7208new file mode 100644
7209index 0000000..1f70535
7210--- /dev/null
7211+++ b/fs/unionfs/mmap.c
2380c486
JR
7212@@ -0,0 +1,89 @@
7213+/*
7670a7fc 7214+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
7215+ * Copyright (c) 2003-2006 Charles P. Wright
7216+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7217+ * Copyright (c) 2005-2006 Junjiro Okajima
7218+ * Copyright (c) 2006 Shaya Potter
7219+ * Copyright (c) 2005 Arun M. Krishnakumar
7220+ * Copyright (c) 2004-2006 David P. Quigley
7221+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7222+ * Copyright (c) 2003 Puja Gupta
7223+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
7224+ * Copyright (c) 2003-2010 Stony Brook University
7225+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
7226+ *
7227+ * This program is free software; you can redistribute it and/or modify
7228+ * it under the terms of the GNU General Public License version 2 as
7229+ * published by the Free Software Foundation.
7230+ */
7231+
7232+#include "union.h"
7233+
7234+
7235+/*
7236+ * XXX: we need a dummy readpage handler because generic_file_mmap (which we
7237+ * use in unionfs_mmap) checks for the existence of
7238+ * mapping->a_ops->readpage, else it returns -ENOEXEC. The VFS will need to
7239+ * be fixed to allow a file system to define vm_ops->fault without any
7240+ * address_space_ops whatsoever.
7241+ *
7242+ * Otherwise, we don't want to use our readpage method at all.
7243+ */
7244+static int unionfs_readpage(struct file *file, struct page *page)
7245+{
7246+ BUG();
7247+ return -EINVAL;
7248+}
7249+
7250+static int unionfs_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
7251+{
7252+ int err;
7253+ struct file *file, *lower_file;
7670a7fc 7254+ const struct vm_operations_struct *lower_vm_ops;
2380c486
JR
7255+ struct vm_area_struct lower_vma;
7256+
7257+ BUG_ON(!vma);
7258+ memcpy(&lower_vma, vma, sizeof(struct vm_area_struct));
7259+ file = lower_vma.vm_file;
7260+ lower_vm_ops = UNIONFS_F(file)->lower_vm_ops;
7261+ BUG_ON(!lower_vm_ops);
7262+
7263+ lower_file = unionfs_lower_file(file);
7264+ BUG_ON(!lower_file);
7265+ /*
7266+ * XXX: vm_ops->fault may be called in parallel. Because we have to
7267+ * resort to temporarily changing the vma->vm_file to point to the
7268+ * lower file, a concurrent invocation of unionfs_fault could see a
7269+ * different value. In this workaround, we keep a different copy of
7270+ * the vma structure in our stack, so we never expose a different
7271+ * value of the vma->vm_file called to us, even temporarily. A
7272+ * better fix would be to change the calling semantics of ->fault to
7273+ * take an explicit file pointer.
7274+ */
7275+ lower_vma.vm_file = lower_file;
7276+ err = lower_vm_ops->fault(&lower_vma, vmf);
7277+ return err;
7278+}
7279+
7280+/*
7281+ * XXX: the default address_space_ops for unionfs is empty. We cannot set
7282+ * our inode->i_mapping->a_ops to NULL because too many code paths expect
7283+ * the a_ops vector to be non-NULL.
7284+ */
7285+struct address_space_operations unionfs_aops = {
7286+ /* empty on purpose */
7287+};
7288+
7289+/*
7290+ * XXX: we need a second, dummy address_space_ops vector, to be used
7291+ * temporarily during unionfs_mmap, because the latter calls
7292+ * generic_file_mmap, which checks if ->readpage exists, else returns
7293+ * -ENOEXEC.
7294+ */
7295+struct address_space_operations unionfs_dummy_aops = {
7296+ .readpage = unionfs_readpage,
7297+};
7298+
7299+struct vm_operations_struct unionfs_vm_ops = {
7300+ .fault = unionfs_fault,
7301+};
0c5527e5
AM
7302diff --git a/fs/unionfs/rdstate.c b/fs/unionfs/rdstate.c
7303new file mode 100644
82260373 7304index 0000000..d57f1f8
0c5527e5
AM
7305--- /dev/null
7306+++ b/fs/unionfs/rdstate.c
2380c486
JR
7307@@ -0,0 +1,285 @@
7308+/*
7670a7fc 7309+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
7310+ * Copyright (c) 2003-2006 Charles P. Wright
7311+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7312+ * Copyright (c) 2005-2006 Junjiro Okajima
7313+ * Copyright (c) 2005 Arun M. Krishnakumar
7314+ * Copyright (c) 2004-2006 David P. Quigley
7315+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7316+ * Copyright (c) 2003 Puja Gupta
7317+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
7318+ * Copyright (c) 2003-2010 Stony Brook University
7319+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
7320+ *
7321+ * This program is free software; you can redistribute it and/or modify
7322+ * it under the terms of the GNU General Public License version 2 as
7323+ * published by the Free Software Foundation.
7324+ */
7325+
7326+#include "union.h"
7327+
7328+/* This file contains the routines for maintaining readdir state. */
7329+
7330+/*
7331+ * There are two structures here, rdstate which is a hash table
7332+ * of the second structure which is a filldir_node.
7333+ */
7334+
7335+/*
7336+ * This is a struct kmem_cache for filldir nodes, because we allocate a lot
7337+ * of them and they shouldn't waste memory. If the node has a small name
7338+ * (as defined by the dentry structure), then we use an inline name to
7339+ * preserve kmalloc space.
7340+ */
7341+static struct kmem_cache *unionfs_filldir_cachep;
7342+
7343+int unionfs_init_filldir_cache(void)
7344+{
7345+ unionfs_filldir_cachep =
7346+ kmem_cache_create("unionfs_filldir",
7347+ sizeof(struct filldir_node), 0,
7348+ SLAB_RECLAIM_ACCOUNT, NULL);
7349+
7350+ return (unionfs_filldir_cachep ? 0 : -ENOMEM);
7351+}
7352+
7353+void unionfs_destroy_filldir_cache(void)
7354+{
7355+ if (unionfs_filldir_cachep)
7356+ kmem_cache_destroy(unionfs_filldir_cachep);
7357+}
7358+
7359+/*
7360+ * This is a tuning parameter that tells us roughly how big to make the
7361+ * hash table in directory entries per page. This isn't perfect, but
7362+ * at least we get a hash table size that shouldn't be too overloaded.
7363+ * The following averages are based on my home directory.
7364+ * 14.44693 Overall
7365+ * 12.29 Single Page Directories
7366+ * 117.93 Multi-page directories
7367+ */
7368+#define DENTPAGE 4096
7369+#define DENTPERONEPAGE 12
7370+#define DENTPERPAGE 118
7371+#define MINHASHSIZE 1
7372+static int guesstimate_hash_size(struct inode *inode)
7373+{
7374+ struct inode *lower_inode;
7375+ int bindex;
7376+ int hashsize = MINHASHSIZE;
7377+
7378+ if (UNIONFS_I(inode)->hashsize > 0)
7379+ return UNIONFS_I(inode)->hashsize;
7380+
7381+ for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
7382+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
7383+ if (!lower_inode)
7384+ continue;
7385+
7386+ if (i_size_read(lower_inode) == DENTPAGE)
7387+ hashsize += DENTPERONEPAGE;
7388+ else
7389+ hashsize += (i_size_read(lower_inode) / DENTPAGE) *
7390+ DENTPERPAGE;
7391+ }
7392+
7393+ return hashsize;
7394+}
7395+
7396+int init_rdstate(struct file *file)
7397+{
7398+ BUG_ON(sizeof(loff_t) !=
7399+ (sizeof(unsigned int) + sizeof(unsigned int)));
7400+ BUG_ON(UNIONFS_F(file)->rdstate != NULL);
7401+
7402+ UNIONFS_F(file)->rdstate = alloc_rdstate(file->f_path.dentry->d_inode,
7403+ fbstart(file));
7404+
7405+ return (UNIONFS_F(file)->rdstate ? 0 : -ENOMEM);
7406+}
7407+
7408+struct unionfs_dir_state *find_rdstate(struct inode *inode, loff_t fpos)
7409+{
7410+ struct unionfs_dir_state *rdstate = NULL;
7411+ struct list_head *pos;
7412+
7413+ spin_lock(&UNIONFS_I(inode)->rdlock);
7414+ list_for_each(pos, &UNIONFS_I(inode)->readdircache) {
7415+ struct unionfs_dir_state *r =
7416+ list_entry(pos, struct unionfs_dir_state, cache);
7417+ if (fpos == rdstate2offset(r)) {
7418+ UNIONFS_I(inode)->rdcount--;
7419+ list_del(&r->cache);
7420+ rdstate = r;
7421+ break;
7422+ }
7423+ }
7424+ spin_unlock(&UNIONFS_I(inode)->rdlock);
7425+ return rdstate;
7426+}
7427+
7428+struct unionfs_dir_state *alloc_rdstate(struct inode *inode, int bindex)
7429+{
7430+ int i = 0;
7431+ int hashsize;
7432+ unsigned long mallocsize = sizeof(struct unionfs_dir_state);
7433+ struct unionfs_dir_state *rdstate;
7434+
7435+ hashsize = guesstimate_hash_size(inode);
7436+ mallocsize += hashsize * sizeof(struct list_head);
7437+ mallocsize = __roundup_pow_of_two(mallocsize);
7438+
7439+ /* This should give us about 500 entries anyway. */
7440+ if (mallocsize > PAGE_SIZE)
7441+ mallocsize = PAGE_SIZE;
7442+
7443+ hashsize = (mallocsize - sizeof(struct unionfs_dir_state)) /
7444+ sizeof(struct list_head);
7445+
7446+ rdstate = kmalloc(mallocsize, GFP_KERNEL);
7447+ if (unlikely(!rdstate))
7448+ return NULL;
7449+
7450+ spin_lock(&UNIONFS_I(inode)->rdlock);
7451+ if (UNIONFS_I(inode)->cookie >= (MAXRDCOOKIE - 1))
7452+ UNIONFS_I(inode)->cookie = 1;
7453+ else
7454+ UNIONFS_I(inode)->cookie++;
7455+
7456+ rdstate->cookie = UNIONFS_I(inode)->cookie;
7457+ spin_unlock(&UNIONFS_I(inode)->rdlock);
7458+ rdstate->offset = 1;
7459+ rdstate->access = jiffies;
7460+ rdstate->bindex = bindex;
7461+ rdstate->dirpos = 0;
7462+ rdstate->hashentries = 0;
7463+ rdstate->size = hashsize;
7464+ for (i = 0; i < rdstate->size; i++)
7465+ INIT_LIST_HEAD(&rdstate->list[i]);
7466+
7467+ return rdstate;
7468+}
7469+
7470+static void free_filldir_node(struct filldir_node *node)
7471+{
82260373 7472+ if (node->namelen >= DNAME_INLINE_LEN)
2380c486
JR
7473+ kfree(node->name);
7474+ kmem_cache_free(unionfs_filldir_cachep, node);
7475+}
7476+
7477+void free_rdstate(struct unionfs_dir_state *state)
7478+{
7479+ struct filldir_node *tmp;
7480+ int i;
7481+
7482+ for (i = 0; i < state->size; i++) {
7483+ struct list_head *head = &(state->list[i]);
7484+ struct list_head *pos, *n;
7485+
7486+ /* traverse the list and deallocate space */
7487+ list_for_each_safe(pos, n, head) {
7488+ tmp = list_entry(pos, struct filldir_node, file_list);
7489+ list_del(&tmp->file_list);
7490+ free_filldir_node(tmp);
7491+ }
7492+ }
7493+
7494+ kfree(state);
7495+}
7496+
7497+struct filldir_node *find_filldir_node(struct unionfs_dir_state *rdstate,
7498+ const char *name, int namelen,
7499+ int is_whiteout)
7500+{
7501+ int index;
7502+ unsigned int hash;
7503+ struct list_head *head;
7504+ struct list_head *pos;
7505+ struct filldir_node *cursor = NULL;
7506+ int found = 0;
7507+
7508+ BUG_ON(namelen <= 0);
7509+
7510+ hash = full_name_hash(name, namelen);
7511+ index = hash % rdstate->size;
7512+
7513+ head = &(rdstate->list[index]);
7514+ list_for_each(pos, head) {
7515+ cursor = list_entry(pos, struct filldir_node, file_list);
7516+
7517+ if (cursor->namelen == namelen && cursor->hash == hash &&
7518+ !strncmp(cursor->name, name, namelen)) {
7519+ /*
7520+ * a duplicate exists, and hence no need to create
7521+ * entry to the list
7522+ */
7523+ found = 1;
7524+
7525+ /*
7526+ * if a duplicate is found in this branch, and is
7527+ * not due to the caller looking for an entry to
7528+ * whiteout, then the file system may be corrupted.
7529+ */
7530+ if (unlikely(!is_whiteout &&
7531+ cursor->bindex == rdstate->bindex))
7532+ printk(KERN_ERR "unionfs: filldir: possible "
7533+ "I/O error: a file is duplicated "
7534+ "in the same branch %d: %s\n",
7535+ rdstate->bindex, cursor->name);
7536+ break;
7537+ }
7538+ }
7539+
7540+ if (!found)
7541+ cursor = NULL;
7542+
7543+ return cursor;
7544+}
7545+
7546+int add_filldir_node(struct unionfs_dir_state *rdstate, const char *name,
7547+ int namelen, int bindex, int whiteout)
7548+{
7549+ struct filldir_node *new;
7550+ unsigned int hash;
7551+ int index;
7552+ int err = 0;
7553+ struct list_head *head;
7554+
7555+ BUG_ON(namelen <= 0);
7556+
7557+ hash = full_name_hash(name, namelen);
7558+ index = hash % rdstate->size;
7559+ head = &(rdstate->list[index]);
7560+
7561+ new = kmem_cache_alloc(unionfs_filldir_cachep, GFP_KERNEL);
7562+ if (unlikely(!new)) {
7563+ err = -ENOMEM;
7564+ goto out;
7565+ }
7566+
7567+ INIT_LIST_HEAD(&new->file_list);
7568+ new->namelen = namelen;
7569+ new->hash = hash;
7570+ new->bindex = bindex;
7571+ new->whiteout = whiteout;
7572+
82260373 7573+ if (namelen < DNAME_INLINE_LEN) {
2380c486
JR
7574+ new->name = new->iname;
7575+ } else {
7576+ new->name = kmalloc(namelen + 1, GFP_KERNEL);
7577+ if (unlikely(!new->name)) {
7578+ kmem_cache_free(unionfs_filldir_cachep, new);
7579+ new = NULL;
7580+ goto out;
7581+ }
7582+ }
7583+
7584+ memcpy(new->name, name, namelen);
7585+ new->name[namelen] = '\0';
7586+
7587+ rdstate->hashentries++;
7588+
7589+ list_add(&(new->file_list), head);
7590+out:
7591+ return err;
7592+}
0c5527e5
AM
7593diff --git a/fs/unionfs/rename.c b/fs/unionfs/rename.c
7594new file mode 100644
7595index 0000000..936700e
7596--- /dev/null
7597+++ b/fs/unionfs/rename.c
7670a7fc 7598@@ -0,0 +1,517 @@
2380c486 7599+/*
7670a7fc 7600+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
7601+ * Copyright (c) 2003-2006 Charles P. Wright
7602+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7603+ * Copyright (c) 2005-2006 Junjiro Okajima
7604+ * Copyright (c) 2005 Arun M. Krishnakumar
7605+ * Copyright (c) 2004-2006 David P. Quigley
7606+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7607+ * Copyright (c) 2003 Puja Gupta
7608+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
7609+ * Copyright (c) 2003-2010 Stony Brook University
7610+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
7611+ *
7612+ * This program is free software; you can redistribute it and/or modify
7613+ * it under the terms of the GNU General Public License version 2 as
7614+ * published by the Free Software Foundation.
7615+ */
7616+
7617+#include "union.h"
7618+
7619+/*
7620+ * This is a helper function for rename, used when rename ends up with hosed
7621+ * over dentries and we need to revert.
7622+ */
7623+static int unionfs_refresh_lower_dentry(struct dentry *dentry,
7624+ struct dentry *parent, int bindex)
7625+{
7626+ struct dentry *lower_dentry;
7627+ struct dentry *lower_parent;
7628+ int err = 0;
7629+
7630+ verify_locked(dentry);
7631+
7632+ lower_parent = unionfs_lower_dentry_idx(parent, bindex);
7633+
7634+ BUG_ON(!S_ISDIR(lower_parent->d_inode->i_mode));
7635+
7636+ lower_dentry = lookup_one_len(dentry->d_name.name, lower_parent,
7637+ dentry->d_name.len);
7638+ if (IS_ERR(lower_dentry)) {
7639+ err = PTR_ERR(lower_dentry);
7640+ goto out;
7641+ }
7642+
7643+ dput(unionfs_lower_dentry_idx(dentry, bindex));
7644+ iput(unionfs_lower_inode_idx(dentry->d_inode, bindex));
7645+ unionfs_set_lower_inode_idx(dentry->d_inode, bindex, NULL);
7646+
7647+ if (!lower_dentry->d_inode) {
7648+ dput(lower_dentry);
7649+ unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
7650+ } else {
7651+ unionfs_set_lower_dentry_idx(dentry, bindex, lower_dentry);
7652+ unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
7653+ igrab(lower_dentry->d_inode));
7654+ }
7655+
7656+out:
7657+ return err;
7658+}
7659+
7660+static int __unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
7661+ struct dentry *old_parent,
7662+ struct inode *new_dir, struct dentry *new_dentry,
7663+ struct dentry *new_parent,
7664+ int bindex)
7665+{
7666+ int err = 0;
7667+ struct dentry *lower_old_dentry;
7668+ struct dentry *lower_new_dentry;
7669+ struct dentry *lower_old_dir_dentry;
7670+ struct dentry *lower_new_dir_dentry;
7671+ struct dentry *trap;
7672+
7673+ lower_new_dentry = unionfs_lower_dentry_idx(new_dentry, bindex);
7674+ lower_old_dentry = unionfs_lower_dentry_idx(old_dentry, bindex);
7675+
7676+ if (!lower_new_dentry) {
7677+ lower_new_dentry =
7678+ create_parents(new_parent->d_inode,
7679+ new_dentry, new_dentry->d_name.name,
7680+ bindex);
7681+ if (IS_ERR(lower_new_dentry)) {
7682+ err = PTR_ERR(lower_new_dentry);
7683+ if (IS_COPYUP_ERR(err))
7684+ goto out;
7685+ printk(KERN_ERR "unionfs: error creating directory "
7686+ "tree for rename, bindex=%d err=%d\n",
7687+ bindex, err);
7688+ goto out;
7689+ }
7690+ }
7691+
7692+ /* check for and remove whiteout, if any */
7693+ err = check_unlink_whiteout(new_dentry, lower_new_dentry, bindex);
7694+ if (err > 0) /* ignore if whiteout found and successfully removed */
7695+ err = 0;
7696+ if (err)
7697+ goto out;
7698+
7699+ /* check of old_dentry branch is writable */
7700+ err = is_robranch_super(old_dentry->d_sb, bindex);
7701+ if (err)
7702+ goto out;
7703+
7704+ dget(lower_old_dentry);
7705+ dget(lower_new_dentry);
7706+ lower_old_dir_dentry = dget_parent(lower_old_dentry);
7707+ lower_new_dir_dentry = dget_parent(lower_new_dentry);
7708+
2380c486
JR
7709+ trap = lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
7710+ /* source should not be ancenstor of target */
7711+ if (trap == lower_old_dentry) {
7712+ err = -EINVAL;
7713+ goto out_err_unlock;
7714+ }
7715+ /* target should not be ancenstor of source */
7716+ if (trap == lower_new_dentry) {
7717+ err = -ENOTEMPTY;
7718+ goto out_err_unlock;
7719+ }
7720+ err = vfs_rename(lower_old_dir_dentry->d_inode, lower_old_dentry,
7721+ lower_new_dir_dentry->d_inode, lower_new_dentry);
7722+out_err_unlock:
7723+ if (!err) {
7724+ /* update parent dir times */
7725+ fsstack_copy_attr_times(old_dir, lower_old_dir_dentry->d_inode);
7726+ fsstack_copy_attr_times(new_dir, lower_new_dir_dentry->d_inode);
7727+ }
7728+ unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
2380c486
JR
7729+
7730+ dput(lower_old_dir_dentry);
7731+ dput(lower_new_dir_dentry);
7732+ dput(lower_old_dentry);
7733+ dput(lower_new_dentry);
7734+
7735+out:
7736+ if (!err) {
7737+ /* Fixup the new_dentry. */
7738+ if (bindex < dbstart(new_dentry))
7739+ dbstart(new_dentry) = bindex;
7740+ else if (bindex > dbend(new_dentry))
7741+ dbend(new_dentry) = bindex;
7742+ }
7743+
7744+ return err;
7745+}
7746+
7747+/*
7748+ * Main rename code. This is sufficiently complex, that it's documented in
7749+ * Documentation/filesystems/unionfs/rename.txt. This routine calls
7750+ * __unionfs_rename() above to perform some of the work.
7751+ */
7752+static int do_unionfs_rename(struct inode *old_dir,
7753+ struct dentry *old_dentry,
7754+ struct dentry *old_parent,
7755+ struct inode *new_dir,
7756+ struct dentry *new_dentry,
7757+ struct dentry *new_parent)
7758+{
7759+ int err = 0;
7760+ int bindex;
7761+ int old_bstart, old_bend;
7762+ int new_bstart, new_bend;
7763+ int do_copyup = -1;
7764+ int local_err = 0;
7765+ int eio = 0;
7766+ int revert = 0;
7767+
7768+ old_bstart = dbstart(old_dentry);
7769+ old_bend = dbend(old_dentry);
7770+
7771+ new_bstart = dbstart(new_dentry);
7772+ new_bend = dbend(new_dentry);
7773+
7774+ /* Rename source to destination. */
7775+ err = __unionfs_rename(old_dir, old_dentry, old_parent,
7776+ new_dir, new_dentry, new_parent,
7777+ old_bstart);
7778+ if (err) {
7779+ if (!IS_COPYUP_ERR(err))
7780+ goto out;
7781+ do_copyup = old_bstart - 1;
7782+ } else {
7783+ revert = 1;
7784+ }
7785+
7786+ /*
7787+ * Unlink all instances of destination that exist to the left of
7788+ * bstart of source. On error, revert back, goto out.
7789+ */
7790+ for (bindex = old_bstart - 1; bindex >= new_bstart; bindex--) {
7791+ struct dentry *unlink_dentry;
7792+ struct dentry *unlink_dir_dentry;
7793+
7794+ BUG_ON(bindex < 0);
7795+ unlink_dentry = unionfs_lower_dentry_idx(new_dentry, bindex);
7796+ if (!unlink_dentry)
7797+ continue;
7798+
7799+ unlink_dir_dentry = lock_parent(unlink_dentry);
7800+ err = is_robranch_super(old_dir->i_sb, bindex);
7801+ if (!err)
7802+ err = vfs_unlink(unlink_dir_dentry->d_inode,
7803+ unlink_dentry);
7804+
7805+ fsstack_copy_attr_times(new_parent->d_inode,
7806+ unlink_dir_dentry->d_inode);
7807+ /* propagate number of hard-links */
7808+ new_parent->d_inode->i_nlink =
7809+ unionfs_get_nlinks(new_parent->d_inode);
7810+
7811+ unlock_dir(unlink_dir_dentry);
7812+ if (!err) {
7813+ if (bindex != new_bstart) {
7814+ dput(unlink_dentry);
7815+ unionfs_set_lower_dentry_idx(new_dentry,
7816+ bindex, NULL);
7817+ }
7818+ } else if (IS_COPYUP_ERR(err)) {
7819+ do_copyup = bindex - 1;
7820+ } else if (revert) {
7821+ goto revert;
7822+ }
7823+ }
7824+
7825+ if (do_copyup != -1) {
7826+ for (bindex = do_copyup; bindex >= 0; bindex--) {
7827+ /*
7828+ * copyup the file into some left directory, so that
7829+ * you can rename it
7830+ */
7831+ err = copyup_dentry(old_parent->d_inode,
7832+ old_dentry, old_bstart, bindex,
7833+ old_dentry->d_name.name,
7834+ old_dentry->d_name.len, NULL,
7835+ i_size_read(old_dentry->d_inode));
7836+ /* if copyup failed, try next branch to the left */
7837+ if (err)
7838+ continue;
7839+ /*
7840+ * create whiteout before calling __unionfs_rename
7841+ * because the latter will change the old_dentry's
7842+ * lower name and parent dir, resulting in the
7843+ * whiteout getting created in the wrong dir.
7844+ */
7845+ err = create_whiteout(old_dentry, bindex);
7846+ if (err) {
7847+ printk(KERN_ERR "unionfs: can't create a "
7848+ "whiteout for %s in rename (err=%d)\n",
7849+ old_dentry->d_name.name, err);
7850+ continue;
7851+ }
7852+ err = __unionfs_rename(old_dir, old_dentry, old_parent,
7853+ new_dir, new_dentry, new_parent,
7854+ bindex);
7855+ break;
7856+ }
7857+ }
7858+
7859+ /* make it opaque */
7860+ if (S_ISDIR(old_dentry->d_inode->i_mode)) {
7861+ err = make_dir_opaque(old_dentry, dbstart(old_dentry));
7862+ if (err)
7863+ goto revert;
7864+ }
7865+
7866+ /*
7867+ * Create whiteout for source, only if:
7868+ * (1) There is more than one underlying instance of source.
7869+ * (We did a copy_up is taken care of above).
7870+ */
7871+ if ((old_bstart != old_bend) && (do_copyup == -1)) {
7872+ err = create_whiteout(old_dentry, old_bstart);
7873+ if (err) {
7874+ /* can't fix anything now, so we exit with -EIO */
7875+ printk(KERN_ERR "unionfs: can't create a whiteout for "
7876+ "%s in rename!\n", old_dentry->d_name.name);
7877+ err = -EIO;
7878+ }
7879+ }
7880+
7881+out:
7882+ return err;
7883+
7884+revert:
7885+ /* Do revert here. */
7886+ local_err = unionfs_refresh_lower_dentry(new_dentry, new_parent,
7887+ old_bstart);
7888+ if (local_err) {
7889+ printk(KERN_ERR "unionfs: revert failed in rename: "
7890+ "the new refresh failed\n");
7891+ eio = -EIO;
7892+ }
7893+
7894+ local_err = unionfs_refresh_lower_dentry(old_dentry, old_parent,
7895+ old_bstart);
7896+ if (local_err) {
7897+ printk(KERN_ERR "unionfs: revert failed in rename: "
7898+ "the old refresh failed\n");
7899+ eio = -EIO;
7900+ goto revert_out;
7901+ }
7902+
7903+ if (!unionfs_lower_dentry_idx(new_dentry, bindex) ||
7904+ !unionfs_lower_dentry_idx(new_dentry, bindex)->d_inode) {
7905+ printk(KERN_ERR "unionfs: revert failed in rename: "
7906+ "the object disappeared from under us!\n");
7907+ eio = -EIO;
7908+ goto revert_out;
7909+ }
7910+
7911+ if (unionfs_lower_dentry_idx(old_dentry, bindex) &&
7912+ unionfs_lower_dentry_idx(old_dentry, bindex)->d_inode) {
7913+ printk(KERN_ERR "unionfs: revert failed in rename: "
7914+ "the object was created underneath us!\n");
7915+ eio = -EIO;
7916+ goto revert_out;
7917+ }
7918+
7919+ local_err = __unionfs_rename(new_dir, new_dentry, new_parent,
7920+ old_dir, old_dentry, old_parent,
7921+ old_bstart);
7922+
7923+ /* If we can't fix it, then we cop-out with -EIO. */
7924+ if (local_err) {
7925+ printk(KERN_ERR "unionfs: revert failed in rename!\n");
7926+ eio = -EIO;
7927+ }
7928+
7929+ local_err = unionfs_refresh_lower_dentry(new_dentry, new_parent,
7930+ bindex);
7931+ if (local_err)
7932+ eio = -EIO;
7933+ local_err = unionfs_refresh_lower_dentry(old_dentry, old_parent,
7934+ bindex);
7935+ if (local_err)
7936+ eio = -EIO;
7937+
7938+revert_out:
7939+ if (eio)
7940+ err = eio;
7941+ return err;
7942+}
7943+
7944+/*
7945+ * We can't copyup a directory, because it may involve huge numbers of
7946+ * children, etc. Doing that in the kernel would be bad, so instead we
7947+ * return EXDEV to the user-space utility that caused this, and let the
7948+ * user-space recurse and ask us to copy up each file separately.
7949+ */
7950+static int may_rename_dir(struct dentry *dentry, struct dentry *parent)
7951+{
7952+ int err, bstart;
7953+
7954+ err = check_empty(dentry, parent, NULL);
7955+ if (err == -ENOTEMPTY) {
7956+ if (is_robranch(dentry))
7957+ return -EXDEV;
7958+ } else if (err) {
7959+ return err;
7960+ }
7961+
7962+ bstart = dbstart(dentry);
7963+ if (dbend(dentry) == bstart || dbopaque(dentry) == bstart)
7964+ return 0;
7965+
7966+ dbstart(dentry) = bstart + 1;
7967+ err = check_empty(dentry, parent, NULL);
7968+ dbstart(dentry) = bstart;
7969+ if (err == -ENOTEMPTY)
7970+ err = -EXDEV;
7971+ return err;
7972+}
7973+
7974+/*
7975+ * The locking rules in unionfs_rename are complex. We could use a simpler
7976+ * superblock-level name-space lock for renames and copy-ups.
7977+ */
7978+int unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
7979+ struct inode *new_dir, struct dentry *new_dentry)
7980+{
7981+ int err = 0;
7982+ struct dentry *wh_dentry;
7983+ struct dentry *old_parent, *new_parent;
7984+ int valid = true;
7985+
7986+ unionfs_read_lock(old_dentry->d_sb, UNIONFS_SMUTEX_CHILD);
7987+ old_parent = dget_parent(old_dentry);
7988+ new_parent = dget_parent(new_dentry);
7989+ /* un/lock parent dentries only if they differ from old/new_dentry */
7990+ if (old_parent != old_dentry &&
7991+ old_parent != new_dentry)
7992+ unionfs_lock_dentry(old_parent, UNIONFS_DMUTEX_REVAL_PARENT);
7993+ if (new_parent != old_dentry &&
7994+ new_parent != new_dentry &&
7995+ new_parent != old_parent)
7996+ unionfs_lock_dentry(new_parent, UNIONFS_DMUTEX_REVAL_CHILD);
7997+ unionfs_double_lock_dentry(old_dentry, new_dentry);
7998+
7999+ valid = __unionfs_d_revalidate(old_dentry, old_parent, false);
8000+ if (!valid) {
8001+ err = -ESTALE;
8002+ goto out;
8003+ }
8004+ if (!d_deleted(new_dentry) && new_dentry->d_inode) {
8005+ valid = __unionfs_d_revalidate(new_dentry, new_parent, false);
8006+ if (!valid) {
8007+ err = -ESTALE;
8008+ goto out;
8009+ }
8010+ }
8011+
8012+ if (!S_ISDIR(old_dentry->d_inode->i_mode))
8013+ err = unionfs_partial_lookup(old_dentry, old_parent);
8014+ else
8015+ err = may_rename_dir(old_dentry, old_parent);
8016+
8017+ if (err)
8018+ goto out;
8019+
8020+ err = unionfs_partial_lookup(new_dentry, new_parent);
8021+ if (err)
8022+ goto out;
8023+
8024+ /*
8025+ * if new_dentry is already lower because of whiteout,
8026+ * simply override it even if the whited-out dir is not empty.
8027+ */
8028+ wh_dentry = find_first_whiteout(new_dentry);
8029+ if (!IS_ERR(wh_dentry)) {
8030+ dput(wh_dentry);
8031+ } else if (new_dentry->d_inode) {
8032+ if (S_ISDIR(old_dentry->d_inode->i_mode) !=
8033+ S_ISDIR(new_dentry->d_inode->i_mode)) {
8034+ err = S_ISDIR(old_dentry->d_inode->i_mode) ?
8035+ -ENOTDIR : -EISDIR;
8036+ goto out;
8037+ }
8038+
8039+ if (S_ISDIR(new_dentry->d_inode->i_mode)) {
8040+ struct unionfs_dir_state *namelist = NULL;
8041+ /* check if this unionfs directory is empty or not */
8042+ err = check_empty(new_dentry, new_parent, &namelist);
8043+ if (err)
8044+ goto out;
8045+
8046+ if (!is_robranch(new_dentry))
8047+ err = delete_whiteouts(new_dentry,
8048+ dbstart(new_dentry),
8049+ namelist);
8050+
8051+ free_rdstate(namelist);
8052+
8053+ if (err)
8054+ goto out;
8055+ }
8056+ }
8057+
8058+ err = do_unionfs_rename(old_dir, old_dentry, old_parent,
8059+ new_dir, new_dentry, new_parent);
8060+ if (err)
8061+ goto out;
8062+
8063+ /*
8064+ * force re-lookup since the dir on ro branch is not renamed, and
8065+ * lower dentries still indicate the un-renamed ones.
8066+ */
8067+ if (S_ISDIR(old_dentry->d_inode->i_mode))
8068+ atomic_dec(&UNIONFS_D(old_dentry)->generation);
8069+ else
8070+ unionfs_postcopyup_release(old_dentry);
8071+ if (new_dentry->d_inode && !S_ISDIR(new_dentry->d_inode->i_mode)) {
8072+ unionfs_postcopyup_release(new_dentry);
8073+ unionfs_postcopyup_setmnt(new_dentry);
8074+ if (!unionfs_lower_inode(new_dentry->d_inode)) {
8075+ /*
8076+ * If we get here, it means that no copyup was
8077+ * needed, and that a file by the old name already
8078+ * existing on the destination branch; that file got
8079+ * renamed earlier in this function, so all we need
8080+ * to do here is set the lower inode.
8081+ */
8082+ struct inode *inode;
8083+ inode = unionfs_lower_inode(old_dentry->d_inode);
8084+ igrab(inode);
8085+ unionfs_set_lower_inode_idx(new_dentry->d_inode,
8086+ dbstart(new_dentry),
8087+ inode);
8088+ }
8089+ }
8090+ /* if all of this renaming succeeded, update our times */
8091+ unionfs_copy_attr_times(old_dentry->d_inode);
8092+ unionfs_copy_attr_times(new_dentry->d_inode);
8093+ unionfs_check_inode(old_dir);
8094+ unionfs_check_inode(new_dir);
8095+ unionfs_check_dentry(old_dentry);
8096+ unionfs_check_dentry(new_dentry);
8097+
8098+out:
8099+ if (err) /* clear the new_dentry stuff created */
8100+ d_drop(new_dentry);
8101+
8102+ unionfs_double_unlock_dentry(old_dentry, new_dentry);
8103+ if (new_parent != old_dentry &&
8104+ new_parent != new_dentry &&
8105+ new_parent != old_parent)
8106+ unionfs_unlock_dentry(new_parent);
8107+ if (old_parent != old_dentry &&
8108+ old_parent != new_dentry)
8109+ unionfs_unlock_dentry(old_parent);
8110+ dput(new_parent);
8111+ dput(old_parent);
8112+ unionfs_read_unlock(old_dentry->d_sb);
8113+
8114+ return err;
8115+}
0c5527e5
AM
8116diff --git a/fs/unionfs/sioq.c b/fs/unionfs/sioq.c
8117new file mode 100644
8118index 0000000..760c580
8119--- /dev/null
8120+++ b/fs/unionfs/sioq.c
2380c486
JR
8121@@ -0,0 +1,101 @@
8122+/*
7670a7fc 8123+ * Copyright (c) 2006-2010 Erez Zadok
2380c486
JR
8124+ * Copyright (c) 2006 Charles P. Wright
8125+ * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
8126+ * Copyright (c) 2006 Junjiro Okajima
8127+ * Copyright (c) 2006 David P. Quigley
7670a7fc
AM
8128+ * Copyright (c) 2006-2010 Stony Brook University
8129+ * Copyright (c) 2006-2010 The Research Foundation of SUNY
2380c486
JR
8130+ *
8131+ * This program is free software; you can redistribute it and/or modify
8132+ * it under the terms of the GNU General Public License version 2 as
8133+ * published by the Free Software Foundation.
8134+ */
8135+
8136+#include "union.h"
8137+
8138+/*
8139+ * Super-user IO work Queue - sometimes we need to perform actions which
8140+ * would fail due to the unix permissions on the parent directory (e.g.,
8141+ * rmdir a directory which appears empty, but in reality contains
8142+ * whiteouts).
8143+ */
8144+
8145+static struct workqueue_struct *superio_workqueue;
8146+
8147+int __init init_sioq(void)
8148+{
8149+ int err;
8150+
8151+ superio_workqueue = create_workqueue("unionfs_siod");
8152+ if (!IS_ERR(superio_workqueue))
8153+ return 0;
8154+
8155+ err = PTR_ERR(superio_workqueue);
8156+ printk(KERN_ERR "unionfs: create_workqueue failed %d\n", err);
8157+ superio_workqueue = NULL;
8158+ return err;
8159+}
8160+
8161+void stop_sioq(void)
8162+{
8163+ if (superio_workqueue)
8164+ destroy_workqueue(superio_workqueue);
8165+}
8166+
8167+void run_sioq(work_func_t func, struct sioq_args *args)
8168+{
8169+ INIT_WORK(&args->work, func);
8170+
8171+ init_completion(&args->comp);
8172+ while (!queue_work(superio_workqueue, &args->work)) {
8173+ /* TODO: do accounting if needed */
8174+ schedule();
8175+ }
8176+ wait_for_completion(&args->comp);
8177+}
8178+
8179+void __unionfs_create(struct work_struct *work)
8180+{
8181+ struct sioq_args *args = container_of(work, struct sioq_args, work);
8182+ struct create_args *c = &args->create;
8183+
8184+ args->err = vfs_create(c->parent, c->dentry, c->mode, c->nd);
8185+ complete(&args->comp);
8186+}
8187+
8188+void __unionfs_mkdir(struct work_struct *work)
8189+{
8190+ struct sioq_args *args = container_of(work, struct sioq_args, work);
8191+ struct mkdir_args *m = &args->mkdir;
8192+
8193+ args->err = vfs_mkdir(m->parent, m->dentry, m->mode);
8194+ complete(&args->comp);
8195+}
8196+
8197+void __unionfs_mknod(struct work_struct *work)
8198+{
8199+ struct sioq_args *args = container_of(work, struct sioq_args, work);
8200+ struct mknod_args *m = &args->mknod;
8201+
8202+ args->err = vfs_mknod(m->parent, m->dentry, m->mode, m->dev);
8203+ complete(&args->comp);
8204+}
8205+
8206+void __unionfs_symlink(struct work_struct *work)
8207+{
8208+ struct sioq_args *args = container_of(work, struct sioq_args, work);
8209+ struct symlink_args *s = &args->symlink;
8210+
8211+ args->err = vfs_symlink(s->parent, s->dentry, s->symbuf);
8212+ complete(&args->comp);
8213+}
8214+
8215+void __unionfs_unlink(struct work_struct *work)
8216+{
8217+ struct sioq_args *args = container_of(work, struct sioq_args, work);
8218+ struct unlink_args *u = &args->unlink;
8219+
8220+ args->err = vfs_unlink(u->parent, u->dentry);
8221+ complete(&args->comp);
8222+}
0c5527e5
AM
8223diff --git a/fs/unionfs/sioq.h b/fs/unionfs/sioq.h
8224new file mode 100644
8225index 0000000..b26d248
8226--- /dev/null
8227+++ b/fs/unionfs/sioq.h
2380c486
JR
8228@@ -0,0 +1,91 @@
8229+/*
7670a7fc 8230+ * Copyright (c) 2006-2010 Erez Zadok
2380c486
JR
8231+ * Copyright (c) 2006 Charles P. Wright
8232+ * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
8233+ * Copyright (c) 2006 Junjiro Okajima
8234+ * Copyright (c) 2006 David P. Quigley
7670a7fc
AM
8235+ * Copyright (c) 2006-2010 Stony Brook University
8236+ * Copyright (c) 2006-2010 The Research Foundation of SUNY
2380c486
JR
8237+ *
8238+ * This program is free software; you can redistribute it and/or modify
8239+ * it under the terms of the GNU General Public License version 2 as
8240+ * published by the Free Software Foundation.
8241+ */
8242+
8243+#ifndef _SIOQ_H
8244+#define _SIOQ_H
8245+
8246+struct deletewh_args {
8247+ struct unionfs_dir_state *namelist;
8248+ struct dentry *dentry;
8249+ int bindex;
8250+};
8251+
8252+struct is_opaque_args {
8253+ struct dentry *dentry;
8254+};
8255+
8256+struct create_args {
8257+ struct inode *parent;
8258+ struct dentry *dentry;
8259+ umode_t mode;
8260+ struct nameidata *nd;
8261+};
8262+
8263+struct mkdir_args {
8264+ struct inode *parent;
8265+ struct dentry *dentry;
8266+ umode_t mode;
8267+};
8268+
8269+struct mknod_args {
8270+ struct inode *parent;
8271+ struct dentry *dentry;
8272+ umode_t mode;
8273+ dev_t dev;
8274+};
8275+
8276+struct symlink_args {
8277+ struct inode *parent;
8278+ struct dentry *dentry;
8279+ char *symbuf;
8280+};
8281+
8282+struct unlink_args {
8283+ struct inode *parent;
8284+ struct dentry *dentry;
8285+};
8286+
8287+
8288+struct sioq_args {
8289+ struct completion comp;
8290+ struct work_struct work;
8291+ int err;
8292+ void *ret;
8293+
8294+ union {
8295+ struct deletewh_args deletewh;
8296+ struct is_opaque_args is_opaque;
8297+ struct create_args create;
8298+ struct mkdir_args mkdir;
8299+ struct mknod_args mknod;
8300+ struct symlink_args symlink;
8301+ struct unlink_args unlink;
8302+ };
8303+};
8304+
8305+/* Extern definitions for SIOQ functions */
8306+extern int __init init_sioq(void);
8307+extern void stop_sioq(void);
8308+extern void run_sioq(work_func_t func, struct sioq_args *args);
8309+
8310+/* Extern definitions for our privilege escalation helpers */
8311+extern void __unionfs_create(struct work_struct *work);
8312+extern void __unionfs_mkdir(struct work_struct *work);
8313+extern void __unionfs_mknod(struct work_struct *work);
8314+extern void __unionfs_symlink(struct work_struct *work);
8315+extern void __unionfs_unlink(struct work_struct *work);
8316+extern void __delete_whiteouts(struct work_struct *work);
8317+extern void __is_opaque_dir(struct work_struct *work);
8318+
8319+#endif /* not _SIOQ_H */
0c5527e5
AM
8320diff --git a/fs/unionfs/subr.c b/fs/unionfs/subr.c
8321new file mode 100644
8322index 0000000..570a344
8323--- /dev/null
8324+++ b/fs/unionfs/subr.c
2380c486
JR
8325@@ -0,0 +1,95 @@
8326+/*
7670a7fc 8327+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
8328+ * Copyright (c) 2003-2006 Charles P. Wright
8329+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
8330+ * Copyright (c) 2005-2006 Junjiro Okajima
8331+ * Copyright (c) 2005 Arun M. Krishnakumar
8332+ * Copyright (c) 2004-2006 David P. Quigley
8333+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
8334+ * Copyright (c) 2003 Puja Gupta
8335+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
8336+ * Copyright (c) 2003-2010 Stony Brook University
8337+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
8338+ *
8339+ * This program is free software; you can redistribute it and/or modify
8340+ * it under the terms of the GNU General Public License version 2 as
8341+ * published by the Free Software Foundation.
8342+ */
8343+
8344+#include "union.h"
8345+
8346+/*
8347+ * returns the right n_link value based on the inode type
8348+ */
8349+int unionfs_get_nlinks(const struct inode *inode)
8350+{
8351+ /* don't bother to do all the work since we're unlinked */
8352+ if (inode->i_nlink == 0)
8353+ return 0;
8354+
8355+ if (!S_ISDIR(inode->i_mode))
8356+ return unionfs_lower_inode(inode)->i_nlink;
8357+
8358+ /*
8359+ * For directories, we return 1. The only place that could cares
8360+ * about links is readdir, and there's d_type there so even that
8361+ * doesn't matter.
8362+ */
8363+ return 1;
8364+}
8365+
8366+/* copy a/m/ctime from the lower branch with the newest times */
8367+void unionfs_copy_attr_times(struct inode *upper)
8368+{
8369+ int bindex;
8370+ struct inode *lower;
8371+
8372+ if (!upper)
8373+ return;
8374+ if (ibstart(upper) < 0) {
8375+#ifdef CONFIG_UNION_FS_DEBUG
8376+ WARN_ON(ibstart(upper) < 0);
8377+#endif /* CONFIG_UNION_FS_DEBUG */
8378+ return;
8379+ }
8380+ for (bindex = ibstart(upper); bindex <= ibend(upper); bindex++) {
8381+ lower = unionfs_lower_inode_idx(upper, bindex);
8382+ if (!lower)
8383+ continue; /* not all lower dir objects may exist */
8384+ if (unlikely(timespec_compare(&upper->i_mtime,
8385+ &lower->i_mtime) < 0))
8386+ upper->i_mtime = lower->i_mtime;
8387+ if (unlikely(timespec_compare(&upper->i_ctime,
8388+ &lower->i_ctime) < 0))
8389+ upper->i_ctime = lower->i_ctime;
8390+ if (unlikely(timespec_compare(&upper->i_atime,
8391+ &lower->i_atime) < 0))
8392+ upper->i_atime = lower->i_atime;
8393+ }
8394+}
8395+
8396+/*
8397+ * A unionfs/fanout version of fsstack_copy_attr_all. Uses a
8398+ * unionfs_get_nlinks to properly calcluate the number of links to a file.
8399+ * Also, copies the max() of all a/m/ctimes for all lower inodes (which is
8400+ * important if the lower inode is a directory type)
8401+ */
8402+void unionfs_copy_attr_all(struct inode *dest,
8403+ const struct inode *src)
8404+{
8405+ dest->i_mode = src->i_mode;
8406+ dest->i_uid = src->i_uid;
8407+ dest->i_gid = src->i_gid;
8408+ dest->i_rdev = src->i_rdev;
8409+
8410+ unionfs_copy_attr_times(dest);
8411+
8412+ dest->i_blkbits = src->i_blkbits;
8413+ dest->i_flags = src->i_flags;
8414+
8415+ /*
8416+ * Update the nlinks AFTER updating the above fields, because the
8417+ * get_links callback may depend on them.
8418+ */
8419+ dest->i_nlink = unionfs_get_nlinks(dest);
8420+}
0c5527e5
AM
8421diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
8422new file mode 100644
8423index 0000000..45bb9bf
8424--- /dev/null
8425+++ b/fs/unionfs/super.c
8426@@ -0,0 +1,1029 @@
2380c486 8427+/*
7670a7fc 8428+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
8429+ * Copyright (c) 2003-2006 Charles P. Wright
8430+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
8431+ * Copyright (c) 2005-2006 Junjiro Okajima
8432+ * Copyright (c) 2005 Arun M. Krishnakumar
8433+ * Copyright (c) 2004-2006 David P. Quigley
8434+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
8435+ * Copyright (c) 2003 Puja Gupta
8436+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
8437+ * Copyright (c) 2003-2010 Stony Brook University
8438+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
8439+ *
8440+ * This program is free software; you can redistribute it and/or modify
8441+ * it under the terms of the GNU General Public License version 2 as
8442+ * published by the Free Software Foundation.
8443+ */
8444+
8445+#include "union.h"
8446+
8447+/*
8448+ * The inode cache is used with alloc_inode for both our inode info and the
8449+ * vfs inode.
8450+ */
8451+static struct kmem_cache *unionfs_inode_cachep;
8452+
8453+struct inode *unionfs_iget(struct super_block *sb, unsigned long ino)
8454+{
8455+ int size;
8456+ struct unionfs_inode_info *info;
8457+ struct inode *inode;
8458+
8459+ inode = iget_locked(sb, ino);
8460+ if (!inode)
8461+ return ERR_PTR(-ENOMEM);
8462+ if (!(inode->i_state & I_NEW))
8463+ return inode;
8464+
8465+ info = UNIONFS_I(inode);
8466+ memset(info, 0, offsetof(struct unionfs_inode_info, vfs_inode));
8467+ info->bstart = -1;
8468+ info->bend = -1;
8469+ atomic_set(&info->generation,
8470+ atomic_read(&UNIONFS_SB(inode->i_sb)->generation));
8471+ spin_lock_init(&info->rdlock);
8472+ info->rdcount = 1;
8473+ info->hashsize = -1;
8474+ INIT_LIST_HEAD(&info->readdircache);
8475+
8476+ size = sbmax(inode->i_sb) * sizeof(struct inode *);
8477+ info->lower_inodes = kzalloc(size, GFP_KERNEL);
8478+ if (unlikely(!info->lower_inodes)) {
8479+ printk(KERN_CRIT "unionfs: no kernel memory when allocating "
8480+ "lower-pointer array!\n");
8481+ iget_failed(inode);
8482+ return ERR_PTR(-ENOMEM);
8483+ }
8484+
8485+ inode->i_version++;
8486+ inode->i_op = &unionfs_main_iops;
8487+ inode->i_fop = &unionfs_main_fops;
8488+
8489+ inode->i_mapping->a_ops = &unionfs_aops;
8490+
8491+ /*
8492+ * reset times so unionfs_copy_attr_all can keep out time invariants
8493+ * right (upper inode time being the max of all lower ones).
8494+ */
8495+ inode->i_atime.tv_sec = inode->i_atime.tv_nsec = 0;
8496+ inode->i_mtime.tv_sec = inode->i_mtime.tv_nsec = 0;
8497+ inode->i_ctime.tv_sec = inode->i_ctime.tv_nsec = 0;
8498+ unlock_new_inode(inode);
8499+ return inode;
8500+}
8501+
8502+/*
2380c486
JR
8503+ * final actions when unmounting a file system
8504+ *
8505+ * No need to lock rwsem.
8506+ */
8507+static void unionfs_put_super(struct super_block *sb)
8508+{
8509+ int bindex, bstart, bend;
8510+ struct unionfs_sb_info *spd;
8511+ int leaks = 0;
8512+
8513+ spd = UNIONFS_SB(sb);
8514+ if (!spd)
8515+ return;
8516+
8517+ bstart = sbstart(sb);
8518+ bend = sbend(sb);
8519+
8520+ /* Make sure we have no leaks of branchget/branchput. */
8521+ for (bindex = bstart; bindex <= bend; bindex++)
8522+ if (unlikely(branch_count(sb, bindex) != 0)) {
8523+ printk(KERN_CRIT
8524+ "unionfs: branch %d has %d references left!\n",
8525+ bindex, branch_count(sb, bindex));
8526+ leaks = 1;
8527+ }
8528+ WARN_ON(leaks != 0);
8529+
8530+ /* decrement lower super references */
8531+ for (bindex = bstart; bindex <= bend; bindex++) {
8532+ struct super_block *s;
8533+ s = unionfs_lower_super_idx(sb, bindex);
8534+ unionfs_set_lower_super_idx(sb, bindex, NULL);
8535+ atomic_dec(&s->s_active);
8536+ }
8537+
8538+ kfree(spd->dev_name);
8539+ kfree(spd->data);
8540+ kfree(spd);
8541+ sb->s_fs_info = NULL;
8542+}
8543+
8544+/*
8545+ * Since people use this to answer the "How big of a file can I write?"
8546+ * question, we report the size of the highest priority branch as the size of
8547+ * the union.
8548+ */
8549+static int unionfs_statfs(struct dentry *dentry, struct kstatfs *buf)
8550+{
8551+ int err = 0;
8552+ struct super_block *sb;
8553+ struct dentry *lower_dentry;
8554+ struct dentry *parent;
0c5527e5 8555+ struct path lower_path;
2380c486
JR
8556+ bool valid;
8557+
8558+ sb = dentry->d_sb;
8559+
8560+ unionfs_read_lock(sb, UNIONFS_SMUTEX_CHILD);
8561+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
8562+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
8563+
8564+ valid = __unionfs_d_revalidate(dentry, parent, false);
8565+ if (unlikely(!valid)) {
8566+ err = -ESTALE;
8567+ goto out;
8568+ }
8569+ unionfs_check_dentry(dentry);
8570+
8571+ lower_dentry = unionfs_lower_dentry(sb->s_root);
0c5527e5
AM
8572+ lower_path.dentry = lower_dentry;
8573+ lower_path.mnt = unionfs_mntget(sb->s_root, 0);
8574+ err = vfs_statfs(&lower_path, buf);
8575+ mntput(lower_path.mnt);
2380c486
JR
8576+
8577+ /* set return buf to our f/s to avoid confusing user-level utils */
8578+ buf->f_type = UNIONFS_SUPER_MAGIC;
8579+ /*
8580+ * Our maximum file name can is shorter by a few bytes because every
8581+ * file name could potentially be whited-out.
8582+ *
8583+ * XXX: this restriction goes away with ODF.
8584+ */
8585+ unionfs_set_max_namelen(&buf->f_namelen);
8586+
8587+ /*
8588+ * reset two fields to avoid confusing user-land.
8589+ * XXX: is this still necessary?
8590+ */
8591+ memset(&buf->f_fsid, 0, sizeof(__kernel_fsid_t));
8592+ memset(&buf->f_spare, 0, sizeof(buf->f_spare));
8593+
8594+out:
8595+ unionfs_check_dentry(dentry);
8596+ unionfs_unlock_dentry(dentry);
8597+ unionfs_unlock_parent(dentry, parent);
8598+ unionfs_read_unlock(sb);
8599+ return err;
8600+}
8601+
8602+/* handle mode changing during remount */
8603+static noinline_for_stack int do_remount_mode_option(
8604+ char *optarg,
8605+ int cur_branches,
8606+ struct unionfs_data *new_data,
8607+ struct path *new_lower_paths)
8608+{
8609+ int err = -EINVAL;
8610+ int perms, idx;
8611+ char *modename = strchr(optarg, '=');
8612+ struct nameidata nd;
8613+
8614+ /* by now, optarg contains the branch name */
8615+ if (!*optarg) {
8616+ printk(KERN_ERR
8617+ "unionfs: no branch specified for mode change\n");
8618+ goto out;
8619+ }
8620+ if (!modename) {
8621+ printk(KERN_ERR "unionfs: branch \"%s\" requires a mode\n",
8622+ optarg);
8623+ goto out;
8624+ }
8625+ *modename++ = '\0';
8626+ err = parse_branch_mode(modename, &perms);
8627+ if (err) {
8628+ printk(KERN_ERR "unionfs: invalid mode \"%s\" for \"%s\"\n",
8629+ modename, optarg);
8630+ goto out;
8631+ }
8632+
8633+ /*
8634+ * Find matching branch index. For now, this assumes that nothing
8635+ * has been mounted on top of this Unionfs stack. Once we have /odf
8636+ * and cache-coherency resolved, we'll address the branch-path
8637+ * uniqueness.
8638+ */
8639+ err = path_lookup(optarg, LOOKUP_FOLLOW, &nd);
8640+ if (err) {
8641+ printk(KERN_ERR "unionfs: error accessing "
8642+ "lower directory \"%s\" (error %d)\n",
8643+ optarg, err);
8644+ goto out;
8645+ }
8646+ for (idx = 0; idx < cur_branches; idx++)
8647+ if (nd.path.mnt == new_lower_paths[idx].mnt &&
8648+ nd.path.dentry == new_lower_paths[idx].dentry)
8649+ break;
8650+ path_put(&nd.path); /* no longer needed */
8651+ if (idx == cur_branches) {
8652+ err = -ENOENT; /* err may have been reset above */
8653+ printk(KERN_ERR "unionfs: branch \"%s\" "
8654+ "not found\n", optarg);
8655+ goto out;
8656+ }
8657+ /* check/change mode for existing branch */
8658+ /* we don't warn if perms==branchperms */
8659+ new_data[idx].branchperms = perms;
8660+ err = 0;
8661+out:
8662+ return err;
8663+}
8664+
8665+/* handle branch deletion during remount */
8666+static noinline_for_stack int do_remount_del_option(
8667+ char *optarg, int cur_branches,
8668+ struct unionfs_data *new_data,
8669+ struct path *new_lower_paths)
8670+{
8671+ int err = -EINVAL;
8672+ int idx;
8673+ struct nameidata nd;
8674+
8675+ /* optarg contains the branch name to delete */
8676+
8677+ /*
8678+ * Find matching branch index. For now, this assumes that nothing
8679+ * has been mounted on top of this Unionfs stack. Once we have /odf
8680+ * and cache-coherency resolved, we'll address the branch-path
8681+ * uniqueness.
8682+ */
8683+ err = path_lookup(optarg, LOOKUP_FOLLOW, &nd);
8684+ if (err) {
8685+ printk(KERN_ERR "unionfs: error accessing "
8686+ "lower directory \"%s\" (error %d)\n",
8687+ optarg, err);
8688+ goto out;
8689+ }
8690+ for (idx = 0; idx < cur_branches; idx++)
8691+ if (nd.path.mnt == new_lower_paths[idx].mnt &&
8692+ nd.path.dentry == new_lower_paths[idx].dentry)
8693+ break;
8694+ path_put(&nd.path); /* no longer needed */
8695+ if (idx == cur_branches) {
8696+ printk(KERN_ERR "unionfs: branch \"%s\" "
8697+ "not found\n", optarg);
8698+ err = -ENOENT;
8699+ goto out;
8700+ }
8701+ /* check if there are any open files on the branch to be deleted */
8702+ if (atomic_read(&new_data[idx].open_files) > 0) {
8703+ err = -EBUSY;
8704+ goto out;
8705+ }
8706+
8707+ /*
8708+ * Now we have to delete the branch. First, release any handles it
8709+ * has. Then, move the remaining array indexes past "idx" in
8710+ * new_data and new_lower_paths one to the left. Finally, adjust
8711+ * cur_branches.
8712+ */
8713+ path_put(&new_lower_paths[idx]);
8714+
8715+ if (idx < cur_branches - 1) {
8716+ /* if idx==cur_branches-1, we delete last branch: easy */
8717+ memmove(&new_data[idx], &new_data[idx+1],
8718+ (cur_branches - 1 - idx) *
8719+ sizeof(struct unionfs_data));
8720+ memmove(&new_lower_paths[idx], &new_lower_paths[idx+1],
8721+ (cur_branches - 1 - idx) * sizeof(struct path));
8722+ }
8723+
8724+ err = 0;
8725+out:
8726+ return err;
8727+}
8728+
8729+/* handle branch insertion during remount */
8730+static noinline_for_stack int do_remount_add_option(
8731+ char *optarg, int cur_branches,
8732+ struct unionfs_data *new_data,
8733+ struct path *new_lower_paths,
8734+ int *high_branch_id)
8735+{
8736+ int err = -EINVAL;
8737+ int perms;
8738+ int idx = 0; /* default: insert at beginning */
8739+ char *new_branch , *modename = NULL;
8740+ struct nameidata nd;
8741+
8742+ /*
8743+ * optarg can be of several forms:
8744+ *
8745+ * /bar:/foo insert /foo before /bar
8746+ * /bar:/foo=ro insert /foo in ro mode before /bar
8747+ * /foo insert /foo in the beginning (prepend)
8748+ * :/foo insert /foo at the end (append)
8749+ */
8750+ if (*optarg == ':') { /* append? */
8751+ new_branch = optarg + 1; /* skip ':' */
8752+ idx = cur_branches;
8753+ goto found_insertion_point;
8754+ }
8755+ new_branch = strchr(optarg, ':');
8756+ if (!new_branch) { /* prepend? */
8757+ new_branch = optarg;
8758+ goto found_insertion_point;
8759+ }
8760+ *new_branch++ = '\0'; /* holds path+mode of new branch */
8761+
8762+ /*
8763+ * Find matching branch index. For now, this assumes that nothing
8764+ * has been mounted on top of this Unionfs stack. Once we have /odf
8765+ * and cache-coherency resolved, we'll address the branch-path
8766+ * uniqueness.
8767+ */
8768+ err = path_lookup(optarg, LOOKUP_FOLLOW, &nd);
8769+ if (err) {
8770+ printk(KERN_ERR "unionfs: error accessing "
8771+ "lower directory \"%s\" (error %d)\n",
8772+ optarg, err);
8773+ goto out;
8774+ }
8775+ for (idx = 0; idx < cur_branches; idx++)
8776+ if (nd.path.mnt == new_lower_paths[idx].mnt &&
8777+ nd.path.dentry == new_lower_paths[idx].dentry)
8778+ break;
8779+ path_put(&nd.path); /* no longer needed */
8780+ if (idx == cur_branches) {
8781+ printk(KERN_ERR "unionfs: branch \"%s\" "
8782+ "not found\n", optarg);
8783+ err = -ENOENT;
8784+ goto out;
8785+ }
8786+
8787+ /*
8788+ * At this point idx will hold the index where the new branch should
8789+ * be inserted before.
8790+ */
8791+found_insertion_point:
8792+ /* find the mode for the new branch */
8793+ if (new_branch)
8794+ modename = strchr(new_branch, '=');
8795+ if (modename)
8796+ *modename++ = '\0';
8797+ if (!new_branch || !*new_branch) {
8798+ printk(KERN_ERR "unionfs: null new branch\n");
8799+ err = -EINVAL;
8800+ goto out;
8801+ }
8802+ err = parse_branch_mode(modename, &perms);
8803+ if (err) {
8804+ printk(KERN_ERR "unionfs: invalid mode \"%s\" for "
8805+ "branch \"%s\"\n", modename, new_branch);
8806+ goto out;
8807+ }
8808+ err = path_lookup(new_branch, LOOKUP_FOLLOW, &nd);
8809+ if (err) {
8810+ printk(KERN_ERR "unionfs: error accessing "
8811+ "lower directory \"%s\" (error %d)\n",
8812+ new_branch, err);
8813+ goto out;
8814+ }
8815+ /*
8816+ * It's probably safe to check_mode the new branch to insert. Note:
8817+ * we don't allow inserting branches which are unionfs's by
8818+ * themselves (check_branch returns EINVAL in that case). This is
8819+ * because this code base doesn't support stacking unionfs: the ODF
8820+ * code base supports that correctly.
8821+ */
8822+ err = check_branch(&nd);
8823+ if (err) {
8824+ printk(KERN_ERR "unionfs: lower directory "
8825+ "\"%s\" is not a valid branch\n", optarg);
8826+ path_put(&nd.path);
8827+ goto out;
8828+ }
8829+
8830+ /*
8831+ * Now we have to insert the new branch. But first, move the bits
8832+ * to make space for the new branch, if needed. Finally, adjust
8833+ * cur_branches.
8834+ * We don't release nd here; it's kept until umount/remount.
8835+ */
8836+ if (idx < cur_branches) {
8837+ /* if idx==cur_branches, we append: easy */
8838+ memmove(&new_data[idx+1], &new_data[idx],
8839+ (cur_branches - idx) * sizeof(struct unionfs_data));
8840+ memmove(&new_lower_paths[idx+1], &new_lower_paths[idx],
8841+ (cur_branches - idx) * sizeof(struct path));
8842+ }
8843+ new_lower_paths[idx].dentry = nd.path.dentry;
8844+ new_lower_paths[idx].mnt = nd.path.mnt;
8845+
8846+ new_data[idx].sb = nd.path.dentry->d_sb;
8847+ atomic_set(&new_data[idx].open_files, 0);
8848+ new_data[idx].branchperms = perms;
8849+ new_data[idx].branch_id = ++*high_branch_id; /* assign new branch ID */
8850+
8851+ err = 0;
8852+out:
8853+ return err;
8854+}
8855+
8856+
8857+/*
8858+ * Support branch management options on remount.
8859+ *
8860+ * See Documentation/filesystems/unionfs/ for details.
8861+ *
8862+ * @flags: numeric mount options
8863+ * @options: mount options string
8864+ *
8865+ * This function can rearrange a mounted union dynamically, adding and
8866+ * removing branches, including changing branch modes. Clearly this has to
8867+ * be done safely and atomically. Luckily, the VFS already calls this
8868+ * function with lock_super(sb) and lock_kernel() held, preventing
8869+ * concurrent mixing of new mounts, remounts, and unmounts. Moreover,
8870+ * do_remount_sb(), our caller function, already called shrink_dcache_sb(sb)
8871+ * to purge dentries/inodes from our superblock, and also called
8872+ * fsync_super(sb) to purge any dirty pages. So we're good.
8873+ *
8874+ * XXX: however, our remount code may also need to invalidate mapped pages
8875+ * so as to force them to be re-gotten from the (newly reconfigured) lower
8876+ * branches. This has to wait for proper mmap and cache coherency support
8877+ * in the VFS.
8878+ *
8879+ */
8880+static int unionfs_remount_fs(struct super_block *sb, int *flags,
8881+ char *options)
8882+{
8883+ int err = 0;
8884+ int i;
8885+ char *optionstmp, *tmp_to_free; /* kstrdup'ed of "options" */
8886+ char *optname;
8887+ int cur_branches = 0; /* no. of current branches */
8888+ int new_branches = 0; /* no. of branches actually left in the end */
8889+ int add_branches; /* est. no. of branches to add */
8890+ int del_branches; /* est. no. of branches to del */
8891+ int max_branches; /* max possible no. of branches */
8892+ struct unionfs_data *new_data = NULL, *tmp_data = NULL;
8893+ struct path *new_lower_paths = NULL, *tmp_lower_paths = NULL;
8894+ struct inode **new_lower_inodes = NULL;
8895+ int new_high_branch_id; /* new high branch ID */
8896+ int size; /* memory allocation size, temp var */
8897+ int old_ibstart, old_ibend;
8898+
8899+ unionfs_write_lock(sb);
8900+
8901+ /*
8902+ * The VFS will take care of "ro" and "rw" flags, and we can safely
8903+ * ignore MS_SILENT, but anything else left over is an error. So we
8904+ * need to check if any other flags may have been passed (none are
8905+ * allowed/supported as of now).
8906+ */
8907+ if ((*flags & ~(MS_RDONLY | MS_SILENT)) != 0) {
8908+ printk(KERN_ERR
8909+ "unionfs: remount flags 0x%x unsupported\n", *flags);
8910+ err = -EINVAL;
8911+ goto out_error;
8912+ }
8913+
8914+ /*
8915+ * If 'options' is NULL, it's probably because the user just changed
8916+ * the union to a "ro" or "rw" and the VFS took care of it. So
8917+ * nothing to do and we're done.
8918+ */
8919+ if (!options || options[0] == '\0')
8920+ goto out_error;
8921+
8922+ /*
8923+ * Find out how many branches we will have in the end, counting
8924+ * "add" and "del" commands. Copy the "options" string because
8925+ * strsep modifies the string and we need it later.
8926+ */
8927+ tmp_to_free = kstrdup(options, GFP_KERNEL);
8928+ optionstmp = tmp_to_free;
8929+ if (unlikely(!optionstmp)) {
8930+ err = -ENOMEM;
8931+ goto out_free;
8932+ }
8933+ cur_branches = sbmax(sb); /* current no. branches */
8934+ new_branches = sbmax(sb);
8935+ del_branches = 0;
8936+ add_branches = 0;
8937+ new_high_branch_id = sbhbid(sb); /* save current high_branch_id */
8938+ while ((optname = strsep(&optionstmp, ",")) != NULL) {
8939+ char *optarg;
8940+
8941+ if (!optname || !*optname)
8942+ continue;
8943+
8944+ optarg = strchr(optname, '=');
8945+ if (optarg)
8946+ *optarg++ = '\0';
8947+
8948+ if (!strcmp("add", optname))
8949+ add_branches++;
8950+ else if (!strcmp("del", optname))
8951+ del_branches++;
8952+ }
8953+ kfree(tmp_to_free);
8954+ /* after all changes, will we have at least one branch left? */
8955+ if ((new_branches + add_branches - del_branches) < 1) {
8956+ printk(KERN_ERR
8957+ "unionfs: no branches left after remount\n");
8958+ err = -EINVAL;
8959+ goto out_free;
8960+ }
8961+
8962+ /*
8963+ * Since we haven't actually parsed all the add/del options, nor
8964+ * have we checked them for errors, we don't know for sure how many
8965+ * branches we will have after all changes have taken place. In
8966+ * fact, the total number of branches left could be less than what
8967+ * we have now. So we need to allocate space for a temporary
8968+ * placeholder that is at least as large as the maximum number of
8969+ * branches we *could* have, which is the current number plus all
8970+ * the additions. Once we're done with these temp placeholders, we
8971+ * may have to re-allocate the final size, copy over from the temp,
8972+ * and then free the temps (done near the end of this function).
8973+ */
8974+ max_branches = cur_branches + add_branches;
8975+ /* allocate space for new pointers to lower dentry */
8976+ tmp_data = kcalloc(max_branches,
8977+ sizeof(struct unionfs_data), GFP_KERNEL);
8978+ if (unlikely(!tmp_data)) {
8979+ err = -ENOMEM;
8980+ goto out_free;
8981+ }
8982+ /* allocate space for new pointers to lower paths */
8983+ tmp_lower_paths = kcalloc(max_branches,
8984+ sizeof(struct path), GFP_KERNEL);
8985+ if (unlikely(!tmp_lower_paths)) {
8986+ err = -ENOMEM;
8987+ goto out_free;
8988+ }
8989+ /* copy current info into new placeholders, incrementing refcnts */
8990+ memcpy(tmp_data, UNIONFS_SB(sb)->data,
8991+ cur_branches * sizeof(struct unionfs_data));
8992+ memcpy(tmp_lower_paths, UNIONFS_D(sb->s_root)->lower_paths,
8993+ cur_branches * sizeof(struct path));
8994+ for (i = 0; i < cur_branches; i++)
8995+ path_get(&tmp_lower_paths[i]); /* drop refs at end of fxn */
8996+
8997+ /*******************************************************************
8998+ * For each branch command, do path_lookup on the requested branch,
8999+ * and apply the change to a temp branch list. To handle errors, we
9000+ * already dup'ed the old arrays (above), and increased the refcnts
9001+ * on various f/s objects. So now we can do all the path_lookups
9002+ * and branch-management commands on the new arrays. If it fail mid
9003+ * way, we free the tmp arrays and *put all objects. If we succeed,
9004+ * then we free old arrays and *put its objects, and then replace
9005+ * the arrays with the new tmp list (we may have to re-allocate the
9006+ * memory because the temp lists could have been larger than what we
9007+ * actually needed).
9008+ *******************************************************************/
9009+
9010+ while ((optname = strsep(&options, ",")) != NULL) {
9011+ char *optarg;
9012+
9013+ if (!optname || !*optname)
9014+ continue;
9015+ /*
9016+ * At this stage optname holds a comma-delimited option, but
9017+ * without the commas. Next, we need to break the string on
9018+ * the '=' symbol to separate CMD=ARG, where ARG itself can
9019+ * be KEY=VAL. For example, in mode=/foo=rw, CMD is "mode",
9020+ * KEY is "/foo", and VAL is "rw".
9021+ */
9022+ optarg = strchr(optname, '=');
9023+ if (optarg)
9024+ *optarg++ = '\0';
9025+ /* incgen remount option (instead of old ioctl) */
9026+ if (!strcmp("incgen", optname)) {
9027+ err = 0;
9028+ goto out_no_change;
9029+ }
9030+
9031+ /*
9032+ * All of our options take an argument now. (Insert ones
9033+ * that don't above this check.) So at this stage optname
9034+ * contains the CMD part and optarg contains the ARG part.
9035+ */
9036+ if (!optarg || !*optarg) {
9037+ printk(KERN_ERR "unionfs: all remount options require "
9038+ "an argument (%s)\n", optname);
9039+ err = -EINVAL;
9040+ goto out_release;
9041+ }
9042+
9043+ if (!strcmp("add", optname)) {
9044+ err = do_remount_add_option(optarg, new_branches,
9045+ tmp_data,
9046+ tmp_lower_paths,
9047+ &new_high_branch_id);
9048+ if (err)
9049+ goto out_release;
9050+ new_branches++;
9051+ if (new_branches > UNIONFS_MAX_BRANCHES) {
9052+ printk(KERN_ERR "unionfs: command exceeds "
9053+ "%d branches\n", UNIONFS_MAX_BRANCHES);
9054+ err = -E2BIG;
9055+ goto out_release;
9056+ }
9057+ continue;
9058+ }
9059+ if (!strcmp("del", optname)) {
9060+ err = do_remount_del_option(optarg, new_branches,
9061+ tmp_data,
9062+ tmp_lower_paths);
9063+ if (err)
9064+ goto out_release;
9065+ new_branches--;
9066+ continue;
9067+ }
9068+ if (!strcmp("mode", optname)) {
9069+ err = do_remount_mode_option(optarg, new_branches,
9070+ tmp_data,
9071+ tmp_lower_paths);
9072+ if (err)
9073+ goto out_release;
9074+ continue;
9075+ }
9076+
9077+ /*
9078+ * When you use "mount -o remount,ro", mount(8) will
9079+ * reportedly pass the original dirs= string from
9080+ * /proc/mounts. So for now, we have to ignore dirs= and
9081+ * not consider it an error, unless we want to allow users
9082+ * to pass dirs= in remount. Note that to allow the VFS to
9083+ * actually process the ro/rw remount options, we have to
9084+ * return 0 from this function.
9085+ */
9086+ if (!strcmp("dirs", optname)) {
9087+ printk(KERN_WARNING
9088+ "unionfs: remount ignoring option \"%s\"\n",
9089+ optname);
9090+ continue;
9091+ }
9092+
9093+ err = -EINVAL;
9094+ printk(KERN_ERR
9095+ "unionfs: unrecognized option \"%s\"\n", optname);
9096+ goto out_release;
9097+ }
9098+
9099+out_no_change:
9100+
9101+ /******************************************************************
9102+ * WE'RE ALMOST DONE: check if leftmost branch might be read-only,
9103+ * see if we need to allocate a small-sized new vector, copy the
9104+ * vectors to their correct place, release the refcnt of the older
9105+ * ones, and return. Also handle invalidating any pages that will
9106+ * have to be re-read.
9107+ *******************************************************************/
9108+
9109+ if (!(tmp_data[0].branchperms & MAY_WRITE)) {
9110+ printk(KERN_ERR "unionfs: leftmost branch cannot be read-only "
9111+ "(use \"remount,ro\" to create a read-only union)\n");
9112+ err = -EINVAL;
9113+ goto out_release;
9114+ }
9115+
9116+ /* (re)allocate space for new pointers to lower dentry */
9117+ size = new_branches * sizeof(struct unionfs_data);
9118+ new_data = krealloc(tmp_data, size, GFP_KERNEL);
9119+ if (unlikely(!new_data)) {
9120+ err = -ENOMEM;
9121+ goto out_release;
9122+ }
9123+
9124+ /* allocate space for new pointers to lower paths */
9125+ size = new_branches * sizeof(struct path);
9126+ new_lower_paths = krealloc(tmp_lower_paths, size, GFP_KERNEL);
9127+ if (unlikely(!new_lower_paths)) {
9128+ err = -ENOMEM;
9129+ goto out_release;
9130+ }
9131+
9132+ /* allocate space for new pointers to lower inodes */
9133+ new_lower_inodes = kcalloc(new_branches,
9134+ sizeof(struct inode *), GFP_KERNEL);
9135+ if (unlikely(!new_lower_inodes)) {
9136+ err = -ENOMEM;
9137+ goto out_release;
9138+ }
9139+
9140+ /*
9141+ * OK, just before we actually put the new set of branches in place,
9142+ * we need to ensure that our own f/s has no dirty objects left.
9143+ * Luckily, do_remount_sb() already calls shrink_dcache_sb(sb) and
9144+ * fsync_super(sb), taking care of dentries, inodes, and dirty
9145+ * pages. So all that's left is for us to invalidate any leftover
9146+ * (non-dirty) pages to ensure that they will be re-read from the
9147+ * new lower branches (and to support mmap).
9148+ */
9149+
9150+ /*
9151+ * Once we finish the remounting successfully, our superblock
9152+ * generation number will have increased. This will be detected by
9153+ * our dentry-revalidation code upon subsequent f/s operations
9154+ * through unionfs. The revalidation code will rebuild the union of
9155+ * lower inodes for a given unionfs inode and invalidate any pages
9156+ * of such "stale" inodes (by calling our purge_inode_data
9157+ * function). This revalidation will happen lazily and
9158+ * incrementally, as users perform operations on cached inodes. We
9159+ * would like to encourage this revalidation to happen sooner if
9160+ * possible, so we like to try to invalidate as many other pages in
9161+ * our superblock as we can. We used to call drop_pagecache_sb() or
9162+ * a variant thereof, but either method was racy (drop_caches alone
9163+ * is known to be racy). So now we let the revalidation happen on a
9164+ * per file basis in ->d_revalidate.
9165+ */
9166+
9167+ /* grab new lower super references; release old ones */
9168+ for (i = 0; i < new_branches; i++)
9169+ atomic_inc(&new_data[i].sb->s_active);
9170+ for (i = 0; i < sbmax(sb); i++)
9171+ atomic_dec(&UNIONFS_SB(sb)->data[i].sb->s_active);
9172+
9173+ /* copy new vectors into their correct place */
9174+ tmp_data = UNIONFS_SB(sb)->data;
9175+ UNIONFS_SB(sb)->data = new_data;
9176+ new_data = NULL; /* so don't free good pointers below */
9177+ tmp_lower_paths = UNIONFS_D(sb->s_root)->lower_paths;
9178+ UNIONFS_D(sb->s_root)->lower_paths = new_lower_paths;
9179+ new_lower_paths = NULL; /* so don't free good pointers below */
9180+
9181+ /* update our unionfs_sb_info and root dentry index of last branch */
9182+ i = sbmax(sb); /* save no. of branches to release at end */
9183+ sbend(sb) = new_branches - 1;
9184+ dbend(sb->s_root) = new_branches - 1;
9185+ old_ibstart = ibstart(sb->s_root->d_inode);
9186+ old_ibend = ibend(sb->s_root->d_inode);
9187+ ibend(sb->s_root->d_inode) = new_branches - 1;
9188+ UNIONFS_D(sb->s_root)->bcount = new_branches;
9189+ new_branches = i; /* no. of branches to release below */
9190+
9191+ /*
9192+ * Update lower inodes: 3 steps
9193+ * 1. grab ref on all new lower inodes
9194+ */
9195+ for (i = dbstart(sb->s_root); i <= dbend(sb->s_root); i++) {
9196+ struct dentry *lower_dentry =
9197+ unionfs_lower_dentry_idx(sb->s_root, i);
9198+ igrab(lower_dentry->d_inode);
9199+ new_lower_inodes[i] = lower_dentry->d_inode;
9200+ }
9201+ /* 2. release reference on all older lower inodes */
9202+ iput_lowers(sb->s_root->d_inode, old_ibstart, old_ibend, true);
9203+ /* 3. update root dentry's inode to new lower_inodes array */
9204+ UNIONFS_I(sb->s_root->d_inode)->lower_inodes = new_lower_inodes;
9205+ new_lower_inodes = NULL;
9206+
9207+ /* maxbytes may have changed */
9208+ sb->s_maxbytes = unionfs_lower_super_idx(sb, 0)->s_maxbytes;
9209+ /* update high branch ID */
9210+ sbhbid(sb) = new_high_branch_id;
9211+
9212+ /* update our sb->generation for revalidating objects */
9213+ i = atomic_inc_return(&UNIONFS_SB(sb)->generation);
9214+ atomic_set(&UNIONFS_D(sb->s_root)->generation, i);
9215+ atomic_set(&UNIONFS_I(sb->s_root->d_inode)->generation, i);
9216+ if (!(*flags & MS_SILENT))
9217+ pr_info("unionfs: %s: new generation number %d\n",
9218+ UNIONFS_SB(sb)->dev_name, i);
9219+ /* finally, update the root dentry's times */
9220+ unionfs_copy_attr_times(sb->s_root->d_inode);
9221+ err = 0; /* reset to success */
9222+
9223+ /*
9224+ * The code above falls through to the next label, and releases the
9225+ * refcnts of the older ones (stored in tmp_*): if we fell through
9226+ * here, it means success. However, if we jump directly to this
9227+ * label from any error above, then an error occurred after we
9228+ * grabbed various refcnts, and so we have to release the
9229+ * temporarily constructed structures.
9230+ */
9231+out_release:
9232+ /* no need to cleanup/release anything in tmp_data */
9233+ if (tmp_lower_paths)
9234+ for (i = 0; i < new_branches; i++)
9235+ path_put(&tmp_lower_paths[i]);
9236+out_free:
9237+ kfree(tmp_lower_paths);
9238+ kfree(tmp_data);
9239+ kfree(new_lower_paths);
9240+ kfree(new_data);
9241+ kfree(new_lower_inodes);
9242+out_error:
9243+ unionfs_check_dentry(sb->s_root);
9244+ unionfs_write_unlock(sb);
9245+ return err;
9246+}
9247+
9248+/*
9249+ * Called by iput() when the inode reference count reached zero
9250+ * and the inode is not hashed anywhere. Used to clear anything
9251+ * that needs to be, before the inode is completely destroyed and put
9252+ * on the inode free list.
9253+ *
9254+ * No need to lock sb info's rwsem.
9255+ */
0c5527e5 9256+static void unionfs_evict_inode(struct inode *inode)
2380c486
JR
9257+{
9258+ int bindex, bstart, bend;
9259+ struct inode *lower_inode;
9260+ struct list_head *pos, *n;
9261+ struct unionfs_dir_state *rdstate;
9262+
0c5527e5
AM
9263+ truncate_inode_pages(&inode->i_data, 0);
9264+ end_writeback(inode);
9265+
2380c486
JR
9266+ list_for_each_safe(pos, n, &UNIONFS_I(inode)->readdircache) {
9267+ rdstate = list_entry(pos, struct unionfs_dir_state, cache);
9268+ list_del(&rdstate->cache);
9269+ free_rdstate(rdstate);
9270+ }
9271+
9272+ /*
9273+ * Decrement a reference to a lower_inode, which was incremented
9274+ * by our read_inode when it was created initially.
9275+ */
9276+ bstart = ibstart(inode);
9277+ bend = ibend(inode);
9278+ if (bstart >= 0) {
9279+ for (bindex = bstart; bindex <= bend; bindex++) {
9280+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
9281+ if (!lower_inode)
9282+ continue;
9283+ unionfs_set_lower_inode_idx(inode, bindex, NULL);
9284+ /* see Documentation/filesystems/unionfs/issues.txt */
9285+ lockdep_off();
9286+ iput(lower_inode);
9287+ lockdep_on();
9288+ }
9289+ }
9290+
9291+ kfree(UNIONFS_I(inode)->lower_inodes);
9292+ UNIONFS_I(inode)->lower_inodes = NULL;
9293+}
9294+
9295+static struct inode *unionfs_alloc_inode(struct super_block *sb)
9296+{
9297+ struct unionfs_inode_info *i;
9298+
9299+ i = kmem_cache_alloc(unionfs_inode_cachep, GFP_KERNEL);
9300+ if (unlikely(!i))
9301+ return NULL;
9302+
9303+ /* memset everything up to the inode to 0 */
9304+ memset(i, 0, offsetof(struct unionfs_inode_info, vfs_inode));
9305+
9306+ i->vfs_inode.i_version = 1;
9307+ return &i->vfs_inode;
9308+}
9309+
9310+static void unionfs_destroy_inode(struct inode *inode)
9311+{
9312+ kmem_cache_free(unionfs_inode_cachep, UNIONFS_I(inode));
9313+}
9314+
9315+/* unionfs inode cache constructor */
9316+static void init_once(void *obj)
9317+{
9318+ struct unionfs_inode_info *i = obj;
9319+
9320+ inode_init_once(&i->vfs_inode);
9321+}
9322+
9323+int unionfs_init_inode_cache(void)
9324+{
9325+ int err = 0;
9326+
9327+ unionfs_inode_cachep =
9328+ kmem_cache_create("unionfs_inode_cache",
9329+ sizeof(struct unionfs_inode_info), 0,
9330+ SLAB_RECLAIM_ACCOUNT, init_once);
9331+ if (unlikely(!unionfs_inode_cachep))
9332+ err = -ENOMEM;
9333+ return err;
9334+}
9335+
9336+/* unionfs inode cache destructor */
9337+void unionfs_destroy_inode_cache(void)
9338+{
9339+ if (unionfs_inode_cachep)
9340+ kmem_cache_destroy(unionfs_inode_cachep);
9341+}
9342+
9343+/*
9344+ * Called when we have a dirty inode, right here we only throw out
9345+ * parts of our readdir list that are too old.
9346+ *
9347+ * No need to grab sb info's rwsem.
9348+ */
0c5527e5
AM
9349+static int unionfs_write_inode(struct inode *inode,
9350+ struct writeback_control *wbc)
2380c486
JR
9351+{
9352+ struct list_head *pos, *n;
9353+ struct unionfs_dir_state *rdstate;
9354+
9355+ spin_lock(&UNIONFS_I(inode)->rdlock);
9356+ list_for_each_safe(pos, n, &UNIONFS_I(inode)->readdircache) {
9357+ rdstate = list_entry(pos, struct unionfs_dir_state, cache);
9358+ /* We keep this list in LRU order. */
9359+ if ((rdstate->access + RDCACHE_JIFFIES) > jiffies)
9360+ break;
9361+ UNIONFS_I(inode)->rdcount--;
9362+ list_del(&rdstate->cache);
9363+ free_rdstate(rdstate);
9364+ }
9365+ spin_unlock(&UNIONFS_I(inode)->rdlock);
9366+
9367+ return 0;
9368+}
9369+
9370+/*
9371+ * Used only in nfs, to kill any pending RPC tasks, so that subsequent
9372+ * code can actually succeed and won't leave tasks that need handling.
9373+ */
9374+static void unionfs_umount_begin(struct super_block *sb)
9375+{
9376+ struct super_block *lower_sb;
9377+ int bindex, bstart, bend;
9378+
9379+ unionfs_read_lock(sb, UNIONFS_SMUTEX_CHILD);
9380+
9381+ bstart = sbstart(sb);
9382+ bend = sbend(sb);
9383+ for (bindex = bstart; bindex <= bend; bindex++) {
9384+ lower_sb = unionfs_lower_super_idx(sb, bindex);
9385+
9386+ if (lower_sb && lower_sb->s_op &&
9387+ lower_sb->s_op->umount_begin)
9388+ lower_sb->s_op->umount_begin(lower_sb);
9389+ }
9390+
9391+ unionfs_read_unlock(sb);
9392+}
9393+
9394+static int unionfs_show_options(struct seq_file *m, struct vfsmount *mnt)
9395+{
9396+ struct super_block *sb = mnt->mnt_sb;
9397+ int ret = 0;
9398+ char *tmp_page;
9399+ char *path;
9400+ int bindex, bstart, bend;
9401+ int perms;
9402+
9403+ unionfs_read_lock(sb, UNIONFS_SMUTEX_CHILD);
9404+
9405+ unionfs_lock_dentry(sb->s_root, UNIONFS_DMUTEX_CHILD);
9406+
9407+ tmp_page = (char *) __get_free_page(GFP_KERNEL);
9408+ if (unlikely(!tmp_page)) {
9409+ ret = -ENOMEM;
9410+ goto out;
9411+ }
9412+
9413+ bstart = sbstart(sb);
9414+ bend = sbend(sb);
9415+
9416+ seq_printf(m, ",dirs=");
9417+ for (bindex = bstart; bindex <= bend; bindex++) {
9418+ struct path p;
9419+ p.dentry = unionfs_lower_dentry_idx(sb->s_root, bindex);
9420+ p.mnt = unionfs_lower_mnt_idx(sb->s_root, bindex);
9421+ path = d_path(&p, tmp_page, PAGE_SIZE);
9422+ if (IS_ERR(path)) {
9423+ ret = PTR_ERR(path);
9424+ goto out;
9425+ }
9426+
9427+ perms = branchperms(sb, bindex);
9428+
9429+ seq_printf(m, "%s=%s", path,
9430+ perms & MAY_WRITE ? "rw" : "ro");
9431+ if (bindex != bend)
9432+ seq_printf(m, ":");
9433+ }
9434+
9435+out:
9436+ free_page((unsigned long) tmp_page);
9437+
9438+ unionfs_unlock_dentry(sb->s_root);
9439+
9440+ unionfs_read_unlock(sb);
9441+
9442+ return ret;
9443+}
9444+
9445+struct super_operations unionfs_sops = {
2380c486
JR
9446+ .put_super = unionfs_put_super,
9447+ .statfs = unionfs_statfs,
9448+ .remount_fs = unionfs_remount_fs,
0c5527e5 9449+ .evict_inode = unionfs_evict_inode,
2380c486
JR
9450+ .umount_begin = unionfs_umount_begin,
9451+ .show_options = unionfs_show_options,
9452+ .write_inode = unionfs_write_inode,
9453+ .alloc_inode = unionfs_alloc_inode,
9454+ .destroy_inode = unionfs_destroy_inode,
9455+};
0c5527e5
AM
9456diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
9457new file mode 100644
82260373 9458index 0000000..6c7b9aa
0c5527e5
AM
9459--- /dev/null
9460+++ b/fs/unionfs/union.h
9461@@ -0,0 +1,669 @@
2380c486 9462+/*
7670a7fc 9463+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
9464+ * Copyright (c) 2003-2006 Charles P. Wright
9465+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
9466+ * Copyright (c) 2005 Arun M. Krishnakumar
9467+ * Copyright (c) 2004-2006 David P. Quigley
9468+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
9469+ * Copyright (c) 2003 Puja Gupta
9470+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
9471+ * Copyright (c) 2003-2010 Stony Brook University
9472+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
9473+ *
9474+ * This program is free software; you can redistribute it and/or modify
9475+ * it under the terms of the GNU General Public License version 2 as
9476+ * published by the Free Software Foundation.
9477+ */
9478+
9479+#ifndef _UNION_H_
9480+#define _UNION_H_
9481+
9482+#include <linux/dcache.h>
9483+#include <linux/file.h>
9484+#include <linux/list.h>
9485+#include <linux/fs.h>
9486+#include <linux/mm.h>
9487+#include <linux/module.h>
9488+#include <linux/mount.h>
9489+#include <linux/namei.h>
9490+#include <linux/page-flags.h>
9491+#include <linux/pagemap.h>
9492+#include <linux/poll.h>
9493+#include <linux/security.h>
9494+#include <linux/seq_file.h>
9495+#include <linux/slab.h>
9496+#include <linux/spinlock.h>
9497+#include <linux/smp_lock.h>
9498+#include <linux/statfs.h>
9499+#include <linux/string.h>
9500+#include <linux/vmalloc.h>
9501+#include <linux/writeback.h>
9502+#include <linux/buffer_head.h>
9503+#include <linux/xattr.h>
9504+#include <linux/fs_stack.h>
9505+#include <linux/magic.h>
9506+#include <linux/log2.h>
9507+#include <linux/poison.h>
9508+#include <linux/mman.h>
9509+#include <linux/backing-dev.h>
9510+#include <linux/splice.h>
9511+
9512+#include <asm/system.h>
9513+
9514+#include <linux/union_fs.h>
9515+
9516+/* the file system name */
9517+#define UNIONFS_NAME "unionfs"
9518+
9519+/* unionfs root inode number */
9520+#define UNIONFS_ROOT_INO 1
9521+
9522+/* number of times we try to get a unique temporary file name */
9523+#define GET_TMPNAM_MAX_RETRY 5
9524+
9525+/* maximum number of branches we support, to avoid memory blowup */
9526+#define UNIONFS_MAX_BRANCHES 128
9527+
9528+/* minimum time (seconds) required for time-based cache-coherency */
9529+#define UNIONFS_MIN_CC_TIME 3
9530+
9531+/* Operations vectors defined in specific files. */
9532+extern struct file_operations unionfs_main_fops;
9533+extern struct file_operations unionfs_dir_fops;
9534+extern struct inode_operations unionfs_main_iops;
9535+extern struct inode_operations unionfs_dir_iops;
9536+extern struct inode_operations unionfs_symlink_iops;
9537+extern struct super_operations unionfs_sops;
9538+extern struct dentry_operations unionfs_dops;
9539+extern struct address_space_operations unionfs_aops, unionfs_dummy_aops;
9540+extern struct vm_operations_struct unionfs_vm_ops;
9541+
9542+/* How long should an entry be allowed to persist */
9543+#define RDCACHE_JIFFIES (5*HZ)
9544+
9545+/* compatibility with Real-Time patches */
9546+#ifdef CONFIG_PREEMPT_RT
9547+# define unionfs_rw_semaphore compat_rw_semaphore
9548+#else /* not CONFIG_PREEMPT_RT */
9549+# define unionfs_rw_semaphore rw_semaphore
9550+#endif /* not CONFIG_PREEMPT_RT */
9551+
9552+/* file private data. */
9553+struct unionfs_file_info {
9554+ int bstart;
9555+ int bend;
9556+ atomic_t generation;
9557+
9558+ struct unionfs_dir_state *rdstate;
9559+ struct file **lower_files;
9560+ int *saved_branch_ids; /* IDs of branches when file was opened */
7670a7fc 9561+ const struct vm_operations_struct *lower_vm_ops;
2380c486
JR
9562+ bool wrote_to_file; /* for delayed copyup */
9563+};
9564+
9565+/* unionfs inode data in memory */
9566+struct unionfs_inode_info {
9567+ int bstart;
9568+ int bend;
9569+ atomic_t generation;
9570+ /* Stuff for readdir over NFS. */
9571+ spinlock_t rdlock;
9572+ struct list_head readdircache;
9573+ int rdcount;
9574+ int hashsize;
9575+ int cookie;
9576+
9577+ /* The lower inodes */
9578+ struct inode **lower_inodes;
9579+
9580+ struct inode vfs_inode;
9581+};
9582+
9583+/* unionfs dentry data in memory */
9584+struct unionfs_dentry_info {
9585+ /*
9586+ * The semaphore is used to lock the dentry as soon as we get into a
9587+ * unionfs function from the VFS. Our lock ordering is that children
9588+ * go before their parents.
9589+ */
9590+ struct mutex lock;
9591+ int bstart;
9592+ int bend;
9593+ int bopaque;
9594+ int bcount;
9595+ atomic_t generation;
9596+ struct path *lower_paths;
9597+};
9598+
9599+/* These are the pointers to our various objects. */
9600+struct unionfs_data {
9601+ struct super_block *sb; /* lower super_block */
9602+ atomic_t open_files; /* number of open files on branch */
9603+ int branchperms;
9604+ int branch_id; /* unique branch ID at re/mount time */
9605+};
9606+
9607+/* unionfs super-block data in memory */
9608+struct unionfs_sb_info {
9609+ int bend;
9610+
9611+ atomic_t generation;
9612+
9613+ /*
9614+ * This rwsem is used to make sure that a branch management
9615+ * operation...
9616+ * 1) will not begin before all currently in-flight operations
9617+ * complete.
9618+ * 2) any new operations do not execute until the currently
9619+ * running branch management operation completes.
9620+ *
9621+ * The write_lock_owner records the PID of the task which grabbed
9622+ * the rw_sem for writing. If the same task also tries to grab the
9623+ * read lock, we allow it. This prevents a self-deadlock when
9624+ * branch-management is used on a pivot_root'ed union, because we
9625+ * have to ->lookup paths which belong to the same union.
9626+ */
9627+ struct unionfs_rw_semaphore rwsem;
9628+ pid_t write_lock_owner; /* PID of rw_sem owner (write lock) */
9629+ int high_branch_id; /* last unique branch ID given */
9630+ char *dev_name; /* to identify different unions in pr_debug */
9631+ struct unionfs_data *data;
9632+};
9633+
9634+/*
9635+ * structure for making the linked list of entries by readdir on left branch
9636+ * to compare with entries on right branch
9637+ */
9638+struct filldir_node {
9639+ struct list_head file_list; /* list for directory entries */
9640+ char *name; /* name entry */
9641+ int hash; /* name hash */
9642+ int namelen; /* name len since name is not 0 terminated */
9643+
9644+ /*
9645+ * we can check for duplicate whiteouts and files in the same branch
9646+ * in order to return -EIO.
9647+ */
9648+ int bindex;
9649+
9650+ /* is this a whiteout entry? */
9651+ int whiteout;
9652+
9653+ /* Inline name, so we don't need to separately kmalloc small ones */
82260373 9654+ char iname[DNAME_INLINE_LEN];
2380c486
JR
9655+};
9656+
9657+/* Directory hash table. */
9658+struct unionfs_dir_state {
9659+ unsigned int cookie; /* the cookie, based off of rdversion */
9660+ unsigned int offset; /* The entry we have returned. */
9661+ int bindex;
9662+ loff_t dirpos; /* offset within the lower level directory */
9663+ int size; /* How big is the hash table? */
9664+ int hashentries; /* How many entries have been inserted? */
9665+ unsigned long access;
9666+
9667+ /* This cache list is used when the inode keeps us around. */
9668+ struct list_head cache;
9669+ struct list_head list[0];
9670+};
9671+
9672+/* externs needed for fanout.h or sioq.h */
9673+extern int unionfs_get_nlinks(const struct inode *inode);
9674+extern void unionfs_copy_attr_times(struct inode *upper);
9675+extern void unionfs_copy_attr_all(struct inode *dest, const struct inode *src);
9676+
9677+/* include miscellaneous macros */
9678+#include "fanout.h"
9679+#include "sioq.h"
9680+
9681+/* externs for cache creation/deletion routines */
9682+extern void unionfs_destroy_filldir_cache(void);
9683+extern int unionfs_init_filldir_cache(void);
9684+extern int unionfs_init_inode_cache(void);
9685+extern void unionfs_destroy_inode_cache(void);
9686+extern int unionfs_init_dentry_cache(void);
9687+extern void unionfs_destroy_dentry_cache(void);
9688+
9689+/* Initialize and free readdir-specific state. */
9690+extern int init_rdstate(struct file *file);
9691+extern struct unionfs_dir_state *alloc_rdstate(struct inode *inode,
9692+ int bindex);
9693+extern struct unionfs_dir_state *find_rdstate(struct inode *inode,
9694+ loff_t fpos);
9695+extern void free_rdstate(struct unionfs_dir_state *state);
9696+extern int add_filldir_node(struct unionfs_dir_state *rdstate,
9697+ const char *name, int namelen, int bindex,
9698+ int whiteout);
9699+extern struct filldir_node *find_filldir_node(struct unionfs_dir_state *rdstate,
9700+ const char *name, int namelen,
9701+ int is_whiteout);
9702+
9703+extern struct dentry **alloc_new_dentries(int objs);
9704+extern struct unionfs_data *alloc_new_data(int objs);
9705+
9706+/* We can only use 32-bits of offset for rdstate --- blech! */
9707+#define DIREOF (0xfffff)
9708+#define RDOFFBITS 20 /* This is the number of bits in DIREOF. */
9709+#define MAXRDCOOKIE (0xfff)
9710+/* Turn an rdstate into an offset. */
9711+static inline off_t rdstate2offset(struct unionfs_dir_state *buf)
9712+{
9713+ off_t tmp;
9714+
9715+ tmp = ((buf->cookie & MAXRDCOOKIE) << RDOFFBITS)
9716+ | (buf->offset & DIREOF);
9717+ return tmp;
9718+}
9719+
9720+/* Macros for locking a super_block. */
9721+enum unionfs_super_lock_class {
9722+ UNIONFS_SMUTEX_NORMAL,
9723+ UNIONFS_SMUTEX_PARENT, /* when locking on behalf of file */
9724+ UNIONFS_SMUTEX_CHILD, /* when locking on behalf of dentry */
9725+};
9726+static inline void unionfs_read_lock(struct super_block *sb, int subclass)
9727+{
9728+ if (UNIONFS_SB(sb)->write_lock_owner &&
9729+ UNIONFS_SB(sb)->write_lock_owner == current->pid)
9730+ return;
9731+ down_read_nested(&UNIONFS_SB(sb)->rwsem, subclass);
9732+}
9733+static inline void unionfs_read_unlock(struct super_block *sb)
9734+{
9735+ if (UNIONFS_SB(sb)->write_lock_owner &&
9736+ UNIONFS_SB(sb)->write_lock_owner == current->pid)
9737+ return;
9738+ up_read(&UNIONFS_SB(sb)->rwsem);
9739+}
9740+static inline void unionfs_write_lock(struct super_block *sb)
9741+{
9742+ down_write(&UNIONFS_SB(sb)->rwsem);
9743+ UNIONFS_SB(sb)->write_lock_owner = current->pid;
9744+}
9745+static inline void unionfs_write_unlock(struct super_block *sb)
9746+{
9747+ up_write(&UNIONFS_SB(sb)->rwsem);
9748+ UNIONFS_SB(sb)->write_lock_owner = 0;
9749+}
9750+
9751+static inline void unionfs_double_lock_dentry(struct dentry *d1,
9752+ struct dentry *d2)
9753+{
9754+ BUG_ON(d1 == d2);
9755+ if (d1 < d2) {
9756+ unionfs_lock_dentry(d1, UNIONFS_DMUTEX_PARENT);
9757+ unionfs_lock_dentry(d2, UNIONFS_DMUTEX_CHILD);
9758+ } else {
9759+ unionfs_lock_dentry(d2, UNIONFS_DMUTEX_PARENT);
9760+ unionfs_lock_dentry(d1, UNIONFS_DMUTEX_CHILD);
9761+ }
9762+}
9763+
9764+static inline void unionfs_double_unlock_dentry(struct dentry *d1,
9765+ struct dentry *d2)
9766+{
9767+ BUG_ON(d1 == d2);
9768+ if (d1 < d2) { /* unlock in reverse order than double_lock_dentry */
9769+ unionfs_unlock_dentry(d1);
9770+ unionfs_unlock_dentry(d2);
9771+ } else {
9772+ unionfs_unlock_dentry(d2);
9773+ unionfs_unlock_dentry(d1);
9774+ }
9775+}
9776+
9777+static inline void unionfs_double_lock_parents(struct dentry *p1,
9778+ struct dentry *p2)
9779+{
9780+ if (p1 == p2) {
9781+ unionfs_lock_dentry(p1, UNIONFS_DMUTEX_REVAL_PARENT);
9782+ return;
9783+ }
9784+ if (p1 < p2) {
9785+ unionfs_lock_dentry(p1, UNIONFS_DMUTEX_REVAL_PARENT);
9786+ unionfs_lock_dentry(p2, UNIONFS_DMUTEX_REVAL_CHILD);
9787+ } else {
9788+ unionfs_lock_dentry(p2, UNIONFS_DMUTEX_REVAL_PARENT);
9789+ unionfs_lock_dentry(p1, UNIONFS_DMUTEX_REVAL_CHILD);
9790+ }
9791+}
9792+
9793+static inline void unionfs_double_unlock_parents(struct dentry *p1,
9794+ struct dentry *p2)
9795+{
9796+ if (p1 == p2) {
9797+ unionfs_unlock_dentry(p1);
9798+ return;
9799+ }
9800+ if (p1 < p2) { /* unlock in reverse order of double_lock_parents */
9801+ unionfs_unlock_dentry(p1);
9802+ unionfs_unlock_dentry(p2);
9803+ } else {
9804+ unionfs_unlock_dentry(p2);
9805+ unionfs_unlock_dentry(p1);
9806+ }
9807+}
9808+
9809+extern int new_dentry_private_data(struct dentry *dentry, int subclass);
9810+extern int realloc_dentry_private_data(struct dentry *dentry);
9811+extern void free_dentry_private_data(struct dentry *dentry);
9812+extern void update_bstart(struct dentry *dentry);
9813+extern int init_lower_nd(struct nameidata *nd, unsigned int flags);
9814+extern void release_lower_nd(struct nameidata *nd, int err);
9815+
9816+/*
9817+ * EXTERNALS:
9818+ */
9819+
9820+/* replicates the directory structure up to given dentry in given branch */
9821+extern struct dentry *create_parents(struct inode *dir, struct dentry *dentry,
9822+ const char *name, int bindex);
9823+
9824+/* partial lookup */
9825+extern int unionfs_partial_lookup(struct dentry *dentry,
9826+ struct dentry *parent);
9827+extern struct dentry *unionfs_lookup_full(struct dentry *dentry,
9828+ struct dentry *parent,
9829+ int lookupmode);
9830+
9831+/* copies a file from dbstart to newbindex branch */
9832+extern int copyup_file(struct inode *dir, struct file *file, int bstart,
9833+ int newbindex, loff_t size);
9834+extern int copyup_named_file(struct inode *dir, struct file *file,
9835+ char *name, int bstart, int new_bindex,
9836+ loff_t len);
9837+/* copies a dentry from dbstart to newbindex branch */
9838+extern int copyup_dentry(struct inode *dir, struct dentry *dentry,
9839+ int bstart, int new_bindex, const char *name,
9840+ int namelen, struct file **copyup_file, loff_t len);
9841+/* helper functions for post-copyup actions */
9842+extern void unionfs_postcopyup_setmnt(struct dentry *dentry);
9843+extern void unionfs_postcopyup_release(struct dentry *dentry);
9844+
9845+/* Is this directory empty: 0 if it is empty, -ENOTEMPTY if not. */
9846+extern int check_empty(struct dentry *dentry, struct dentry *parent,
9847+ struct unionfs_dir_state **namelist);
9848+/* whiteout and opaque directory helpers */
9849+extern char *alloc_whname(const char *name, int len);
9850+extern bool is_whiteout_name(char **namep, int *namelenp);
9851+extern bool is_validname(const char *name);
9852+extern struct dentry *lookup_whiteout(const char *name,
9853+ struct dentry *lower_parent);
9854+extern struct dentry *find_first_whiteout(struct dentry *dentry);
9855+extern int unlink_whiteout(struct dentry *wh_dentry);
9856+extern int check_unlink_whiteout(struct dentry *dentry,
9857+ struct dentry *lower_dentry, int bindex);
9858+extern int create_whiteout(struct dentry *dentry, int start);
9859+extern int delete_whiteouts(struct dentry *dentry, int bindex,
9860+ struct unionfs_dir_state *namelist);
9861+extern int is_opaque_dir(struct dentry *dentry, int bindex);
9862+extern int make_dir_opaque(struct dentry *dir, int bindex);
9863+extern void unionfs_set_max_namelen(long *namelen);
9864+
9865+extern void unionfs_reinterpose(struct dentry *this_dentry);
9866+extern struct super_block *unionfs_duplicate_super(struct super_block *sb);
9867+
9868+/* Locking functions. */
9869+extern int unionfs_setlk(struct file *file, int cmd, struct file_lock *fl);
9870+extern int unionfs_getlk(struct file *file, struct file_lock *fl);
9871+
9872+/* Common file operations. */
9873+extern int unionfs_file_revalidate(struct file *file, struct dentry *parent,
9874+ bool willwrite);
9875+extern int unionfs_open(struct inode *inode, struct file *file);
9876+extern int unionfs_file_release(struct inode *inode, struct file *file);
9877+extern int unionfs_flush(struct file *file, fl_owner_t id);
9878+extern long unionfs_ioctl(struct file *file, unsigned int cmd,
9879+ unsigned long arg);
0c5527e5 9880+extern int unionfs_fsync(struct file *file, int datasync);
2380c486
JR
9881+extern int unionfs_fasync(int fd, struct file *file, int flag);
9882+
9883+/* Inode operations */
9884+extern struct inode *unionfs_iget(struct super_block *sb, unsigned long ino);
9885+extern int unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
9886+ struct inode *new_dir, struct dentry *new_dentry);
9887+extern int unionfs_unlink(struct inode *dir, struct dentry *dentry);
9888+extern int unionfs_rmdir(struct inode *dir, struct dentry *dentry);
9889+
9890+extern bool __unionfs_d_revalidate(struct dentry *dentry,
9891+ struct dentry *parent, bool willwrite);
9892+extern bool is_negative_lower(const struct dentry *dentry);
9893+extern bool is_newer_lower(const struct dentry *dentry);
9894+extern void purge_sb_data(struct super_block *sb);
9895+
9896+/* The values for unionfs_interpose's flag. */
9897+#define INTERPOSE_DEFAULT 0
9898+#define INTERPOSE_LOOKUP 1
9899+#define INTERPOSE_REVAL 2
9900+#define INTERPOSE_REVAL_NEG 3
9901+#define INTERPOSE_PARTIAL 4
9902+
9903+extern struct dentry *unionfs_interpose(struct dentry *this_dentry,
9904+ struct super_block *sb, int flag);
9905+
9906+#ifdef CONFIG_UNION_FS_XATTR
9907+/* Extended attribute functions. */
9908+extern void *unionfs_xattr_alloc(size_t size, size_t limit);
9909+static inline void unionfs_xattr_kfree(const void *p)
9910+{
9911+ kfree(p);
9912+}
9913+extern ssize_t unionfs_getxattr(struct dentry *dentry, const char *name,
9914+ void *value, size_t size);
9915+extern int unionfs_removexattr(struct dentry *dentry, const char *name);
9916+extern ssize_t unionfs_listxattr(struct dentry *dentry, char *list,
9917+ size_t size);
9918+extern int unionfs_setxattr(struct dentry *dentry, const char *name,
9919+ const void *value, size_t size, int flags);
9920+#endif /* CONFIG_UNION_FS_XATTR */
9921+
9922+/* The root directory is unhashed, but isn't deleted. */
9923+static inline int d_deleted(struct dentry *d)
9924+{
9925+ return d_unhashed(d) && (d != d->d_sb->s_root);
9926+}
9927+
9928+/* unionfs_permission, check if we should bypass error to facilitate copyup */
9929+#define IS_COPYUP_ERR(err) ((err) == -EROFS)
9930+
9931+/* unionfs_open, check if we need to copyup the file */
9932+#define OPEN_WRITE_FLAGS (O_WRONLY | O_RDWR | O_APPEND)
9933+#define IS_WRITE_FLAG(flag) ((flag) & OPEN_WRITE_FLAGS)
9934+
9935+static inline int branchperms(const struct super_block *sb, int index)
9936+{
9937+ BUG_ON(index < 0);
9938+ return UNIONFS_SB(sb)->data[index].branchperms;
9939+}
9940+
9941+static inline int set_branchperms(struct super_block *sb, int index, int perms)
9942+{
9943+ BUG_ON(index < 0);
9944+ UNIONFS_SB(sb)->data[index].branchperms = perms;
9945+ return perms;
9946+}
9947+
4ae1df7a
JR
9948+/* check if readonly lower inode, but possibly unlinked (no inode->i_sb) */
9949+static inline int __is_rdonly(const struct inode *inode)
9950+{
9951+ /* if unlinked, can't be readonly (?) */
9952+ if (!inode->i_sb)
9953+ return 0;
9954+ return IS_RDONLY(inode);
9955+
9956+}
2380c486
JR
9957+/* Is this file on a read-only branch? */
9958+static inline int is_robranch_super(const struct super_block *sb, int index)
9959+{
9960+ int ret;
9961+
9962+ ret = (!(branchperms(sb, index) & MAY_WRITE)) ? -EROFS : 0;
9963+ return ret;
9964+}
9965+
9966+/* Is this file on a read-only branch? */
9967+static inline int is_robranch_idx(const struct dentry *dentry, int index)
9968+{
9969+ struct super_block *lower_sb;
9970+
9971+ BUG_ON(index < 0);
9972+
9973+ if (!(branchperms(dentry->d_sb, index) & MAY_WRITE))
9974+ return -EROFS;
9975+
9976+ lower_sb = unionfs_lower_super_idx(dentry->d_sb, index);
9977+ BUG_ON(lower_sb == NULL);
9978+ /*
9979+ * test sb flags directly, not IS_RDONLY(lower_inode) because the
9980+ * lower_dentry could be a negative.
9981+ */
9982+ if (lower_sb->s_flags & MS_RDONLY)
9983+ return -EROFS;
9984+
9985+ return 0;
9986+}
9987+
9988+static inline int is_robranch(const struct dentry *dentry)
9989+{
9990+ int index;
9991+
9992+ index = UNIONFS_D(dentry)->bstart;
9993+ BUG_ON(index < 0);
9994+
9995+ return is_robranch_idx(dentry, index);
9996+}
9997+
9998+/*
9999+ * EXTERNALS:
10000+ */
10001+extern int check_branch(struct nameidata *nd);
10002+extern int parse_branch_mode(const char *name, int *perms);
10003+
10004+/* locking helpers */
10005+static inline struct dentry *lock_parent(struct dentry *dentry)
10006+{
10007+ struct dentry *dir = dget_parent(dentry);
10008+ mutex_lock_nested(&dir->d_inode->i_mutex, I_MUTEX_PARENT);
10009+ return dir;
10010+}
10011+static inline struct dentry *lock_parent_wh(struct dentry *dentry)
10012+{
10013+ struct dentry *dir = dget_parent(dentry);
10014+
10015+ mutex_lock_nested(&dir->d_inode->i_mutex, UNIONFS_DMUTEX_WHITEOUT);
10016+ return dir;
10017+}
10018+
10019+static inline void unlock_dir(struct dentry *dir)
10020+{
10021+ mutex_unlock(&dir->d_inode->i_mutex);
10022+ dput(dir);
10023+}
10024+
4ae1df7a
JR
10025+/* lock base inode mutex before calling lookup_one_len */
10026+static inline struct dentry *lookup_lck_len(const char *name,
10027+ struct dentry *base, int len)
10028+{
10029+ struct dentry *d;
10030+ mutex_lock(&base->d_inode->i_mutex);
10031+ d = lookup_one_len(name, base, len);
10032+ mutex_unlock(&base->d_inode->i_mutex);
10033+ return d;
10034+}
10035+
2380c486
JR
10036+static inline struct vfsmount *unionfs_mntget(struct dentry *dentry,
10037+ int bindex)
10038+{
10039+ struct vfsmount *mnt;
10040+
10041+ BUG_ON(!dentry || bindex < 0);
10042+
10043+ mnt = mntget(unionfs_lower_mnt_idx(dentry, bindex));
10044+#ifdef CONFIG_UNION_FS_DEBUG
10045+ if (!mnt)
10046+ pr_debug("unionfs: mntget: mnt=%p bindex=%d\n",
10047+ mnt, bindex);
10048+#endif /* CONFIG_UNION_FS_DEBUG */
10049+
10050+ return mnt;
10051+}
10052+
10053+static inline void unionfs_mntput(struct dentry *dentry, int bindex)
10054+{
10055+ struct vfsmount *mnt;
10056+
10057+ if (!dentry && bindex < 0)
10058+ return;
10059+ BUG_ON(!dentry || bindex < 0);
10060+
10061+ mnt = unionfs_lower_mnt_idx(dentry, bindex);
10062+#ifdef CONFIG_UNION_FS_DEBUG
10063+ /*
10064+ * Directories can have NULL lower objects in between start/end, but
10065+ * NOT if at the start/end range. We cannot verify that this dentry
10066+ * is a type=DIR, because it may already be a negative dentry. But
10067+ * if dbstart is greater than dbend, we know that this couldn't have
10068+ * been a regular file: it had to have been a directory.
10069+ */
10070+ if (!mnt && !(bindex > dbstart(dentry) && bindex < dbend(dentry)))
10071+ pr_debug("unionfs: mntput: mnt=%p bindex=%d\n", mnt, bindex);
10072+#endif /* CONFIG_UNION_FS_DEBUG */
10073+ mntput(mnt);
10074+}
10075+
10076+#ifdef CONFIG_UNION_FS_DEBUG
10077+
10078+/* useful for tracking code reachability */
10079+#define UDBG pr_debug("DBG:%s:%s:%d\n", __FILE__, __func__, __LINE__)
10080+
10081+#define unionfs_check_inode(i) __unionfs_check_inode((i), \
10082+ __FILE__, __func__, __LINE__)
10083+#define unionfs_check_dentry(d) __unionfs_check_dentry((d), \
10084+ __FILE__, __func__, __LINE__)
10085+#define unionfs_check_file(f) __unionfs_check_file((f), \
10086+ __FILE__, __func__, __LINE__)
10087+#define unionfs_check_nd(n) __unionfs_check_nd((n), \
10088+ __FILE__, __func__, __LINE__)
10089+#define show_branch_counts(sb) __show_branch_counts((sb), \
10090+ __FILE__, __func__, __LINE__)
10091+#define show_inode_times(i) __show_inode_times((i), \
10092+ __FILE__, __func__, __LINE__)
10093+#define show_dinode_times(d) __show_dinode_times((d), \
10094+ __FILE__, __func__, __LINE__)
10095+#define show_inode_counts(i) __show_inode_counts((i), \
10096+ __FILE__, __func__, __LINE__)
10097+
10098+extern void __unionfs_check_inode(const struct inode *inode, const char *fname,
10099+ const char *fxn, int line);
10100+extern void __unionfs_check_dentry(const struct dentry *dentry,
10101+ const char *fname, const char *fxn,
10102+ int line);
10103+extern void __unionfs_check_file(const struct file *file,
10104+ const char *fname, const char *fxn, int line);
10105+extern void __unionfs_check_nd(const struct nameidata *nd,
10106+ const char *fname, const char *fxn, int line);
10107+extern void __show_branch_counts(const struct super_block *sb,
10108+ const char *file, const char *fxn, int line);
10109+extern void __show_inode_times(const struct inode *inode,
10110+ const char *file, const char *fxn, int line);
10111+extern void __show_dinode_times(const struct dentry *dentry,
10112+ const char *file, const char *fxn, int line);
10113+extern void __show_inode_counts(const struct inode *inode,
10114+ const char *file, const char *fxn, int line);
10115+
10116+#else /* not CONFIG_UNION_FS_DEBUG */
10117+
10118+/* we leave useful hooks for these check functions throughout the code */
10119+#define unionfs_check_inode(i) do { } while (0)
10120+#define unionfs_check_dentry(d) do { } while (0)
10121+#define unionfs_check_file(f) do { } while (0)
10122+#define unionfs_check_nd(n) do { } while (0)
10123+#define show_branch_counts(sb) do { } while (0)
10124+#define show_inode_times(i) do { } while (0)
10125+#define show_dinode_times(d) do { } while (0)
10126+#define show_inode_counts(i) do { } while (0)
10127+
10128+#endif /* not CONFIG_UNION_FS_DEBUG */
10129+
10130+#endif /* not _UNION_H_ */
0c5527e5
AM
10131diff --git a/fs/unionfs/unlink.c b/fs/unionfs/unlink.c
10132new file mode 100644
10133index 0000000..542c513
10134--- /dev/null
10135+++ b/fs/unionfs/unlink.c
7670a7fc 10136@@ -0,0 +1,278 @@
2380c486 10137+/*
7670a7fc 10138+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
10139+ * Copyright (c) 2003-2006 Charles P. Wright
10140+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
10141+ * Copyright (c) 2005-2006 Junjiro Okajima
10142+ * Copyright (c) 2005 Arun M. Krishnakumar
10143+ * Copyright (c) 2004-2006 David P. Quigley
10144+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
10145+ * Copyright (c) 2003 Puja Gupta
10146+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
10147+ * Copyright (c) 2003-2010 Stony Brook University
10148+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
10149+ *
10150+ * This program is free software; you can redistribute it and/or modify
10151+ * it under the terms of the GNU General Public License version 2 as
10152+ * published by the Free Software Foundation.
10153+ */
10154+
10155+#include "union.h"
10156+
10157+/*
10158+ * Helper function for Unionfs's unlink operation.
10159+ *
10160+ * The main goal of this function is to optimize the unlinking of non-dir
10161+ * objects in unionfs by deleting all possible lower inode objects from the
10162+ * underlying branches having same dentry name as the non-dir dentry on
10163+ * which this unlink operation is called. This way we delete as many lower
10164+ * inodes as possible, and save space. Whiteouts need to be created in
10165+ * branch0 only if unlinking fails on any of the lower branch other than
10166+ * branch0, or if a lower branch is marked read-only.
10167+ *
10168+ * Also, while unlinking a file, if we encounter any dir type entry in any
10169+ * intermediate branch, then we remove the directory by calling vfs_rmdir.
10170+ * The following special cases are also handled:
10171+
10172+ * (1) If an error occurs in branch0 during vfs_unlink, then we return
10173+ * appropriate error.
10174+ *
10175+ * (2) If we get an error during unlink in any of other lower branch other
10176+ * than branch0, then we create a whiteout in branch0.
10177+ *
10178+ * (3) If a whiteout already exists in any intermediate branch, we delete
10179+ * all possible inodes only up to that branch (this is an "opaqueness"
10180+ * as as per Documentation/filesystems/unionfs/concepts.txt).
10181+ *
10182+ */
10183+static int unionfs_unlink_whiteout(struct inode *dir, struct dentry *dentry,
10184+ struct dentry *parent)
10185+{
10186+ struct dentry *lower_dentry;
10187+ struct dentry *lower_dir_dentry;
10188+ int bindex;
10189+ int err = 0;
10190+
10191+ err = unionfs_partial_lookup(dentry, parent);
10192+ if (err)
10193+ goto out;
10194+
10195+ /* trying to unlink all possible valid instances */
10196+ for (bindex = dbstart(dentry); bindex <= dbend(dentry); bindex++) {
10197+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10198+ if (!lower_dentry || !lower_dentry->d_inode)
10199+ continue;
10200+
10201+ lower_dir_dentry = lock_parent(lower_dentry);
10202+
10203+ /* avoid destroying the lower inode if the object is in use */
10204+ dget(lower_dentry);
10205+ err = is_robranch_super(dentry->d_sb, bindex);
10206+ if (!err) {
10207+ /* see Documentation/filesystems/unionfs/issues.txt */
10208+ lockdep_off();
10209+ if (!S_ISDIR(lower_dentry->d_inode->i_mode))
10210+ err = vfs_unlink(lower_dir_dentry->d_inode,
10211+ lower_dentry);
10212+ else
10213+ err = vfs_rmdir(lower_dir_dentry->d_inode,
10214+ lower_dentry);
10215+ lockdep_on();
10216+ }
10217+
10218+ /* if lower object deletion succeeds, update inode's times */
10219+ if (!err)
10220+ unionfs_copy_attr_times(dentry->d_inode);
10221+ dput(lower_dentry);
10222+ fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
10223+ unlock_dir(lower_dir_dentry);
10224+
10225+ if (err)
10226+ break;
10227+ }
10228+
10229+ /*
10230+ * Create the whiteout in branch 0 (highest priority) only if (a)
10231+ * there was an error in any intermediate branch other than branch 0
10232+ * due to failure of vfs_unlink/vfs_rmdir or (b) a branch marked or
10233+ * mounted read-only.
10234+ */
10235+ if (err) {
10236+ if ((bindex == 0) ||
10237+ ((bindex == dbstart(dentry)) &&
10238+ (!IS_COPYUP_ERR(err))))
10239+ goto out;
10240+ else {
10241+ if (!IS_COPYUP_ERR(err))
10242+ pr_debug("unionfs: lower object deletion "
10243+ "failed in branch:%d\n", bindex);
10244+ err = create_whiteout(dentry, sbstart(dentry->d_sb));
10245+ }
10246+ }
10247+
10248+out:
10249+ if (!err)
10250+ inode_dec_link_count(dentry->d_inode);
10251+
10252+ /* We don't want to leave negative leftover dentries for revalidate. */
10253+ if (!err && (dbopaque(dentry) != -1))
10254+ update_bstart(dentry);
10255+
10256+ return err;
10257+}
10258+
10259+int unionfs_unlink(struct inode *dir, struct dentry *dentry)
10260+{
10261+ int err = 0;
10262+ struct inode *inode = dentry->d_inode;
10263+ struct dentry *parent;
10264+ int valid;
10265+
10266+ BUG_ON(S_ISDIR(inode->i_mode));
10267+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
10268+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
10269+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
10270+
10271+ valid = __unionfs_d_revalidate(dentry, parent, false);
10272+ if (unlikely(!valid)) {
10273+ err = -ESTALE;
10274+ goto out;
10275+ }
10276+ unionfs_check_dentry(dentry);
10277+
10278+ err = unionfs_unlink_whiteout(dir, dentry, parent);
10279+ /* call d_drop so the system "forgets" about us */
10280+ if (!err) {
10281+ unionfs_postcopyup_release(dentry);
10282+ unionfs_postcopyup_setmnt(parent);
10283+ if (inode->i_nlink == 0) /* drop lower inodes */
10284+ iput_lowers_all(inode, false);
10285+ d_drop(dentry);
10286+ /*
10287+ * if unlink/whiteout succeeded, parent dir mtime has
10288+ * changed
10289+ */
10290+ unionfs_copy_attr_times(dir);
10291+ }
10292+
10293+out:
10294+ if (!err) {
10295+ unionfs_check_dentry(dentry);
10296+ unionfs_check_inode(dir);
10297+ }
10298+ unionfs_unlock_dentry(dentry);
10299+ unionfs_unlock_parent(dentry, parent);
10300+ unionfs_read_unlock(dentry->d_sb);
10301+ return err;
10302+}
10303+
10304+static int unionfs_rmdir_first(struct inode *dir, struct dentry *dentry,
10305+ struct unionfs_dir_state *namelist)
10306+{
10307+ int err;
10308+ struct dentry *lower_dentry;
10309+ struct dentry *lower_dir_dentry = NULL;
10310+
10311+ /* Here we need to remove whiteout entries. */
10312+ err = delete_whiteouts(dentry, dbstart(dentry), namelist);
10313+ if (err)
10314+ goto out;
10315+
10316+ lower_dentry = unionfs_lower_dentry(dentry);
10317+
10318+ lower_dir_dentry = lock_parent(lower_dentry);
10319+
10320+ /* avoid destroying the lower inode if the file is in use */
10321+ dget(lower_dentry);
10322+ err = is_robranch(dentry);
7670a7fc 10323+ if (!err)
2380c486 10324+ err = vfs_rmdir(lower_dir_dentry->d_inode, lower_dentry);
2380c486
JR
10325+ dput(lower_dentry);
10326+
10327+ fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
10328+ /* propagate number of hard-links */
10329+ dentry->d_inode->i_nlink = unionfs_get_nlinks(dentry->d_inode);
10330+
10331+out:
10332+ if (lower_dir_dentry)
10333+ unlock_dir(lower_dir_dentry);
10334+ return err;
10335+}
10336+
10337+int unionfs_rmdir(struct inode *dir, struct dentry *dentry)
10338+{
10339+ int err = 0;
10340+ struct unionfs_dir_state *namelist = NULL;
10341+ struct dentry *parent;
10342+ int dstart, dend;
10343+ bool valid;
10344+
10345+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
10346+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
10347+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
10348+
10349+ valid = __unionfs_d_revalidate(dentry, parent, false);
10350+ if (unlikely(!valid)) {
10351+ err = -ESTALE;
10352+ goto out;
10353+ }
10354+ unionfs_check_dentry(dentry);
10355+
10356+ /* check if this unionfs directory is empty or not */
10357+ err = check_empty(dentry, parent, &namelist);
10358+ if (err)
10359+ goto out;
10360+
10361+ err = unionfs_rmdir_first(dir, dentry, namelist);
10362+ dstart = dbstart(dentry);
10363+ dend = dbend(dentry);
10364+ /*
10365+ * We create a whiteout for the directory if there was an error to
10366+ * rmdir the first directory entry in the union. Otherwise, we
10367+ * create a whiteout only if there is no chance that a lower
10368+ * priority branch might also have the same named directory. IOW,
10369+ * if there is not another same-named directory at a lower priority
10370+ * branch, then we don't need to create a whiteout for it.
10371+ */
10372+ if (!err) {
10373+ if (dstart < dend)
10374+ err = create_whiteout(dentry, dstart);
10375+ } else {
10376+ int new_err;
10377+
10378+ if (dstart == 0)
10379+ goto out;
10380+
10381+ /* exit if the error returned was NOT -EROFS */
10382+ if (!IS_COPYUP_ERR(err))
10383+ goto out;
10384+
10385+ new_err = create_whiteout(dentry, dstart - 1);
10386+ if (new_err != -EEXIST)
10387+ err = new_err;
10388+ }
10389+
10390+out:
10391+ /*
10392+ * Drop references to lower dentry/inode so storage space for them
10393+ * can be reclaimed. Then, call d_drop so the system "forgets"
10394+ * about us.
10395+ */
10396+ if (!err) {
10397+ iput_lowers_all(dentry->d_inode, false);
10398+ dput(unionfs_lower_dentry_idx(dentry, dstart));
10399+ unionfs_set_lower_dentry_idx(dentry, dstart, NULL);
10400+ d_drop(dentry);
10401+ /* update our lower vfsmnts, in case a copyup took place */
10402+ unionfs_postcopyup_setmnt(dentry);
10403+ unionfs_check_dentry(dentry);
10404+ unionfs_check_inode(dir);
10405+ }
10406+
10407+ if (namelist)
10408+ free_rdstate(namelist);
10409+
10410+ unionfs_unlock_dentry(dentry);
10411+ unionfs_unlock_parent(dentry, parent);
10412+ unionfs_read_unlock(dentry->d_sb);
10413+ return err;
10414+}
0c5527e5
AM
10415diff --git a/fs/unionfs/whiteout.c b/fs/unionfs/whiteout.c
10416new file mode 100644
10417index 0000000..405073a
10418--- /dev/null
10419+++ b/fs/unionfs/whiteout.c
2380c486
JR
10420@@ -0,0 +1,584 @@
10421+/*
7670a7fc 10422+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
10423+ * Copyright (c) 2003-2006 Charles P. Wright
10424+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
10425+ * Copyright (c) 2005-2006 Junjiro Okajima
10426+ * Copyright (c) 2005 Arun M. Krishnakumar
10427+ * Copyright (c) 2004-2006 David P. Quigley
10428+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
10429+ * Copyright (c) 2003 Puja Gupta
10430+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
10431+ * Copyright (c) 2003-2010 Stony Brook University
10432+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
10433+ *
10434+ * This program is free software; you can redistribute it and/or modify
10435+ * it under the terms of the GNU General Public License version 2 as
10436+ * published by the Free Software Foundation.
10437+ */
10438+
10439+#include "union.h"
10440+
10441+/*
10442+ * whiteout and opaque directory helpers
10443+ */
10444+
10445+/* What do we use for whiteouts. */
10446+#define UNIONFS_WHPFX ".wh."
10447+#define UNIONFS_WHLEN 4
10448+/*
10449+ * If a directory contains this file, then it is opaque. We start with the
10450+ * .wh. flag so that it is blocked by lookup.
10451+ */
10452+#define UNIONFS_DIR_OPAQUE_NAME "__dir_opaque"
10453+#define UNIONFS_DIR_OPAQUE UNIONFS_WHPFX UNIONFS_DIR_OPAQUE_NAME
10454+
10455+/* construct whiteout filename */
10456+char *alloc_whname(const char *name, int len)
10457+{
10458+ char *buf;
10459+
10460+ buf = kmalloc(len + UNIONFS_WHLEN + 1, GFP_KERNEL);
10461+ if (unlikely(!buf))
10462+ return ERR_PTR(-ENOMEM);
10463+
10464+ strcpy(buf, UNIONFS_WHPFX);
10465+ strlcat(buf, name, len + UNIONFS_WHLEN + 1);
10466+
10467+ return buf;
10468+}
10469+
10470+/*
10471+ * XXX: this can be inline or CPP macro, but is here to keep all whiteout
10472+ * code in one place.
10473+ */
10474+void unionfs_set_max_namelen(long *namelen)
10475+{
10476+ *namelen -= UNIONFS_WHLEN;
10477+}
10478+
10479+/* check if @namep is a whiteout, update @namep and @namelenp accordingly */
10480+bool is_whiteout_name(char **namep, int *namelenp)
10481+{
10482+ if (*namelenp > UNIONFS_WHLEN &&
10483+ !strncmp(*namep, UNIONFS_WHPFX, UNIONFS_WHLEN)) {
10484+ *namep += UNIONFS_WHLEN;
10485+ *namelenp -= UNIONFS_WHLEN;
10486+ return true;
10487+ }
10488+ return false;
10489+}
10490+
10491+/* is the filename valid == !(whiteout for a file or opaque dir marker) */
10492+bool is_validname(const char *name)
10493+{
10494+ if (!strncmp(name, UNIONFS_WHPFX, UNIONFS_WHLEN))
10495+ return false;
10496+ if (!strncmp(name, UNIONFS_DIR_OPAQUE_NAME,
10497+ sizeof(UNIONFS_DIR_OPAQUE_NAME) - 1))
10498+ return false;
10499+ return true;
10500+}
10501+
10502+/*
10503+ * Look for a whiteout @name in @lower_parent directory. If error, return
10504+ * ERR_PTR. Caller must dput() the returned dentry if not an error.
10505+ *
10506+ * XXX: some callers can reuse the whname allocated buffer to avoid repeated
10507+ * free then re-malloc calls. Need to provide a different API for those
10508+ * callers.
10509+ */
10510+struct dentry *lookup_whiteout(const char *name, struct dentry *lower_parent)
10511+{
10512+ char *whname = NULL;
10513+ int err = 0, namelen;
10514+ struct dentry *wh_dentry = NULL;
10515+
10516+ namelen = strlen(name);
10517+ whname = alloc_whname(name, namelen);
10518+ if (unlikely(IS_ERR(whname))) {
10519+ err = PTR_ERR(whname);
10520+ goto out;
10521+ }
10522+
10523+ /* check if whiteout exists in this branch: lookup .wh.foo */
4ae1df7a 10524+ wh_dentry = lookup_lck_len(whname, lower_parent, strlen(whname));
2380c486
JR
10525+ if (IS_ERR(wh_dentry)) {
10526+ err = PTR_ERR(wh_dentry);
10527+ goto out;
10528+ }
10529+
10530+ /* check if negative dentry (ENOENT) */
10531+ if (!wh_dentry->d_inode)
10532+ goto out;
10533+
10534+ /* whiteout found: check if valid type */
10535+ if (!S_ISREG(wh_dentry->d_inode->i_mode)) {
10536+ printk(KERN_ERR "unionfs: invalid whiteout %s entry type %d\n",
10537+ whname, wh_dentry->d_inode->i_mode);
10538+ dput(wh_dentry);
10539+ err = -EIO;
10540+ goto out;
10541+ }
10542+
10543+out:
10544+ kfree(whname);
10545+ if (err)
10546+ wh_dentry = ERR_PTR(err);
10547+ return wh_dentry;
10548+}
10549+
10550+/* find and return first whiteout in parent directory, else ENOENT */
10551+struct dentry *find_first_whiteout(struct dentry *dentry)
10552+{
10553+ int bindex, bstart, bend;
10554+ struct dentry *parent, *lower_parent, *wh_dentry;
10555+
10556+ parent = dget_parent(dentry);
10557+
10558+ bstart = dbstart(parent);
10559+ bend = dbend(parent);
10560+ wh_dentry = ERR_PTR(-ENOENT);
10561+
10562+ for (bindex = bstart; bindex <= bend; bindex++) {
10563+ lower_parent = unionfs_lower_dentry_idx(parent, bindex);
10564+ if (!lower_parent)
10565+ continue;
10566+ wh_dentry = lookup_whiteout(dentry->d_name.name, lower_parent);
10567+ if (IS_ERR(wh_dentry))
10568+ continue;
10569+ if (wh_dentry->d_inode)
10570+ break;
10571+ dput(wh_dentry);
10572+ wh_dentry = ERR_PTR(-ENOENT);
10573+ }
10574+
10575+ dput(parent);
10576+
10577+ return wh_dentry;
10578+}
10579+
10580+/*
10581+ * Unlink a whiteout dentry. Returns 0 or -errno. Caller must hold and
10582+ * release dentry reference.
10583+ */
10584+int unlink_whiteout(struct dentry *wh_dentry)
10585+{
10586+ int err;
10587+ struct dentry *lower_dir_dentry;
10588+
10589+ /* dget and lock parent dentry */
10590+ lower_dir_dentry = lock_parent_wh(wh_dentry);
10591+
10592+ /* see Documentation/filesystems/unionfs/issues.txt */
10593+ lockdep_off();
10594+ err = vfs_unlink(lower_dir_dentry->d_inode, wh_dentry);
10595+ lockdep_on();
10596+ unlock_dir(lower_dir_dentry);
10597+
10598+ /*
10599+ * Whiteouts are special files and should be deleted no matter what
10600+ * (as if they never existed), in order to allow this create
10601+ * operation to succeed. This is especially important in sticky
10602+ * directories: a whiteout may have been created by one user, but
10603+ * the newly created file may be created by another user.
10604+ * Therefore, in order to maintain Unix semantics, if the vfs_unlink
10605+ * above failed, then we have to try to directly unlink the
10606+ * whiteout. Note: in the ODF version of unionfs, whiteout are
10607+ * handled much more cleanly.
10608+ */
10609+ if (err == -EPERM) {
10610+ struct inode *inode = lower_dir_dentry->d_inode;
10611+ err = inode->i_op->unlink(inode, wh_dentry);
10612+ }
10613+ if (err)
10614+ printk(KERN_ERR "unionfs: could not unlink whiteout %s, "
10615+ "err = %d\n", wh_dentry->d_name.name, err);
10616+
10617+ return err;
10618+
10619+}
10620+
10621+/*
10622+ * Helper function when creating new objects (create, symlink, mknod, etc.).
10623+ * Checks to see if there's a whiteout in @lower_dentry's parent directory,
10624+ * whose name is taken from @dentry. Then tries to remove that whiteout, if
10625+ * found. If <dentry,bindex> is a branch marked readonly, return -EROFS.
10626+ * If it finds both a regular file and a whiteout, return -EIO (this should
10627+ * never happen).
10628+ *
10629+ * Return 0 if no whiteout was found. Return 1 if one was found and
10630+ * successfully removed. Therefore a value >= 0 tells the caller that
10631+ * @lower_dentry belongs to a good branch to create the new object in).
10632+ * Return -ERRNO if an error occurred during whiteout lookup or in trying to
10633+ * unlink the whiteout.
10634+ */
10635+int check_unlink_whiteout(struct dentry *dentry, struct dentry *lower_dentry,
10636+ int bindex)
10637+{
10638+ int err;
10639+ struct dentry *wh_dentry = NULL;
10640+ struct dentry *lower_dir_dentry = NULL;
10641+
10642+ /* look for whiteout dentry first */
10643+ lower_dir_dentry = dget_parent(lower_dentry);
10644+ wh_dentry = lookup_whiteout(dentry->d_name.name, lower_dir_dentry);
10645+ dput(lower_dir_dentry);
10646+ if (IS_ERR(wh_dentry)) {
10647+ err = PTR_ERR(wh_dentry);
10648+ goto out;
10649+ }
10650+
10651+ if (!wh_dentry->d_inode) { /* no whiteout exists*/
10652+ err = 0;
10653+ goto out_dput;
10654+ }
10655+
10656+ /* check if regular file and whiteout were both found */
10657+ if (unlikely(lower_dentry->d_inode)) {
10658+ err = -EIO;
10659+ printk(KERN_ERR "unionfs: found both whiteout and regular "
10660+ "file in directory %s (branch %d)\n",
10661+ lower_dir_dentry->d_name.name, bindex);
10662+ goto out_dput;
10663+ }
10664+
10665+ /* check if branch is writeable */
10666+ err = is_robranch_super(dentry->d_sb, bindex);
10667+ if (err)
10668+ goto out_dput;
10669+
10670+ /* .wh.foo has been found, so let's unlink it */
10671+ err = unlink_whiteout(wh_dentry);
10672+ if (!err)
10673+ err = 1; /* a whiteout was found and successfully removed */
10674+out_dput:
10675+ dput(wh_dentry);
10676+out:
10677+ return err;
10678+}
10679+
10680+/*
10681+ * Pass an unionfs dentry and an index. It will try to create a whiteout
10682+ * for the filename in dentry, and will try in branch 'index'. On error,
10683+ * it will proceed to a branch to the left.
10684+ */
10685+int create_whiteout(struct dentry *dentry, int start)
10686+{
10687+ int bstart, bend, bindex;
10688+ struct dentry *lower_dir_dentry;
10689+ struct dentry *lower_dentry;
10690+ struct dentry *lower_wh_dentry;
10691+ struct nameidata nd;
10692+ char *name = NULL;
10693+ int err = -EINVAL;
10694+
10695+ verify_locked(dentry);
10696+
10697+ bstart = dbstart(dentry);
10698+ bend = dbend(dentry);
10699+
10700+ /* create dentry's whiteout equivalent */
10701+ name = alloc_whname(dentry->d_name.name, dentry->d_name.len);
10702+ if (unlikely(IS_ERR(name))) {
10703+ err = PTR_ERR(name);
10704+ goto out;
10705+ }
10706+
10707+ for (bindex = start; bindex >= 0; bindex--) {
10708+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10709+
10710+ if (!lower_dentry) {
10711+ /*
10712+ * if lower dentry is not present, create the
10713+ * entire lower dentry directory structure and go
10714+ * ahead. Since we want to just create whiteout, we
10715+ * only want the parent dentry, and hence get rid of
10716+ * this dentry.
10717+ */
10718+ lower_dentry = create_parents(dentry->d_inode,
10719+ dentry,
10720+ dentry->d_name.name,
10721+ bindex);
10722+ if (!lower_dentry || IS_ERR(lower_dentry)) {
10723+ int ret = PTR_ERR(lower_dentry);
10724+ if (!IS_COPYUP_ERR(ret))
10725+ printk(KERN_ERR
10726+ "unionfs: create_parents for "
10727+ "whiteout failed: bindex=%d "
10728+ "err=%d\n", bindex, ret);
10729+ continue;
10730+ }
10731+ }
10732+
10733+ lower_wh_dentry =
4ae1df7a 10734+ lookup_lck_len(name, lower_dentry->d_parent,
2380c486
JR
10735+ dentry->d_name.len + UNIONFS_WHLEN);
10736+ if (IS_ERR(lower_wh_dentry))
10737+ continue;
10738+
10739+ /*
10740+ * The whiteout already exists. This used to be impossible,
10741+ * but now is possible because of opaqueness.
10742+ */
10743+ if (lower_wh_dentry->d_inode) {
10744+ dput(lower_wh_dentry);
10745+ err = 0;
10746+ goto out;
10747+ }
10748+
10749+ err = init_lower_nd(&nd, LOOKUP_CREATE);
10750+ if (unlikely(err < 0))
10751+ goto out;
10752+ lower_dir_dentry = lock_parent_wh(lower_wh_dentry);
10753+ err = is_robranch_super(dentry->d_sb, bindex);
10754+ if (!err)
10755+ err = vfs_create(lower_dir_dentry->d_inode,
10756+ lower_wh_dentry,
4ae1df7a 10757+ current_umask() & S_IRUGO,
2380c486
JR
10758+ &nd);
10759+ unlock_dir(lower_dir_dentry);
10760+ dput(lower_wh_dentry);
10761+ release_lower_nd(&nd, err);
10762+
10763+ if (!err || !IS_COPYUP_ERR(err))
10764+ break;
10765+ }
10766+
10767+ /* set dbopaque so that lookup will not proceed after this branch */
10768+ if (!err)
10769+ dbopaque(dentry) = bindex;
10770+
10771+out:
10772+ kfree(name);
10773+ return err;
10774+}
10775+
10776+/*
10777+ * Delete all of the whiteouts in a given directory for rmdir.
10778+ *
10779+ * lower directory inode should be locked
10780+ */
10781+static int do_delete_whiteouts(struct dentry *dentry, int bindex,
10782+ struct unionfs_dir_state *namelist)
10783+{
10784+ int err = 0;
10785+ struct dentry *lower_dir_dentry = NULL;
10786+ struct dentry *lower_dentry;
10787+ char *name = NULL, *p;
10788+ struct inode *lower_dir;
10789+ int i;
10790+ struct list_head *pos;
10791+ struct filldir_node *cursor;
10792+
10793+ /* Find out lower parent dentry */
10794+ lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10795+ BUG_ON(!S_ISDIR(lower_dir_dentry->d_inode->i_mode));
10796+ lower_dir = lower_dir_dentry->d_inode;
10797+ BUG_ON(!S_ISDIR(lower_dir->i_mode));
10798+
10799+ err = -ENOMEM;
10800+ name = __getname();
10801+ if (unlikely(!name))
10802+ goto out;
10803+ strcpy(name, UNIONFS_WHPFX);
10804+ p = name + UNIONFS_WHLEN;
10805+
10806+ err = 0;
10807+ for (i = 0; !err && i < namelist->size; i++) {
10808+ list_for_each(pos, &namelist->list[i]) {
10809+ cursor =
10810+ list_entry(pos, struct filldir_node,
10811+ file_list);
10812+ /* Only operate on whiteouts in this branch. */
10813+ if (cursor->bindex != bindex)
10814+ continue;
10815+ if (!cursor->whiteout)
10816+ continue;
10817+
10818+ strlcpy(p, cursor->name, PATH_MAX - UNIONFS_WHLEN);
10819+ lower_dentry =
4ae1df7a 10820+ lookup_lck_len(name, lower_dir_dentry,
2380c486
JR
10821+ cursor->namelen +
10822+ UNIONFS_WHLEN);
10823+ if (IS_ERR(lower_dentry)) {
10824+ err = PTR_ERR(lower_dentry);
10825+ break;
10826+ }
10827+ if (lower_dentry->d_inode)
10828+ err = vfs_unlink(lower_dir, lower_dentry);
10829+ dput(lower_dentry);
10830+ if (err)
10831+ break;
10832+ }
10833+ }
10834+
10835+ __putname(name);
10836+
10837+ /* After all of the removals, we should copy the attributes once. */
10838+ fsstack_copy_attr_times(dentry->d_inode, lower_dir_dentry->d_inode);
10839+
10840+out:
10841+ return err;
10842+}
10843+
10844+
10845+void __delete_whiteouts(struct work_struct *work)
10846+{
10847+ struct sioq_args *args = container_of(work, struct sioq_args, work);
10848+ struct deletewh_args *d = &args->deletewh;
10849+
10850+ args->err = do_delete_whiteouts(d->dentry, d->bindex, d->namelist);
10851+ complete(&args->comp);
10852+}
10853+
10854+/* delete whiteouts in a dir (for rmdir operation) using sioq if necessary */
10855+int delete_whiteouts(struct dentry *dentry, int bindex,
10856+ struct unionfs_dir_state *namelist)
10857+{
10858+ int err;
10859+ struct super_block *sb;
10860+ struct dentry *lower_dir_dentry;
10861+ struct inode *lower_dir;
10862+ struct sioq_args args;
10863+
10864+ sb = dentry->d_sb;
10865+
10866+ BUG_ON(!S_ISDIR(dentry->d_inode->i_mode));
10867+ BUG_ON(bindex < dbstart(dentry));
10868+ BUG_ON(bindex > dbend(dentry));
10869+ err = is_robranch_super(sb, bindex);
10870+ if (err)
10871+ goto out;
10872+
10873+ lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10874+ BUG_ON(!S_ISDIR(lower_dir_dentry->d_inode->i_mode));
10875+ lower_dir = lower_dir_dentry->d_inode;
10876+ BUG_ON(!S_ISDIR(lower_dir->i_mode));
10877+
10878+ if (!inode_permission(lower_dir, MAY_WRITE | MAY_EXEC)) {
10879+ err = do_delete_whiteouts(dentry, bindex, namelist);
10880+ } else {
10881+ args.deletewh.namelist = namelist;
10882+ args.deletewh.dentry = dentry;
10883+ args.deletewh.bindex = bindex;
10884+ run_sioq(__delete_whiteouts, &args);
10885+ err = args.err;
10886+ }
10887+
10888+out:
10889+ return err;
10890+}
10891+
10892+/****************************************************************************
10893+ * Opaque directory helpers *
10894+ ****************************************************************************/
10895+
10896+/*
10897+ * is_opaque_dir: returns 0 if it is NOT an opaque dir, 1 if it is, and
10898+ * -errno if an error occurred trying to figure this out.
10899+ */
10900+int is_opaque_dir(struct dentry *dentry, int bindex)
10901+{
10902+ int err = 0;
10903+ struct dentry *lower_dentry;
10904+ struct dentry *wh_lower_dentry;
10905+ struct inode *lower_inode;
10906+ struct sioq_args args;
10907+
10908+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10909+ lower_inode = lower_dentry->d_inode;
10910+
10911+ BUG_ON(!S_ISDIR(lower_inode->i_mode));
10912+
10913+ mutex_lock(&lower_inode->i_mutex);
10914+
10915+ if (!inode_permission(lower_inode, MAY_EXEC)) {
10916+ wh_lower_dentry =
10917+ lookup_one_len(UNIONFS_DIR_OPAQUE, lower_dentry,
10918+ sizeof(UNIONFS_DIR_OPAQUE) - 1);
10919+ } else {
10920+ args.is_opaque.dentry = lower_dentry;
10921+ run_sioq(__is_opaque_dir, &args);
10922+ wh_lower_dentry = args.ret;
10923+ }
10924+
10925+ mutex_unlock(&lower_inode->i_mutex);
10926+
10927+ if (IS_ERR(wh_lower_dentry)) {
10928+ err = PTR_ERR(wh_lower_dentry);
10929+ goto out;
10930+ }
10931+
10932+ /* This is an opaque dir iff wh_lower_dentry is positive */
10933+ err = !!wh_lower_dentry->d_inode;
10934+
10935+ dput(wh_lower_dentry);
10936+out:
10937+ return err;
10938+}
10939+
10940+void __is_opaque_dir(struct work_struct *work)
10941+{
10942+ struct sioq_args *args = container_of(work, struct sioq_args, work);
10943+
10944+ args->ret = lookup_one_len(UNIONFS_DIR_OPAQUE, args->is_opaque.dentry,
10945+ sizeof(UNIONFS_DIR_OPAQUE) - 1);
10946+ complete(&args->comp);
10947+}
10948+
10949+int make_dir_opaque(struct dentry *dentry, int bindex)
10950+{
10951+ int err = 0;
10952+ struct dentry *lower_dentry, *diropq;
10953+ struct inode *lower_dir;
10954+ struct nameidata nd;
10955+ const struct cred *old_creds;
10956+ struct cred *new_creds;
10957+
10958+ /*
10959+ * Opaque directory whiteout markers are special files (like regular
10960+ * whiteouts), and should appear to the users as if they don't
10961+ * exist. They should be created/deleted regardless of directory
10962+ * search/create permissions, but only for the duration of this
10963+ * creation of the .wh.__dir_opaque: file. Note, this does not
10964+ * circumvent normal ->permission).
10965+ */
10966+ new_creds = prepare_creds();
10967+ if (unlikely(!new_creds)) {
10968+ err = -ENOMEM;
10969+ goto out_err;
10970+ }
10971+ cap_raise(new_creds->cap_effective, CAP_DAC_READ_SEARCH);
10972+ cap_raise(new_creds->cap_effective, CAP_DAC_OVERRIDE);
10973+ old_creds = override_creds(new_creds);
10974+
10975+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10976+ lower_dir = lower_dentry->d_inode;
10977+ BUG_ON(!S_ISDIR(dentry->d_inode->i_mode) ||
10978+ !S_ISDIR(lower_dir->i_mode));
10979+
10980+ mutex_lock(&lower_dir->i_mutex);
10981+ diropq = lookup_one_len(UNIONFS_DIR_OPAQUE, lower_dentry,
10982+ sizeof(UNIONFS_DIR_OPAQUE) - 1);
10983+ if (IS_ERR(diropq)) {
10984+ err = PTR_ERR(diropq);
10985+ goto out;
10986+ }
10987+
10988+ err = init_lower_nd(&nd, LOOKUP_CREATE);
10989+ if (unlikely(err < 0))
10990+ goto out;
10991+ if (!diropq->d_inode)
10992+ err = vfs_create(lower_dir, diropq, S_IRUGO, &nd);
10993+ if (!err)
10994+ dbopaque(dentry) = bindex;
10995+ release_lower_nd(&nd, err);
10996+
10997+ dput(diropq);
10998+
10999+out:
11000+ mutex_unlock(&lower_dir->i_mutex);
11001+ revert_creds(old_creds);
11002+out_err:
11003+ return err;
11004+}
0c5527e5
AM
11005diff --git a/fs/unionfs/xattr.c b/fs/unionfs/xattr.c
11006new file mode 100644
11007index 0000000..9002e06
11008--- /dev/null
11009+++ b/fs/unionfs/xattr.c
2380c486
JR
11010@@ -0,0 +1,173 @@
11011+/*
7670a7fc 11012+ * Copyright (c) 2003-2010 Erez Zadok
2380c486
JR
11013+ * Copyright (c) 2003-2006 Charles P. Wright
11014+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
11015+ * Copyright (c) 2005-2006 Junjiro Okajima
11016+ * Copyright (c) 2005 Arun M. Krishnakumar
11017+ * Copyright (c) 2004-2006 David P. Quigley
11018+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
11019+ * Copyright (c) 2003 Puja Gupta
11020+ * Copyright (c) 2003 Harikesavan Krishnan
7670a7fc
AM
11021+ * Copyright (c) 2003-2010 Stony Brook University
11022+ * Copyright (c) 2003-2010 The Research Foundation of SUNY
2380c486
JR
11023+ *
11024+ * This program is free software; you can redistribute it and/or modify
11025+ * it under the terms of the GNU General Public License version 2 as
11026+ * published by the Free Software Foundation.
11027+ */
11028+
11029+#include "union.h"
11030+
11031+/* This is lifted from fs/xattr.c */
11032+void *unionfs_xattr_alloc(size_t size, size_t limit)
11033+{
11034+ void *ptr;
11035+
11036+ if (size > limit)
11037+ return ERR_PTR(-E2BIG);
11038+
11039+ if (!size) /* size request, no buffer is needed */
11040+ return NULL;
11041+
11042+ ptr = kmalloc(size, GFP_KERNEL);
11043+ if (unlikely(!ptr))
11044+ return ERR_PTR(-ENOMEM);
11045+ return ptr;
11046+}
11047+
11048+/*
11049+ * BKL held by caller.
11050+ * dentry->d_inode->i_mutex locked
11051+ */
11052+ssize_t unionfs_getxattr(struct dentry *dentry, const char *name, void *value,
11053+ size_t size)
11054+{
11055+ struct dentry *lower_dentry = NULL;
11056+ struct dentry *parent;
11057+ int err = -EOPNOTSUPP;
11058+ bool valid;
11059+
11060+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11061+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11062+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11063+
11064+ valid = __unionfs_d_revalidate(dentry, parent, false);
11065+ if (unlikely(!valid)) {
11066+ err = -ESTALE;
11067+ goto out;
11068+ }
11069+
11070+ lower_dentry = unionfs_lower_dentry(dentry);
11071+
11072+ err = vfs_getxattr(lower_dentry, (char *) name, value, size);
11073+
11074+out:
11075+ unionfs_check_dentry(dentry);
11076+ unionfs_unlock_dentry(dentry);
11077+ unionfs_unlock_parent(dentry, parent);
11078+ unionfs_read_unlock(dentry->d_sb);
11079+ return err;
11080+}
11081+
11082+/*
11083+ * BKL held by caller.
11084+ * dentry->d_inode->i_mutex locked
11085+ */
11086+int unionfs_setxattr(struct dentry *dentry, const char *name,
11087+ const void *value, size_t size, int flags)
11088+{
11089+ struct dentry *lower_dentry = NULL;
11090+ struct dentry *parent;
11091+ int err = -EOPNOTSUPP;
11092+ bool valid;
11093+
11094+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11095+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11096+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11097+
11098+ valid = __unionfs_d_revalidate(dentry, parent, false);
11099+ if (unlikely(!valid)) {
11100+ err = -ESTALE;
11101+ goto out;
11102+ }
11103+
11104+ lower_dentry = unionfs_lower_dentry(dentry);
11105+
11106+ err = vfs_setxattr(lower_dentry, (char *) name, (void *) value,
11107+ size, flags);
11108+
11109+out:
11110+ unionfs_check_dentry(dentry);
11111+ unionfs_unlock_dentry(dentry);
11112+ unionfs_unlock_parent(dentry, parent);
11113+ unionfs_read_unlock(dentry->d_sb);
11114+ return err;
11115+}
11116+
11117+/*
11118+ * BKL held by caller.
11119+ * dentry->d_inode->i_mutex locked
11120+ */
11121+int unionfs_removexattr(struct dentry *dentry, const char *name)
11122+{
11123+ struct dentry *lower_dentry = NULL;
11124+ struct dentry *parent;
11125+ int err = -EOPNOTSUPP;
11126+ bool valid;
11127+
11128+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11129+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11130+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11131+
11132+ valid = __unionfs_d_revalidate(dentry, parent, false);
11133+ if (unlikely(!valid)) {
11134+ err = -ESTALE;
11135+ goto out;
11136+ }
11137+
11138+ lower_dentry = unionfs_lower_dentry(dentry);
11139+
11140+ err = vfs_removexattr(lower_dentry, (char *) name);
11141+
11142+out:
11143+ unionfs_check_dentry(dentry);
11144+ unionfs_unlock_dentry(dentry);
11145+ unionfs_unlock_parent(dentry, parent);
11146+ unionfs_read_unlock(dentry->d_sb);
11147+ return err;
11148+}
11149+
11150+/*
11151+ * BKL held by caller.
11152+ * dentry->d_inode->i_mutex locked
11153+ */
11154+ssize_t unionfs_listxattr(struct dentry *dentry, char *list, size_t size)
11155+{
11156+ struct dentry *lower_dentry = NULL;
11157+ struct dentry *parent;
11158+ int err = -EOPNOTSUPP;
11159+ char *encoded_list = NULL;
11160+ bool valid;
11161+
11162+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11163+ parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11164+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11165+
11166+ valid = __unionfs_d_revalidate(dentry, parent, false);
11167+ if (unlikely(!valid)) {
11168+ err = -ESTALE;
11169+ goto out;
11170+ }
11171+
11172+ lower_dentry = unionfs_lower_dentry(dentry);
11173+
11174+ encoded_list = list;
11175+ err = vfs_listxattr(lower_dentry, encoded_list, size);
11176+
11177+out:
11178+ unionfs_check_dentry(dentry);
11179+ unionfs_unlock_dentry(dentry);
11180+ unionfs_unlock_parent(dentry, parent);
11181+ unionfs_read_unlock(dentry->d_sb);
11182+ return err;
11183+}
0c5527e5
AM
11184diff --git a/include/linux/fs_stack.h b/include/linux/fs_stack.h
11185index da317c7..64f1ced 100644
11186--- a/include/linux/fs_stack.h
11187+++ b/include/linux/fs_stack.h
7670a7fc
AM
11188@@ -1,7 +1,19 @@
11189+/*
11190+ * Copyright (c) 2006-2009 Erez Zadok
11191+ * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
11192+ * Copyright (c) 2006-2009 Stony Brook University
11193+ * Copyright (c) 2006-2009 The Research Foundation of SUNY
11194+ *
11195+ * This program is free software; you can redistribute it and/or modify
11196+ * it under the terms of the GNU General Public License version 2 as
11197+ * published by the Free Software Foundation.
11198+ */
11199+
11200 #ifndef _LINUX_FS_STACK_H
11201 #define _LINUX_FS_STACK_H
11202
11203-/* This file defines generic functions used primarily by stackable
11204+/*
11205+ * This file defines generic functions used primarily by stackable
11206 * filesystems; none of these functions require i_mutex to be held.
11207 */
11208
0c5527e5 11209diff --git a/include/linux/magic.h b/include/linux/magic.h
82260373 11210index 62730ea..bd9832b 100644
0c5527e5
AM
11211--- a/include/linux/magic.h
11212+++ b/include/linux/magic.h
82260373 11213@@ -48,6 +48,8 @@
2380c486
JR
11214 #define REISER2FS_SUPER_MAGIC_STRING "ReIsEr2Fs"
11215 #define REISER2FS_JR_SUPER_MAGIC_STRING "ReIsEr3Fs"
11216
11217+#define UNIONFS_SUPER_MAGIC 0xf15f083d
11218+
11219 #define SMB_SUPER_MAGIC 0x517B
11220 #define USBDEVICE_SUPER_MAGIC 0x9fa2
11221 #define CGROUP_SUPER_MAGIC 0x27e0eb
0c5527e5 11222diff --git a/include/linux/namei.h b/include/linux/namei.h
82260373 11223index f276d4f..cf4ec6c 100644
0c5527e5
AM
11224--- a/include/linux/namei.h
11225+++ b/include/linux/namei.h
82260373 11226@@ -78,6 +78,7 @@ extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
7670a7fc
AM
11227
11228 extern struct file *lookup_instantiate_filp(struct nameidata *nd, struct dentry *dentry,
11229 int (*open)(struct inode *, struct file *));
11230+extern void release_open_intent(struct nameidata *);
11231
11232 extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
11233
0c5527e5
AM
11234diff --git a/include/linux/splice.h b/include/linux/splice.h
11235index 997c3b4..54f5501 100644
11236--- a/include/linux/splice.h
11237+++ b/include/linux/splice.h
11238@@ -81,6 +81,11 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
2380c486
JR
11239 struct splice_pipe_desc *);
11240 extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
11241 splice_direct_actor *);
11242+extern long vfs_splice_from(struct pipe_inode_info *pipe, struct file *out,
11243+ loff_t *ppos, size_t len, unsigned int flags);
11244+extern long vfs_splice_to(struct file *in, loff_t *ppos,
11245+ struct pipe_inode_info *pipe, size_t len,
11246+ unsigned int flags);
11247
76514441
AM
11248 /*
11249 * for dynamic pipe sizing
0c5527e5
AM
11250diff --git a/include/linux/union_fs.h b/include/linux/union_fs.h
11251new file mode 100644
11252index 0000000..c84d97e
11253--- /dev/null
11254+++ b/include/linux/union_fs.h
2380c486
JR
11255@@ -0,0 +1,22 @@
11256+/*
11257+ * Copyright (c) 2003-2009 Erez Zadok
11258+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
11259+ * Copyright (c) 2003-2009 Stony Brook University
11260+ * Copyright (c) 2003-2009 The Research Foundation of SUNY
11261+ *
11262+ * This program is free software; you can redistribute it and/or modify
11263+ * it under the terms of the GNU General Public License version 2 as
11264+ * published by the Free Software Foundation.
11265+ */
11266+
11267+#ifndef _LINUX_UNION_FS_H
11268+#define _LINUX_UNION_FS_H
11269+
11270+/*
11271+ * DEFINITIONS FOR USER AND KERNEL CODE:
11272+ */
11273+# define UNIONFS_IOCTL_INCGEN _IOR(0x15, 11, int)
11274+# define UNIONFS_IOCTL_QUERYFILE _IOR(0x15, 15, int)
11275+
11276+#endif /* _LINUX_UNIONFS_H */
11277+
0c5527e5 11278diff --git a/security/security.c b/security/security.c
82260373 11279index 7b7308a..abdb5a5 100644
0c5527e5
AM
11280--- a/security/security.c
11281+++ b/security/security.c
82260373 11282@@ -511,6 +511,7 @@ int security_inode_permission(struct inode *inode, int mask)
2380c486
JR
11283 return 0;
11284 return security_ops->inode_permission(inode, mask);
11285 }
11286+EXPORT_SYMBOL(security_inode_permission);
11287
82260373 11288 int security_inode_exec_permission(struct inode *inode, unsigned int flags)
2380c486 11289 {
233c9fb5
AM
11290diff -purN orig/fs/unionfs/commonfops.c linux-2.6.36/fs/unionfs/commonfops.c
11291--- orig/fs/unionfs/commonfops.c 2010-10-21 16:29:51.033693283 -0400
11292+++ linux-2.6.36/fs/unionfs/commonfops.c 2010-10-27 10:15:30.337131546 -0400
11293@@ -740,10 +740,8 @@ static long do_ioctl(struct file *file,
11294 if (lower_file->f_op->unlocked_ioctl) {
11295 err = lower_file->f_op->unlocked_ioctl(lower_file, cmd, arg);
11296 #ifdef CONFIG_COMPAT
11297- } else if (lower_file->f_op->ioctl) {
11298- err = lower_file->f_op->compat_ioctl(
11299- lower_file->f_path.dentry->d_inode,
11300- lower_file, cmd, arg);
11301+ } else if (lower_file->f_op->compat_ioctl) {
11302+ err = lower_file->f_op->compat_ioctl(lower_file, cmd, arg);
11303 #endif
11304 }
11305
This page took 1.535517 seconds and 4 git commands to generate.