]> git.pld-linux.org Git - packages/kernel.git/blame - linux-2.6-unionfs-2.1.1.patch
- kernel kvm module build disabled, it is provided by kvm.spec
[packages/kernel.git] / linux-2.6-unionfs-2.1.1.patch
CommitLineData
7f651772 1diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
2index 5717858..2ef035e 100644
3--- a/Documentation/filesystems/00-INDEX
4+++ b/Documentation/filesystems/00-INDEX
5@@ -84,6 +84,8 @@ udf.txt
6 - info and mount options for the UDF filesystem.
7 ufs.txt
8 - info on the ufs filesystem.
9+unionfs/
10+ - info on the unionfs filesystem
11 vfat.txt
12 - info on using the VFAT filesystem used in Windows NT and Windows 95
13 vfs.txt
14diff --git a/Documentation/filesystems/unionfs/00-INDEX b/Documentation/filesystems/unionfs/00-INDEX
15new file mode 100644
16index 0000000..96fdf67
17--- /dev/null
18+++ b/Documentation/filesystems/unionfs/00-INDEX
19@@ -0,0 +1,10 @@
20+00-INDEX
21+ - this file.
22+concepts.txt
23+ - A brief introduction of concepts.
24+issues.txt
25+ - A summary of known issues with unionfs.
26+rename.txt
27+ - Information regarding rename operations.
28+usage.txt
29+ - Usage information and examples.
30diff --git a/Documentation/filesystems/unionfs/concepts.txt b/Documentation/filesystems/unionfs/concepts.txt
31new file mode 100644
32index 0000000..37a62d8
33--- /dev/null
34+++ b/Documentation/filesystems/unionfs/concepts.txt
35@@ -0,0 +1,181 @@
36+Unionfs 2.1 CONCEPTS:
37+=====================
38+
39+This file describes the concepts needed by a namespace unification file
40+system.
41+
42+
43+Branch Priority:
44+================
45+
46+Each branch is assigned a unique priority - starting from 0 (highest
47+priority). No two branches can have the same priority.
48+
49+
50+Branch Mode:
51+============
52+
53+Each branch is assigned a mode - read-write or read-only. This allows
54+directories on media mounted read-write to be used in a read-only manner.
55+
56+
57+Whiteouts:
58+==========
59+
60+A whiteout removes a file name from the namespace. Whiteouts are needed when
61+one attempts to remove a file on a read-only branch.
62+
63+Suppose we have a two-branch union, where branch 0 is read-write and branch
64+1 is read-only. And a file 'foo' on branch 1:
65+
66+./b0/
67+./b1/
68+./b1/foo
69+
70+The unified view would simply be:
71+
72+./union/
73+./union/foo
74+
75+Since 'foo' is stored on a read-only branch, it cannot be removed. A
76+whiteout is used to remove the name 'foo' from the unified namespace. Again,
77+since branch 1 is read-only, the whiteout cannot be created there. So, we
78+try on a higher priority (lower numerically) branch and create the whiteout
79+there.
80+
81+./b0/
82+./b0/.wh.foo
83+./b1/
84+./b1/foo
85+
86+Later, when Unionfs traverses branches (due to lookup or readdir), it
87+eliminate 'foo' from the namespace (as well as the whiteout itself.)
88+
89+
90+Duplicate Elimination:
91+======================
92+
93+It is possible for files on different branches to have the same name.
94+Unionfs then has to select which instance of the file to show to the user.
95+Given the fact that each branch has a priority associated with it, the
96+simplest solution is to take the instance from the highest priority
97+(numerically lowest value) and "hide" the others.
98+
99+
100+Copyup:
101+=======
102+
103+When a change is made to the contents of a file's data or meta-data, they
104+have to be stored somewhere. The best way is to create a copy of the
105+original file on a branch that is writable, and then redirect the write
106+though to this copy. The copy must be made on a higher priority branch so
107+that lookup and readdir return this newer "version" of the file rather than
108+the original (see duplicate elimination).
109+
110+
111+Cache Coherency:
112+================
113+
114+Unionfs users often want to be able to modify files and directories directly
115+on the lower branches, and have those changes be visible at the Unionfs
116+level. This means that data (e.g., pages) and meta-data (dentries, inodes,
117+open files, etc.) have to be synchronized between the upper and lower
118+layers. In other words, the newest changes from a layer below have to be
119+propagated to the Unionfs layer above. If the two layers are not in sync, a
120+cache incoherency ensues, which could lead to application failures and even
121+oopses. The Linux kernel, however, has a rather limited set of mechanisms
122+to ensure this inter-layer cache coherency---so Unionfs has to do most of
123+the hard work on its own.
124+
125+Maintaining Invariants:
126+
127+The way Unionfs ensures cache coherency is as follows. At each entry point
128+to a Unionfs file system method, we call a utility function to validate the
129+primary objects of this method. Generally, we call unionfs_file_revalidate
130+on open files, and __unionfs_d_revalidate_chain on dentries (which also
131+validates inodes). These utility functions check to see whether the upper
132+Unionfs object is in sync with any of the lower objects that it represents.
133+The checks we perform include whether the Unionfs superblock has a newer
134+generation number, or if any of the lower objects mtime's or ctime's are
135+newer. (Note: generation numbers change when branch-management commands are
136+issued, so in a way, maintaining cache coherency is also very important for
137+branch-management.) If indeed we determine that any Unionfs object is no
138+longer in sync with its lower counterparts, then we rebuild that object
139+similarly to how we do so for branch-management.
140+
141+While rebuilding Unionfs's objects, we also purge any page mappings and
142+truncate inode pages (see fs/unionfs/dentry.c:purge_inode_data). This is to
143+ensure that Unionfs will re-get the newer data from the lower branches. We
144+perform this purging only if the Unionfs operation in question is a reading
145+operation; if Unionfs is performing a data writing operation (e.g., ->write,
146+->commit_write, etc.) then we do NOT flush the lower mappings/pages: this is
147+because (1) a self-deadlock could occur and (2) the upper Unionfs pages are
148+considered more authoritative anyway, as they are newer and will overwrite
149+any lower pages.
150+
151+Unionfs maintains the following important invariant regarding mtime's,
152+ctime's, and atime's: the upper inode object's times are the max() of all of
153+the lower ones. For non-directory objects, there's only one object below,
154+so the mapping is simple; for directory objects, there could me multiple
155+lower objects and we have to sync up with the newest one of all the lower
156+ones. This invariant is important to maintain, especially for directories
157+(besides, we need this to be POSIX compliant). A union could comprise
158+multiple writable branches, each of which could change. If we don't reflect
159+the newest possible mtime/ctime, some applications could fail. For example,
160+NFSv2/v3 exports check for newer directory mtimes on the server to determine
161+if the client-side attribute cache should be purged.
162+
163+To maintain these important invariants, of course, Unionfs carefully
164+synchronizes upper and lower times in various places. For example, if we
165+copy-up a file to a top-level branch, the parent directory where the file
166+was copied up to will now have a new mtime: so after a successful copy-up,
167+we sync up with the new top-level branch's parent directory mtime.
168+
169+Implementation:
170+
171+This cache-coherency implementation is efficient because it defers any
172+synchronizing between the upper and lower layers until absolutely needed.
173+Consider the example a common situation where users perform a lot of lower
174+changes, such as untarring a whole package. While these take place,
175+typically the user doesn't access the files via Unionfs; only after the
176+lower changes are done, does the user try to access the lower files. With
177+our cache-coherency implementation, the entirety of the changes to the lower
178+branches will not result in a single CPU cycle spent at the Unionfs level
179+until the user invokes a system call that goes through Unionfs.
180+
181+We have considered two alternate cache-coherency designs. (1) Using the
182+dentry/inode notify functionality to register interest in finding out about
183+any lower changes. This is a somewhat limited and also a heavy-handed
184+approach which could result in many notifications to the Unionfs layer upon
185+each small change at the lower layer (imagine a file being modified multiple
186+times in rapid succession). (2) Rewriting the VFS to support explicit
187+callbacks from lower objects to upper objects. We began exploring such an
188+implementation, but found it to be very complicated--it would have resulted
189+in massive VFS/MM changes which are unlikely to be accepted by the LKML
190+community. We therefore believe that our current cache-coherency design and
191+implementation represent the best approach at this time.
192+
193+Limitations:
194+
195+Our implementation works in that as long as a user process will have caused
196+Unionfs to be called, directly or indirectly, even to just do
197+->d_revalidate; then we will have purged the current Unionfs data and the
198+process will see the new data. For example, a process that continually
199+re-reads the same file's data will see the NEW data as soon as the lower
200+file had changed, upon the next read(2) syscall (even if the file is still
201+open!) However, this doesn't work when the process re-reads the open file's
202+data via mmap(2) (unless the user unmaps/closes the file and remaps/reopens
203+it). Once we respond to ->readpage(s), then the kernel maps the page into
204+the process's address space and there doesn't appear to be a way to force
205+the kernel to invalidate those pages/mappings, and force the process to
206+re-issue ->readpage. If there's a way to invalidate active mappings and
207+force a ->readpage, let us know please (invalidate_inode_pages2 doesn't do
208+the trick).
209+
210+Our current Unionfs code has to perform many file-revalidation calls. It
211+would be really nice if the VFS would export an optional file system hook
212+->file_revalidate (similarly to dentry->d_revalidate) that will be called
213+before each VFS op that has a "struct file" in it.
214+
215+
216+For more information, see <http://unionfs.filesystems.org/>.
217diff --git a/Documentation/filesystems/unionfs/issues.txt b/Documentation/filesystems/unionfs/issues.txt
218new file mode 100644
219index 0000000..9db1d70
220--- /dev/null
221+++ b/Documentation/filesystems/unionfs/issues.txt
222@@ -0,0 +1,24 @@
223+KNOWN Unionfs 2.1 ISSUES:
224+=========================
225+
226+1. Unionfs should not use lookup_one_len() on the underlying f/s as it
227+ confuses NFSv4. Currently, unionfs_lookup() passes lookup intents to the
228+ lower file-system, this eliminates part of the problem. The remaining
229+ calls to lookup_one_len may need to be changed to pass an intent. We are
230+ currently introducing VFS changes to fs/namei.c's do_path_lookup() to
231+ allow proper file lookup and opening in stackable file systems.
232+
233+2. Lockdep (a debugging feature) isn't aware of stacking, and so it
234+ incorrectly complains about locking problems. The problem boils down to
235+ this: Lockdep considers all objects of a certain type to be in the same
236+ class, for example, all inodes. Lockdep doesn't like to see a lock held
237+ on two inodes within the same task, and warns that it could lead to a
238+ deadlock. However, stackable file systems do precisely that: they lock
239+ an upper object, and then a lower object, in a strict order to avoid
240+ locking problems; in addition, Unionfs, as a fan-out file system, may
241+ have to lock several lower inodes. We are currently looking into Lockdep
242+ to see how to make it aware of stackable file systems. In the meantime,
243+ if you get any warnings from Lockdep, you can safely ignore them (or feel
244+ free to report them to the Unionfs maintainers, just to be sure).
245+
246+For more information, see <http://unionfs.filesystems.org/>.
247diff --git a/Documentation/filesystems/unionfs/rename.txt b/Documentation/filesystems/unionfs/rename.txt
248new file mode 100644
249index 0000000..e20bb82
250--- /dev/null
251+++ b/Documentation/filesystems/unionfs/rename.txt
252@@ -0,0 +1,31 @@
253+Rename is a complex beast. The following table shows which rename(2) operations
254+should succeed and which should fail.
255+
256+o: success
257+E: error (either unionfs or vfs)
258+X: EXDEV
259+
260+none = file does not exist
261+file = file is a file
262+dir = file is a empty directory
263+child= file is a non-empty directory
264+wh = file is a directory containing only whiteouts; this makes it logically
265+ empty
266+
267+ none file dir child wh
268+file o o E E E
269+dir o E o E o
270+child X E X E X
271+wh o E o E o
272+
273+
274+Renaming directories:
275+=====================
276+
277+Whenever a empty (either physically or logically) directory is being renamed,
278+the following sequence of events should take place:
279+
280+1) Remove whiteouts from both source and destination directory
281+2) Rename source to destination
282+3) Make destination opaque to prevent anything under it from showing up
283+
284diff --git a/Documentation/filesystems/unionfs/usage.txt b/Documentation/filesystems/unionfs/usage.txt
285new file mode 100644
286index 0000000..d8c15de
287--- /dev/null
288+++ b/Documentation/filesystems/unionfs/usage.txt
289@@ -0,0 +1,98 @@
290+Unionfs is a stackable unification file system, which can appear to merge
291+the contents of several directories (branches), while keeping their physical
292+content separate. Unionfs is useful for unified source tree management,
293+merged contents of split CD-ROM, merged separate software package
294+directories, data grids, and more. Unionfs allows any mix of read-only and
295+read-write branches, as well as insertion and deletion of branches anywhere
296+in the fan-out. To maintain Unix semantics, Unionfs handles elimination of
297+duplicates, partial-error conditions, and more.
298+
299+# mount -t unionfs -o branch-option[,union-options[,...]] none MOUNTPOINT
300+
301+The available branch-option for the mount command is:
302+
303+ dirs=branch[=ro|=rw][:...]
304+
305+specifies a separated list of which directories compose the union.
306+Directories that come earlier in the list have a higher precedence than
307+those which come later. Additionally, read-only or read-write permissions of
308+the branch can be specified by appending =ro or =rw (default) to each
309+directory.
310+
311+Syntax:
312+
313+ dirs=/branch1[=ro|=rw]:/branch2[=ro|=rw]:...:/branchN[=ro|=rw]
314+
315+Example:
316+
317+ dirs=/writable_branch=rw:/read-only_branch=ro
318+
319+
320+DYNAMIC BRANCH MANAGEMENT AND REMOUNTS
321+======================================
322+
323+You can remount a union and change its overall mode, or reconfigure the
324+branches, as follows.
325+
326+To downgrade a union from read-write to read-only:
327+
328+# mount -t unionfs -o remount,ro none MOUNTPOINT
329+
330+To upgrade a union from read-only to read-write:
331+
332+# mount -t unionfs -o remount,rw none MOUNTPOINT
333+
334+To delete a branch /foo, regardless where it is in the current union:
335+
336+# mount -t unionfs -o remount,del=/foo none MOUNTPOINT
337+
338+To insert (add) a branch /foo before /bar:
339+
340+# mount -t unionfs -o remount,add=/bar:/foo none MOUNTPOINT
341+
342+To insert (add) a branch /foo (with the "rw" mode flag) before /bar:
343+
344+# mount -t unionfs -o remount,add=/bar:/foo=rw none MOUNTPOINT
345+
346+To insert (add) a branch /foo (in "rw" mode) at the very beginning (i.e., a
347+new highest-priority branch), you can use the above syntax, or use a short
348+hand version as follows:
349+
350+# mount -t unionfs -o remount,add=/foo none MOUNTPOINT
351+
352+To append a branch to the very end (new lowest-priority branch):
353+
354+# mount -t unionfs -o remount,add=:/foo none MOUNTPOINT
355+
356+To append a branch to the very end (new lowest-priority branch), in
357+read-only mode:
358+
359+# mount -t unionfs -o remount,add=:/foo=ro none MOUNTPOINT
360+
361+Finally, to change the mode of one existing branch, say /foo, from read-only
362+to read-write, and change /bar from read-write to read-only:
363+
364+# mount -t unionfs -o remount,mode=/foo=rw,mode=/bar=ro none MOUNTPOINT
365+
366+
367+CACHE CONSISTENCY
368+=================
369+
370+If you modify any file on any of the lower branches directly, while there is
371+a Unionfs 2.1 mounted above any of those branches, you should tell Unionfs
372+to purge its caches and re-get the objects. To do that, you have to
373+increment the generation number of the superblock using the following
374+command:
375+
376+# mount -t unionfs -o remount,incgen none MOUNTPOINT
377+
378+Note that the older way of incrementing the generation number using an
379+ioctl, is no longer supported in Unionfs 2.0 and newer. Ioctls in general
380+are not encouraged. Plus, an ioctl is per-file concept, whereas the
381+generation number is a per-file-system concept. Worse, such an ioctl
382+requires an open file, which then has to be invalidated by the very nature
383+of the generation number increase (read: the old generation increase ioctl
384+was pretty racy).
385+
386+
387+For more information, see <http://unionfs.filesystems.org/>.
388diff --git a/MAINTAINERS b/MAINTAINERS
389index df40a4e..161652b 100644
390--- a/MAINTAINERS
391+++ b/MAINTAINERS
392@@ -3593,6 +3593,15 @@ L: linux-kernel@vger.kernel.org
393 W: http://www.kernel.dk
394 S: Maintained
395
396+UNIONFS
397+P: Erez Zadok
398+M: ezk@cs.sunysb.edu
399+P: Josef "Jeff" Sipek
400+M: jsipek@cs.sunysb.edu
401+L: unionfs@filesystems.org
402+W: http://unionfs.filesystems.org
403+S: Maintained
404+
405 USB ACM DRIVER
406 P: Oliver Neukum
407 M: oliver@neukum.name
408diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
409index aca3319..e28f0fa 100644
410--- a/drivers/mtd/mtdsuper.c
411+++ b/drivers/mtd/mtdsuper.c
412@@ -230,3 +230,5 @@ void kill_mtd_super(struct super_block *sb)
413 }
414
415 EXPORT_SYMBOL_GPL(kill_mtd_super);
416+
417+MODULE_LICENSE("GPL");
418diff --git a/fs/Kconfig b/fs/Kconfig
419index 0fa0c11..e9380c7 100644
420--- a/fs/Kconfig
421+++ b/fs/Kconfig
422@@ -1030,6 +1030,47 @@ config CONFIGFS_FS
423
424 endmenu
425
426+menu "Layered filesystems"
427+
428+config ECRYPT_FS
429+ tristate "eCrypt filesystem layer support (EXPERIMENTAL)"
430+ depends on EXPERIMENTAL && KEYS && CRYPTO && NET
431+ help
432+ Encrypted filesystem that operates on the VFS layer. See
433+ <file:Documentation/ecryptfs.txt> to learn more about
434+ eCryptfs. Userspace components are required and can be
435+ obtained from <http://ecryptfs.sf.net>.
436+
437+ To compile this file system support as a module, choose M here: the
438+ module will be called ecryptfs.
439+
440+config UNION_FS
441+ tristate "Union file system (EXPERIMENTAL)"
442+ depends on EXPERIMENTAL
443+ help
444+ Unionfs is a stackable unification file system, which appears to
445+ merge the contents of several directories (branches), while keeping
446+ their physical content separate.
447+
448+ See <http://unionfs.filesystems.org> for details
449+
450+config UNION_FS_XATTR
451+ bool "Unionfs extended attributes"
452+ depends on UNION_FS
453+ help
454+ Extended attributes are name:value pairs associated with inodes by
455+ the kernel or by users (see the attr(5) manual page).
456+
457+ If unsure, say N.
458+
459+config UNION_FS_DEBUG
460+ bool "Debug Unionfs"
461+ depends on UNION_FS
462+ help
463+ If you say Y here, you can turn on debugging output from Unionfs.
464+
465+endmenu
466+
467 menu "Miscellaneous filesystems"
468
469 config ADFS_FS
470@@ -1082,18 +1123,6 @@ config AFFS_FS
471 To compile this file system support as a module, choose M here: the
472 module will be called affs. If unsure, say N.
473
474-config ECRYPT_FS
475- tristate "eCrypt filesystem layer support (EXPERIMENTAL)"
476- depends on EXPERIMENTAL && KEYS && CRYPTO && NET
477- help
478- Encrypted filesystem that operates on the VFS layer. See
479- <file:Documentation/ecryptfs.txt> to learn more about
480- eCryptfs. Userspace components are required and can be
481- obtained from <http://ecryptfs.sf.net>.
482-
483- To compile this file system support as a module, choose M here: the
484- module will be called ecryptfs.
485-
486 config HFS_FS
487 tristate "Apple Macintosh file system support (EXPERIMENTAL)"
488 depends on BLOCK && EXPERIMENTAL
489diff --git a/fs/Makefile b/fs/Makefile
490index 720c29d..951f411 100644
491--- a/fs/Makefile
492+++ b/fs/Makefile
493@@ -118,3 +118,4 @@ obj-$(CONFIG_HPPFS) += hppfs/
494 obj-$(CONFIG_DEBUG_FS) += debugfs/
495 obj-$(CONFIG_OCFS2_FS) += ocfs2/
496 obj-$(CONFIG_GFS2_FS) += gfs2/
497+obj-$(CONFIG_UNION_FS) += unionfs/
498diff --git a/fs/drop_caches.c b/fs/drop_caches.c
499index 03ea769..6a7aa05 100644
500--- a/fs/drop_caches.c
501+++ b/fs/drop_caches.c
502@@ -3,6 +3,7 @@
503 */
504
505 #include <linux/kernel.h>
506+#include <linux/module.h>
507 #include <linux/mm.h>
508 #include <linux/fs.h>
509 #include <linux/writeback.h>
510@@ -12,7 +13,7 @@
511 /* A global variable is a bit ugly, but it keeps the code simple */
512 int sysctl_drop_caches;
513
514-static void drop_pagecache_sb(struct super_block *sb)
515+void drop_pagecache_sb(struct super_block *sb)
516 {
517 struct inode *inode;
518
519@@ -24,6 +25,7 @@ static void drop_pagecache_sb(struct super_block *sb)
520 }
521 spin_unlock(&inode_lock);
522 }
523+EXPORT_SYMBOL(drop_pagecache_sb);
524
525 void drop_pagecache(void)
526 {
527diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c
528index cb20b96..a8c1686 100644
529--- a/fs/ecryptfs/dentry.c
530+++ b/fs/ecryptfs/dentry.c
531@@ -62,7 +62,7 @@ static int ecryptfs_d_revalidate(struct dentry *dentry, struct nameidata *nd)
532 struct inode *lower_inode =
533 ecryptfs_inode_to_lower(dentry->d_inode);
534
535- fsstack_copy_attr_all(dentry->d_inode, lower_inode, NULL);
536+ fsstack_copy_attr_all(dentry->d_inode, lower_inode);
537 }
538 out:
539 return rc;
540diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
541index 9c6877c..fed495d 100644
542--- a/fs/ecryptfs/inode.c
543+++ b/fs/ecryptfs/inode.c
544@@ -280,7 +280,9 @@ static struct dentry *ecryptfs_lookup(struct inode *dir, struct dentry *dentry,
545 int rc = 0;
546 struct dentry *lower_dir_dentry;
547 struct dentry *lower_dentry;
548+ struct dentry *dentry_save;
549 struct vfsmount *lower_mnt;
550+ struct vfsmount *mnt_save;
551 char *encoded_name;
552 unsigned int encoded_namelen;
553 struct ecryptfs_crypt_stat *crypt_stat = NULL;
554@@ -308,9 +310,13 @@ static struct dentry *ecryptfs_lookup(struct inode *dir, struct dentry *dentry,
555 }
556 ecryptfs_printk(KERN_DEBUG, "encoded_name = [%s]; encoded_namelen "
557 "= [%d]\n", encoded_name, encoded_namelen);
558- lower_dentry = lookup_one_len(encoded_name, lower_dir_dentry,
559- encoded_namelen - 1);
560+ dentry_save = nd->dentry;
561+ mnt_save = nd->mnt;
562+ lower_dentry = lookup_one_len_nd(encoded_name, lower_dir_dentry,
563+ (encoded_namelen - 1), nd);
564 kfree(encoded_name);
565+ nd->mnt = mnt_save;
566+ nd->dentry = dentry_save;
567 if (IS_ERR(lower_dentry)) {
568 ecryptfs_printk(KERN_ERR, "ERR from lower_dentry\n");
569 rc = PTR_ERR(lower_dentry);
570@@ -597,9 +603,9 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry,
571 lower_new_dir_dentry->d_inode, lower_new_dentry);
572 if (rc)
573 goto out_lock;
574- fsstack_copy_attr_all(new_dir, lower_new_dir_dentry->d_inode, NULL);
575+ fsstack_copy_attr_all(new_dir, lower_new_dir_dentry->d_inode);
576 if (new_dir != old_dir)
577- fsstack_copy_attr_all(old_dir, lower_old_dir_dentry->d_inode, NULL);
578+ fsstack_copy_attr_all(old_dir, lower_old_dir_dentry->d_inode);
579 out_lock:
580 unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
581 dput(lower_new_dentry->d_parent);
582@@ -957,7 +963,7 @@ static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia)
583 }
584 rc = notify_change(lower_dentry, ia);
585 out:
586- fsstack_copy_attr_all(inode, lower_inode, NULL);
587+ fsstack_copy_attr_all(inode, lower_inode);
588 return rc;
589 }
590
591diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c
592index 606128f..5f99404 100644
593--- a/fs/ecryptfs/main.c
594+++ b/fs/ecryptfs/main.c
595@@ -151,7 +151,7 @@ int ecryptfs_interpose(struct dentry *lower_dentry, struct dentry *dentry,
596 d_add(dentry, inode);
597 else
598 d_instantiate(dentry, inode);
599- fsstack_copy_attr_all(inode, lower_inode, NULL);
600+ fsstack_copy_attr_all(inode, lower_inode);
601 /* This size will be overwritten for real files w/ headers and
602 * other metadata */
603 fsstack_copy_inode_size(inode, lower_inode);
604diff --git a/fs/namei.c b/fs/namei.c
605index 5e2d98d..90d2a3a 100644
606--- a/fs/namei.c
607+++ b/fs/namei.c
608@@ -1124,6 +1124,10 @@ static int fastcall do_path_lookup(int dfd, const char *name,
609 nd->mnt = mntget(fs->rootmnt);
610 nd->dentry = dget(fs->root);
611 read_unlock(&fs->lock);
612+ } else if (flags & LOOKUP_ONE) {
613+ /* nd->mnt and nd->dentry already set, just grab references */
614+ mntget(nd->mnt);
615+ dget(nd->dentry);
616 } else if (dfd == AT_FDCWD) {
617 read_lock(&fs->lock);
618 nd->mnt = mntget(fs->pwdmnt);
619@@ -1325,7 +1329,8 @@ static inline int __lookup_one_len(const char *name, struct qstr *this, struct d
620 return 0;
621 }
622
623-struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
624+struct dentry *lookup_one_len_nd(const char *name, struct dentry *base,
625+ int len, struct nameidata *nd)
626 {
627 int err;
628 struct qstr this;
629@@ -1333,7 +1338,7 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
630 err = __lookup_one_len(name, &this, base, len);
631 if (err)
632 return ERR_PTR(err);
633- return __lookup_hash(&this, base, NULL);
634+ return __lookup_hash(&this, base, nd);
635 }
636
637 struct dentry *lookup_one_len_kern(const char *name, struct dentry *base, int len)
638@@ -2766,7 +2771,7 @@ EXPORT_SYMBOL(follow_up);
639 EXPORT_SYMBOL(get_write_access); /* binfmt_aout */
640 EXPORT_SYMBOL(getname);
641 EXPORT_SYMBOL(lock_rename);
642-EXPORT_SYMBOL(lookup_one_len);
643+EXPORT_SYMBOL(lookup_one_len_nd);
644 EXPORT_SYMBOL(page_follow_link_light);
645 EXPORT_SYMBOL(page_put_link);
646 EXPORT_SYMBOL(page_readlink);
647diff --git a/fs/stack.c b/fs/stack.c
648index 67716f6..a548aac 100644
649--- a/fs/stack.c
650+++ b/fs/stack.c
651@@ -1,8 +1,20 @@
652+/*
653+ * Copyright (c) 2006-2007 Erez Zadok
654+ * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
655+ * Copyright (c) 2006-2007 Stony Brook University
656+ * Copyright (c) 2006-2007 The Research Foundation of SUNY
657+ *
658+ * This program is free software; you can redistribute it and/or modify
659+ * it under the terms of the GNU General Public License version 2 as
660+ * published by the Free Software Foundation.
661+ */
662+
663 #include <linux/module.h>
664 #include <linux/fs.h>
665 #include <linux/fs_stack.h>
666
667-/* does _NOT_ require i_mutex to be held.
668+/*
669+ * does _NOT_ require i_mutex to be held.
670 *
671 * This function cannot be inlined since i_size_{read,write} is rather
672 * heavy-weight on 32-bit systems
673@@ -14,11 +26,11 @@ void fsstack_copy_inode_size(struct inode *dst, const struct inode *src)
674 }
675 EXPORT_SYMBOL_GPL(fsstack_copy_inode_size);
676
677-/* copy all attributes; get_nlinks is optional way to override the i_nlink
678+/*
679+ * copy all attributes; get_nlinks is optional way to override the i_nlink
680 * copying
681 */
682-void fsstack_copy_attr_all(struct inode *dest, const struct inode *src,
683- int (*get_nlinks)(struct inode *))
684+void fsstack_copy_attr_all(struct inode *dest, const struct inode *src)
685 {
686 dest->i_mode = src->i_mode;
687 dest->i_uid = src->i_uid;
688@@ -29,14 +41,6 @@ void fsstack_copy_attr_all(struct inode *dest, const struct inode *src,
689 dest->i_ctime = src->i_ctime;
690 dest->i_blkbits = src->i_blkbits;
691 dest->i_flags = src->i_flags;
692-
693- /*
694- * Update the nlinks AFTER updating the above fields, because the
695- * get_links callback may depend on them.
696- */
697- if (!get_nlinks)
698- dest->i_nlink = src->i_nlink;
699- else
700- dest->i_nlink = (*get_nlinks)(dest);
701+ dest->i_nlink = src->i_nlink;
702 }
703 EXPORT_SYMBOL_GPL(fsstack_copy_attr_all);
704diff --git a/fs/unionfs/Makefile b/fs/unionfs/Makefile
705new file mode 100644
706index 0000000..73a6bea
707--- /dev/null
708+++ b/fs/unionfs/Makefile
709@@ -0,0 +1,13 @@
710+UNIONFS_VERSION="2.1.4 (for 2.6.22.6)"
711+
712+EXTRA_CFLAGS += -DUNIONFS_VERSION=\"$(UNIONFS_VERSION)\"
713+
714+obj-$(CONFIG_UNION_FS) += unionfs.o
715+
716+unionfs-y := subr.o dentry.o file.o inode.o main.o super.o \
717+ rdstate.o copyup.o dirhelper.o rename.o unlink.o \
718+ lookup.o commonfops.o dirfops.o sioq.o mmap.o
719+
720+unionfs-$(CONFIG_UNION_FS_XATTR) += xattr.o
721+
722+unionfs-$(CONFIG_UNION_FS_DEBUG) += debug.o
723diff --git a/fs/unionfs/commonfops.c b/fs/unionfs/commonfops.c
724new file mode 100644
725index 0000000..e69ccf6
726--- /dev/null
727+++ b/fs/unionfs/commonfops.c
728@@ -0,0 +1,835 @@
729+/*
730+ * Copyright (c) 2003-2007 Erez Zadok
731+ * Copyright (c) 2003-2006 Charles P. Wright
732+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
733+ * Copyright (c) 2005-2006 Junjiro Okajima
734+ * Copyright (c) 2005 Arun M. Krishnakumar
735+ * Copyright (c) 2004-2006 David P. Quigley
736+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
737+ * Copyright (c) 2003 Puja Gupta
738+ * Copyright (c) 2003 Harikesavan Krishnan
739+ * Copyright (c) 2003-2007 Stony Brook University
740+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
741+ *
742+ * This program is free software; you can redistribute it and/or modify
743+ * it under the terms of the GNU General Public License version 2 as
744+ * published by the Free Software Foundation.
745+ */
746+
747+#include "union.h"
748+
749+/*
750+ * 1) Copyup the file
751+ * 2) Rename the file to '.unionfs<original inode#><counter>' - obviously
752+ * stolen from NFS's silly rename
753+ */
754+static int copyup_deleted_file(struct file *file, struct dentry *dentry,
755+ int bstart, int bindex)
756+{
757+ static unsigned int counter;
758+ const int i_inosize = sizeof(dentry->d_inode->i_ino) * 2;
759+ const int countersize = sizeof(counter) * 2;
760+ const int nlen = sizeof(".unionfs") + i_inosize + countersize - 1;
761+ char name[nlen + 1];
762+ int err;
763+ struct dentry *tmp_dentry = NULL;
764+ struct dentry *lower_dentry;
765+ struct dentry *lower_dir_dentry = NULL;
766+
767+ lower_dentry = unionfs_lower_dentry_idx(dentry, bstart);
768+
769+ sprintf(name, ".unionfs%*.*lx",
770+ i_inosize, i_inosize, lower_dentry->d_inode->i_ino);
771+
772+ /*
773+ * Loop, looking for an unused temp name to copyup to.
774+ *
775+ * It's somewhat silly that we look for a free temp tmp name in the
776+ * source branch (bstart) instead of the dest branch (bindex), where
777+ * the final name will be created. We _will_ catch it if somehow
778+ * the name exists in the dest branch, but it'd be nice to catch it
779+ * sooner than later.
780+ */
781+retry:
782+ tmp_dentry = NULL;
783+ do {
784+ char *suffix = name + nlen - countersize;
785+
786+ dput(tmp_dentry);
787+ counter++;
788+ sprintf(suffix, "%*.*x", countersize, countersize, counter);
789+
790+ printk(KERN_DEBUG "unionfs: trying to rename %s to %s\n",
791+ dentry->d_name.name, name);
792+
793+ tmp_dentry = lookup_one_len(name, lower_dentry->d_parent,
794+ nlen);
795+ if (IS_ERR(tmp_dentry)) {
796+ err = PTR_ERR(tmp_dentry);
797+ goto out;
798+ }
799+ } while (tmp_dentry->d_inode != NULL); /* need negative dentry */
800+ dput(tmp_dentry);
801+
802+ err = copyup_named_file(dentry->d_parent->d_inode, file, name, bstart,
803+ bindex, file->f_path.dentry->d_inode->i_size);
804+ if (err) {
805+ if (err == -EEXIST)
806+ goto retry;
807+ goto out;
808+ }
809+
810+ /* bring it to the same state as an unlinked file */
811+ lower_dentry = unionfs_lower_dentry_idx(dentry, dbstart(dentry));
812+ if (!unionfs_lower_inode_idx(dentry->d_inode, bindex)) {
813+ atomic_inc(&lower_dentry->d_inode->i_count);
814+ unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
815+ lower_dentry->d_inode);
816+ }
817+ lower_dir_dentry = lock_parent(lower_dentry);
818+ err = vfs_unlink(lower_dir_dentry->d_inode, lower_dentry);
819+ unlock_dir(lower_dir_dentry);
820+
821+out:
822+ if (!err)
823+ unionfs_check_dentry(dentry);
824+ return err;
825+}
826+
827+/*
828+ * put all references held by upper struct file and free lower file pointer
829+ * array
830+ */
831+static void cleanup_file(struct file *file)
832+{
833+ int bindex, bstart, bend;
834+ struct file **lower_files;
835+ struct file *lower_file;
836+ struct super_block *sb = file->f_path.dentry->d_sb;
837+
838+ lower_files = UNIONFS_F(file)->lower_files;
839+ bstart = fbstart(file);
840+ bend = fbend(file);
841+
842+ for (bindex = bstart; bindex <= bend; bindex++) {
843+ int i; /* holds (possibly) updated branch index */
844+ int old_bid;
845+
846+ lower_file = unionfs_lower_file_idx(file, bindex);
847+ if (!lower_file)
848+ continue;
849+
850+ /*
851+ * Find new index of matching branch with an open
852+ * file, since branches could have been added or
853+ * deleted causing the one with open files to shift.
854+ */
855+ old_bid = UNIONFS_F(file)->saved_branch_ids[bindex];
856+ i = branch_id_to_idx(sb, old_bid);
857+ if (i < 0) {
858+ printk(KERN_ERR "unionfs: no superblock for "
859+ "file %p\n", file);
860+ continue;
861+ }
862+
863+ /* decrement count of open files */
864+ branchput(sb, i);
865+ /*
866+ * fput will perform an mntput for us on the correct branch.
867+ * Although we're using the file's old branch configuration,
868+ * bindex, which is the old index, correctly points to the
869+ * right branch in the file's branch list. In other words,
870+ * we're going to mntput the correct branch even if branches
871+ * have been added/removed.
872+ */
873+ fput(lower_file);
874+ UNIONFS_F(file)->lower_files[bindex] = NULL;
875+ UNIONFS_F(file)->saved_branch_ids[bindex] = -1;
876+ }
877+
878+ UNIONFS_F(file)->lower_files = NULL;
879+ kfree(lower_files);
880+ kfree(UNIONFS_F(file)->saved_branch_ids);
881+ /* set to NULL because caller needs to know if to kfree on error */
882+ UNIONFS_F(file)->saved_branch_ids = NULL;
883+}
884+
885+/* open all lower files for a given file */
886+static int open_all_files(struct file *file)
887+{
888+ int bindex, bstart, bend, err = 0;
889+ struct file *lower_file;
890+ struct dentry *lower_dentry;
891+ struct dentry *dentry = file->f_path.dentry;
892+ struct super_block *sb = dentry->d_sb;
893+
894+ bstart = dbstart(dentry);
895+ bend = dbend(dentry);
896+
897+ for (bindex = bstart; bindex <= bend; bindex++) {
898+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
899+ if (!lower_dentry)
900+ continue;
901+
902+ dget(lower_dentry);
903+ unionfs_mntget(dentry, bindex);
904+ branchget(sb, bindex);
905+
906+ lower_file =
907+ dentry_open(lower_dentry,
908+ unionfs_lower_mnt_idx(dentry, bindex),
909+ file->f_flags);
910+ if (IS_ERR(lower_file)) {
911+ err = PTR_ERR(lower_file);
912+ goto out;
913+ } else
914+ unionfs_set_lower_file_idx(file, bindex, lower_file);
915+ }
916+out:
917+ return err;
918+}
919+
920+/* open the highest priority file for a given upper file */
921+static int open_highest_file(struct file *file, bool willwrite)
922+{
923+ int bindex, bstart, bend, err = 0;
924+ struct file *lower_file;
925+ struct dentry *lower_dentry;
926+ struct dentry *dentry = file->f_path.dentry;
927+ struct inode *parent_inode = dentry->d_parent->d_inode;
928+ struct super_block *sb = dentry->d_sb;
929+ size_t inode_size = dentry->d_inode->i_size;
930+
931+ bstart = dbstart(dentry);
932+ bend = dbend(dentry);
933+
934+ lower_dentry = unionfs_lower_dentry(dentry);
935+ if (willwrite && IS_WRITE_FLAG(file->f_flags) && is_robranch(dentry)) {
936+ for (bindex = bstart - 1; bindex >= 0; bindex--) {
937+ err = copyup_file(parent_inode, file, bstart, bindex,
938+ inode_size);
939+ if (!err)
940+ break;
941+ }
942+ atomic_set(&UNIONFS_F(file)->generation,
943+ atomic_read(&UNIONFS_I(dentry->d_inode)->
944+ generation));
945+ goto out;
946+ }
947+
948+ dget(lower_dentry);
949+ unionfs_mntget(dentry, bstart);
950+ lower_file = dentry_open(lower_dentry,
951+ unionfs_lower_mnt_idx(dentry, bstart),
952+ file->f_flags);
953+ if (IS_ERR(lower_file)) {
954+ err = PTR_ERR(lower_file);
955+ goto out;
956+ }
957+ branchget(sb, bstart);
958+ unionfs_set_lower_file(file, lower_file);
959+ /* Fix up the position. */
960+ lower_file->f_pos = file->f_pos;
961+
962+ memcpy(&lower_file->f_ra, &file->f_ra, sizeof(struct file_ra_state));
963+out:
964+ return err;
965+}
966+
967+/* perform a delayed copyup of a read-write file on a read-only branch */
968+static int do_delayed_copyup(struct file *file)
969+{
970+ int bindex, bstart, bend, err = 0;
971+ struct dentry *dentry = file->f_path.dentry;
972+ struct inode *parent_inode = dentry->d_parent->d_inode;
973+ loff_t inode_size = dentry->d_inode->i_size;
974+
975+ bstart = fbstart(file);
976+ bend = fbend(file);
977+
978+ BUG_ON(!S_ISREG(dentry->d_inode->i_mode));
979+
980+ unionfs_check_file(file);
981+ unionfs_check_dentry(dentry);
982+ for (bindex = bstart - 1; bindex >= 0; bindex--) {
983+ if (!d_deleted(dentry))
984+ err = copyup_file(parent_inode, file, bstart,
985+ bindex, inode_size);
986+ else
987+ err = copyup_deleted_file(file, dentry, bstart,
988+ bindex);
989+
990+ if (!err)
991+ break;
992+ }
993+ if (err || (bstart <= fbstart(file)))
994+ goto out;
995+ bend = fbend(file);
996+ for (bindex = bstart; bindex <= bend; bindex++) {
997+ if (unionfs_lower_file_idx(file, bindex)) {
998+ branchput(dentry->d_sb, bindex);
999+ fput(unionfs_lower_file_idx(file, bindex));
1000+ unionfs_set_lower_file_idx(file, bindex, NULL);
1001+ }
1002+ if (unionfs_lower_mnt_idx(dentry, bindex)) {
1003+ unionfs_mntput(dentry, bindex);
1004+ unionfs_set_lower_mnt_idx(dentry, bindex, NULL);
1005+ }
1006+ if (unionfs_lower_dentry_idx(dentry, bindex)) {
1007+ BUG_ON(!dentry->d_inode);
1008+ iput(unionfs_lower_inode_idx(dentry->d_inode, bindex));
1009+ unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
1010+ NULL);
1011+ dput(unionfs_lower_dentry_idx(dentry, bindex));
1012+ unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
1013+ }
1014+ }
1015+ /* for reg file, we only open it "once" */
1016+ fbend(file) = fbstart(file);
1017+ set_dbend(dentry, dbstart(dentry));
1018+ ibend(dentry->d_inode) = ibstart(dentry->d_inode);
1019+
1020+out:
1021+ unionfs_check_file(file);
1022+ unionfs_check_dentry(dentry);
1023+ return err;
1024+}
1025+
1026+/*
1027+ * Revalidate the struct file
1028+ * @file: file to revalidate
1029+ * @willwrite: true if caller may cause changes to the file; false otherwise.
1030+ */
1031+int unionfs_file_revalidate(struct file *file, bool willwrite)
1032+{
1033+ struct super_block *sb;
1034+ struct dentry *dentry;
1035+ int sbgen, fgen, dgen;
1036+ int bstart, bend;
1037+ int size;
1038+ int err = 0;
1039+
1040+ dentry = file->f_path.dentry;
1041+ unionfs_lock_dentry(dentry);
1042+ sb = dentry->d_sb;
1043+
1044+ /*
1045+ * First revalidate the dentry inside struct file,
1046+ * but not unhashed dentries.
1047+ */
1048+ if (!d_deleted(dentry) &&
1049+ !__unionfs_d_revalidate_chain(dentry, NULL, willwrite)) {
1050+ err = -ESTALE;
1051+ goto out_nofree;
1052+ }
1053+
1054+ sbgen = atomic_read(&UNIONFS_SB(sb)->generation);
1055+ dgen = atomic_read(&UNIONFS_D(dentry)->generation);
1056+ fgen = atomic_read(&UNIONFS_F(file)->generation);
1057+
1058+ BUG_ON(sbgen > dgen);
1059+
1060+ /*
1061+ * There are two cases we are interested in. The first is if the
1062+ * generation is lower than the super-block. The second is if
1063+ * someone has copied up this file from underneath us, we also need
1064+ * to refresh things.
1065+ */
1066+ if (!d_deleted(dentry) &&
1067+ (sbgen > fgen || dbstart(dentry) != fbstart(file))) {
1068+ /* save orig branch ID */
1069+ int orig_brid = UNIONFS_F(file)->saved_branch_ids[fbstart(file)];
1070+
1071+ /* First we throw out the existing files. */
1072+ cleanup_file(file);
1073+
1074+ /* Now we reopen the file(s) as in unionfs_open. */
1075+ bstart = fbstart(file) = dbstart(dentry);
1076+ bend = fbend(file) = dbend(dentry);
1077+
1078+ size = sizeof(struct file *) * sbmax(sb);
1079+ UNIONFS_F(file)->lower_files = kzalloc(size, GFP_KERNEL);
1080+ if (!UNIONFS_F(file)->lower_files) {
1081+ err = -ENOMEM;
1082+ goto out;
1083+ }
1084+ size = sizeof(int) * sbmax(sb);
1085+ UNIONFS_F(file)->saved_branch_ids = kzalloc(size, GFP_KERNEL);
1086+ if (!UNIONFS_F(file)->saved_branch_ids) {
1087+ err = -ENOMEM;
1088+ goto out;
1089+ }
1090+
1091+ if (S_ISDIR(dentry->d_inode->i_mode)) {
1092+ /* We need to open all the files. */
1093+ err = open_all_files(file);
1094+ if (err)
1095+ goto out;
1096+ } else {
1097+ int new_brid;
1098+ /* We only open the highest priority branch. */
1099+ err = open_highest_file(file, willwrite);
1100+ if (err)
1101+ goto out;
1102+ new_brid = UNIONFS_F(file)->
1103+ saved_branch_ids[fbstart(file)];
1104+ if (new_brid != orig_brid && sbgen > fgen) {
1105+ /*
1106+ * If we re-opened the file on a different
1107+ * branch than the original one, and this
1108+ * was due to a new branch inserted, then
1109+ * update the mnt counts of the old and new
1110+ * branches accordingly.
1111+ */
1112+ unionfs_mntget(dentry, bstart);
1113+ unionfs_mntput(sb->s_root,
1114+ branch_id_to_idx(sb, orig_brid));
1115+ }
1116+ }
1117+ atomic_set(&UNIONFS_F(file)->generation,
1118+ atomic_read(&UNIONFS_I(dentry->d_inode)->generation));
1119+ }
1120+
1121+ /* Copyup on the first write to a file on a readonly branch. */
1122+ if (willwrite && IS_WRITE_FLAG(file->f_flags) &&
1123+ !IS_WRITE_FLAG(unionfs_lower_file(file)->f_flags) &&
1124+ is_robranch(dentry)) {
1125+ dprintk(KERN_DEBUG "unionfs: do delay copyup of \"%s\"\n",
1126+ dentry->d_name.name);
1127+ err = do_delayed_copyup(file);
1128+ }
1129+
1130+out:
1131+ if (err) {
1132+ kfree(UNIONFS_F(file)->lower_files);
1133+ kfree(UNIONFS_F(file)->saved_branch_ids);
1134+ }
1135+out_nofree:
1136+ if (!err)
1137+ unionfs_check_file(file);
1138+ unionfs_unlock_dentry(dentry);
1139+ return err;
1140+}
1141+
1142+/* unionfs_open helper function: open a directory */
1143+static int __open_dir(struct inode *inode, struct file *file)
1144+{
1145+ struct dentry *lower_dentry;
1146+ struct file *lower_file;
1147+ int bindex, bstart, bend;
1148+
1149+ bstart = fbstart(file) = dbstart(file->f_path.dentry);
1150+ bend = fbend(file) = dbend(file->f_path.dentry);
1151+
1152+ for (bindex = bstart; bindex <= bend; bindex++) {
1153+ lower_dentry =
1154+ unionfs_lower_dentry_idx(file->f_path.dentry, bindex);
1155+ if (!lower_dentry)
1156+ continue;
1157+
1158+ dget(lower_dentry);
1159+ unionfs_mntget(file->f_path.dentry, bindex);
1160+ lower_file = dentry_open(lower_dentry,
1161+ unionfs_lower_mnt_idx(file->f_path.dentry,
1162+ bindex),
1163+ file->f_flags);
1164+ if (IS_ERR(lower_file))
1165+ return PTR_ERR(lower_file);
1166+
1167+ unionfs_set_lower_file_idx(file, bindex, lower_file);
1168+
1169+ /*
1170+ * The branchget goes after the open, because otherwise
1171+ * we would miss the reference on release.
1172+ */
1173+ branchget(inode->i_sb, bindex);
1174+ }
1175+
1176+ return 0;
1177+}
1178+
1179+/* unionfs_open helper function: open a file */
1180+static int __open_file(struct inode *inode, struct file *file)
1181+{
1182+ struct dentry *lower_dentry;
1183+ struct file *lower_file;
1184+ int lower_flags;
1185+ int bindex, bstart, bend;
1186+
1187+ lower_dentry = unionfs_lower_dentry(file->f_path.dentry);
1188+ lower_flags = file->f_flags;
1189+
1190+ bstart = fbstart(file) = dbstart(file->f_path.dentry);
1191+ bend = fbend(file) = dbend(file->f_path.dentry);
1192+
1193+ /*
1194+ * check for the permission for lower file. If the error is
1195+ * COPYUP_ERR, copyup the file.
1196+ */
1197+ if (lower_dentry->d_inode && is_robranch(file->f_path.dentry)) {
1198+ /*
1199+ * if the open will change the file, copy it up otherwise
1200+ * defer it.
1201+ */
1202+ if (lower_flags & O_TRUNC) {
1203+ int size = 0;
1204+ int err = -EROFS;
1205+
1206+ /* copyup the file */
1207+ for (bindex = bstart - 1; bindex >= 0; bindex--) {
1208+ err = copyup_file(
1209+ file->f_path.dentry->d_parent->d_inode,
1210+ file, bstart, bindex, size);
1211+ if (!err)
1212+ break;
1213+ }
1214+ return err;
1215+ } else
1216+ lower_flags &= ~(OPEN_WRITE_FLAGS);
1217+ }
1218+
1219+ dget(lower_dentry);
1220+
1221+ /*
1222+ * dentry_open will decrement mnt refcnt if err.
1223+ * otherwise fput() will do an mntput() for us upon file close.
1224+ */
1225+ unionfs_mntget(file->f_path.dentry, bstart);
1226+ lower_file =
1227+ dentry_open(lower_dentry,
1228+ unionfs_lower_mnt_idx(file->f_path.dentry, bstart),
1229+ lower_flags);
1230+ if (IS_ERR(lower_file))
1231+ return PTR_ERR(lower_file);
1232+
1233+ unionfs_set_lower_file(file, lower_file);
1234+ branchget(inode->i_sb, bstart);
1235+
1236+ return 0;
1237+}
1238+
1239+int unionfs_open(struct inode *inode, struct file *file)
1240+{
1241+ int err = 0;
1242+ struct file *lower_file = NULL;
1243+ struct dentry *dentry = NULL;
1244+ int bindex = 0, bstart = 0, bend = 0;
1245+ int size;
1246+
1247+ unionfs_read_lock(inode->i_sb);
1248+
1249+ file->private_data =
1250+ kzalloc(sizeof(struct unionfs_file_info), GFP_KERNEL);
1251+ if (!UNIONFS_F(file)) {
1252+ err = -ENOMEM;
1253+ goto out_nofree;
1254+ }
1255+ fbstart(file) = -1;
1256+ fbend(file) = -1;
1257+ atomic_set(&UNIONFS_F(file)->generation,
1258+ atomic_read(&UNIONFS_I(inode)->generation));
1259+
1260+ size = sizeof(struct file *) * sbmax(inode->i_sb);
1261+ UNIONFS_F(file)->lower_files = kzalloc(size, GFP_KERNEL);
1262+ if (!UNIONFS_F(file)->lower_files) {
1263+ err = -ENOMEM;
1264+ goto out;
1265+ }
1266+ size = sizeof(int) * sbmax(inode->i_sb);
1267+ UNIONFS_F(file)->saved_branch_ids = kzalloc(size, GFP_KERNEL);
1268+ if (!UNIONFS_F(file)->saved_branch_ids) {
1269+ err = -ENOMEM;
1270+ goto out;
1271+ }
1272+
1273+ dentry = file->f_path.dentry;
1274+ unionfs_lock_dentry(dentry);
1275+
1276+ bstart = fbstart(file) = dbstart(dentry);
1277+ bend = fbend(file) = dbend(dentry);
1278+
1279+ /* increment, so that we can flush appropriately */
1280+ atomic_inc(&UNIONFS_I(dentry->d_inode)->totalopens);
1281+
1282+ /*
1283+ * open all directories and make the unionfs file struct point to
1284+ * these lower file structs
1285+ */
1286+ if (S_ISDIR(inode->i_mode))
1287+ err = __open_dir(inode, file); /* open a dir */
1288+ else
1289+ err = __open_file(inode, file); /* open a file */
1290+
1291+ /* freeing the allocated resources, and fput the opened files */
1292+ if (err) {
1293+ atomic_dec(&UNIONFS_I(dentry->d_inode)->totalopens);
1294+ for (bindex = bstart; bindex <= bend; bindex++) {
1295+ lower_file = unionfs_lower_file_idx(file, bindex);
1296+ if (!lower_file)
1297+ continue;
1298+
1299+ branchput(file->f_path.dentry->d_sb, bindex);
1300+ /* fput calls dput for lower_dentry */
1301+ fput(lower_file);
1302+ }
1303+ }
1304+
1305+ unionfs_unlock_dentry(dentry);
1306+
1307+out:
1308+ if (err) {
1309+ kfree(UNIONFS_F(file)->lower_files);
1310+ kfree(UNIONFS_F(file)->saved_branch_ids);
1311+ kfree(UNIONFS_F(file));
1312+ }
1313+out_nofree:
1314+ unionfs_read_unlock(inode->i_sb);
1315+ unionfs_check_inode(inode);
1316+ if (!err) {
1317+ unionfs_check_file(file);
1318+ unionfs_check_dentry(file->f_path.dentry->d_parent);
1319+ }
1320+ return err;
1321+}
1322+
1323+/*
1324+ * release all lower object references & free the file info structure
1325+ *
1326+ * No need to grab sb info's rwsem.
1327+ */
1328+int unionfs_file_release(struct inode *inode, struct file *file)
1329+{
1330+ struct file *lower_file = NULL;
1331+ struct unionfs_file_info *fileinfo;
1332+ struct unionfs_inode_info *inodeinfo;
1333+ struct super_block *sb = inode->i_sb;
1334+ int bindex, bstart, bend;
1335+ int fgen, err = 0;
1336+
1337+ unionfs_read_lock(sb);
1338+ /*
1339+ * Yes, we have to revalidate this file even if it's being released.
1340+ * This is important for open-but-unlinked files, as well as mmap
1341+ * support.
1342+ */
1343+ if ((err = unionfs_file_revalidate(file, true)))
1344+ goto out;
1345+ unionfs_check_file(file);
1346+ fileinfo = UNIONFS_F(file);
1347+ BUG_ON(file->f_path.dentry->d_inode != inode);
1348+ inodeinfo = UNIONFS_I(inode);
1349+
1350+ /* fput all the lower files */
1351+ fgen = atomic_read(&fileinfo->generation);
1352+ bstart = fbstart(file);
1353+ bend = fbend(file);
1354+
1355+ for (bindex = bstart; bindex <= bend; bindex++) {
1356+ lower_file = unionfs_lower_file_idx(file, bindex);
1357+
1358+ if (lower_file) {
1359+ fput(lower_file);
1360+ branchput(sb, bindex);
1361+ }
1362+ }
1363+ kfree(fileinfo->lower_files);
1364+ kfree(fileinfo->saved_branch_ids);
1365+
1366+ if (fileinfo->rdstate) {
1367+ fileinfo->rdstate->access = jiffies;
1368+ printk(KERN_DEBUG "unionfs: saving rdstate with cookie "
1369+ "%u [%d.%lld]\n",
1370+ fileinfo->rdstate->cookie,
1371+ fileinfo->rdstate->bindex,
1372+ (long long)fileinfo->rdstate->dirpos);
1373+ spin_lock(&inodeinfo->rdlock);
1374+ inodeinfo->rdcount++;
1375+ list_add_tail(&fileinfo->rdstate->cache,
1376+ &inodeinfo->readdircache);
1377+ mark_inode_dirty(inode);
1378+ spin_unlock(&inodeinfo->rdlock);
1379+ fileinfo->rdstate = NULL;
1380+ }
1381+ kfree(fileinfo);
1382+
1383+out:
1384+ unionfs_read_unlock(sb);
1385+ return err;
1386+}
1387+
1388+/* pass the ioctl to the lower fs */
1389+static long do_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
1390+{
1391+ struct file *lower_file;
1392+ int err;
1393+
1394+ lower_file = unionfs_lower_file(file);
1395+
1396+ err = security_file_ioctl(lower_file, cmd, arg);
1397+ if (err)
1398+ goto out;
1399+
1400+ err = -ENOTTY;
1401+ if (!lower_file || !lower_file->f_op)
1402+ goto out;
1403+ if (lower_file->f_op->unlocked_ioctl) {
1404+ err = lower_file->f_op->unlocked_ioctl(lower_file, cmd, arg);
1405+ } else if (lower_file->f_op->ioctl) {
1406+ lock_kernel();
1407+ err = lower_file->f_op->ioctl(lower_file->f_path.dentry->d_inode,
1408+ lower_file, cmd, arg);
1409+ unlock_kernel();
1410+ }
1411+
1412+out:
1413+ return err;
1414+}
1415+
1416+/*
1417+ * return to user-space the branch indices containing the file in question
1418+ *
1419+ * We use fd_set and therefore we are limited to the number of the branches
1420+ * to FD_SETSIZE, which is currently 1024 - plenty for most people
1421+ */
1422+static int unionfs_ioctl_queryfile(struct file *file, unsigned int cmd,
1423+ unsigned long arg)
1424+{
1425+ int err = 0;
1426+ fd_set branchlist;
1427+ int bstart = 0, bend = 0, bindex = 0;
1428+ int orig_bstart, orig_bend;
1429+ struct dentry *dentry, *lower_dentry;
1430+ struct vfsmount *mnt;
1431+
1432+ dentry = file->f_path.dentry;
1433+ unionfs_lock_dentry(dentry);
1434+ orig_bstart = dbstart(dentry);
1435+ orig_bend = dbend(dentry);
1436+ if ((err = unionfs_partial_lookup(dentry)))
1437+ goto out;
1438+ bstart = dbstart(dentry);
1439+ bend = dbend(dentry);
1440+
1441+ FD_ZERO(&branchlist);
1442+
1443+ for (bindex = bstart; bindex <= bend; bindex++) {
1444+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
1445+ if (!lower_dentry)
1446+ continue;
1447+ if (lower_dentry->d_inode)
1448+ FD_SET(bindex, &branchlist);
1449+ /* purge any lower objects after partial_lookup */
1450+ if (bindex < orig_bstart || bindex > orig_bend) {
1451+ dput(lower_dentry);
1452+ unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
1453+ iput(unionfs_lower_inode_idx(dentry->d_inode, bindex));
1454+ unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
1455+ NULL);
1456+ mnt = unionfs_lower_mnt_idx(dentry, bindex);
1457+ if (!mnt)
1458+ continue;
1459+ unionfs_mntput(dentry, bindex);
1460+ unionfs_set_lower_mnt_idx(dentry, bindex, NULL);
1461+ }
1462+ }
1463+ /* restore original dentry's offsets */
1464+ set_dbstart(dentry, orig_bstart);
1465+ set_dbend(dentry, orig_bend);
1466+ ibstart(dentry->d_inode) = orig_bstart;
1467+ ibend(dentry->d_inode) = orig_bend;
1468+
1469+ err = copy_to_user((void __user *)arg, &branchlist, sizeof(fd_set));
1470+ if (err)
1471+ err = -EFAULT;
1472+
1473+out:
1474+ unionfs_unlock_dentry(dentry);
1475+ return err < 0 ? err : bend;
1476+}
1477+
1478+long unionfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
1479+{
1480+ long err;
1481+
1482+ unionfs_read_lock(file->f_path.dentry->d_sb);
1483+
1484+ if ((err = unionfs_file_revalidate(file, true)))
1485+ goto out;
1486+
1487+ /* check if asked for local commands */
1488+ switch (cmd) {
1489+ case UNIONFS_IOCTL_INCGEN:
1490+ /* Increment the superblock generation count */
1491+ printk("unionfs: incgen ioctl deprecated; "
1492+ "use \"-o remount,incgen\"\n");
1493+ err = -ENOSYS;
1494+ break;
1495+
1496+ case UNIONFS_IOCTL_QUERYFILE:
1497+ /* Return list of branches containing the given file */
1498+ err = unionfs_ioctl_queryfile(file, cmd, arg);
1499+ break;
1500+
1501+ default:
1502+ /* pass the ioctl down */
1503+ err = do_ioctl(file, cmd, arg);
1504+ break;
1505+ }
1506+
1507+out:
1508+ unionfs_read_unlock(file->f_path.dentry->d_sb);
1509+ unionfs_check_file(file);
1510+ return err;
1511+}
1512+
1513+int unionfs_flush(struct file *file, fl_owner_t id)
1514+{
1515+ int err = 0;
1516+ struct file *lower_file = NULL;
1517+ struct dentry *dentry = file->f_path.dentry;
1518+ int bindex, bstart, bend;
1519+
1520+ unionfs_read_lock(dentry->d_sb);
1521+
1522+ if ((err = unionfs_file_revalidate(file, true)))
1523+ goto out;
1524+ unionfs_check_file(file);
1525+
1526+ if (!atomic_dec_and_test(&UNIONFS_I(dentry->d_inode)->totalopens))
1527+ goto out;
1528+
1529+ unionfs_lock_dentry(dentry);
1530+
1531+ bstart = fbstart(file);
1532+ bend = fbend(file);
1533+ for (bindex = bstart; bindex <= bend; bindex++) {
1534+ lower_file = unionfs_lower_file_idx(file, bindex);
1535+
1536+ if (lower_file && lower_file->f_op &&
1537+ lower_file->f_op->flush) {
1538+ err = lower_file->f_op->flush(lower_file, id);
1539+ if (err)
1540+ goto out_lock;
1541+
1542+ /* if there are no more refs to the dentry, dput it */
1543+ if (d_deleted(dentry)) {
1544+ dput(unionfs_lower_dentry_idx(dentry, bindex));
1545+ unionfs_set_lower_dentry_idx(dentry, bindex,
1546+ NULL);
1547+ }
1548+ }
1549+
1550+ }
1551+
1552+ /* on success, update our times */
1553+ unionfs_copy_attr_times(dentry->d_inode);
1554+ /* parent time could have changed too (async) */
1555+ unionfs_copy_attr_times(dentry->d_parent->d_inode);
1556+
1557+out_lock:
1558+ unionfs_unlock_dentry(dentry);
1559+out:
1560+ unionfs_read_unlock(dentry->d_sb);
1561+ unionfs_check_file(file);
1562+ return err;
1563+}
1564diff --git a/fs/unionfs/copyup.c b/fs/unionfs/copyup.c
1565new file mode 100644
1566index 0000000..04bedb1
1567--- /dev/null
1568+++ b/fs/unionfs/copyup.c
1569@@ -0,0 +1,888 @@
1570+/*
1571+ * Copyright (c) 2003-2007 Erez Zadok
1572+ * Copyright (c) 2003-2006 Charles P. Wright
1573+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
1574+ * Copyright (c) 2005-2006 Junjiro Okajima
1575+ * Copyright (c) 2005 Arun M. Krishnakumar
1576+ * Copyright (c) 2004-2006 David P. Quigley
1577+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
1578+ * Copyright (c) 2003 Puja Gupta
1579+ * Copyright (c) 2003 Harikesavan Krishnan
1580+ * Copyright (c) 2003-2007 Stony Brook University
1581+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
1582+ *
1583+ * This program is free software; you can redistribute it and/or modify
1584+ * it under the terms of the GNU General Public License version 2 as
1585+ * published by the Free Software Foundation.
1586+ */
1587+
1588+#include "union.h"
1589+
1590+/*
1591+ * For detailed explanation of copyup see:
1592+ * Documentation/filesystems/unionfs/concepts.txt
1593+ */
1594+
1595+#ifdef CONFIG_UNION_FS_XATTR
1596+/* copyup all extended attrs for a given dentry */
1597+static int copyup_xattrs(struct dentry *old_lower_dentry,
1598+ struct dentry *new_lower_dentry)
1599+{
1600+ int err = 0;
1601+ ssize_t list_size = -1;
1602+ char *name_list = NULL;
1603+ char *attr_value = NULL;
1604+ char *name_list_buf = NULL;
1605+
1606+ /* query the actual size of the xattr list */
1607+ list_size = vfs_listxattr(old_lower_dentry, NULL, 0);
1608+ if (list_size <= 0) {
1609+ err = list_size;
1610+ goto out;
1611+ }
1612+
1613+ /* allocate space for the actual list */
1614+ name_list = unionfs_xattr_alloc(list_size + 1, XATTR_LIST_MAX);
1615+ if (!name_list || IS_ERR(name_list)) {
1616+ err = PTR_ERR(name_list);
1617+ goto out;
1618+ }
1619+
1620+ name_list_buf = name_list; /* save for kfree at end */
1621+
1622+ /* now get the actual xattr list of the source file */
1623+ list_size = vfs_listxattr(old_lower_dentry, name_list, list_size);
1624+ if (list_size <= 0) {
1625+ err = list_size;
1626+ goto out;
1627+ }
1628+
1629+ /* allocate space to hold each xattr's value */
1630+ attr_value = unionfs_xattr_alloc(XATTR_SIZE_MAX, XATTR_SIZE_MAX);
1631+ if (!attr_value || IS_ERR(attr_value)) {
1632+ err = PTR_ERR(name_list);
1633+ goto out;
1634+ }
1635+
1636+ /* in a loop, get and set each xattr from src to dst file */
1637+ while (*name_list) {
1638+ ssize_t size;
1639+
1640+ /* Lock here since vfs_getxattr doesn't lock for us */
1641+ mutex_lock(&old_lower_dentry->d_inode->i_mutex);
1642+ size = vfs_getxattr(old_lower_dentry, name_list,
1643+ attr_value, XATTR_SIZE_MAX);
1644+ mutex_unlock(&old_lower_dentry->d_inode->i_mutex);
1645+ if (size < 0) {
1646+ err = size;
1647+ goto out;
1648+ }
1649+ if (size > XATTR_SIZE_MAX) {
1650+ err = -E2BIG;
1651+ goto out;
1652+ }
1653+ /* Don't lock here since vfs_setxattr does it for us. */
1654+ err = vfs_setxattr(new_lower_dentry, name_list, attr_value,
1655+ size, 0);
1656+ /*
1657+ * Selinux depends on "security.*" xattrs, so to maintain
1658+ * the security of copied-up files, if Selinux is active,
1659+ * then we must copy these xattrs as well. So we need to
1660+ * temporarily get FOWNER privileges.
1661+ * XXX: move entire copyup code to SIOQ.
1662+ */
1663+ if (err == -EPERM && !capable(CAP_FOWNER)) {
1664+ cap_raise(current->cap_effective, CAP_FOWNER);
1665+ err = vfs_setxattr(new_lower_dentry, name_list,
1666+ attr_value, size, 0);
1667+ cap_lower(current->cap_effective, CAP_FOWNER);
1668+ }
1669+ if (err < 0)
1670+ goto out;
1671+ name_list += strlen(name_list) + 1;
1672+ }
1673+out:
1674+ unionfs_xattr_kfree(name_list_buf);
1675+ unionfs_xattr_kfree(attr_value);
1676+ /* Ignore if xattr isn't supported */
1677+ if (err == -ENOTSUPP || err == -EOPNOTSUPP)
1678+ err = 0;
1679+ return err;
1680+}
1681+#endif /* CONFIG_UNION_FS_XATTR */
1682+
1683+/*
1684+ * Determine the mode based on the copyup flags, and the existing dentry.
1685+ *
1686+ * Handle file systems which may not support certain options. For example
1687+ * jffs2 doesn't allow one to chmod a symlink. So we ignore such harmless
1688+ * errors, rather than propagating them up, which results in copyup errors
1689+ * and errors returned back to users.
1690+ */
1691+static int copyup_permissions(struct super_block *sb,
1692+ struct dentry *old_lower_dentry,
1693+ struct dentry *new_lower_dentry)
1694+{
1695+ struct inode *i = old_lower_dentry->d_inode;
1696+ struct iattr newattrs;
1697+ int err;
1698+
1699+ newattrs.ia_atime = i->i_atime;
1700+ newattrs.ia_mtime = i->i_mtime;
1701+ newattrs.ia_ctime = i->i_ctime;
1702+ newattrs.ia_gid = i->i_gid;
1703+ newattrs.ia_uid = i->i_uid;
1704+ newattrs.ia_valid = ATTR_CTIME | ATTR_ATIME | ATTR_MTIME |
1705+ ATTR_ATIME_SET | ATTR_MTIME_SET | ATTR_FORCE |
1706+ ATTR_GID | ATTR_UID;
1707+ err = notify_change(new_lower_dentry, &newattrs);
1708+ if (err)
1709+ goto out;
1710+
1711+ /* now try to change the mode and ignore EOPNOTSUPP on symlinks */
1712+ newattrs.ia_mode = i->i_mode;
1713+ newattrs.ia_valid = ATTR_MODE | ATTR_FORCE;
1714+ err = notify_change(new_lower_dentry, &newattrs);
1715+ if (err == -EOPNOTSUPP &&
1716+ S_ISLNK(new_lower_dentry->d_inode->i_mode)) {
1717+ printk(KERN_WARNING
1718+ "unionfs: changing \"%s\" symlink mode unsupported\n",
1719+ new_lower_dentry->d_name.name);
1720+ err = 0;
1721+ }
1722+
1723+out:
1724+ return err;
1725+}
1726+
1727+/*
1728+ * create the new device/file/directory - use copyup_permission to copyup
1729+ * times, and mode
1730+ *
1731+ * if the object being copied up is a regular file, the file is only created,
1732+ * the contents have to be copied up separately
1733+ */
1734+static int __copyup_ndentry(struct dentry *old_lower_dentry,
1735+ struct dentry *new_lower_dentry,
1736+ struct dentry *new_lower_parent_dentry,
1737+ char *symbuf)
1738+{
1739+ int err = 0;
1740+ umode_t old_mode = old_lower_dentry->d_inode->i_mode;
1741+ struct sioq_args args;
1742+
1743+ if (S_ISDIR(old_mode)) {
1744+ args.mkdir.parent = new_lower_parent_dentry->d_inode;
1745+ args.mkdir.dentry = new_lower_dentry;
1746+ args.mkdir.mode = old_mode;
1747+
1748+ run_sioq(__unionfs_mkdir, &args);
1749+ err = args.err;
1750+ } else if (S_ISLNK(old_mode)) {
1751+ args.symlink.parent = new_lower_parent_dentry->d_inode;
1752+ args.symlink.dentry = new_lower_dentry;
1753+ args.symlink.symbuf = symbuf;
1754+ args.symlink.mode = old_mode;
1755+
1756+ run_sioq(__unionfs_symlink, &args);
1757+ err = args.err;
1758+ } else if (S_ISBLK(old_mode) || S_ISCHR(old_mode) ||
1759+ S_ISFIFO(old_mode) || S_ISSOCK(old_mode)) {
1760+ args.mknod.parent = new_lower_parent_dentry->d_inode;
1761+ args.mknod.dentry = new_lower_dentry;
1762+ args.mknod.mode = old_mode;
1763+ args.mknod.dev = old_lower_dentry->d_inode->i_rdev;
1764+
1765+ run_sioq(__unionfs_mknod, &args);
1766+ err = args.err;
1767+ } else if (S_ISREG(old_mode)) {
1768+ args.create.parent = new_lower_parent_dentry->d_inode;
1769+ args.create.dentry = new_lower_dentry;
1770+ args.create.mode = old_mode;
1771+ args.create.nd = NULL;
1772+
1773+ run_sioq(__unionfs_create, &args);
1774+ err = args.err;
1775+ } else {
1776+ printk(KERN_ERR "unionfs: unknown inode type %d\n",
1777+ old_mode);
1778+ BUG();
1779+ }
1780+
1781+ return err;
1782+}
1783+
1784+static int __copyup_reg_data(struct dentry *dentry,
1785+ struct dentry *new_lower_dentry, int new_bindex,
1786+ struct dentry *old_lower_dentry, int old_bindex,
1787+ struct file **copyup_file, loff_t len)
1788+{
1789+ struct super_block *sb = dentry->d_sb;
1790+ struct file *input_file;
1791+ struct file *output_file;
1792+ struct vfsmount *output_mnt;
1793+ mm_segment_t old_fs;
1794+ char *buf = NULL;
1795+ ssize_t read_bytes, write_bytes;
1796+ loff_t size;
1797+ int err = 0;
1798+
1799+ /* open old file */
1800+ unionfs_mntget(dentry, old_bindex);
1801+ branchget(sb, old_bindex);
1802+ /* dentry_open calls dput and mntput if it returns an error */
1803+ input_file = dentry_open(old_lower_dentry,
1804+ unionfs_lower_mnt_idx(dentry, old_bindex),
1805+ O_RDONLY | O_LARGEFILE);
1806+ if (IS_ERR(input_file)) {
1807+ dput(old_lower_dentry);
1808+ err = PTR_ERR(input_file);
1809+ goto out;
1810+ }
1811+ if (!input_file->f_op || !input_file->f_op->read) {
1812+ err = -EINVAL;
1813+ goto out_close_in;
1814+ }
1815+
1816+ /* open new file */
1817+ dget(new_lower_dentry);
1818+ output_mnt = unionfs_mntget(sb->s_root, new_bindex);
1819+ branchget(sb, new_bindex);
1820+ output_file = dentry_open(new_lower_dentry, output_mnt,
1821+ O_RDWR | O_LARGEFILE);
1822+ if (IS_ERR(output_file)) {
1823+ err = PTR_ERR(output_file);
1824+ goto out_close_in2;
1825+ }
1826+ if (!output_file->f_op || !output_file->f_op->write) {
1827+ err = -EINVAL;
1828+ goto out_close_out;
1829+ }
1830+
1831+ /* allocating a buffer */
1832+ buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
1833+ if (!buf) {
1834+ err = -ENOMEM;
1835+ goto out_close_out;
1836+ }
1837+
1838+ input_file->f_pos = 0;
1839+ output_file->f_pos = 0;
1840+
1841+ old_fs = get_fs();
1842+ set_fs(KERNEL_DS);
1843+
1844+ size = len;
1845+ err = 0;
1846+ do {
1847+ if (len >= PAGE_SIZE)
1848+ size = PAGE_SIZE;
1849+ else if ((len < PAGE_SIZE) && (len > 0))
1850+ size = len;
1851+
1852+ len -= PAGE_SIZE;
1853+
1854+ read_bytes =
1855+ input_file->f_op->read(input_file,
1856+ (char __user *)buf, size,
1857+ &input_file->f_pos);
1858+ if (read_bytes <= 0) {
1859+ err = read_bytes;
1860+ break;
1861+ }
1862+
1863+ write_bytes =
1864+ output_file->f_op->write(output_file,
1865+ (char __user *)buf,
1866+ read_bytes,
1867+ &output_file->f_pos);
1868+ if ((write_bytes < 0) || (write_bytes < read_bytes)) {
1869+ err = write_bytes;
1870+ break;
1871+ }
1872+ } while ((read_bytes > 0) && (len > 0));
1873+
1874+ set_fs(old_fs);
1875+
1876+ kfree(buf);
1877+
1878+ if (!err)
1879+ err = output_file->f_op->fsync(output_file,
1880+ new_lower_dentry, 0);
1881+
1882+ if (err)
1883+ goto out_close_out;
1884+
1885+ if (copyup_file) {
1886+ *copyup_file = output_file;
1887+ goto out_close_in;
1888+ }
1889+
1890+out_close_out:
1891+ fput(output_file);
1892+
1893+out_close_in2:
1894+ branchput(sb, new_bindex);
1895+
1896+out_close_in:
1897+ fput(input_file);
1898+
1899+out:
1900+ branchput(sb, old_bindex);
1901+
1902+ return err;
1903+}
1904+
1905+/*
1906+ * dput the lower references for old and new dentry & clear a lower dentry
1907+ * pointer
1908+ */
1909+static void __clear(struct dentry *dentry, struct dentry *old_lower_dentry,
1910+ int old_bstart, int old_bend,
1911+ struct dentry *new_lower_dentry, int new_bindex)
1912+{
1913+ /* get rid of the lower dentry and all its traces */
1914+ unionfs_set_lower_dentry_idx(dentry, new_bindex, NULL);
1915+ set_dbstart(dentry, old_bstart);
1916+ set_dbend(dentry, old_bend);
1917+
1918+ dput(new_lower_dentry);
1919+ dput(old_lower_dentry);
1920+}
1921+
1922+/*
1923+ * Copy up a dentry to a file of specified name.
1924+ *
1925+ * @dir: used to pull the ->i_sb to access other branches
1926+ * @dentry: the non-negative dentry whose lower_inode we should copy
1927+ * @bstart: the branch of the lower_inode to copy from
1928+ * @new_bindex: the branch to create the new file in
1929+ * @name: the name of the file to create
1930+ * @namelen: length of @name
1931+ * @copyup_file: the "struct file" to return (optional)
1932+ * @len: how many bytes to copy-up?
1933+ */
1934+int copyup_dentry(struct inode *dir, struct dentry *dentry, int bstart,
1935+ int new_bindex, const char *name, int namelen,
1936+ struct file **copyup_file, loff_t len)
1937+{
1938+ struct dentry *new_lower_dentry;
1939+ struct dentry *old_lower_dentry = NULL;
1940+ struct super_block *sb;
1941+ int err = 0;
1942+ int old_bindex;
1943+ int old_bstart;
1944+ int old_bend;
1945+ struct dentry *new_lower_parent_dentry = NULL;
1946+ mm_segment_t oldfs;
1947+ char *symbuf = NULL;
1948+
1949+ verify_locked(dentry);
1950+
1951+ old_bindex = bstart;
1952+ old_bstart = dbstart(dentry);
1953+ old_bend = dbend(dentry);
1954+
1955+ BUG_ON(new_bindex < 0);
1956+ BUG_ON(new_bindex >= old_bindex);
1957+
1958+ sb = dir->i_sb;
1959+
1960+ if ((err = is_robranch_super(sb, new_bindex)))
1961+ goto out;
1962+
1963+ /* Create the directory structure above this dentry. */
1964+ new_lower_dentry = create_parents(dir, dentry, name, new_bindex);
1965+ if (IS_ERR(new_lower_dentry)) {
1966+ err = PTR_ERR(new_lower_dentry);
1967+ goto out;
1968+ }
1969+
1970+ old_lower_dentry = unionfs_lower_dentry_idx(dentry, old_bindex);
1971+ /* we conditionally dput this old_lower_dentry at end of function */
1972+ dget(old_lower_dentry);
1973+
1974+ /* For symlinks, we must read the link before we lock the directory. */
1975+ if (S_ISLNK(old_lower_dentry->d_inode->i_mode)) {
1976+
1977+ symbuf = kmalloc(PATH_MAX, GFP_KERNEL);
1978+ if (!symbuf) {
1979+ __clear(dentry, old_lower_dentry,
1980+ old_bstart, old_bend,
1981+ new_lower_dentry, new_bindex);
1982+ err = -ENOMEM;
1983+ goto out_free;
1984+ }
1985+
1986+ oldfs = get_fs();
1987+ set_fs(KERNEL_DS);
1988+ err = old_lower_dentry->d_inode->i_op->readlink(
1989+ old_lower_dentry,
1990+ (char __user *)symbuf,
1991+ PATH_MAX);
1992+ set_fs(oldfs);
1993+ if (err < 0) {
1994+ __clear(dentry, old_lower_dentry,
1995+ old_bstart, old_bend,
1996+ new_lower_dentry, new_bindex);
1997+ goto out_free;
1998+ }
1999+ symbuf[err] = '\0';
2000+ }
2001+
2002+ /* Now we lock the parent, and create the object in the new branch. */
2003+ new_lower_parent_dentry = lock_parent(new_lower_dentry);
2004+
2005+ /* create the new inode */
2006+ err = __copyup_ndentry(old_lower_dentry, new_lower_dentry,
2007+ new_lower_parent_dentry, symbuf);
2008+
2009+ if (err) {
2010+ __clear(dentry, old_lower_dentry,
2011+ old_bstart, old_bend,
2012+ new_lower_dentry, new_bindex);
2013+ goto out_unlock;
2014+ }
2015+
2016+ /* We actually copyup the file here. */
2017+ if (S_ISREG(old_lower_dentry->d_inode->i_mode))
2018+ err = __copyup_reg_data(dentry, new_lower_dentry, new_bindex,
2019+ old_lower_dentry, old_bindex,
2020+ copyup_file, len);
2021+ if (err)
2022+ goto out_unlink;
2023+
2024+ /* Set permissions. */
2025+ if ((err = copyup_permissions(sb, old_lower_dentry,
2026+ new_lower_dentry)))
2027+ goto out_unlink;
2028+
2029+#ifdef CONFIG_UNION_FS_XATTR
2030+ /* Selinux uses extended attributes for permissions. */
2031+ if ((err = copyup_xattrs(old_lower_dentry, new_lower_dentry)))
2032+ goto out_unlink;
2033+#endif /* CONFIG_UNION_FS_XATTR */
2034+
2035+ /* do not allow files getting deleted to be re-interposed */
2036+ if (!d_deleted(dentry))
2037+ unionfs_reinterpose(dentry);
2038+
2039+ goto out_unlock;
2040+
2041+out_unlink:
2042+ /*
2043+ * copyup failed, because we possibly ran out of space or
2044+ * quota, or something else happened so let's unlink; we don't
2045+ * really care about the return value of vfs_unlink
2046+ */
2047+ vfs_unlink(new_lower_parent_dentry->d_inode, new_lower_dentry);
2048+
2049+ if (copyup_file) {
2050+ /* need to close the file */
2051+
2052+ fput(*copyup_file);
2053+ branchput(sb, new_bindex);
2054+ }
2055+
2056+ /*
2057+ * TODO: should we reset the error to something like -EIO?
2058+ *
2059+ * If we don't reset, the user may get some nonsensical errors, but
2060+ * on the other hand, if we reset to EIO, we guarantee that the user
2061+ * will get a "confusing" error message.
2062+ */
2063+
2064+out_unlock:
2065+ unlock_dir(new_lower_parent_dentry);
2066+
2067+out_free:
2068+ /*
2069+ * If old_lower_dentry was a directory, we need to dput it. If it
2070+ * was a file, then it was already dput indirectly by other
2071+ * functions we call above which operate on regular files.
2072+ */
2073+ if (old_lower_dentry && old_lower_dentry->d_inode &&
2074+ (S_ISDIR(old_lower_dentry->d_inode->i_mode) ||
2075+ S_ISLNK(old_lower_dentry->d_inode->i_mode)))
2076+ dput(old_lower_dentry);
2077+ kfree(symbuf);
2078+
2079+ if (err)
2080+ goto out;
2081+ if (!S_ISDIR(dentry->d_inode->i_mode)) {
2082+ unionfs_postcopyup_release(dentry);
2083+ if (!unionfs_lower_inode(dentry->d_inode)) {
2084+ /*
2085+ * If we got here, then we copied up to an
2086+ * unlinked-open file, whose name is .unionfsXXXXX.
2087+ */
2088+ struct inode *inode = new_lower_dentry->d_inode;
2089+ atomic_inc(&inode->i_count);
2090+ unionfs_set_lower_inode_idx(dentry->d_inode,
2091+ ibstart(dentry->d_inode),
2092+ inode);
2093+ }
2094+ }
2095+ unionfs_postcopyup_setmnt(dentry);
2096+ /* sync inode times from copied-up inode to our inode */
2097+ unionfs_copy_attr_times(dentry->d_inode);
2098+ unionfs_check_inode(dir);
2099+ unionfs_check_dentry(dentry);
2100+out:
2101+ return err;
2102+}
2103+
2104+/*
2105+ * This function creates a copy of a file represented by 'file' which
2106+ * currently resides in branch 'bstart' to branch 'new_bindex.' The copy
2107+ * will be named "name".
2108+ */
2109+int copyup_named_file(struct inode *dir, struct file *file, char *name,
2110+ int bstart, int new_bindex, loff_t len)
2111+{
2112+ int err = 0;
2113+ struct file *output_file = NULL;
2114+
2115+ err = copyup_dentry(dir, file->f_path.dentry, bstart, new_bindex,
2116+ name, strlen(name), &output_file, len);
2117+ if (!err) {
2118+ fbstart(file) = new_bindex;
2119+ unionfs_set_lower_file_idx(file, new_bindex, output_file);
2120+ }
2121+
2122+ return err;
2123+}
2124+
2125+/*
2126+ * This function creates a copy of a file represented by 'file' which
2127+ * currently resides in branch 'bstart' to branch 'new_bindex'.
2128+ */
2129+int copyup_file(struct inode *dir, struct file *file, int bstart,
2130+ int new_bindex, loff_t len)
2131+{
2132+ int err = 0;
2133+ struct file *output_file = NULL;
2134+ struct dentry *dentry = file->f_path.dentry;
2135+
2136+ err = copyup_dentry(dir, dentry, bstart, new_bindex,
2137+ dentry->d_name.name, dentry->d_name.len,
2138+ &output_file, len);
2139+ if (!err) {
2140+ fbstart(file) = new_bindex;
2141+ unionfs_set_lower_file_idx(file, new_bindex, output_file);
2142+ }
2143+
2144+ return err;
2145+}
2146+
2147+/* purge a dentry's lower-branch states (dput/mntput, etc.) */
2148+static void __cleanup_dentry(struct dentry *dentry, int bindex,
2149+ int old_bstart, int old_bend)
2150+{
2151+ int loop_start;
2152+ int loop_end;
2153+ int new_bstart = -1;
2154+ int new_bend = -1;
2155+ int i;
2156+
2157+ loop_start = min(old_bstart, bindex);
2158+ loop_end = max(old_bend, bindex);
2159+
2160+ /*
2161+ * This loop sets the bstart and bend for the new dentry by
2162+ * traversing from left to right. It also dputs all negative
2163+ * dentries except bindex
2164+ */
2165+ for (i = loop_start; i <= loop_end; i++) {
2166+ if (!unionfs_lower_dentry_idx(dentry, i))
2167+ continue;
2168+
2169+ if (i == bindex) {
2170+ new_bend = i;
2171+ if (new_bstart < 0)
2172+ new_bstart = i;
2173+ continue;
2174+ }
2175+
2176+ if (!unionfs_lower_dentry_idx(dentry, i)->d_inode) {
2177+ dput(unionfs_lower_dentry_idx(dentry, i));
2178+ unionfs_set_lower_dentry_idx(dentry, i, NULL);
2179+
2180+ unionfs_mntput(dentry, i);
2181+ unionfs_set_lower_mnt_idx(dentry, i, NULL);
2182+ } else {
2183+ if (new_bstart < 0)
2184+ new_bstart = i;
2185+ new_bend = i;
2186+ }
2187+ }
2188+
2189+ if (new_bstart < 0)
2190+ new_bstart = bindex;
2191+ if (new_bend < 0)
2192+ new_bend = bindex;
2193+ set_dbstart(dentry, new_bstart);
2194+ set_dbend(dentry, new_bend);
2195+
2196+}
2197+
2198+/* set lower inode ptr and update bstart & bend if necessary */
2199+static void __set_inode(struct dentry *upper, struct dentry *lower,
2200+ int bindex)
2201+{
2202+ unionfs_set_lower_inode_idx(upper->d_inode, bindex,
2203+ igrab(lower->d_inode));
2204+ if (likely(ibstart(upper->d_inode) > bindex))
2205+ ibstart(upper->d_inode) = bindex;
2206+ if (likely(ibend(upper->d_inode) < bindex))
2207+ ibend(upper->d_inode) = bindex;
2208+
2209+}
2210+
2211+/* set lower dentry ptr and update bstart & bend if necessary */
2212+static void __set_dentry(struct dentry *upper, struct dentry *lower,
2213+ int bindex)
2214+{
2215+ unionfs_set_lower_dentry_idx(upper, bindex, lower);
2216+ if (likely(dbstart(upper) > bindex))
2217+ set_dbstart(upper, bindex);
2218+ if (likely(dbend(upper) < bindex))
2219+ set_dbend(upper, bindex);
2220+}
2221+
2222+/*
2223+ * This function replicates the directory structure up-to given dentry
2224+ * in the bindex branch.
2225+ */
2226+struct dentry *create_parents(struct inode *dir, struct dentry *dentry,
2227+ const char *name, int bindex)
2228+{
2229+ int err;
2230+ struct dentry *child_dentry;
2231+ struct dentry *parent_dentry;
2232+ struct dentry *lower_parent_dentry = NULL;
2233+ struct dentry *lower_dentry = NULL;
2234+ const char *childname;
2235+ unsigned int childnamelen;
2236+ int nr_dentry;
2237+ int count = 0;
2238+ int old_bstart;
2239+ int old_bend;
2240+ struct dentry **path = NULL;
2241+ struct super_block *sb;
2242+
2243+ verify_locked(dentry);
2244+
2245+ if ((err = is_robranch_super(dir->i_sb, bindex))) {
2246+ lower_dentry = ERR_PTR(err);
2247+ goto out;
2248+ }
2249+
2250+ old_bstart = dbstart(dentry);
2251+ old_bend = dbend(dentry);
2252+
2253+ lower_dentry = ERR_PTR(-ENOMEM);
2254+
2255+ /* There is no sense allocating any less than the minimum. */
2256+ nr_dentry = 1;
2257+ path = kmalloc(nr_dentry * sizeof(struct dentry *), GFP_KERNEL);
2258+ if (!path)
2259+ goto out;
2260+
2261+ /* assume the negative dentry of unionfs as the parent dentry */
2262+ parent_dentry = dentry;
2263+
2264+ /*
2265+ * This loop finds the first parent that exists in the given branch.
2266+ * We start building the directory structure from there. At the end
2267+ * of the loop, the following should hold:
2268+ * - child_dentry is the first nonexistent child
2269+ * - parent_dentry is the first existent parent
2270+ * - path[0] is the = deepest child
2271+ * - path[count] is the first child to create
2272+ */
2273+ do {
2274+ child_dentry = parent_dentry;
2275+
2276+ /* find the parent directory dentry in unionfs */
2277+ parent_dentry = child_dentry->d_parent;
2278+ unionfs_lock_dentry(parent_dentry);
2279+
2280+ /* find out the lower_parent_dentry in the given branch */
2281+ lower_parent_dentry =
2282+ unionfs_lower_dentry_idx(parent_dentry, bindex);
2283+
2284+ /* grow path table */
2285+ if (count == nr_dentry) {
2286+ void *p;
2287+
2288+ nr_dentry *= 2;
2289+ p = krealloc(path, nr_dentry * sizeof(struct dentry *),
2290+ GFP_KERNEL);
2291+ if (!p) {
2292+ lower_dentry = ERR_PTR(-ENOMEM);
2293+ goto out;
2294+ }
2295+ path = p;
2296+ }
2297+
2298+ /* store the child dentry */
2299+ path[count++] = child_dentry;
2300+ } while (!lower_parent_dentry);
2301+ count--;
2302+
2303+ sb = dentry->d_sb;
2304+
2305+ /*
2306+ * This code goes between the begin/end labels and basically
2307+ * emulates a while(child_dentry != dentry), only cleaner and
2308+ * shorter than what would be a much longer while loop.
2309+ */
2310+begin:
2311+ /* get lower parent dir in the current branch */
2312+ lower_parent_dentry = unionfs_lower_dentry_idx(parent_dentry, bindex);
2313+ unionfs_unlock_dentry(parent_dentry);
2314+
2315+ /* init the values to lookup */
2316+ childname = child_dentry->d_name.name;
2317+ childnamelen = child_dentry->d_name.len;
2318+
2319+ if (child_dentry != dentry) {
2320+ /* lookup child in the underlying file system */
2321+ lower_dentry = lookup_one_len(childname, lower_parent_dentry,
2322+ childnamelen);
2323+ if (IS_ERR(lower_dentry))
2324+ goto out;
2325+ } else {
2326+ /*
2327+ * Is the name a whiteout of the child name ? lookup the
2328+ * whiteout child in the underlying file system
2329+ */
2330+ lower_dentry = lookup_one_len(name, lower_parent_dentry,
2331+ strlen(name));
2332+ if (IS_ERR(lower_dentry))
2333+ goto out;
2334+
2335+ /* Replace the current dentry (if any) with the new one */
2336+ dput(unionfs_lower_dentry_idx(dentry, bindex));
2337+ unionfs_set_lower_dentry_idx(dentry, bindex,
2338+ lower_dentry);
2339+
2340+ __cleanup_dentry(dentry, bindex, old_bstart, old_bend);
2341+ goto out;
2342+ }
2343+
2344+ if (lower_dentry->d_inode) {
2345+ /*
2346+ * since this already exists we dput to avoid
2347+ * multiple references on the same dentry
2348+ */
2349+ dput(lower_dentry);
2350+ } else {
2351+ struct sioq_args args;
2352+
2353+ /* it's a negative dentry, create a new dir */
2354+ lower_parent_dentry = lock_parent(lower_dentry);
2355+
2356+ args.mkdir.parent = lower_parent_dentry->d_inode;
2357+ args.mkdir.dentry = lower_dentry;
2358+ args.mkdir.mode = child_dentry->d_inode->i_mode;
2359+
2360+ run_sioq(__unionfs_mkdir, &args);
2361+ err = args.err;
2362+
2363+ if (!err)
2364+ err = copyup_permissions(dir->i_sb, child_dentry,
2365+ lower_dentry);
2366+ unlock_dir(lower_parent_dentry);
2367+ if (err) {
2368+ struct inode *inode = lower_dentry->d_inode;
2369+ /*
2370+ * If we get here, it means that we created a new
2371+ * dentry+inode, but copying permissions failed.
2372+ * Therefore, we should delete this inode and dput
2373+ * the dentry so as not to leave cruft behind.
2374+ */
2375+ if (lower_dentry->d_op && lower_dentry->d_op->d_iput)
2376+ lower_dentry->d_op->d_iput(lower_dentry,
2377+ inode);
2378+ else
2379+ iput(inode);
2380+ lower_dentry->d_inode = NULL;
2381+ dput(lower_dentry);
2382+ lower_dentry = ERR_PTR(err);
2383+ goto out;
2384+ }
2385+
2386+ }
2387+
2388+ __set_inode(child_dentry, lower_dentry, bindex);
2389+ __set_dentry(child_dentry, lower_dentry, bindex);
2390+ /*
2391+ * update times of this dentry, but also the parent, because if
2392+ * we changed, the parent may have changed too.
2393+ */
2394+ unionfs_copy_attr_times(parent_dentry->d_inode);
2395+ unionfs_copy_attr_times(child_dentry->d_inode);
2396+
2397+ parent_dentry = child_dentry;
2398+ child_dentry = path[--count];
2399+ goto begin;
2400+out:
2401+ /* cleanup any leftover locks from the do/while loop above */
2402+ if (IS_ERR(lower_dentry))
2403+ while (count)
2404+ unionfs_unlock_dentry(path[count--]);
2405+ kfree(path);
2406+ return lower_dentry;
2407+}
2408+
2409+/*
2410+ * Post-copyup helper to ensure we have valid mnts: set lower mnt of
2411+ * dentry+parents to the first parent node that has an mnt.
2412+ */
2413+void unionfs_postcopyup_setmnt(struct dentry *dentry)
2414+{
2415+ struct dentry *parent, *hasone;
2416+ int bindex = dbstart(dentry);
2417+
2418+ if (unionfs_lower_mnt_idx(dentry, bindex))
2419+ return;
2420+ hasone = dentry->d_parent;
2421+ /* this loop should stop at root dentry */
2422+ while (!unionfs_lower_mnt_idx(hasone, bindex))
2423+ hasone = hasone->d_parent;
2424+ parent = dentry;
2425+ while (!unionfs_lower_mnt_idx(parent, bindex)) {
2426+ unionfs_set_lower_mnt_idx(parent, bindex,
2427+ unionfs_mntget(hasone, bindex));
2428+ parent = parent->d_parent;
2429+ }
2430+}
2431+
2432+/*
2433+ * Post-copyup helper to release all non-directory source objects of a
2434+ * copied-up file. Regular files should have only one lower object.
2435+ */
2436+void unionfs_postcopyup_release(struct dentry *dentry)
2437+{
2438+ int bindex;
2439+
2440+ BUG_ON(S_ISDIR(dentry->d_inode->i_mode));
2441+ for (bindex=dbstart(dentry)+1; bindex<=dbend(dentry); bindex++) {
2442+ if (unionfs_lower_mnt_idx(dentry, bindex)) {
2443+ unionfs_mntput(dentry, bindex);
2444+ unionfs_set_lower_mnt_idx(dentry, bindex, NULL);
2445+ }
2446+ if (unionfs_lower_dentry_idx(dentry, bindex)) {
2447+ dput(unionfs_lower_dentry_idx(dentry, bindex));
2448+ unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
2449+ iput(unionfs_lower_inode_idx(dentry->d_inode, bindex));
2450+ unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
2451+ NULL);
2452+ }
2453+ }
2454+ bindex = dbstart(dentry);
2455+ set_dbend(dentry, bindex);
2456+ ibend(dentry->d_inode) = ibstart(dentry->d_inode) = bindex;
2457+}
2458diff --git a/fs/unionfs/debug.c b/fs/unionfs/debug.c
2459new file mode 100644
2460index 0000000..f678534
2461--- /dev/null
2462+++ b/fs/unionfs/debug.c
2463@@ -0,0 +1,502 @@
2464+/*
2465+ * Copyright (c) 2003-2007 Erez Zadok
2466+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
2467+ * Copyright (c) 2003-2007 Stony Brook University
2468+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
2469+ *
2470+ * This program is free software; you can redistribute it and/or modify
2471+ * it under the terms of the GNU General Public License version 2 as
2472+ * published by the Free Software Foundation.
2473+ */
2474+
2475+#include "union.h"
2476+
2477+/*
2478+ * Helper debugging functions for maintainers (and for users to report back
2479+ * useful information back to maintainers)
2480+ */
2481+
2482+/* it's always useful to know what part of the code called us */
2483+#define PRINT_CALLER(fname, fxn, line) \
2484+ do { \
2485+ if (!printed_caller) { \
2486+ printk("PC:%s:%s:%d\n",(fname),(fxn),(line)); \
2487+ printed_caller = 1; \
2488+ } \
2489+ } while (0)
2490+
2491+#if BITS_PER_LONG == 32
2492+#define POISONED_PTR ((void*) 0x5a5a5a5a)
2493+#elif BITS_PER_LONG == 64
2494+#define POISONED_PTR ((void*) 0x5a5a5a5a5a5a5a5a)
2495+#else
2496+#error Unknown BITS_PER_LONG value
2497+#endif /* BITS_PER_LONG != known */
2498+
2499+/*
2500+ * __unionfs_check_{inode,dentry,file} perform exhaustive sanity checking on
2501+ * the fan-out of various Unionfs objects. We check that no lower objects
2502+ * exist outside the start/end branch range; that all objects within are
2503+ * non-NULL (with some allowed exceptions); that for every lower file
2504+ * there's a lower dentry+inode; that the start/end ranges match for all
2505+ * corresponding lower objects; that open files/symlinks have only one lower
2506+ * objects, but directories can have several; and more.
2507+ */
2508+void __unionfs_check_inode(const struct inode *inode,
2509+ const char *fname, const char *fxn, int line)
2510+{
2511+ int bindex;
2512+ int istart, iend;
2513+ struct inode *lower_inode;
2514+ struct super_block *sb;
2515+ int printed_caller = 0;
2516+
2517+ /* for inodes now */
2518+ BUG_ON(!inode);
2519+ sb = inode->i_sb;
2520+ istart = ibstart(inode);
2521+ iend = ibend(inode);
2522+ if (istart > iend) {
2523+ PRINT_CALLER(fname, fxn, line);
2524+ printk(" Ci0: inode=%p istart/end=%d:%d\n",
2525+ inode, istart, iend);
2526+ }
2527+ if ((istart == -1 && iend != -1) ||
2528+ (istart != -1 && iend == -1)) {
2529+ PRINT_CALLER(fname, fxn, line);
2530+ printk(" Ci1: inode=%p istart/end=%d:%d\n",
2531+ inode, istart, iend);
2532+ }
2533+ if (!S_ISDIR(inode->i_mode)) {
2534+ if (iend != istart) {
2535+ PRINT_CALLER(fname, fxn, line);
2536+ printk(" Ci2: inode=%p istart=%d iend=%d\n",
2537+ inode, istart, iend);
2538+ }
2539+ }
2540+
2541+ for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2542+ if (!UNIONFS_I(inode)) {
2543+ PRINT_CALLER(fname, fxn, line);
2544+ printk(" Ci3: no inode_info %p\n", inode);
2545+ return;
2546+ }
2547+ if (!UNIONFS_I(inode)->lower_inodes) {
2548+ PRINT_CALLER(fname, fxn, line);
2549+ printk(" Ci4: no lower_inodes %p\n", inode);
2550+ return;
2551+ }
2552+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
2553+ if (lower_inode) {
2554+ if (bindex < istart || bindex > iend) {
2555+ PRINT_CALLER(fname, fxn, line);
2556+ printk(" Ci5: inode/linode=%p:%p bindex=%d "
2557+ "istart/end=%d:%d\n", inode,
2558+ lower_inode, bindex, istart, iend);
2559+ } else if (lower_inode == POISONED_PTR) {
2560+ /* freed inode! */
2561+ PRINT_CALLER(fname, fxn, line);
2562+ printk(" Ci6: inode/linode=%p:%p bindex=%d "
2563+ "istart/end=%d:%d\n", inode,
2564+ lower_inode, bindex, istart, iend);
2565+ }
2566+ } else { /* lower_inode == NULL */
2567+ if (bindex >= istart && bindex <= iend) {
2568+ /*
2569+ * directories can have NULL lower inodes in
2570+ * b/t start/end, but NOT if at the
2571+ * start/end range.
2572+ */
2573+ if (!(S_ISDIR(inode->i_mode) &&
2574+ bindex > istart && bindex < iend)) {
2575+ PRINT_CALLER(fname, fxn, line);
2576+ printk(" Ci7: inode/linode=%p:%p "
2577+ "bindex=%d istart/end=%d:%d\n",
2578+ inode, lower_inode, bindex,
2579+ istart, iend);
2580+ }
2581+ }
2582+ }
2583+ }
2584+}
2585+
2586+void __unionfs_check_dentry(const struct dentry *dentry,
2587+ const char *fname, const char *fxn, int line)
2588+{
2589+ int bindex;
2590+ int dstart, dend, istart, iend;
2591+ struct dentry *lower_dentry;
2592+ struct inode *inode, *lower_inode;
2593+ struct super_block *sb;
2594+ struct vfsmount *lower_mnt;
2595+ int printed_caller = 0;
2596+
2597+ BUG_ON(!dentry);
2598+ sb = dentry->d_sb;
2599+ inode = dentry->d_inode;
2600+ dstart = dbstart(dentry);
2601+ dend = dbend(dentry);
2602+ BUG_ON(dstart > dend);
2603+
2604+ if ((dstart == -1 && dend != -1) ||
2605+ (dstart != -1 && dend == -1)) {
2606+ PRINT_CALLER(fname, fxn, line);
2607+ printk(" CD0: dentry=%p dstart/end=%d:%d\n",
2608+ dentry, dstart, dend);
2609+ }
2610+ /*
2611+ * check for NULL dentries inside the start/end range, or
2612+ * non-NULL dentries outside the start/end range.
2613+ */
2614+ for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2615+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
2616+ if (lower_dentry) {
2617+ if (bindex < dstart || bindex > dend) {
2618+ PRINT_CALLER(fname, fxn, line);
2619+ printk(" CD1: dentry/lower=%p:%p(%p) "
2620+ "bindex=%d dstart/end=%d:%d\n",
2621+ dentry, lower_dentry,
2622+ (lower_dentry ? lower_dentry->d_inode :
2623+ (void *) -1L),
2624+ bindex, dstart, dend);
2625+ }
2626+ } else { /* lower_dentry == NULL */
2627+ if (bindex >= dstart && bindex <= dend) {
2628+ /*
2629+ * Directories can have NULL lower inodes in
2630+ * b/t start/end, but NOT if at the
2631+ * start/end range. Ignore this rule,
2632+ * however, if this is a NULL dentry or a
2633+ * deleted dentry.
2634+ */
2635+ if (!d_deleted((struct dentry *) dentry) &&
2636+ inode &&
2637+ !(inode && S_ISDIR(inode->i_mode) &&
2638+ bindex > dstart && bindex < dend)) {
2639+ PRINT_CALLER(fname, fxn, line);
2640+ printk(" CD2: dentry/lower=%p:%p(%p) "
2641+ "bindex=%d dstart/end=%d:%d\n",
2642+ dentry, lower_dentry,
2643+ (lower_dentry ?
2644+ lower_dentry->d_inode :
2645+ (void *) -1L),
2646+ bindex, dstart, dend);
2647+ }
2648+ }
2649+ }
2650+ }
2651+
2652+ /* check for vfsmounts same as for dentries */
2653+ for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2654+ lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
2655+ if (lower_mnt) {
2656+ if (bindex < dstart || bindex > dend) {
2657+ PRINT_CALLER(fname, fxn, line);
2658+ printk(" CM0: dentry/lmnt=%p:%p bindex=%d "
2659+ "dstart/end=%d:%d\n", dentry,
2660+ lower_mnt, bindex, dstart, dend);
2661+ }
2662+ } else { /* lower_mnt == NULL */
2663+ if (bindex >= dstart && bindex <= dend) {
2664+ /*
2665+ * Directories can have NULL lower inodes in
2666+ * b/t start/end, but NOT if at the
2667+ * start/end range. Ignore this rule,
2668+ * however, if this is a NULL dentry.
2669+ */
2670+ if (inode &&
2671+ !(inode && S_ISDIR(inode->i_mode) &&
2672+ bindex > dstart && bindex < dend)) {
2673+ PRINT_CALLER(fname, fxn, line);
2674+ printk(" CM1: dentry/lmnt=%p:%p "
2675+ "bindex=%d dstart/end=%d:%d\n",
2676+ dentry, lower_mnt, bindex,
2677+ dstart, dend);
2678+ }
2679+ }
2680+ }
2681+ }
2682+
2683+ /* for inodes now */
2684+ if (!inode)
2685+ return;
2686+ istart = ibstart(inode);
2687+ iend = ibend(inode);
2688+ BUG_ON(istart > iend);
2689+ if ((istart == -1 && iend != -1) ||
2690+ (istart != -1 && iend == -1)) {
2691+ PRINT_CALLER(fname, fxn, line);
2692+ printk(" CI0: dentry/inode=%p:%p istart/end=%d:%d\n",
2693+ dentry, inode, istart, iend);
2694+ }
2695+ if (istart != dstart) {
2696+ PRINT_CALLER(fname, fxn, line);
2697+ printk(" CI1: dentry/inode=%p:%p istart=%d dstart=%d\n",
2698+ dentry, inode, istart, dstart);
2699+ }
2700+ if (iend != dend) {
2701+ PRINT_CALLER(fname, fxn, line);
2702+ printk(" CI2: dentry/inode=%p:%p iend=%d dend=%d\n",
2703+ dentry, inode, iend, dend);
2704+ }
2705+
2706+ if (!S_ISDIR(inode->i_mode)) {
2707+ if (dend != dstart) {
2708+ PRINT_CALLER(fname, fxn, line);
2709+ printk(" CI3: dentry/inode=%p:%p dstart=%d dend=%d\n",
2710+ dentry, inode, dstart, dend);
2711+ }
2712+ if (iend != istart) {
2713+ PRINT_CALLER(fname, fxn, line);
2714+ printk(" CI4: dentry/inode=%p:%p istart=%d iend=%d\n",
2715+ dentry, inode, istart, iend);
2716+ }
2717+ }
2718+
2719+ for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2720+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
2721+ if (lower_inode) {
2722+ if (bindex < istart || bindex > iend) {
2723+ PRINT_CALLER(fname, fxn, line);
2724+ printk(" CI5: dentry/linode=%p:%p bindex=%d "
2725+ "istart/end=%d:%d\n", dentry,
2726+ lower_inode, bindex, istart, iend);
2727+ } else if (lower_inode == POISONED_PTR) {
2728+ /* freed inode! */
2729+ PRINT_CALLER(fname, fxn, line);
2730+ printk(" CI6: dentry/linode=%p:%p bindex=%d "
2731+ "istart/end=%d:%d\n", dentry,
2732+ lower_inode, bindex, istart, iend);
2733+ }
2734+ } else { /* lower_inode == NULL */
2735+ if (bindex >= istart && bindex <= iend) {
2736+ /*
2737+ * directories can have NULL lower inodes in
2738+ * b/t start/end, but NOT if at the
2739+ * start/end range.
2740+ */
2741+ if (!(S_ISDIR(inode->i_mode) &&
2742+ bindex > istart && bindex < iend)) {
2743+ PRINT_CALLER(fname, fxn, line);
2744+ printk(" CI7: dentry/linode=%p:%p "
2745+ "bindex=%d istart/end=%d:%d\n",
2746+ dentry, lower_inode, bindex,
2747+ istart, iend);
2748+ }
2749+ }
2750+ }
2751+ }
2752+
2753+ /*
2754+ * If it's a directory, then intermediate objects b/t start/end can
2755+ * be NULL. But, check that all three are NULL: lower dentry, mnt,
2756+ * and inode.
2757+ */
2758+ if (S_ISDIR(inode->i_mode))
2759+ for (bindex = dstart+1; bindex < dend; bindex++) {
2760+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
2761+ lower_dentry = unionfs_lower_dentry_idx(dentry,
2762+ bindex);
2763+ lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
2764+ if (!((lower_inode && lower_dentry && lower_mnt) ||
2765+ (!lower_inode && !lower_dentry && !lower_mnt))) {
2766+ PRINT_CALLER(fname, fxn, line);
2767+ printk(" Cx: lmnt/ldentry/linode=%p:%p:%p "
2768+ "bindex=%d dstart/end=%d:%d\n",
2769+ lower_mnt, lower_dentry, lower_inode,
2770+ bindex, dstart, dend);
2771+ }
2772+ }
2773+ /* check if lower inode is newer than upper one (it shouldn't) */
2774+ if (is_newer_lower(dentry)) {
2775+ PRINT_CALLER(fname, fxn, line);
2776+ for (bindex=ibstart(inode); bindex <= ibend(inode); bindex++) {
2777+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
2778+ if (!lower_inode)
2779+ continue;
2780+ printk(" CI8: bindex=%d mtime/lmtime=%lu.%lu/%lu.%lu "
2781+ "ctime/lctime=%lu.%lu/%lu.%lu\n",
2782+ bindex,
2783+ inode->i_mtime.tv_sec,
2784+ inode->i_mtime.tv_nsec,
2785+ lower_inode->i_mtime.tv_sec,
2786+ lower_inode->i_mtime.tv_nsec,
2787+ inode->i_ctime.tv_sec,
2788+ inode->i_ctime.tv_nsec,
2789+ lower_inode->i_ctime.tv_sec,
2790+ lower_inode->i_ctime.tv_nsec);
2791+ }
2792+ }
2793+}
2794+
2795+void __unionfs_check_file(const struct file *file,
2796+ const char *fname, const char *fxn, int line)
2797+{
2798+ int bindex;
2799+ int dstart, dend, fstart, fend;
2800+ struct dentry *dentry;
2801+ struct file *lower_file;
2802+ struct inode *inode;
2803+ struct super_block *sb;
2804+ int printed_caller = 0;
2805+
2806+ BUG_ON(!file);
2807+ dentry = file->f_path.dentry;
2808+ sb = dentry->d_sb;
2809+ dstart = dbstart(dentry);
2810+ dend = dbend(dentry);
2811+ BUG_ON(dstart > dend);
2812+ fstart = fbstart(file);
2813+ fend = fbend(file);
2814+ BUG_ON(fstart > fend);
2815+
2816+ if ((fstart == -1 && fend != -1) ||
2817+ (fstart != -1 && fend == -1)) {
2818+ PRINT_CALLER(fname, fxn, line);
2819+ printk(" CF0: file/dentry=%p:%p fstart/end=%d:%d\n",
2820+ file, dentry, fstart, fend);
2821+ }
2822+ if (fstart != dstart) {
2823+ PRINT_CALLER(fname, fxn, line);
2824+ printk(" CF1: file/dentry=%p:%p fstart=%d dstart=%d\n",
2825+ file, dentry, fstart, dstart);
2826+ }
2827+ if (fend != dend) {
2828+ PRINT_CALLER(fname, fxn, line);
2829+ printk(" CF2: file/dentry=%p:%p fend=%d dend=%d\n",
2830+ file, dentry, fend, dend);
2831+ }
2832+ inode = dentry->d_inode;
2833+ if (!S_ISDIR(inode->i_mode)) {
2834+ if (fend != fstart) {
2835+ PRINT_CALLER(fname, fxn, line);
2836+ printk(" CF3: file/inode=%p:%p fstart=%d fend=%d\n",
2837+ file, inode, fstart, fend);
2838+ }
2839+ if (dend != dstart) {
2840+ PRINT_CALLER(fname, fxn, line);
2841+ printk(" CF4: file/dentry=%p:%p dstart=%d dend=%d\n",
2842+ file, dentry, dstart, dend);
2843+ }
2844+ }
2845+
2846+ /*
2847+ * check for NULL dentries inside the start/end range, or
2848+ * non-NULL dentries outside the start/end range.
2849+ */
2850+ for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2851+ lower_file = unionfs_lower_file_idx(file, bindex);
2852+ if (lower_file) {
2853+ if (bindex < fstart || bindex > fend) {
2854+ PRINT_CALLER(fname, fxn, line);
2855+ printk(" CF5: file/lower=%p:%p bindex=%d "
2856+ "fstart/end=%d:%d\n",
2857+ file, lower_file, bindex, fstart, fend);
2858+ }
2859+ } else { /* lower_file == NULL */
2860+ if (bindex >= fstart && bindex <= fend) {
2861+ /*
2862+ * directories can have NULL lower inodes in
2863+ * b/t start/end, but NOT if at the
2864+ * start/end range.
2865+ */
2866+ if (!(S_ISDIR(inode->i_mode) &&
2867+ bindex > fstart && bindex < fend)) {
2868+ PRINT_CALLER(fname, fxn, line);
2869+ printk(" CF6: file/lower=%p:%p "
2870+ "bindex=%d fstart/end=%d:%d\n",
2871+ file, lower_file, bindex,
2872+ fstart, fend);
2873+ }
2874+ }
2875+ }
2876+ }
2877+
2878+ __unionfs_check_dentry(dentry,fname,fxn,line);
2879+}
2880+
2881+/* useful to track vfsmount leaks that could cause EBUSY on unmount */
2882+void __show_branch_counts(const struct super_block *sb,
2883+ const char *file, const char *fxn, int line)
2884+{
2885+ int i;
2886+ struct vfsmount *mnt;
2887+
2888+ printk("BC:");
2889+ for (i=0; i<sbmax(sb); i++) {
2890+ if (sb->s_root)
2891+ mnt = UNIONFS_D(sb->s_root)->lower_paths[i].mnt;
2892+ else
2893+ mnt = NULL;
2894+ printk("%d:", (mnt ? atomic_read(&mnt->mnt_count) : -99));
2895+ }
2896+ printk("%s:%s:%d\n",file,fxn,line);
2897+}
2898+
2899+void __show_inode_times(const struct inode *inode,
2900+ const char *file, const char *fxn, int line)
2901+{
2902+ struct inode *lower_inode;
2903+ int bindex;
2904+
2905+ for (bindex=ibstart(inode); bindex <= ibend(inode); bindex++) {
2906+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
2907+ if (!lower_inode)
2908+ continue;
2909+ printk("IT(%lu:%d): ", inode->i_ino, bindex);
2910+ printk("%s:%s:%d ",file,fxn,line);
2911+ printk("um=%lu/%lu lm=%lu/%lu ",
2912+ inode->i_mtime.tv_sec, inode->i_mtime.tv_nsec,
2913+ lower_inode->i_mtime.tv_sec,
2914+ lower_inode->i_mtime.tv_nsec);
2915+ printk("uc=%lu/%lu lc=%lu/%lu\n",
2916+ inode->i_ctime.tv_sec, inode->i_ctime.tv_nsec,
2917+ lower_inode->i_ctime.tv_sec,
2918+ lower_inode->i_ctime.tv_nsec);
2919+ }
2920+}
2921+
2922+void __show_dinode_times(const struct dentry *dentry,
2923+ const char *file, const char *fxn, int line)
2924+{
2925+ struct inode *inode = dentry->d_inode;
2926+ struct inode *lower_inode;
2927+ int bindex;
2928+
2929+ for (bindex=ibstart(inode); bindex <= ibend(inode); bindex++) {
2930+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
2931+ if (!lower_inode)
2932+ continue;
2933+ printk("DT(%s:%lu:%d): ", dentry->d_name.name, inode->i_ino, bindex);
2934+ printk("%s:%s:%d ",file,fxn,line);
2935+ printk("um=%lu/%lu lm=%lu/%lu ",
2936+ inode->i_mtime.tv_sec, inode->i_mtime.tv_nsec,
2937+ lower_inode->i_mtime.tv_sec,
2938+ lower_inode->i_mtime.tv_nsec);
2939+ printk("uc=%lu/%lu lc=%lu/%lu\n",
2940+ inode->i_ctime.tv_sec, inode->i_ctime.tv_nsec,
2941+ lower_inode->i_ctime.tv_sec,
2942+ lower_inode->i_ctime.tv_nsec);
2943+ }
2944+}
2945+
2946+void __show_inode_counts(const struct inode *inode,
2947+ const char *file, const char *fxn, int line)
2948+{
2949+ struct inode *lower_inode;
2950+ int bindex;
2951+
2952+ if (!inode) {
2953+ printk("SiC: Null inode\n");
2954+ return;
2955+ }
2956+ for (bindex=sbstart(inode->i_sb); bindex <= sbend(inode->i_sb); bindex++) {
2957+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
2958+ if (!lower_inode)
2959+ continue;
2960+ printk("SIC(%lu:%d:%d): ", inode->i_ino, bindex,
2961+ atomic_read(&(inode)->i_count));
2962+ printk("lc=%d ", atomic_read(&(lower_inode)->i_count));
2963+ printk("%s:%s:%d\n",file,fxn,line);
2964+ }
2965+}
2966diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c
2967new file mode 100644
2968index 0000000..08b5722
2969--- /dev/null
2970+++ b/fs/unionfs/dentry.c
2971@@ -0,0 +1,481 @@
2972+/*
2973+ * Copyright (c) 2003-2007 Erez Zadok
2974+ * Copyright (c) 2003-2006 Charles P. Wright
2975+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
2976+ * Copyright (c) 2005-2006 Junjiro Okajima
2977+ * Copyright (c) 2005 Arun M. Krishnakumar
2978+ * Copyright (c) 2004-2006 David P. Quigley
2979+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
2980+ * Copyright (c) 2003 Puja Gupta
2981+ * Copyright (c) 2003 Harikesavan Krishnan
2982+ * Copyright (c) 2003-2007 Stony Brook University
2983+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
2984+ *
2985+ * This program is free software; you can redistribute it and/or modify
2986+ * it under the terms of the GNU General Public License version 2 as
2987+ * published by the Free Software Foundation.
2988+ */
2989+
2990+#include "union.h"
2991+
2992+/*
2993+ * Revalidate a single dentry.
2994+ * Assume that dentry's info node is locked.
2995+ * Assume that parent(s) are all valid already, but
2996+ * the child may not yet be valid.
2997+ * Returns true if valid, false otherwise.
2998+ */
2999+static bool __unionfs_d_revalidate_one(struct dentry *dentry,
3000+ struct nameidata *nd)
3001+{
3002+ bool valid = true; /* default is valid */
3003+ struct dentry *lower_dentry;
3004+ int bindex, bstart, bend;
3005+ int sbgen, dgen;
3006+ int positive = 0;
3007+ int locked = 0;
3008+ int interpose_flag;
3009+ struct nameidata lowernd; /* TODO: be gentler to the stack */
3010+
3011+ if (nd)
3012+ memcpy(&lowernd, nd, sizeof(struct nameidata));
3013+ else
3014+ memset(&lowernd, 0, sizeof(struct nameidata));
3015+
3016+ verify_locked(dentry);
3017+
3018+ /* if the dentry is unhashed, do NOT revalidate */
3019+ if (d_deleted(dentry)) {
3020+ dprintk(KERN_DEBUG "unionfs: unhashed dentry being "
3021+ "revalidated: %*s\n",
3022+ dentry->d_name.len, dentry->d_name.name);
3023+ goto out;
3024+ }
3025+
3026+ BUG_ON(dbstart(dentry) == -1);
3027+ if (dentry->d_inode)
3028+ positive = 1;
3029+ dgen = atomic_read(&UNIONFS_D(dentry)->generation);
3030+ sbgen = atomic_read(&UNIONFS_SB(dentry->d_sb)->generation);
3031+ /*
3032+ * If we are working on an unconnected dentry, then there is no
3033+ * revalidation to be done, because this file does not exist within
3034+ * the namespace, and Unionfs operates on the namespace, not data.
3035+ */
3036+ if (sbgen != dgen) {
3037+ struct dentry *result;
3038+ int pdgen;
3039+
3040+ /* The root entry should always be valid */
3041+ BUG_ON(IS_ROOT(dentry));
3042+
3043+ /* We can't work correctly if our parent isn't valid. */
3044+ pdgen = atomic_read(&UNIONFS_D(dentry->d_parent)->generation);
3045+ BUG_ON(pdgen != sbgen); /* should never happen here */
3046+
3047+ /* Free the pointers for our inodes and this dentry. */
3048+ bstart = dbstart(dentry);
3049+ bend = dbend(dentry);
3050+ if (bstart >= 0) {
3051+ struct dentry *lower_dentry;
3052+ for (bindex = bstart; bindex <= bend; bindex++) {
3053+ lower_dentry =
3054+ unionfs_lower_dentry_idx(dentry,
3055+ bindex);
3056+ dput(lower_dentry);
3057+ }
3058+ }
3059+ set_dbstart(dentry, -1);
3060+ set_dbend(dentry, -1);
3061+
3062+ interpose_flag = INTERPOSE_REVAL_NEG;
3063+ if (positive) {
3064+ interpose_flag = INTERPOSE_REVAL;
3065+ /*
3066+ * During BRM, the VFS could already hold a lock on
3067+ * a file being read, so don't lock it again
3068+ * (deadlock), but if you lock it in this function,
3069+ * then release it here too.
3070+ */
3071+ if (!mutex_is_locked(&dentry->d_inode->i_mutex)) {
3072+ mutex_lock(&dentry->d_inode->i_mutex);
3073+ locked = 1;
3074+ }
3075+
3076+ bstart = ibstart(dentry->d_inode);
3077+ bend = ibend(dentry->d_inode);
3078+ if (bstart >= 0) {
3079+ struct inode *lower_inode;
3080+ for (bindex = bstart; bindex <= bend;
3081+ bindex++) {
3082+ lower_inode =
3083+ unionfs_lower_inode_idx(
3084+ dentry->d_inode,
3085+ bindex);
3086+ iput(lower_inode);
3087+ }
3088+ }
3089+ kfree(UNIONFS_I(dentry->d_inode)->lower_inodes);
3090+ UNIONFS_I(dentry->d_inode)->lower_inodes = NULL;
3091+ ibstart(dentry->d_inode) = -1;
3092+ ibend(dentry->d_inode) = -1;
3093+ if (locked)
3094+ mutex_unlock(&dentry->d_inode->i_mutex);
3095+ }
3096+
3097+ result = unionfs_lookup_backend(dentry, &lowernd,
3098+ interpose_flag);
3099+ if (result) {
3100+ if (IS_ERR(result)) {
3101+ valid = false;
3102+ goto out;
3103+ }
3104+ /*
3105+ * current unionfs_lookup_backend() doesn't return
3106+ * a valid dentry
3107+ */
3108+ dput(dentry);
3109+ dentry = result;
3110+ }
3111+
3112+ if (positive && UNIONFS_I(dentry->d_inode)->stale) {
3113+ make_bad_inode(dentry->d_inode);
3114+ d_drop(dentry);
3115+ valid = false;
3116+ goto out;
3117+ }
3118+ goto out;
3119+ }
3120+
3121+ /* The revalidation must occur across all branches */
3122+ bstart = dbstart(dentry);
3123+ bend = dbend(dentry);
3124+ BUG_ON(bstart == -1);
3125+ for (bindex = bstart; bindex <= bend; bindex++) {
3126+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3127+ if (!lower_dentry || !lower_dentry->d_op
3128+ || !lower_dentry->d_op->d_revalidate)
3129+ continue;
3130+ if (!lower_dentry->d_op->d_revalidate(lower_dentry,
3131+ &lowernd))
3132+ valid = false;
3133+ }
3134+
3135+ if (!dentry->d_inode)
3136+ valid = false;
3137+
3138+ if (valid) {
3139+ /*
3140+ * If we get here, and we copy the meta-data from the lower
3141+ * inode to our inode, then it is vital that we have already
3142+ * purged all unionfs-level file data. We do that in the
3143+ * caller (__unionfs_d_revalidate_chain) by calling
3144+ * purge_inode_data.
3145+ */
3146+ unionfs_copy_attr_all(dentry->d_inode,
3147+ unionfs_lower_inode(dentry->d_inode));
3148+ fsstack_copy_inode_size(dentry->d_inode,
3149+ unionfs_lower_inode(dentry->d_inode));
3150+ }
3151+
3152+out:
3153+ return valid;
3154+}
3155+
3156+/*
3157+ * Determine if the lower inode objects have changed from below the unionfs
3158+ * inode. Return true if changed, false otherwise.
3159+ */
3160+bool is_newer_lower(const struct dentry *dentry)
3161+{
3162+ int bindex;
3163+ struct inode *inode;
3164+ struct inode *lower_inode;
3165+
3166+ /* ignore if we're called on semi-initialized dentries/inodes */
3167+ if (!dentry || !UNIONFS_D(dentry))
3168+ return false;
3169+ inode = dentry->d_inode;
3170+ if (!inode || !UNIONFS_I(inode) ||
3171+ ibstart(inode) < 0 || ibend(inode) < 0)
3172+ return false;
3173+
3174+ for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
3175+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
3176+ if (!lower_inode)
3177+ continue;
3178+ /*
3179+ * We may want to apply other tests to determine if the
3180+ * lower inode's data has changed, but checking for changed
3181+ * ctime and mtime on the lower inode should be enough.
3182+ */
3183+ if (timespec_compare(&inode->i_mtime,
3184+ &lower_inode->i_mtime) < 0) {
3185+ printk("unionfs: new lower inode mtime "
3186+ "(bindex=%d, name=%s)\n", bindex,
3187+ dentry->d_name.name);
3188+ show_dinode_times(dentry);
3189+ return true; /* mtime changed! */
3190+ }
3191+ if (timespec_compare(&inode->i_ctime,
3192+ &lower_inode->i_ctime) < 0) {
3193+ printk("unionfs: new lower inode ctime "
3194+ "(bindex=%d, name=%s)\n", bindex,
3195+ dentry->d_name.name);
3196+ show_dinode_times(dentry);
3197+ return true; /* ctime changed! */
3198+ }
3199+ }
3200+ return false; /* default: lower is not newer */
3201+}
3202+
3203+/*
3204+ * Purge/remove/unmap all date pages of a unionfs inode. This is called
3205+ * when the lower inode has changed, and we have to force processes to get
3206+ * the new data.
3207+ *
3208+ * XXX: Our implementation works in that as long as a user process will have
3209+ * caused Unionfs to be called, directly or indirectly, even to just do
3210+ * ->d_revalidate; then we will have purged the current Unionfs data and the
3211+ * process will see the new data. For example, a process that continually
3212+ * re-reads the same file's data will see the NEW data as soon as the lower
3213+ * file had changed, upon the next read(2) syscall (even if the file is
3214+ * still open!) However, this doesn't work when the process re-reads the
3215+ * open file's data via mmap(2) (unless the user unmaps/closes the file and
3216+ * remaps/reopens it). Once we respond to ->readpage(s), then the kernel
3217+ * maps the page into the process's address space and there doesn't appear
3218+ * to be a way to force the kernel to invalidate those pages/mappings, and
3219+ * force the process to re-issue ->readpage. If there's a way to invalidate
3220+ * active mappings and force a ->readpage, let us know please
3221+ * (invalidate_inode_pages2 doesn't do the trick).
3222+ */
3223+static inline void purge_inode_data(struct inode *inode)
3224+{
3225+ /* remove all non-private mappings */
3226+ unmap_mapping_range(inode->i_mapping, 0, 0, 0);
3227+
3228+ if (inode->i_data.nrpages)
3229+ truncate_inode_pages(&inode->i_data, 0);
3230+}
3231+
3232+/*
3233+ * Revalidate a parent chain of dentries, then the actual node.
3234+ * Assumes that dentry is locked, but will lock all parents if/when needed.
3235+ *
3236+ * If 'willwrite' is true, and the lower inode times are not in sync, then
3237+ * *don't* purge_inode_data, as it could deadlock if ->write calls us and we
3238+ * try to truncate a locked page. Besides, if unionfs is about to write
3239+ * data to a file, then there's the data unionfs is about to write is more
3240+ * authoritative than what's below, therefore we can safely overwrite the
3241+ * lower inode times and data.
3242+ */
3243+bool __unionfs_d_revalidate_chain(struct dentry *dentry, struct nameidata *nd,
3244+ bool willwrite)
3245+{
3246+ bool valid = false; /* default is invalid */
3247+ struct dentry **chain = NULL; /* chain of dentries to reval */
3248+ int chain_len = 0;
3249+ struct dentry *dtmp;
3250+ int sbgen, dgen, i;
3251+ int saved_bstart, saved_bend, bindex;
3252+
3253+ /* find length of chain needed to revalidate */
3254+ /* XXX: should I grab some global (dcache?) lock? */
3255+ chain_len = 0;
3256+ sbgen = atomic_read(&UNIONFS_SB(dentry->d_sb)->generation);
3257+ dtmp = dentry->d_parent;
3258+ dgen = atomic_read(&UNIONFS_D(dtmp)->generation);
3259+ /* XXX: should we check if is_newer_lower all the way up? */
3260+ if (is_newer_lower(dtmp)) {
3261+ /*
3262+ * Special case: the root dentry's generation number must
3263+ * always be valid, but its lower inode times don't have to
3264+ * be, so sync up the times only.
3265+ */
3266+ if (IS_ROOT(dtmp))
3267+ unionfs_copy_attr_times(dtmp->d_inode);
3268+ else {
3269+ /*
3270+ * reset generation number to zero, guaranteed to be
3271+ * "old"
3272+ */
3273+ dgen = 0;
3274+ atomic_set(&UNIONFS_D(dtmp)->generation, dgen);
3275+ }
3276+ purge_inode_data(dtmp->d_inode);
3277+ }
3278+ while (sbgen != dgen) {
3279+ /* The root entry should always be valid */
3280+ BUG_ON(IS_ROOT(dtmp));
3281+ chain_len++;
3282+ dtmp = dtmp->d_parent;
3283+ dgen = atomic_read(&UNIONFS_D(dtmp)->generation);
3284+ }
3285+ if (chain_len == 0)
3286+ goto out_this; /* shortcut if parents are OK */
3287+
3288+ /*
3289+ * Allocate array of dentries to reval. We could use linked lists,
3290+ * but the number of entries we need to alloc here is often small,
3291+ * and short lived, so locality will be better.
3292+ */
3293+ chain = kzalloc(chain_len * sizeof(struct dentry *), GFP_KERNEL);
3294+ if (!chain) {
3295+ printk("unionfs: no more memory in %s\n", __FUNCTION__);
3296+ goto out;
3297+ }
3298+
3299+ /*
3300+ * lock all dentries in chain, in child to parent order.
3301+ * if failed, then sleep for a little, then retry.
3302+ */
3303+ dtmp = dentry->d_parent;
3304+ for (i=chain_len-1; i>=0; i--) {
3305+ chain[i] = dget(dtmp);
3306+ dtmp = dtmp->d_parent;
3307+ }
3308+
3309+ /*
3310+ * call __unionfs_d_revalidate_one() on each dentry, but in parent
3311+ * to child order.
3312+ */
3313+ for (i=0; i<chain_len; i++) {
3314+ unionfs_lock_dentry(chain[i]);
3315+ saved_bstart = dbstart(chain[i]);
3316+ saved_bend = dbend(chain[i]);
3317+ sbgen = atomic_read(&UNIONFS_SB(dentry->d_sb)->generation);
3318+ dgen = atomic_read(&UNIONFS_D(chain[i])->generation);
3319+
3320+ valid = __unionfs_d_revalidate_one(chain[i], nd);
3321+ /* XXX: is this the correct mntput condition?! */
3322+ if (valid && chain_len > 0 &&
3323+ sbgen != dgen && chain[i]->d_inode &&
3324+ S_ISDIR(chain[i]->d_inode->i_mode)) {
3325+ for (bindex = saved_bstart; bindex <= saved_bend;
3326+ bindex++)
3327+ unionfs_mntput(chain[i], bindex);
3328+ }
3329+ unionfs_unlock_dentry(chain[i]);
3330+
3331+ if (!valid)
3332+ goto out_free;
3333+ }
3334+
3335+
3336+out_this:
3337+ /* finally, lock this dentry and revalidate it */
3338+ verify_locked(dentry);
3339+ dgen = atomic_read(&UNIONFS_D(dentry)->generation);
3340+ if (is_newer_lower(dentry)) {
3341+ /* root dentry special case as aforementioned */
3342+ if (IS_ROOT(dentry))
3343+ unionfs_copy_attr_times(dentry->d_inode);
3344+ else {
3345+ /*
3346+ * reset generation number to zero, guaranteed to be
3347+ * "old"
3348+ */
3349+ dgen = 0;
3350+ atomic_set(&UNIONFS_D(dentry)->generation, dgen);
3351+ }
3352+ if (!willwrite)
3353+ purge_inode_data(dentry->d_inode);
3354+ }
3355+ valid = __unionfs_d_revalidate_one(dentry, nd);
3356+
3357+ /*
3358+ * If __unionfs_d_revalidate_one() succeeded above, then it will
3359+ * have incremented the refcnt of the mnt's, but also the branch
3360+ * indices of the dentry will have been updated (to take into
3361+ * account any branch insertions/deletion. So the current
3362+ * dbstart/dbend match the current, and new, indices of the mnts
3363+ * which __unionfs_d_revalidate_one has incremented. Note: the "if"
3364+ * test below does not depend on whether chain_len was 0 or greater.
3365+ */
3366+ if (valid && sbgen != dgen)
3367+ for (bindex = dbstart(dentry);
3368+ bindex <= dbend(dentry);
3369+ bindex++)
3370+ unionfs_mntput(dentry, bindex);
3371+
3372+out_free:
3373+ /* unlock/dput all dentries in chain and return status */
3374+ if (chain_len > 0) {
3375+ for (i=0; i<chain_len; i++)
3376+ dput(chain[i]);
3377+ kfree(chain);
3378+ }
3379+out:
3380+ return valid;
3381+}
3382+
3383+static int unionfs_d_revalidate(struct dentry *dentry, struct nameidata *nd)
3384+{
3385+ int err;
3386+
3387+ unionfs_read_lock(dentry->d_sb);
3388+
3389+ unionfs_lock_dentry(dentry);
3390+ err = __unionfs_d_revalidate_chain(dentry, nd, false);
3391+ unionfs_unlock_dentry(dentry);
3392+ if (err > 0) /* true==1: dentry is valid */
3393+ unionfs_check_dentry(dentry);
3394+
3395+ unionfs_read_unlock(dentry->d_sb);
3396+
3397+ return err;
3398+}
3399+
3400+/*
3401+ * At this point no one can reference this dentry, so we don't have to be
3402+ * careful about concurrent access.
3403+ */
3404+static void unionfs_d_release(struct dentry *dentry)
3405+{
3406+ int bindex, bstart, bend;
3407+
3408+ unionfs_read_lock(dentry->d_sb);
3409+
3410+ unionfs_check_dentry(dentry);
3411+ /* this could be a negative dentry, so check first */
3412+ if (!UNIONFS_D(dentry)) {
3413+ printk(KERN_DEBUG "unionfs: dentry without private data: %.*s\n",
3414+ dentry->d_name.len, dentry->d_name.name);
3415+ goto out;
3416+ } else if (dbstart(dentry) < 0) {
3417+ /* this is due to a failed lookup */
3418+ printk(KERN_DEBUG "unionfs: dentry without lower "
3419+ "dentries: %.*s\n",
3420+ dentry->d_name.len, dentry->d_name.name);
3421+ goto out_free;
3422+ }
3423+
3424+ /* Release all the lower dentries */
3425+ bstart = dbstart(dentry);
3426+ bend = dbend(dentry);
3427+ for (bindex = bstart; bindex <= bend; bindex++) {
3428+ dput(unionfs_lower_dentry_idx(dentry, bindex));
3429+ unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
3430+ /* NULL lower mnt is ok if this is a negative dentry */
3431+ if (!dentry->d_inode && !unionfs_lower_mnt_idx(dentry,bindex))
3432+ continue;
3433+ unionfs_mntput(dentry, bindex);
3434+ unionfs_set_lower_mnt_idx(dentry, bindex, NULL);
3435+ }
3436+ /* free private data (unionfs_dentry_info) here */
3437+ kfree(UNIONFS_D(dentry)->lower_paths);
3438+ UNIONFS_D(dentry)->lower_paths = NULL;
3439+
3440+out_free:
3441+ /* No need to unlock it, because it is disappeared. */
3442+ free_dentry_private_data(dentry);
3443+
3444+out:
3445+ unionfs_read_unlock(dentry->d_sb);
3446+ return;
3447+}
3448+
3449+struct dentry_operations unionfs_dops = {
3450+ .d_revalidate = unionfs_d_revalidate,
3451+ .d_release = unionfs_d_release,
3452+};
3453diff --git a/fs/unionfs/dirfops.c b/fs/unionfs/dirfops.c
3454new file mode 100644
3455index 0000000..c923e58
3456--- /dev/null
3457+++ b/fs/unionfs/dirfops.c
3458@@ -0,0 +1,278 @@
3459+/*
3460+ * Copyright (c) 2003-2007 Erez Zadok
3461+ * Copyright (c) 2003-2006 Charles P. Wright
3462+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3463+ * Copyright (c) 2005-2006 Junjiro Okajima
3464+ * Copyright (c) 2005 Arun M. Krishnakumar
3465+ * Copyright (c) 2004-2006 David P. Quigley
3466+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3467+ * Copyright (c) 2003 Puja Gupta
3468+ * Copyright (c) 2003 Harikesavan Krishnan
3469+ * Copyright (c) 2003-2007 Stony Brook University
3470+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
3471+ *
3472+ * This program is free software; you can redistribute it and/or modify
3473+ * it under the terms of the GNU General Public License version 2 as
3474+ * published by the Free Software Foundation.
3475+ */
3476+
3477+#include "union.h"
3478+
3479+/* Make sure our rdstate is playing by the rules. */
3480+static void verify_rdstate_offset(struct unionfs_dir_state *rdstate)
3481+{
3482+ BUG_ON(rdstate->offset >= DIREOF);
3483+ BUG_ON(rdstate->cookie >= MAXRDCOOKIE);
3484+}
3485+
3486+struct unionfs_getdents_callback {
3487+ struct unionfs_dir_state *rdstate;
3488+ void *dirent;
3489+ int entries_written;
3490+ int filldir_called;
3491+ int filldir_error;
3492+ filldir_t filldir;
3493+ struct super_block *sb;
3494+};
3495+
3496+/* based on generic filldir in fs/readir.c */
3497+static int unionfs_filldir(void *dirent, const char *name, int namelen,
3498+ loff_t offset, u64 ino, unsigned int d_type)
3499+{
3500+ struct unionfs_getdents_callback *buf = dirent;
3501+ struct filldir_node *found = NULL;
3502+ int err = 0;
3503+ int is_wh_entry = 0;
3504+
3505+ buf->filldir_called++;
3506+
3507+ if ((namelen > UNIONFS_WHLEN) &&
3508+ !strncmp(name, UNIONFS_WHPFX, UNIONFS_WHLEN)) {
3509+ name += UNIONFS_WHLEN;
3510+ namelen -= UNIONFS_WHLEN;
3511+ is_wh_entry = 1;
3512+ }
3513+
3514+ found = find_filldir_node(buf->rdstate, name, namelen);
3515+
3516+ if (found)
3517+ goto out;
3518+
3519+ /* if 'name' isn't a whiteout, filldir it. */
3520+ if (!is_wh_entry) {
3521+ off_t pos = rdstate2offset(buf->rdstate);
3522+ u64 unionfs_ino = ino;
3523+
3524+ if (!err) {
3525+ err = buf->filldir(buf->dirent, name, namelen, pos,
3526+ unionfs_ino, d_type);
3527+ buf->rdstate->offset++;
3528+ verify_rdstate_offset(buf->rdstate);
3529+ }
3530+ }
3531+ /*
3532+ * If we did fill it, stuff it in our hash, otherwise return an
3533+ * error.
3534+ */
3535+ if (err) {
3536+ buf->filldir_error = err;
3537+ goto out;
3538+ }
3539+ buf->entries_written++;
3540+ if ((err = add_filldir_node(buf->rdstate, name, namelen,
3541+ buf->rdstate->bindex, is_wh_entry)))
3542+ buf->filldir_error = err;
3543+
3544+out:
3545+ return err;
3546+}
3547+
3548+static int unionfs_readdir(struct file *file, void *dirent, filldir_t filldir)
3549+{
3550+ int err = 0;
3551+ struct file *lower_file = NULL;
3552+ struct inode *inode = NULL;
3553+ struct unionfs_getdents_callback buf;
3554+ struct unionfs_dir_state *uds;
3555+ int bend;
3556+ loff_t offset;
3557+
3558+ unionfs_read_lock(file->f_path.dentry->d_sb);
3559+
3560+ if ((err = unionfs_file_revalidate(file, false)))
3561+ goto out;
3562+
3563+ inode = file->f_path.dentry->d_inode;
3564+
3565+ uds = UNIONFS_F(file)->rdstate;
3566+ if (!uds) {
3567+ if (file->f_pos == DIREOF) {
3568+ goto out;
3569+ } else if (file->f_pos > 0) {
3570+ uds = find_rdstate(inode, file->f_pos);
3571+ if (!uds) {
3572+ err = -ESTALE;
3573+ goto out;
3574+ }
3575+ UNIONFS_F(file)->rdstate = uds;
3576+ } else {
3577+ init_rdstate(file);
3578+ uds = UNIONFS_F(file)->rdstate;
3579+ }
3580+ }
3581+ bend = fbend(file);
3582+
3583+ while (uds->bindex <= bend) {
3584+ lower_file = unionfs_lower_file_idx(file, uds->bindex);
3585+ if (!lower_file) {
3586+ uds->bindex++;
3587+ uds->dirpos = 0;
3588+ continue;
3589+ }
3590+
3591+ /* prepare callback buffer */
3592+ buf.filldir_called = 0;
3593+ buf.filldir_error = 0;
3594+ buf.entries_written = 0;
3595+ buf.dirent = dirent;
3596+ buf.filldir = filldir;
3597+ buf.rdstate = uds;
3598+ buf.sb = inode->i_sb;
3599+
3600+ /* Read starting from where we last left off. */
3601+ offset = vfs_llseek(lower_file, uds->dirpos, SEEK_SET);
3602+ if (offset < 0) {
3603+ err = offset;
3604+ goto out;
3605+ }
3606+ err = vfs_readdir(lower_file, unionfs_filldir, &buf);
3607+
3608+ /* Save the position for when we continue. */
3609+ offset = vfs_llseek(lower_file, 0, SEEK_CUR);
3610+ if (offset < 0) {
3611+ err = offset;
3612+ goto out;
3613+ }
3614+ uds->dirpos = offset;
3615+
3616+ /* Copy the atime. */
3617+ fsstack_copy_attr_atime(inode, lower_file->f_path.dentry->d_inode);
3618+
3619+ if (err < 0)
3620+ goto out;
3621+
3622+ if (buf.filldir_error)
3623+ break;
3624+
3625+ if (!buf.entries_written) {
3626+ uds->bindex++;
3627+ uds->dirpos = 0;
3628+ }
3629+ }
3630+
3631+ if (!buf.filldir_error && uds->bindex >= bend) {
3632+ /* Save the number of hash entries for next time. */
3633+ UNIONFS_I(inode)->hashsize = uds->hashentries;
3634+ free_rdstate(uds);
3635+ UNIONFS_F(file)->rdstate = NULL;
3636+ file->f_pos = DIREOF;
3637+ } else
3638+ file->f_pos = rdstate2offset(uds);
3639+
3640+out:
3641+ unionfs_read_unlock(file->f_path.dentry->d_sb);
3642+ return err;
3643+}
3644+
3645+/*
3646+ * This is not meant to be a generic repositioning function. If you do
3647+ * things that aren't supported, then we return EINVAL.
3648+ *
3649+ * What is allowed:
3650+ * (1) seeking to the same position that you are currently at
3651+ * This really has no effect, but returns where you are.
3652+ * (2) seeking to the beginning of the file
3653+ * This throws out all state, and lets you begin again.
3654+ */
3655+static loff_t unionfs_dir_llseek(struct file *file, loff_t offset, int origin)
3656+{
3657+ struct unionfs_dir_state *rdstate;
3658+ loff_t err;
3659+
3660+ unionfs_read_lock(file->f_path.dentry->d_sb);
3661+
3662+ if ((err = unionfs_file_revalidate(file, false)))
3663+ goto out;
3664+
3665+ rdstate = UNIONFS_F(file)->rdstate;
3666+
3667+ /*
3668+ * we let users seek to their current position, but not anywhere
3669+ * else.
3670+ */
3671+ if (!offset) {
3672+ switch (origin) {
3673+ case SEEK_SET:
3674+ if (rdstate) {
3675+ free_rdstate(rdstate);
3676+ UNIONFS_F(file)->rdstate = NULL;
3677+ }
3678+ init_rdstate(file);
3679+ err = 0;
3680+ break;
3681+ case SEEK_CUR:
3682+ err = file->f_pos;
3683+ break;
3684+ case SEEK_END:
3685+ /* Unsupported, because we would break everything. */
3686+ err = -EINVAL;
3687+ break;
3688+ }
3689+ } else {
3690+ switch (origin) {
3691+ case SEEK_SET:
3692+ if (rdstate) {
3693+ if (offset == rdstate2offset(rdstate))
3694+ err = offset;
3695+ else if (file->f_pos == DIREOF)
3696+ err = DIREOF;
3697+ else
3698+ err = -EINVAL;
3699+ } else {
3700+ rdstate = find_rdstate(file->f_path.dentry->d_inode,
3701+ offset);
3702+ if (rdstate) {
3703+ UNIONFS_F(file)->rdstate = rdstate;
3704+ err = rdstate->offset;
3705+ } else
3706+ err = -EINVAL;
3707+ }
3708+ break;
3709+ case SEEK_CUR:
3710+ case SEEK_END:
3711+ /* Unsupported, because we would break everything. */
3712+ err = -EINVAL;
3713+ break;
3714+ }
3715+ }
3716+
3717+out:
3718+ unionfs_read_unlock(file->f_path.dentry->d_sb);
3719+ return err;
3720+}
3721+
3722+/*
3723+ * Trimmed directory options, we shouldn't pass everything down since
3724+ * we don't want to operate on partial directories.
3725+ */
3726+struct file_operations unionfs_dir_fops = {
3727+ .llseek = unionfs_dir_llseek,
3728+ .read = generic_read_dir,
3729+ .readdir = unionfs_readdir,
3730+ .unlocked_ioctl = unionfs_ioctl,
3731+ .open = unionfs_open,
3732+ .release = unionfs_file_release,
3733+ .flush = unionfs_flush,
3734+ .fsync = unionfs_fsync,
3735+ .fasync = unionfs_fasync,
3736+};
3737diff --git a/fs/unionfs/dirhelper.c b/fs/unionfs/dirhelper.c
3738new file mode 100644
3739index 0000000..a72f711
3740--- /dev/null
3741+++ b/fs/unionfs/dirhelper.c
3742@@ -0,0 +1,271 @@
3743+/*
3744+ * Copyright (c) 2003-2007 Erez Zadok
3745+ * Copyright (c) 2003-2006 Charles P. Wright
3746+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3747+ * Copyright (c) 2005-2006 Junjiro Okajima
3748+ * Copyright (c) 2005 Arun M. Krishnakumar
3749+ * Copyright (c) 2004-2006 David P. Quigley
3750+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3751+ * Copyright (c) 2003 Puja Gupta
3752+ * Copyright (c) 2003 Harikesavan Krishnan
3753+ * Copyright (c) 2003-2007 Stony Brook University
3754+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
3755+ *
3756+ * This program is free software; you can redistribute it and/or modify
3757+ * it under the terms of the GNU General Public License version 2 as
3758+ * published by the Free Software Foundation.
3759+ */
3760+
3761+#include "union.h"
3762+
3763+/*
3764+ * Delete all of the whiteouts in a given directory for rmdir.
3765+ *
3766+ * lower directory inode should be locked
3767+ */
3768+int do_delete_whiteouts(struct dentry *dentry, int bindex,
3769+ struct unionfs_dir_state *namelist)
3770+{
3771+ int err = 0;
3772+ struct dentry *lower_dir_dentry = NULL;
3773+ struct dentry *lower_dentry;
3774+ char *name = NULL, *p;
3775+ struct inode *lower_dir;
3776+ int i;
3777+ struct list_head *pos;
3778+ struct filldir_node *cursor;
3779+
3780+ /* Find out lower parent dentry */
3781+ lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3782+ BUG_ON(!S_ISDIR(lower_dir_dentry->d_inode->i_mode));
3783+ lower_dir = lower_dir_dentry->d_inode;
3784+ BUG_ON(!S_ISDIR(lower_dir->i_mode));
3785+
3786+ err = -ENOMEM;
3787+ name = __getname();
3788+ if (!name)
3789+ goto out;
3790+ strcpy(name, UNIONFS_WHPFX);
3791+ p = name + UNIONFS_WHLEN;
3792+
3793+ err = 0;
3794+ for (i = 0; !err && i < namelist->size; i++) {
3795+ list_for_each(pos, &namelist->list[i]) {
3796+ cursor =
3797+ list_entry(pos, struct filldir_node,
3798+ file_list);
3799+ /* Only operate on whiteouts in this branch. */
3800+ if (cursor->bindex != bindex)
3801+ continue;
3802+ if (!cursor->whiteout)
3803+ continue;
3804+
3805+ strcpy(p, cursor->name);
3806+ lower_dentry =
3807+ lookup_one_len(name, lower_dir_dentry,
3808+ cursor->namelen +
3809+ UNIONFS_WHLEN);
3810+ if (IS_ERR(lower_dentry)) {
3811+ err = PTR_ERR(lower_dentry);
3812+ break;
3813+ }
3814+ if (lower_dentry->d_inode)
3815+ err = vfs_unlink(lower_dir, lower_dentry);
3816+ dput(lower_dentry);
3817+ if (err)
3818+ break;
3819+ }
3820+ }
3821+
3822+ __putname(name);
3823+
3824+ /* After all of the removals, we should copy the attributes once. */
3825+ fsstack_copy_attr_times(dentry->d_inode, lower_dir_dentry->d_inode);
3826+
3827+out:
3828+ return err;
3829+}
3830+
3831+/* delete whiteouts in a dir (for rmdir operation) using sioq if necessary */
3832+int delete_whiteouts(struct dentry *dentry, int bindex,
3833+ struct unionfs_dir_state *namelist)
3834+{
3835+ int err;
3836+ struct super_block *sb;
3837+ struct dentry *lower_dir_dentry;
3838+ struct inode *lower_dir;
3839+ struct sioq_args args;
3840+
3841+ sb = dentry->d_sb;
3842+
3843+ BUG_ON(!S_ISDIR(dentry->d_inode->i_mode));
3844+ BUG_ON(bindex < dbstart(dentry));
3845+ BUG_ON(bindex > dbend(dentry));
3846+ err = is_robranch_super(sb, bindex);
3847+ if (err)
3848+ goto out;
3849+
3850+ lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3851+ BUG_ON(!S_ISDIR(lower_dir_dentry->d_inode->i_mode));
3852+ lower_dir = lower_dir_dentry->d_inode;
3853+ BUG_ON(!S_ISDIR(lower_dir->i_mode));
3854+
3855+ mutex_lock(&lower_dir->i_mutex);
3856+ if (!permission(lower_dir, MAY_WRITE | MAY_EXEC, NULL))
3857+ err = do_delete_whiteouts(dentry, bindex, namelist);
3858+ else {
3859+ args.deletewh.namelist = namelist;
3860+ args.deletewh.dentry = dentry;
3861+ args.deletewh.bindex = bindex;
3862+ run_sioq(__delete_whiteouts, &args);
3863+ err = args.err;
3864+ }
3865+ mutex_unlock(&lower_dir->i_mutex);
3866+
3867+out:
3868+ return err;
3869+}
3870+
3871+#define RD_NONE 0
3872+#define RD_CHECK_EMPTY 1
3873+/* The callback structure for check_empty. */
3874+struct unionfs_rdutil_callback {
3875+ int err;
3876+ int filldir_called;
3877+ struct unionfs_dir_state *rdstate;
3878+ int mode;
3879+};
3880+
3881+/* This filldir function makes sure only whiteouts exist within a directory. */
3882+static int readdir_util_callback(void *dirent, const char *name, int namelen,
3883+ loff_t offset, u64 ino, unsigned int d_type)
3884+{
3885+ int err = 0;
3886+ struct unionfs_rdutil_callback *buf = dirent;
3887+ int whiteout = 0;
3888+ struct filldir_node *found;
3889+
3890+ buf->filldir_called = 1;
3891+
3892+ if (name[0] == '.' && (namelen == 1 ||
3893+ (name[1] == '.' && namelen == 2)))
3894+ goto out;
3895+
3896+ if (namelen > UNIONFS_WHLEN &&
3897+ !strncmp(name, UNIONFS_WHPFX, UNIONFS_WHLEN)) {
3898+ namelen -= UNIONFS_WHLEN;
3899+ name += UNIONFS_WHLEN;
3900+ whiteout = 1;
3901+ }
3902+
3903+ found = find_filldir_node(buf->rdstate, name, namelen);
3904+ /* If it was found in the table there was a previous whiteout. */
3905+ if (found)
3906+ goto out;
3907+
3908+ /*
3909+ * if it wasn't found and isn't a whiteout, the directory isn't
3910+ * empty.
3911+ */
3912+ err = -ENOTEMPTY;
3913+ if ((buf->mode == RD_CHECK_EMPTY) && !whiteout)
3914+ goto out;
3915+
3916+ err = add_filldir_node(buf->rdstate, name, namelen,
3917+ buf->rdstate->bindex, whiteout);
3918+
3919+out:
3920+ buf->err = err;
3921+ return err;
3922+}
3923+
3924+/* Is a directory logically empty? */
3925+int check_empty(struct dentry *dentry, struct unionfs_dir_state **namelist)
3926+{
3927+ int err = 0;
3928+ struct dentry *lower_dentry = NULL;
3929+ struct super_block *sb;
3930+ struct file *lower_file;
3931+ struct unionfs_rdutil_callback *buf = NULL;
3932+ int bindex, bstart, bend, bopaque;
3933+
3934+ sb = dentry->d_sb;
3935+
3936+
3937+ BUG_ON(!S_ISDIR(dentry->d_inode->i_mode));
3938+
3939+ if ((err = unionfs_partial_lookup(dentry)))
3940+ goto out;
3941+
3942+ bstart = dbstart(dentry);
3943+ bend = dbend(dentry);
3944+ bopaque = dbopaque(dentry);
3945+ if (0 <= bopaque && bopaque < bend)
3946+ bend = bopaque;
3947+
3948+ buf = kmalloc(sizeof(struct unionfs_rdutil_callback), GFP_KERNEL);
3949+ if (!buf) {
3950+ err = -ENOMEM;
3951+ goto out;
3952+ }
3953+ buf->err = 0;
3954+ buf->mode = RD_CHECK_EMPTY;
3955+ buf->rdstate = alloc_rdstate(dentry->d_inode, bstart);
3956+ if (!buf->rdstate) {
3957+ err = -ENOMEM;
3958+ goto out;
3959+ }
3960+
3961+ /* Process the lower directories with rdutil_callback as a filldir. */
3962+ for (bindex = bstart; bindex <= bend; bindex++) {
3963+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3964+ if (!lower_dentry)
3965+ continue;
3966+ if (!lower_dentry->d_inode)
3967+ continue;
3968+ if (!S_ISDIR(lower_dentry->d_inode->i_mode))
3969+ continue;
3970+
3971+ dget(lower_dentry);
3972+ unionfs_mntget(dentry, bindex);
3973+ branchget(sb, bindex);
3974+ lower_file =
3975+ dentry_open(lower_dentry,
3976+ unionfs_lower_mnt_idx(dentry, bindex),
3977+ O_RDONLY);
3978+ if (IS_ERR(lower_file)) {
3979+ err = PTR_ERR(lower_file);
3980+ dput(lower_dentry);
3981+ branchput(sb, bindex);
3982+ goto out;
3983+ }
3984+
3985+ do {
3986+ buf->filldir_called = 0;
3987+ buf->rdstate->bindex = bindex;
3988+ err = vfs_readdir(lower_file,
3989+ readdir_util_callback, buf);
3990+ if (buf->err)
3991+ err = buf->err;
3992+ } while ((err >= 0) && buf->filldir_called);
3993+
3994+ /* fput calls dput for lower_dentry */
3995+ fput(lower_file);
3996+ branchput(sb, bindex);
3997+
3998+ if (err < 0)
3999+ goto out;
4000+ }
4001+
4002+out:
4003+ if (buf) {
4004+ if (namelist && !err)
4005+ *namelist = buf->rdstate;
4006+ else if (buf->rdstate)
4007+ free_rdstate(buf->rdstate);
4008+ kfree(buf);
4009+ }
4010+
4011+
4012+ return err;
4013+}
4014diff --git a/fs/unionfs/fanout.h b/fs/unionfs/fanout.h
4015new file mode 100644
4016index 0000000..51aa0de
4017--- /dev/null
4018+++ b/fs/unionfs/fanout.h
4019@@ -0,0 +1,352 @@
4020+/*
4021+ * Copyright (c) 2003-2007 Erez Zadok
4022+ * Copyright (c) 2003-2006 Charles P. Wright
4023+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
4024+ * Copyright (c) 2005 Arun M. Krishnakumar
4025+ * Copyright (c) 2004-2006 David P. Quigley
4026+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
4027+ * Copyright (c) 2003 Puja Gupta
4028+ * Copyright (c) 2003 Harikesavan Krishnan
4029+ * Copyright (c) 2003-2007 Stony Brook University
4030+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
4031+ *
4032+ * This program is free software; you can redistribute it and/or modify
4033+ * it under the terms of the GNU General Public License version 2 as
4034+ * published by the Free Software Foundation.
4035+ */
4036+
4037+#ifndef _FANOUT_H_
4038+#define _FANOUT_H_
4039+
4040+/*
4041+ * Inode to private data
4042+ *
4043+ * Since we use containers and the struct inode is _inside_ the
4044+ * unionfs_inode_info structure, UNIONFS_I will always (given a non-NULL
4045+ * inode pointer), return a valid non-NULL pointer.
4046+ */
4047+static inline struct unionfs_inode_info *UNIONFS_I(const struct inode *inode)
4048+{
4049+ return container_of(inode, struct unionfs_inode_info, vfs_inode);
4050+}
4051+
4052+#define ibstart(ino) (UNIONFS_I(ino)->bstart)
4053+#define ibend(ino) (UNIONFS_I(ino)->bend)
4054+
4055+/* Superblock to private data */
4056+#define UNIONFS_SB(super) ((struct unionfs_sb_info *)(super)->s_fs_info)
4057+#define sbstart(sb) 0
4058+#define sbend(sb) (UNIONFS_SB(sb)->bend)
4059+#define sbmax(sb) (UNIONFS_SB(sb)->bend + 1)
4060+#define sbhbid(sb) (UNIONFS_SB(sb)->high_branch_id)
4061+
4062+/* File to private Data */
4063+#define UNIONFS_F(file) ((struct unionfs_file_info *)((file)->private_data))
4064+#define fbstart(file) (UNIONFS_F(file)->bstart)
4065+#define fbend(file) (UNIONFS_F(file)->bend)
4066+
4067+/* macros to manipulate branch IDs in stored in our superblock */
4068+static inline int branch_id(struct super_block *sb, int index)
4069+{
4070+ BUG_ON(!sb || index < 0);
4071+ return UNIONFS_SB(sb)->data[index].branch_id;
4072+}
4073+
4074+static inline void set_branch_id(struct super_block *sb, int index, int val)
4075+{
4076+ BUG_ON(!sb || index < 0);
4077+ UNIONFS_SB(sb)->data[index].branch_id = val;
4078+}
4079+
4080+static inline void new_branch_id(struct super_block *sb, int index)
4081+{
4082+ BUG_ON(!sb || index < 0);
4083+ set_branch_id(sb, index, ++UNIONFS_SB(sb)->high_branch_id);
4084+}
4085+
4086+/*
4087+ * Find new index of matching branch with an existing superblock of a known
4088+ * (possibly old) id. This is needed because branches could have been
4089+ * added/deleted causing the branches of any open files to shift.
4090+ *
4091+ * @sb: the new superblock which may have new/different branch IDs
4092+ * @id: the old/existing id we're looking for
4093+ * Returns index of newly found branch (0 or greater), -1 otherwise.
4094+ */
4095+static inline int branch_id_to_idx(struct super_block *sb, int id)
4096+{
4097+ int i;
4098+ for (i = 0; i < sbmax(sb); i++) {
4099+ if (branch_id(sb, i) == id)
4100+ return i;
4101+ }
4102+ /* in the non-ODF code, this should really never happen */
4103+ printk(KERN_WARNING "unionfs: cannot find branch with id %d\n", id);
4104+ return -1;
4105+}
4106+
4107+/* File to lower file. */
4108+static inline struct file *unionfs_lower_file(const struct file *f)
4109+{
4110+ BUG_ON(!f);
4111+ return UNIONFS_F(f)->lower_files[fbstart(f)];
4112+}
4113+
4114+static inline struct file *unionfs_lower_file_idx(const struct file *f,
4115+ int index)
4116+{
4117+ BUG_ON(!f || index < 0);
4118+ return UNIONFS_F(f)->lower_files[index];
4119+}
4120+
4121+static inline void unionfs_set_lower_file_idx(struct file *f, int index,
4122+ struct file *val)
4123+{
4124+ BUG_ON(!f || index < 0);
4125+ UNIONFS_F(f)->lower_files[index] = val;
4126+ /* save branch ID (may be redundant?) */
4127+ UNIONFS_F(f)->saved_branch_ids[index] =
4128+ branch_id((f)->f_dentry->d_sb, index);
4129+}
4130+
4131+static inline void unionfs_set_lower_file(struct file *f, struct file *val)
4132+{
4133+ BUG_ON(!f);
4134+ unionfs_set_lower_file_idx((f), fbstart(f), (val));
4135+}
4136+
4137+/* Inode to lower inode. */
4138+static inline struct inode *unionfs_lower_inode(const struct inode *i)
4139+{
4140+ BUG_ON(!i);
4141+ return UNIONFS_I(i)->lower_inodes[ibstart(i)];
4142+}
4143+
4144+static inline struct inode *unionfs_lower_inode_idx(const struct inode *i,
4145+ int index)
4146+{
4147+ BUG_ON(!i || index < 0);
4148+ return UNIONFS_I(i)->lower_inodes[index];
4149+}
4150+
4151+static inline void unionfs_set_lower_inode_idx(struct inode *i, int index,
4152+ struct inode *val)
4153+{
4154+ BUG_ON(!i || index < 0);
4155+ UNIONFS_I(i)->lower_inodes[index] = val;
4156+}
4157+
4158+static inline void unionfs_set_lower_inode(struct inode *i, struct inode *val)
4159+{
4160+ BUG_ON(!i);
4161+ UNIONFS_I(i)->lower_inodes[ibstart(i)] = val;
4162+}
4163+
4164+/* Superblock to lower superblock. */
4165+static inline struct super_block *unionfs_lower_super(
4166+ const struct super_block *sb)
4167+{
4168+ BUG_ON(!sb);
4169+ return UNIONFS_SB(sb)->data[sbstart(sb)].sb;
4170+}
4171+
4172+static inline struct super_block *unionfs_lower_super_idx(
4173+ const struct super_block *sb,
4174+ int index)
4175+{
4176+ BUG_ON(!sb || index < 0);
4177+ return UNIONFS_SB(sb)->data[index].sb;
4178+}
4179+
4180+static inline void unionfs_set_lower_super_idx(struct super_block *sb,
4181+ int index,
4182+ struct super_block *val)
4183+{
4184+ BUG_ON(!sb || index < 0);
4185+ UNIONFS_SB(sb)->data[index].sb = val;
4186+}
4187+
4188+static inline void unionfs_set_lower_super(struct super_block *sb,
4189+ struct super_block *val)
4190+{
4191+ BUG_ON(!sb);
4192+ UNIONFS_SB(sb)->data[sbstart(sb)].sb = val;
4193+}
4194+
4195+/* Branch count macros. */
4196+static inline int branch_count(const struct super_block *sb, int index)
4197+{
4198+ BUG_ON(!sb || index < 0);
4199+ return atomic_read(&UNIONFS_SB(sb)->data[index].open_files);
4200+}
4201+
4202+static inline void set_branch_count(struct super_block *sb, int index, int val)
4203+{
4204+ BUG_ON(!sb || index < 0);
4205+ atomic_set(&UNIONFS_SB(sb)->data[index].open_files, val);
4206+}
4207+
4208+static inline void branchget(struct super_block *sb, int index)
4209+{
4210+ BUG_ON(!sb || index < 0);
4211+ atomic_inc(&UNIONFS_SB(sb)->data[index].open_files);
4212+}
4213+
4214+static inline void branchput(struct super_block *sb, int index)
4215+{
4216+ BUG_ON(!sb || index < 0);
4217+ atomic_dec(&UNIONFS_SB(sb)->data[index].open_files);
4218+}
4219+
4220+/* Dentry macros */
4221+static inline struct unionfs_dentry_info *UNIONFS_D(const struct dentry *dent)
4222+{
4223+ BUG_ON(!dent);
4224+ return dent->d_fsdata;
4225+}
4226+
4227+static inline int dbstart(const struct dentry *dent)
4228+{
4229+ BUG_ON(!dent);
4230+ return UNIONFS_D(dent)->bstart;
4231+}
4232+
4233+static inline void set_dbstart(struct dentry *dent, int val)
4234+{
4235+ BUG_ON(!dent);
4236+ UNIONFS_D(dent)->bstart = val;
4237+}
4238+
4239+static inline int dbend(const struct dentry *dent)
4240+{
4241+ BUG_ON(!dent);
4242+ return UNIONFS_D(dent)->bend;
4243+}
4244+
4245+static inline void set_dbend(struct dentry *dent, int val)
4246+{
4247+ BUG_ON(!dent);
4248+ UNIONFS_D(dent)->bend = val;
4249+}
4250+
4251+static inline int dbopaque(const struct dentry *dent)
4252+{
4253+ BUG_ON(!dent);
4254+ return UNIONFS_D(dent)->bopaque;
4255+}
4256+
4257+static inline void set_dbopaque(struct dentry *dent, int val)
4258+{
4259+ BUG_ON(!dent);
4260+ UNIONFS_D(dent)->bopaque = val;
4261+}
4262+
4263+static inline void unionfs_set_lower_dentry_idx(struct dentry *dent, int index,
4264+ struct dentry *val)
4265+{
4266+ BUG_ON(!dent || index < 0);
4267+ UNIONFS_D(dent)->lower_paths[index].dentry = val;
4268+}
4269+
4270+static inline struct dentry *unionfs_lower_dentry_idx(
4271+ const struct dentry *dent,
4272+ int index)
4273+{
4274+ BUG_ON(!dent || index < 0);
4275+ return UNIONFS_D(dent)->lower_paths[index].dentry;
4276+}
4277+
4278+static inline struct dentry *unionfs_lower_dentry(const struct dentry *dent)
4279+{
4280+ BUG_ON(!dent);
4281+ return unionfs_lower_dentry_idx(dent, dbstart(dent));
4282+}
4283+
4284+static inline void unionfs_set_lower_mnt_idx(struct dentry *dent, int index,
4285+ struct vfsmount *mnt)
4286+{
4287+ BUG_ON(!dent || index < 0);
4288+ UNIONFS_D(dent)->lower_paths[index].mnt = mnt;
4289+}
4290+
4291+static inline struct vfsmount *unionfs_lower_mnt_idx(
4292+ const struct dentry *dent,
4293+ int index)
4294+{
4295+ BUG_ON(!dent || index < 0);
4296+ return UNIONFS_D(dent)->lower_paths[index].mnt;
4297+}
4298+
4299+static inline struct vfsmount *unionfs_lower_mnt(const struct dentry *dent)
4300+{
4301+ BUG_ON(!dent);
4302+ return unionfs_lower_mnt_idx(dent, dbstart(dent));
4303+}
4304+
4305+/* Macros for locking a dentry. */
4306+static inline void unionfs_lock_dentry(struct dentry *d)
4307+{
4308+ BUG_ON(!d);
4309+ mutex_lock(&UNIONFS_D(d)->lock);
4310+}
4311+
4312+static inline void unionfs_unlock_dentry(struct dentry *d)
4313+{
4314+ BUG_ON(!d);
4315+ mutex_unlock(&UNIONFS_D(d)->lock);
4316+}
4317+
4318+static inline void verify_locked(struct dentry *d)
4319+{
4320+ BUG_ON(!d);
4321+ BUG_ON(!mutex_is_locked(&UNIONFS_D(d)->lock));
4322+}
4323+
4324+/* copy a/m/ctime from the lower branch with the newest times */
4325+static inline void unionfs_copy_attr_times(struct inode *upper)
4326+{
4327+ int bindex;
4328+ struct inode *lower;
4329+
4330+ if (!upper || ibstart(upper) < 0)
4331+ return;
4332+ for (bindex=ibstart(upper); bindex <= ibend(upper); bindex++) {
4333+ lower = unionfs_lower_inode_idx(upper, bindex);
4334+ if (!lower)
4335+ continue; /* not all lower dir objects may exist */
4336+ if (timespec_compare(&upper->i_mtime, &lower->i_mtime) < 0)
4337+ upper->i_mtime = lower->i_mtime;
4338+ if (timespec_compare(&upper->i_ctime, &lower->i_ctime) < 0)
4339+ upper->i_ctime = lower->i_ctime;
4340+ if (timespec_compare(&upper->i_atime, &lower->i_atime) < 0)
4341+ upper->i_atime = lower->i_atime;
4342+ }
4343+}
4344+
4345+/*
4346+ * A unionfs/fanout version of fsstack_copy_attr_all. Uses a
4347+ * unionfs_get_nlinks to properly calcluate the number of links to a file.
4348+ * Also, copies the max() of all a/m/ctimes for all lower inodes (which is
4349+ * important if the lower inode is a directory type)
4350+ */
4351+static inline void unionfs_copy_attr_all(struct inode *dest,
4352+ const struct inode *src)
4353+{
4354+ dest->i_mode = src->i_mode;
4355+ dest->i_uid = src->i_uid;
4356+ dest->i_gid = src->i_gid;
4357+ dest->i_rdev = src->i_rdev;
4358+
4359+ unionfs_copy_attr_times(dest);
4360+
4361+ dest->i_blkbits = src->i_blkbits;
4362+ dest->i_flags = src->i_flags;
4363+
4364+ /*
4365+ * Update the nlinks AFTER updating the above fields, because the
4366+ * get_links callback may depend on them.
4367+ */
4368+ dest->i_nlink = unionfs_get_nlinks(dest);
4369+}
4370+
4371+#endif /* not _FANOUT_H */
4372diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
4373new file mode 100644
4374index 0000000..2409378
4375--- /dev/null
4376+++ b/fs/unionfs/file.c
4377@@ -0,0 +1,250 @@
4378+/*
4379+ * Copyright (c) 2003-2007 Erez Zadok
4380+ * Copyright (c) 2003-2006 Charles P. Wright
4381+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
4382+ * Copyright (c) 2005-2006 Junjiro Okajima
4383+ * Copyright (c) 2005 Arun M. Krishnakumar
4384+ * Copyright (c) 2004-2006 David P. Quigley
4385+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
4386+ * Copyright (c) 2003 Puja Gupta
4387+ * Copyright (c) 2003 Harikesavan Krishnan
4388+ * Copyright (c) 2003-2007 Stony Brook University
4389+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
4390+ *
4391+ * This program is free software; you can redistribute it and/or modify
4392+ * it under the terms of the GNU General Public License version 2 as
4393+ * published by the Free Software Foundation.
4394+ */
4395+
4396+#include "union.h"
4397+
4398+static ssize_t unionfs_read(struct file *file, char __user *buf,
4399+ size_t count, loff_t *ppos)
4400+{
4401+ int err;
4402+
4403+ unionfs_read_lock(file->f_path.dentry->d_sb);
4404+ if ((err = unionfs_file_revalidate(file, false)))
4405+ goto out;
4406+ unionfs_check_file(file);
4407+
4408+ err = do_sync_read(file, buf, count, ppos);
4409+
4410+ if (err >= 0)
4411+ touch_atime(unionfs_lower_mnt(file->f_path.dentry),
4412+ unionfs_lower_dentry(file->f_path.dentry));
4413+
4414+out:
4415+ unionfs_read_unlock(file->f_path.dentry->d_sb);
4416+ unionfs_check_file(file);
4417+ return err;
4418+}
4419+
4420+static ssize_t unionfs_aio_read(struct kiocb *iocb, const struct iovec *iov,
4421+ unsigned long nr_segs, loff_t pos)
4422+{
4423+ int err = 0;
4424+ struct file *file = iocb->ki_filp;
4425+
4426+ unionfs_read_lock(file->f_path.dentry->d_sb);
4427+ if ((err = unionfs_file_revalidate(file, false)))
4428+ goto out;
4429+ unionfs_check_file(file);
4430+
4431+ err = generic_file_aio_read(iocb, iov, nr_segs, pos);
4432+
4433+ if (err == -EIOCBQUEUED)
4434+ err = wait_on_sync_kiocb(iocb);
4435+
4436+ if (err >= 0)
4437+ touch_atime(unionfs_lower_mnt(file->f_path.dentry),
4438+ unionfs_lower_dentry(file->f_path.dentry));
4439+
4440+out:
4441+ unionfs_read_unlock(file->f_path.dentry->d_sb);
4442+ unionfs_check_file(file);
4443+ return err;
4444+}
4445+
4446+static ssize_t unionfs_write(struct file *file, const char __user *buf,
4447+ size_t count, loff_t *ppos)
4448+{
4449+ int err = 0;
4450+
4451+ unionfs_read_lock(file->f_path.dentry->d_sb);
4452+ if ((err = unionfs_file_revalidate(file, true)))
4453+ goto out;
4454+ unionfs_check_file(file);
4455+
4456+ err = do_sync_write(file, buf, count, ppos);
4457+ /* update our inode times upon a successful lower write */
4458+ if (err >= 0) {
4459+ unionfs_copy_attr_times(file->f_path.dentry->d_inode);
4460+ unionfs_check_file(file);
4461+ }
4462+
4463+out:
4464+ unionfs_read_unlock(file->f_path.dentry->d_sb);
4465+ return err;
4466+}
4467+
4468+static int unionfs_file_readdir(struct file *file, void *dirent,
4469+ filldir_t filldir)
4470+{
4471+ return -ENOTDIR;
4472+}
4473+
4474+static int unionfs_mmap(struct file *file, struct vm_area_struct *vma)
4475+{
4476+ int err = 0;
4477+ bool willwrite;
4478+ struct file *lower_file;
4479+
4480+ unionfs_read_lock(file->f_path.dentry->d_sb);
4481+
4482+ /* This might be deferred to mmap's writepage */
4483+ willwrite = ((vma->vm_flags | VM_SHARED | VM_WRITE) == vma->vm_flags);
4484+ if ((err = unionfs_file_revalidate(file, willwrite)))
4485+ goto out;
4486+ unionfs_check_file(file);
4487+
4488+ /*
4489+ * File systems which do not implement ->writepage may use
4490+ * generic_file_readonly_mmap as their ->mmap op. If you call
4491+ * generic_file_readonly_mmap with VM_WRITE, you'd get an -EINVAL.
4492+ * But we cannot call the lower ->mmap op, so we can't tell that
4493+ * writeable mappings won't work. Therefore, our only choice is to
4494+ * check if the lower file system supports the ->writepage, and if
4495+ * not, return EINVAL (the same error that
4496+ * generic_file_readonly_mmap returns in that case).
4497+ */
4498+ lower_file = unionfs_lower_file(file);
4499+ if (willwrite && !lower_file->f_mapping->a_ops->writepage) {
4500+ err = -EINVAL;
4501+ printk("unionfs: branch %d file system does not support "
4502+ "writeable mmap\n", fbstart(file));
4503+ } else {
4504+ err = generic_file_mmap(file, vma);
4505+ if (err)
4506+ printk("unionfs: generic_file_mmap failed %d\n", err);
4507+ }
4508+
4509+out:
4510+ unionfs_read_unlock(file->f_path.dentry->d_sb);
4511+ if (!err) {
4512+ /* copyup could cause parent dir times to change */
4513+ unionfs_copy_attr_times(file->f_path.dentry->d_parent->d_inode);
4514+ unionfs_check_file(file);
4515+ unionfs_check_dentry(file->f_path.dentry->d_parent);
4516+ }
4517+ return err;
4518+}
4519+
4520+int unionfs_fsync(struct file *file, struct dentry *dentry, int datasync)
4521+{
4522+ int bindex, bstart, bend;
4523+ struct file *lower_file;
4524+ struct dentry *lower_dentry;
4525+ struct inode *lower_inode, *inode;
4526+ int err = -EINVAL;
4527+
4528+ unionfs_read_lock(file->f_path.dentry->d_sb);
4529+ if ((err = unionfs_file_revalidate(file, true)))
4530+ goto out;
4531+ unionfs_check_file(file);
4532+
4533+ bstart = fbstart(file);
4534+ bend = fbend(file);
4535+ if (bstart < 0 || bend < 0)
4536+ goto out;
4537+
4538+ inode = dentry->d_inode;
4539+ if (!inode) {
4540+ printk(KERN_ERR
4541+ "unionfs: null lower inode in unionfs_fsync\n");
4542+ goto out;
4543+ }
4544+ for (bindex = bstart; bindex <= bend; bindex++) {
4545+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
4546+ if (!lower_inode || !lower_inode->i_fop->fsync)
4547+ continue;
4548+ lower_file = unionfs_lower_file_idx(file, bindex);
4549+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
4550+ mutex_lock(&lower_inode->i_mutex);
4551+ err = lower_inode->i_fop->fsync(lower_file,
4552+ lower_dentry,
4553+ datasync);
4554+ mutex_unlock(&lower_inode->i_mutex);
4555+ if (err)
4556+ goto out;
4557+ }
4558+
4559+ unionfs_copy_attr_times(inode);
4560+
4561+out:
4562+ unionfs_read_unlock(file->f_path.dentry->d_sb);
4563+ unionfs_check_file(file);
4564+ return err;
4565+}
4566+
4567+int unionfs_fasync(int fd, struct file *file, int flag)
4568+{
4569+ int bindex, bstart, bend;
4570+ struct file *lower_file;
4571+ struct dentry *dentry;
4572+ struct inode *lower_inode, *inode;
4573+ int err = 0;
4574+
4575+ unionfs_read_lock(file->f_path.dentry->d_sb);
4576+ if ((err = unionfs_file_revalidate(file, true)))
4577+ goto out;
4578+ unionfs_check_file(file);
4579+
4580+ bstart = fbstart(file);
4581+ bend = fbend(file);
4582+ if (bstart < 0 || bend < 0)
4583+ goto out;
4584+
4585+ dentry = file->f_path.dentry;
4586+ inode = dentry->d_inode;
4587+ if (!inode) {
4588+ printk(KERN_ERR
4589+ "unionfs: null lower inode in unionfs_fasync\n");
4590+ goto out;
4591+ }
4592+ for (bindex = bstart; bindex <= bend; bindex++) {
4593+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
4594+ if (!lower_inode || !lower_inode->i_fop->fasync)
4595+ continue;
4596+ lower_file = unionfs_lower_file_idx(file, bindex);
4597+ mutex_lock(&lower_inode->i_mutex);
4598+ err = lower_inode->i_fop->fasync(fd, lower_file, flag);
4599+ mutex_unlock(&lower_inode->i_mutex);
4600+ if (err)
4601+ goto out;
4602+ }
4603+
4604+ unionfs_copy_attr_times(inode);
4605+
4606+out:
4607+ unionfs_read_unlock(file->f_path.dentry->d_sb);
4608+ unionfs_check_file(file);
4609+ return err;
4610+}
4611+
4612+struct file_operations unionfs_main_fops = {
4613+ .llseek = generic_file_llseek,
4614+ .read = unionfs_read,
4615+ .aio_read = unionfs_aio_read,
4616+ .write = unionfs_write,
4617+ .aio_write = generic_file_aio_write,
4618+ .readdir = unionfs_file_readdir,
4619+ .unlocked_ioctl = unionfs_ioctl,
4620+ .mmap = unionfs_mmap,
4621+ .open = unionfs_open,
4622+ .flush = unionfs_flush,
4623+ .release = unionfs_file_release,
4624+ .fsync = unionfs_fsync,
4625+ .fasync = unionfs_fasync,
4626+ .sendfile = generic_file_sendfile,
4627+};
4628diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
4629new file mode 100644
4630index 0000000..9638b64
4631--- /dev/null
4632+++ b/fs/unionfs/inode.c
4633@@ -0,0 +1,1133 @@
4634+/*
4635+ * Copyright (c) 2003-2007 Erez Zadok
4636+ * Copyright (c) 2003-2006 Charles P. Wright
4637+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
4638+ * Copyright (c) 2005-2006 Junjiro Okajima
4639+ * Copyright (c) 2005 Arun M. Krishnakumar
4640+ * Copyright (c) 2004-2006 David P. Quigley
4641+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
4642+ * Copyright (c) 2003 Puja Gupta
4643+ * Copyright (c) 2003 Harikesavan Krishnan
4644+ * Copyright (c) 2003-2007 Stony Brook University
4645+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
4646+ *
4647+ * This program is free software; you can redistribute it and/or modify
4648+ * it under the terms of the GNU General Public License version 2 as
4649+ * published by the Free Software Foundation.
4650+ */
4651+
4652+#include "union.h"
4653+
4654+static int unionfs_create(struct inode *parent, struct dentry *dentry,
4655+ int mode, struct nameidata *nd)
4656+{
4657+ int err = 0;
4658+ struct dentry *lower_dentry = NULL;
4659+ struct dentry *wh_dentry = NULL;
4660+ struct dentry *lower_parent_dentry = NULL;
4661+ char *name = NULL;
4662+ int valid = 0;
4663+
4664+ unionfs_read_lock(dentry->d_sb);
4665+ unionfs_lock_dentry(dentry);
4666+
4667+ unionfs_lock_dentry(dentry->d_parent);
4668+ valid = __unionfs_d_revalidate_chain(dentry->d_parent, nd, false);
4669+ unionfs_unlock_dentry(dentry->d_parent);
4670+ if (!valid) {
4671+ err = -ESTALE; /* same as what real_lookup does */
4672+ goto out;
4673+ }
4674+ valid = __unionfs_d_revalidate_chain(dentry, nd, false);
4675+ /*
4676+ * It's only a bug if this dentry was not negative and couldn't be
4677+ * revalidated (shouldn't happen).
4678+ */
4679+ BUG_ON(!valid && dentry->d_inode);
4680+
4681+ /*
4682+ * We shouldn't create things in a read-only branch; this check is a
4683+ * bit redundant as we don't allow branch 0 to be read-only at the
4684+ * moment
4685+ */
4686+ if ((err = is_robranch_super(dentry->d_sb, 0))) {
4687+ err = -EROFS;
4688+ goto out;
4689+ }
4690+
4691+ /*
4692+ * We _always_ create on branch 0
4693+ */
4694+ lower_dentry = unionfs_lower_dentry_idx(dentry, 0);
4695+ if (lower_dentry) {
4696+ /*
4697+ * check if whiteout exists in this branch, i.e. lookup .wh.foo
4698+ * first.
4699+ */
4700+ name = alloc_whname(dentry->d_name.name, dentry->d_name.len);
4701+ if (IS_ERR(name)) {
4702+ err = PTR_ERR(name);
4703+ goto out;
4704+ }
4705+
4706+ wh_dentry = lookup_one_len(name, lower_dentry->d_parent,
4707+ dentry->d_name.len + UNIONFS_WHLEN);
4708+ if (IS_ERR(wh_dentry)) {
4709+ err = PTR_ERR(wh_dentry);
4710+ wh_dentry = NULL;
4711+ goto out;
4712+ }
4713+
4714+ if (wh_dentry->d_inode) {
4715+ /*
4716+ * .wh.foo has been found, so let's unlink it
4717+ */
4718+ struct dentry *lower_dir_dentry;
4719+
4720+ lower_dir_dentry = lock_parent(wh_dentry);
4721+ err = vfs_unlink(lower_dir_dentry->d_inode, wh_dentry);
4722+ unlock_dir(lower_dir_dentry);
4723+
4724+ if (err) {
4725+ printk("unionfs_create: could not unlink "
4726+ "whiteout, err = %d\n", err);
4727+ goto out;
4728+ }
4729+ }
4730+ } else {
4731+ /*
4732+ * if lower_dentry is NULL, create the entire
4733+ * dentry directory structure in branch 0.
4734+ */
4735+ lower_dentry = create_parents(parent, dentry, dentry->d_name.name, 0);
4736+ if (IS_ERR(lower_dentry)) {
4737+ err = PTR_ERR(lower_dentry);
4738+ goto out;
4739+ }
4740+ }
4741+
4742+ lower_parent_dentry = lock_parent(lower_dentry);
4743+ if (IS_ERR(lower_parent_dentry)) {
4744+ err = PTR_ERR(lower_parent_dentry);
4745+ goto out;
4746+ }
4747+
4748+ err = vfs_create(lower_parent_dentry->d_inode, lower_dentry, mode, nd);
4749+
4750+ if (!err) {
4751+ err = PTR_ERR(unionfs_interpose(dentry, parent->i_sb, 0));
4752+ if (!err) {
4753+ unionfs_copy_attr_times(parent);
4754+ fsstack_copy_inode_size(parent,
4755+ lower_parent_dentry->d_inode);
4756+ /* update no. of links on parent directory */
4757+ parent->i_nlink = unionfs_get_nlinks(parent);
4758+ }
4759+ }
4760+
4761+ unlock_dir(lower_parent_dentry);
4762+
4763+out:
4764+ dput(wh_dentry);
4765+ kfree(name);
4766+
4767+ if (!err)
4768+ unionfs_postcopyup_setmnt(dentry);
4769+ unionfs_unlock_dentry(dentry);
4770+ unionfs_read_unlock(dentry->d_sb);
4771+
4772+ unionfs_check_inode(parent);
4773+ if (!err)
4774+ unionfs_check_dentry(dentry->d_parent);
4775+ unionfs_check_dentry(dentry);
4776+ return err;
4777+}
4778+
4779+/*
4780+ * unionfs_lookup is the only special function which takes a dentry, yet we
4781+ * do NOT want to call __unionfs_d_revalidate_chain because by definition,
4782+ * we don't have a valid dentry here yet.
4783+ */
4784+static struct dentry *unionfs_lookup(struct inode *parent,
4785+ struct dentry *dentry,
4786+ struct nameidata *nd)
4787+{
4788+ struct path path_save;
4789+ struct dentry *ret;
4790+
4791+ unionfs_read_lock(dentry->d_sb);
4792+
4793+ /* save the dentry & vfsmnt from namei */
4794+ if (nd) {
4795+ path_save.dentry = nd->dentry;
4796+ path_save.mnt = nd->mnt;
4797+ }
4798+
4799+ /*
4800+ * unionfs_lookup_backend returns a locked dentry upon success,
4801+ * so we'll have to unlock it below.
4802+ */
4803+ ret = unionfs_lookup_backend(dentry, nd, INTERPOSE_LOOKUP);
4804+
4805+ /* restore the dentry & vfsmnt in namei */
4806+ if (nd) {
4807+ nd->dentry = path_save.dentry;
4808+ nd->mnt = path_save.mnt;
4809+ }
4810+ if (!IS_ERR(ret)) {
4811+ if (ret)
4812+ dentry = ret;
4813+ /* parent times may have changed */
4814+ unionfs_copy_attr_times(dentry->d_parent->d_inode);
4815+ unionfs_unlock_dentry(dentry);
4816+ }
4817+
4818+ unionfs_check_inode(parent);
4819+ unionfs_check_dentry(dentry);
4820+ unionfs_check_dentry(dentry->d_parent);
4821+ unionfs_read_unlock(dentry->d_sb);
4822+
4823+ return ret;
4824+}
4825+
4826+static int unionfs_link(struct dentry *old_dentry, struct inode *dir,
4827+ struct dentry *new_dentry)
4828+{
4829+ int err = 0;
4830+ struct dentry *lower_old_dentry = NULL;
4831+ struct dentry *lower_new_dentry = NULL;
4832+ struct dentry *lower_dir_dentry = NULL;
4833+ struct dentry *whiteout_dentry;
4834+ char *name = NULL;
4835+
4836+ unionfs_read_lock(old_dentry->d_sb);
4837+ unionfs_double_lock_dentry(new_dentry, old_dentry);
4838+
4839+ if (!__unionfs_d_revalidate_chain(old_dentry, NULL, false)) {
4840+ err = -ESTALE;
4841+ goto out;
4842+ }
4843+ if (new_dentry->d_inode &&
4844+ !__unionfs_d_revalidate_chain(new_dentry, NULL, false)) {
4845+ err = -ESTALE;
4846+ goto out;
4847+ }
4848+
4849+ lower_new_dentry = unionfs_lower_dentry(new_dentry);
4850+
4851+ /*
4852+ * check if whiteout exists in the branch of new dentry, i.e. lookup
4853+ * .wh.foo first. If present, delete it
4854+ */
4855+ name = alloc_whname(new_dentry->d_name.name, new_dentry->d_name.len);
4856+ if (IS_ERR(name)) {
4857+ err = PTR_ERR(name);
4858+ goto out;
4859+ }
4860+
4861+ whiteout_dentry = lookup_one_len(name, lower_new_dentry->d_parent,
4862+ new_dentry->d_name.len +
4863+ UNIONFS_WHLEN);
4864+ if (IS_ERR(whiteout_dentry)) {
4865+ err = PTR_ERR(whiteout_dentry);
4866+ goto out;
4867+ }
4868+
4869+ if (!whiteout_dentry->d_inode) {
4870+ dput(whiteout_dentry);
4871+ whiteout_dentry = NULL;
4872+ } else {
4873+ /* found a .wh.foo entry, unlink it and then call vfs_link() */
4874+ lower_dir_dentry = lock_parent(whiteout_dentry);
4875+ err = is_robranch_super(new_dentry->d_sb, dbstart(new_dentry));
4876+ if (!err)
4877+ err = vfs_unlink(lower_dir_dentry->d_inode,
4878+ whiteout_dentry);
4879+
4880+ fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
4881+ dir->i_nlink = unionfs_get_nlinks(dir);
4882+ unlock_dir(lower_dir_dentry);
4883+ lower_dir_dentry = NULL;
4884+ dput(whiteout_dentry);
4885+ if (err)
4886+ goto out;
4887+ }
4888+
4889+ if (dbstart(old_dentry) != dbstart(new_dentry)) {
4890+ lower_new_dentry = create_parents(dir, new_dentry,
4891+ new_dentry->d_name.name,
4892+ dbstart(old_dentry));
4893+ err = PTR_ERR(lower_new_dentry);
4894+ if (IS_COPYUP_ERR(err))
4895+ goto docopyup;
4896+ if (!lower_new_dentry || IS_ERR(lower_new_dentry))
4897+ goto out;
4898+ }
4899+ lower_new_dentry = unionfs_lower_dentry(new_dentry);
4900+ lower_old_dentry = unionfs_lower_dentry(old_dentry);
4901+
4902+ BUG_ON(dbstart(old_dentry) != dbstart(new_dentry));
4903+ lower_dir_dentry = lock_parent(lower_new_dentry);
4904+ if (!(err = is_robranch(old_dentry)))
4905+ err = vfs_link(lower_old_dentry, lower_dir_dentry->d_inode,
4906+ lower_new_dentry);
4907+ unlock_dir(lower_dir_dentry);
4908+
4909+docopyup:
4910+ if (IS_COPYUP_ERR(err)) {
4911+ int old_bstart = dbstart(old_dentry);
4912+ int bindex;
4913+
4914+ for (bindex = old_bstart - 1; bindex >= 0; bindex--) {
4915+ err = copyup_dentry(old_dentry->d_parent->d_inode,
4916+ old_dentry, old_bstart,
4917+ bindex, old_dentry->d_name.name,
4918+ old_dentry->d_name.len, NULL,
4919+ old_dentry->d_inode->i_size);
4920+ if (!err) {
4921+ lower_new_dentry =
4922+ create_parents(dir, new_dentry,
4923+ new_dentry->d_name.name,
4924+ bindex);
4925+ lower_old_dentry =
4926+ unionfs_lower_dentry(old_dentry);
4927+ lower_dir_dentry =
4928+ lock_parent(lower_new_dentry);
4929+ /* do vfs_link */
4930+ err = vfs_link(lower_old_dentry,
4931+ lower_dir_dentry->d_inode,
4932+ lower_new_dentry);
4933+ unlock_dir(lower_dir_dentry);
4934+ goto check_link;
4935+ }
4936+ }
4937+ goto out;
4938+ }
4939+
4940+check_link:
4941+ if (err || !lower_new_dentry->d_inode)
4942+ goto out;
4943+
4944+ /* Its a hard link, so use the same inode */
4945+ new_dentry->d_inode = igrab(old_dentry->d_inode);
4946+ d_instantiate(new_dentry, new_dentry->d_inode);
4947+ unionfs_copy_attr_all(dir, lower_new_dentry->d_parent->d_inode);
4948+ fsstack_copy_inode_size(dir, lower_new_dentry->d_parent->d_inode);
4949+
4950+ /* propagate number of hard-links */
4951+ old_dentry->d_inode->i_nlink = unionfs_get_nlinks(old_dentry->d_inode);
4952+ /* new dentry's ctime may have changed due to hard-link counts */
4953+ unionfs_copy_attr_times(new_dentry->d_inode);
4954+
4955+out:
4956+ if (!new_dentry->d_inode)
4957+ d_drop(new_dentry);
4958+
4959+ kfree(name);
4960+ if (!err)
4961+ unionfs_postcopyup_setmnt(new_dentry);
4962+
4963+ unionfs_unlock_dentry(new_dentry);
4964+ unionfs_unlock_dentry(old_dentry);
4965+
4966+ unionfs_check_inode(dir);
4967+ unionfs_check_dentry(new_dentry);
4968+ unionfs_check_dentry(old_dentry);
4969+ unionfs_read_unlock(old_dentry->d_sb);
4970+
4971+ return err;
4972+}
4973+
4974+static int unionfs_symlink(struct inode *dir, struct dentry *dentry,
4975+ const char *symname)
4976+{
4977+ int err = 0;
4978+ struct dentry *lower_dentry = NULL;
4979+ struct dentry *whiteout_dentry = NULL;
4980+ struct dentry *lower_dir_dentry = NULL;
4981+ umode_t mode;
4982+ int bindex = 0, bstart;
4983+ char *name = NULL;
4984+
4985+ unionfs_read_lock(dentry->d_sb);
4986+ unionfs_lock_dentry(dentry);
4987+
4988+ if (dentry->d_inode &&
4989+ !__unionfs_d_revalidate_chain(dentry, NULL, false)) {
4990+ err = -ESTALE;
4991+ goto out;
4992+ }
4993+
4994+ /* We start out in the leftmost branch. */
4995+ bstart = dbstart(dentry);
4996+
4997+ lower_dentry = unionfs_lower_dentry(dentry);
4998+
4999+ /*
5000+ * check if whiteout exists in this branch, i.e. lookup .wh.foo
5001+ * first. If present, delete it
5002+ */
5003+ name = alloc_whname(dentry->d_name.name, dentry->d_name.len);
5004+ if (IS_ERR(name)) {
5005+ err = PTR_ERR(name);
5006+ goto out;
5007+ }
5008+
5009+ whiteout_dentry =
5010+ lookup_one_len(name, lower_dentry->d_parent,
5011+ dentry->d_name.len + UNIONFS_WHLEN);
5012+ if (IS_ERR(whiteout_dentry)) {
5013+ err = PTR_ERR(whiteout_dentry);
5014+ goto out;
5015+ }
5016+
5017+ if (!whiteout_dentry->d_inode) {
5018+ dput(whiteout_dentry);
5019+ whiteout_dentry = NULL;
5020+ } else {
5021+ /*
5022+ * found a .wh.foo entry, unlink it and then call
5023+ * vfs_symlink().
5024+ */
5025+ lower_dir_dentry = lock_parent(whiteout_dentry);
5026+
5027+ if (!(err = is_robranch_super(dentry->d_sb, bstart)))
5028+ err = vfs_unlink(lower_dir_dentry->d_inode,
5029+ whiteout_dentry);
5030+ dput(whiteout_dentry);
5031+
5032+ fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
5033+ /* propagate number of hard-links */
5034+ dir->i_nlink = unionfs_get_nlinks(dir);
5035+
5036+ unlock_dir(lower_dir_dentry);
5037+
5038+ if (err) {
5039+ /* exit if the error returned was NOT -EROFS */
5040+ if (!IS_COPYUP_ERR(err))
5041+ goto out;
5042+ /*
5043+ * should now try to create symlink in the another
5044+ * branch.
5045+ */
5046+ bstart--;
5047+ }
5048+ }
5049+
5050+ /*
5051+ * deleted whiteout if it was present, now do a normal vfs_symlink()
5052+ * with possible recursive directory creation
5053+ */
5054+ for (bindex = bstart; bindex >= 0; bindex--) {
5055+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
5056+ if (!lower_dentry) {
5057+ /*
5058+ * if lower_dentry is NULL, create the entire
5059+ * dentry directory structure in branch 'bindex'.
5060+ * lower_dentry will NOT be null when bindex ==
5061+ * bstart because lookup passed as a negative
5062+ * unionfs dentry pointing to a lone negative
5063+ * underlying dentry
5064+ */
5065+ lower_dentry = create_parents(dir, dentry,
5066+ dentry->d_name.name,
5067+ bindex);
5068+ if (!lower_dentry || IS_ERR(lower_dentry)) {
5069+ if (IS_ERR(lower_dentry))
5070+ err = PTR_ERR(lower_dentry);
5071+
5072+ printk(KERN_DEBUG "unionfs: lower dentry "
5073+ "NULL (or error) for bindex = %d\n",
5074+ bindex);
5075+ continue;
5076+ }
5077+ }
5078+
5079+ lower_dir_dentry = lock_parent(lower_dentry);
5080+
5081+ if (!(err = is_robranch_super(dentry->d_sb, bindex))) {
5082+ mode = S_IALLUGO;
5083+ err =
5084+ vfs_symlink(lower_dir_dentry->d_inode,
5085+ lower_dentry, symname, mode);
5086+ }
5087+ unlock_dir(lower_dir_dentry);
5088+
5089+ if (err || !lower_dentry->d_inode) {
5090+ /*
5091+ * break out of for loop if error returned was NOT
5092+ * -EROFS.
5093+ */
5094+ if (!IS_COPYUP_ERR(err))
5095+ break;
5096+ } else {
5097+ /*
5098+ * Only INTERPOSE_LOOKUP can return a value other
5099+ * than 0 on err.
5100+ */
5101+ err = PTR_ERR(unionfs_interpose(dentry,
5102+ dir->i_sb, 0));
5103+ if (!err) {
5104+ fsstack_copy_attr_times(dir,
5105+ lower_dir_dentry->
5106+ d_inode);
5107+ fsstack_copy_inode_size(dir,
5108+ lower_dir_dentry->
5109+ d_inode);
5110+ /*
5111+ * update number of links on parent
5112+ * directory.
5113+ */
5114+ dir->i_nlink = unionfs_get_nlinks(dir);
5115+ }
5116+ break;
5117+ }
5118+ }
5119+
5120+out:
5121+ if (!dentry->d_inode)
5122+ d_drop(dentry);
5123+
5124+ kfree(name);
5125+ if (!err)
5126+ unionfs_postcopyup_setmnt(dentry);
5127+ unionfs_unlock_dentry(dentry);
5128+
5129+ unionfs_check_inode(dir);
5130+ unionfs_check_dentry(dentry);
5131+ unionfs_read_unlock(dentry->d_sb);
5132+
5133+ return err;
5134+}
5135+
5136+static int unionfs_mkdir(struct inode *parent, struct dentry *dentry, int mode)
5137+{
5138+ int err = 0;
5139+ struct dentry *lower_dentry = NULL, *whiteout_dentry = NULL;
5140+ struct dentry *lower_parent_dentry = NULL;
5141+ int bindex = 0, bstart;
5142+ char *name = NULL;
5143+ int whiteout_unlinked = 0;
5144+ struct sioq_args args;
5145+
5146+ unionfs_read_lock(dentry->d_sb);
5147+ unionfs_lock_dentry(dentry);
5148+
5149+ if (dentry->d_inode &&
5150+ !__unionfs_d_revalidate_chain(dentry, NULL, false)) {
5151+ err = -ESTALE;
5152+ goto out;
5153+ }
5154+
5155+ bstart = dbstart(dentry);
5156+
5157+ lower_dentry = unionfs_lower_dentry(dentry);
5158+
5159+ /*
5160+ * check if whiteout exists in this branch, i.e. lookup .wh.foo
5161+ * first.
5162+ */
5163+ name = alloc_whname(dentry->d_name.name, dentry->d_name.len);
5164+ if (IS_ERR(name)) {
5165+ err = PTR_ERR(name);
5166+ goto out;
5167+ }
5168+
5169+ whiteout_dentry = lookup_one_len(name, lower_dentry->d_parent,
5170+ dentry->d_name.len + UNIONFS_WHLEN);
5171+ if (IS_ERR(whiteout_dentry)) {
5172+ err = PTR_ERR(whiteout_dentry);
5173+ goto out;
5174+ }
5175+
5176+ if (!whiteout_dentry->d_inode) {
5177+ dput(whiteout_dentry);
5178+ whiteout_dentry = NULL;
5179+ } else {
5180+ lower_parent_dentry = lock_parent(whiteout_dentry);
5181+
5182+ /* found a.wh.foo entry, remove it then do vfs_mkdir */
5183+ if (!(err = is_robranch_super(dentry->d_sb, bstart))) {
5184+ args.unlink.parent = lower_parent_dentry->d_inode;
5185+ args.unlink.dentry = whiteout_dentry;
5186+ run_sioq(__unionfs_unlink, &args);
5187+ err = args.err;
5188+ }
5189+ dput(whiteout_dentry);
5190+
5191+ unlock_dir(lower_parent_dentry);
5192+
5193+ if (err) {
5194+ /* exit if the error returned was NOT -EROFS */
5195+ if (!IS_COPYUP_ERR(err))
5196+ goto out;
5197+ bstart--;
5198+ } else
5199+ whiteout_unlinked = 1;
5200+ }
5201+
5202+ for (bindex = bstart; bindex >= 0; bindex--) {
5203+ int i;
5204+ int bend = dbend(dentry);
5205+
5206+ if (is_robranch_super(dentry->d_sb, bindex))
5207+ continue;
5208+
5209+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
5210+ if (!lower_dentry) {
5211+ lower_dentry = create_parents(parent, dentry,
5212+ dentry->d_name.name,
5213+ bindex);
5214+ if (!lower_dentry || IS_ERR(lower_dentry)) {
5215+ printk(KERN_DEBUG "unionfs: lower dentry "
5216+ " NULL for bindex = %d\n", bindex);
5217+ continue;
5218+ }
5219+ }
5220+
5221+ lower_parent_dentry = lock_parent(lower_dentry);
5222+
5223+ if (IS_ERR(lower_parent_dentry)) {
5224+ err = PTR_ERR(lower_parent_dentry);
5225+ goto out;
5226+ }
5227+
5228+ err = vfs_mkdir(lower_parent_dentry->d_inode, lower_dentry,
5229+ mode);
5230+
5231+ unlock_dir(lower_parent_dentry);
5232+
5233+ /* did the mkdir succeed? */
5234+ if (err)
5235+ break;
5236+
5237+ for (i = bindex + 1; i < bend; i++) {
5238+ if (unionfs_lower_dentry_idx(dentry, i)) {
5239+ dput(unionfs_lower_dentry_idx(dentry, i));
5240+ unionfs_set_lower_dentry_idx(dentry, i, NULL);
5241+ }
5242+ }
5243+ set_dbend(dentry, bindex);
5244+
5245+ /*
5246+ * Only INTERPOSE_LOOKUP can return a value other than 0 on
5247+ * err.
5248+ */
5249+ err = PTR_ERR(unionfs_interpose(dentry, parent->i_sb, 0));
5250+ if (!err) {
5251+ unionfs_copy_attr_times(parent);
5252+ fsstack_copy_inode_size(parent,
5253+ lower_parent_dentry->d_inode);
5254+
5255+ /* update number of links on parent directory */
5256+ parent->i_nlink = unionfs_get_nlinks(parent);
5257+ }
5258+
5259+ err = make_dir_opaque(dentry, dbstart(dentry));
5260+ if (err) {
5261+ printk(KERN_ERR "unionfs: mkdir: error creating "
5262+ ".wh.__dir_opaque: %d\n", err);
5263+ goto out;
5264+ }
5265+
5266+ /* we are done! */
5267+ break;
5268+ }
5269+
5270+out:
5271+ if (!dentry->d_inode)
5272+ d_drop(dentry);
5273+
5274+ kfree(name);
5275+
5276+ if (!err)
5277+ unionfs_copy_attr_times(dentry->d_inode);
5278+ unionfs_unlock_dentry(dentry);
5279+ unionfs_check_inode(parent);
5280+ unionfs_check_dentry(dentry);
5281+ unionfs_read_unlock(dentry->d_sb);
5282+
5283+ return err;
5284+}
5285+
5286+static int unionfs_mknod(struct inode *dir, struct dentry *dentry, int mode,
5287+ dev_t dev)
5288+{
5289+ int err = 0;
5290+ struct dentry *lower_dentry = NULL, *whiteout_dentry = NULL;
5291+ struct dentry *lower_parent_dentry = NULL;
5292+ int bindex = 0, bstart;
5293+ char *name = NULL;
5294+ int whiteout_unlinked = 0;
5295+
5296+ unionfs_read_lock(dentry->d_sb);
5297+ unionfs_lock_dentry(dentry);
5298+
5299+ if (dentry->d_inode &&
5300+ !__unionfs_d_revalidate_chain(dentry, NULL, false)) {
5301+ err = -ESTALE;
5302+ goto out;
5303+ }
5304+
5305+ bstart = dbstart(dentry);
5306+
5307+ lower_dentry = unionfs_lower_dentry(dentry);
5308+
5309+ /*
5310+ * check if whiteout exists in this branch, i.e. lookup .wh.foo
5311+ * first.
5312+ */
5313+ name = alloc_whname(dentry->d_name.name, dentry->d_name.len);
5314+ if (IS_ERR(name)) {
5315+ err = PTR_ERR(name);
5316+ goto out;
5317+ }
5318+
5319+ whiteout_dentry = lookup_one_len(name, lower_dentry->d_parent,
5320+ dentry->d_name.len + UNIONFS_WHLEN);
5321+ if (IS_ERR(whiteout_dentry)) {
5322+ err = PTR_ERR(whiteout_dentry);
5323+ goto out;
5324+ }
5325+
5326+ if (!whiteout_dentry->d_inode) {
5327+ dput(whiteout_dentry);
5328+ whiteout_dentry = NULL;
5329+ } else {
5330+ /* found .wh.foo, unlink it */
5331+ lower_parent_dentry = lock_parent(whiteout_dentry);
5332+
5333+ /* found a.wh.foo entry, remove it then do vfs_mkdir */
5334+ if (!(err = is_robranch_super(dentry->d_sb, bstart)))
5335+ err = vfs_unlink(lower_parent_dentry->d_inode,
5336+ whiteout_dentry);
5337+ dput(whiteout_dentry);
5338+
5339+ unlock_dir(lower_parent_dentry);
5340+
5341+ if (err) {
5342+ if (!IS_COPYUP_ERR(err))
5343+ goto out;
5344+ bstart--;
5345+ } else
5346+ whiteout_unlinked = 1;
5347+ }
5348+
5349+ for (bindex = bstart; bindex >= 0; bindex--) {
5350+ if (is_robranch_super(dentry->d_sb, bindex))
5351+ continue;
5352+
5353+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
5354+ if (!lower_dentry) {
5355+ lower_dentry = create_parents(dir, dentry,
5356+ dentry->d_name.name,
5357+ bindex);
5358+ if (IS_ERR(lower_dentry)) {
5359+ printk(KERN_DEBUG "unionfs: failed to create "
5360+ "parents on %d, err = %ld\n",
5361+ bindex, PTR_ERR(lower_dentry));
5362+ continue;
5363+ }
5364+ }
5365+
5366+ lower_parent_dentry = lock_parent(lower_dentry);
5367+ if (IS_ERR(lower_parent_dentry)) {
5368+ err = PTR_ERR(lower_parent_dentry);
5369+ goto out;
5370+ }
5371+
5372+ err = vfs_mknod(lower_parent_dentry->d_inode,
5373+ lower_dentry, mode, dev);
5374+
5375+ if (err) {
5376+ unlock_dir(lower_parent_dentry);
5377+ break;
5378+ }
5379+
5380+ /*
5381+ * Only INTERPOSE_LOOKUP can return a value other than 0 on
5382+ * err.
5383+ */
5384+ err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
5385+ if (!err) {
5386+ fsstack_copy_attr_times(dir,
5387+ lower_parent_dentry->d_inode);
5388+ fsstack_copy_inode_size(dir,
5389+ lower_parent_dentry->d_inode);
5390+ /* update number of links on parent directory */
5391+ dir->i_nlink = unionfs_get_nlinks(dir);
5392+ }
5393+ unlock_dir(lower_parent_dentry);
5394+
5395+ break;
5396+ }
5397+
5398+out:
5399+ if (!dentry->d_inode)
5400+ d_drop(dentry);
5401+
5402+ kfree(name);
5403+
5404+ if (!err)
5405+ unionfs_postcopyup_setmnt(dentry);
5406+ unionfs_unlock_dentry(dentry);
5407+
5408+ unionfs_check_inode(dir);
5409+ unionfs_check_dentry(dentry);
5410+ unionfs_read_unlock(dentry->d_sb);
5411+
5412+ return err;
5413+}
5414+
5415+static int unionfs_readlink(struct dentry *dentry, char __user *buf,
5416+ int bufsiz)
5417+{
5418+ int err;
5419+ struct dentry *lower_dentry;
5420+
5421+ unionfs_read_lock(dentry->d_sb);
5422+ unionfs_lock_dentry(dentry);
5423+
5424+ if (!__unionfs_d_revalidate_chain(dentry, NULL, false)) {
5425+ err = -ESTALE;
5426+ goto out;
5427+ }
5428+
5429+ lower_dentry = unionfs_lower_dentry(dentry);
5430+
5431+ if (!lower_dentry->d_inode->i_op ||
5432+ !lower_dentry->d_inode->i_op->readlink) {
5433+ err = -EINVAL;
5434+ goto out;
5435+ }
5436+
5437+ err = lower_dentry->d_inode->i_op->readlink(lower_dentry,
5438+ buf, bufsiz);
5439+ if (err > 0)
5440+ fsstack_copy_attr_atime(dentry->d_inode,
5441+ lower_dentry->d_inode);
5442+
5443+out:
5444+ unionfs_unlock_dentry(dentry);
5445+ unionfs_check_dentry(dentry);
5446+ unionfs_read_unlock(dentry->d_sb);
5447+
5448+ return err;
5449+}
5450+
5451+/*
5452+ * unionfs_follow_link takes a dentry, but it is simple. It only needs to
5453+ * allocate some memory and then call our ->readlink method. Our
5454+ * unionfs_readlink *does* lock our dentry and revalidate the dentry.
5455+ * Therefore, we do not have to lock our dentry here, to prevent a deadlock;
5456+ * nor do we need to revalidate it either. It is safe to not lock our
5457+ * dentry here, nor revalidate it, because unionfs_follow_link does not do
5458+ * anything (prior to calling ->readlink) which could become inconsistent
5459+ * due to branch management.
5460+ */
5461+static void *unionfs_follow_link(struct dentry *dentry, struct nameidata *nd)
5462+{
5463+ char *buf;
5464+ int len = PAGE_SIZE, err;
5465+ mm_segment_t old_fs;
5466+
5467+ unionfs_read_lock(dentry->d_sb);
5468+
5469+ /* This is freed by the put_link method assuming a successful call. */
5470+ buf = kmalloc(len, GFP_KERNEL);
5471+ if (!buf) {
5472+ err = -ENOMEM;
5473+ goto out;
5474+ }
5475+
5476+ /* read the symlink, and then we will follow it */
5477+ old_fs = get_fs();
5478+ set_fs(KERNEL_DS);
5479+ err = dentry->d_inode->i_op->readlink(dentry, (char __user *)buf, len);
5480+ set_fs(old_fs);
5481+ if (err < 0) {
5482+ kfree(buf);
5483+ buf = NULL;
5484+ goto out;
5485+ }
5486+ buf[err] = 0;
5487+ nd_set_link(nd, buf);
5488+ err = 0;
5489+
5490+out:
5491+ unionfs_check_dentry(dentry);
5492+ unionfs_read_unlock(dentry->d_sb);
5493+ return ERR_PTR(err);
5494+}
5495+
5496+/* FIXME: We may not have to lock here */
5497+static void unionfs_put_link(struct dentry *dentry, struct nameidata *nd,
5498+ void *cookie)
5499+{
5500+ unionfs_read_lock(dentry->d_sb);
5501+
5502+ unionfs_lock_dentry(dentry);
5503+ if (!__unionfs_d_revalidate_chain(dentry, nd, false))
5504+ printk("unionfs: put_link failed to revalidate dentry\n");
5505+ unionfs_unlock_dentry(dentry);
5506+
5507+ unionfs_check_dentry(dentry);
5508+ kfree(nd_get_link(nd));
5509+ unionfs_read_unlock(dentry->d_sb);
5510+}
5511+
5512+/*
5513+ * Basically copied from the kernel vfs permission(), but we've changed
5514+ * the following:
5515+ * (1) the IS_RDONLY check is skipped, and
5516+ * (2) We return 0 (success) if the non-leftmost branch is mounted
5517+ * readonly, to allow copyup to work.
5518+ * (3) we do call security_inode_permission, and therefore security inside
5519+ * SELinux, etc. are performed.
5520+ */
5521+static int inode_permission(struct super_block *sb, struct inode *inode, int mask,
5522+ struct nameidata *nd, int bindex)
5523+{
5524+ int retval, submask;
5525+
5526+ if (mask & MAY_WRITE) {
5527+ umode_t mode = inode->i_mode;
5528+ /* The first branch is allowed to be really readonly. */
5529+ if (bindex == 0 &&
5530+ IS_RDONLY(inode) &&
5531+ (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)))
5532+ return -EROFS;
5533+ /*
5534+ * Nobody gets write access to an immutable file.
5535+ */
5536+ if (IS_IMMUTABLE(inode))
5537+ return -EACCES;
5538+ /*
5539+ * For all other branches than the first one, we ignore
5540+ * EROFS or if the branch is mounted as readonly, to let
5541+ * copyup take place.
5542+ */
5543+ if (bindex > 0 &&
5544+ is_robranch_super(sb, bindex) &&
5545+ (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)))
5546+ return 0;
5547+ }
5548+
5549+ /* Ordinary permission routines do not understand MAY_APPEND. */
5550+ submask = mask & ~MAY_APPEND;
5551+ if (inode->i_op && inode->i_op->permission)
5552+ retval = inode->i_op->permission(inode, submask, nd);
5553+ else
5554+ retval = generic_permission(inode, submask, NULL);
5555+
5556+ if (retval && retval != -EROFS) /* ignore EROFS */
5557+ return retval;
5558+
5559+ retval = security_inode_permission(inode, mask, nd);
5560+ return ((retval == -EROFS) ? 0 : retval); /* ignore EROFS */
5561+}
5562+
5563+/*
5564+ * Don't grab the superblock read-lock in unionfs_permission, which prevents
5565+ * a deadlock with the branch-management "add branch" code (which grabbed
5566+ * the write lock). It is safe to not grab the read lock here, because even
5567+ * with branch management taking place, there is no chance that
5568+ * unionfs_permission, or anything it calls, will use stale branch
5569+ * information.
5570+ */
5571+static int unionfs_permission(struct inode *inode, int mask,
5572+ struct nameidata *nd)
5573+{
5574+ struct inode *lower_inode = NULL;
5575+ int err = 0;
5576+ int bindex, bstart, bend;
5577+ const int is_file = !S_ISDIR(inode->i_mode);
5578+ const int write_mask = (mask & MAY_WRITE) && !(mask & MAY_READ);
5579+
5580+ bstart = ibstart(inode);
5581+ bend = ibend(inode);
5582+ if (bstart < 0 || bend < 0) {
5583+ /*
5584+ * With branch-management, we can get a stale inode here.
5585+ * If so, we return ESTALE back to link_path_walk, which
5586+ * would discard the dcache entry and re-lookup the
5587+ * dentry+inode. This should be equivalent to issuing
5588+ * __unionfs_d_revalidate_chain on nd.dentry here.
5589+ */
5590+ err = -ESTALE; /* force revalidate */
5591+ goto out;
5592+ }
5593+
5594+ for (bindex = bstart; bindex <= bend; bindex++) {
5595+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
5596+ if (!lower_inode)
5597+ continue;
5598+
5599+ /*
5600+ * check the condition for D-F-D underlying files/directories,
5601+ * we don't have to check for files, if we are checking for
5602+ * directories.
5603+ */
5604+ if (!is_file && !S_ISDIR(lower_inode->i_mode))
5605+ continue;
5606+
5607+ /*
5608+ * We use our own special version of permission, such that
5609+ * only the first branch returns -EROFS.
5610+ */
5611+ err = inode_permission(inode->i_sb, lower_inode, mask, nd,
5612+ bindex);
5613+
5614+ /*
5615+ * The permissions are an intersection of the overall directory
5616+ * permissions, so we fail if one fails.
5617+ */
5618+ if (err)
5619+ goto out;
5620+
5621+ /* only the leftmost file matters. */
5622+ if (is_file || write_mask) {
5623+ if (is_file && write_mask) {
5624+ err = get_write_access(lower_inode);
5625+ if (!err)
5626+ put_write_access(lower_inode);
5627+ }
5628+ break;
5629+ }
5630+ }
5631+ /* sync times which may have changed (asynchronously) below */
5632+ unionfs_copy_attr_times(inode);
5633+
5634+out:
5635+ unionfs_check_inode(inode);
5636+ return err;
5637+}
5638+
5639+static int unionfs_setattr(struct dentry *dentry, struct iattr *ia)
5640+{
5641+ int err = 0;
5642+ struct dentry *lower_dentry;
5643+ struct inode *inode = NULL;
5644+ struct inode *lower_inode = NULL;
5645+ int bstart, bend, bindex;
5646+ int i;
5647+ int copyup = 0;
5648+
5649+ unionfs_read_lock(dentry->d_sb);
5650+ unionfs_lock_dentry(dentry);
5651+
5652+ if (!__unionfs_d_revalidate_chain(dentry, NULL, false)) {
5653+ err = -ESTALE;
5654+ goto out;
5655+ }
5656+
5657+ bstart = dbstart(dentry);
5658+ bend = dbend(dentry);
5659+ inode = dentry->d_inode;
5660+
5661+ for (bindex = bstart; (bindex <= bend) || (bindex == bstart);
5662+ bindex++) {
5663+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
5664+ if (!lower_dentry)
5665+ continue;
5666+ BUG_ON(lower_dentry->d_inode == NULL);
5667+
5668+ /* If the file is on a read only branch */
5669+ if (is_robranch_super(dentry->d_sb, bindex)
5670+ || IS_RDONLY(lower_dentry->d_inode)) {
5671+ if (copyup || (bindex != bstart))
5672+ continue;
5673+ /* Only if its the leftmost file, copyup the file */
5674+ for (i = bstart - 1; i >= 0; i--) {
5675+ loff_t size = dentry->d_inode->i_size;
5676+ if (ia->ia_valid & ATTR_SIZE)
5677+ size = ia->ia_size;
5678+ err = copyup_dentry(dentry->d_parent->d_inode,
5679+ dentry, bstart, i,
5680+ dentry->d_name.name,
5681+ dentry->d_name.len,
5682+ NULL, size);
5683+
5684+ if (!err) {
5685+ copyup = 1;
5686+ lower_dentry =
5687+ unionfs_lower_dentry(dentry);
5688+ break;
5689+ }
5690+ /*
5691+ * if error is in the leftmost branch, pass
5692+ * it up.
5693+ */
5694+ if (i == 0)
5695+ goto out;
5696+ }
5697+
5698+ }
5699+ err = notify_change(lower_dentry, ia);
5700+ if (err)
5701+ goto out;
5702+ break;
5703+ }
5704+
5705+ /* for mmap */
5706+ if (ia->ia_valid & ATTR_SIZE) {
5707+ if (ia->ia_size != i_size_read(inode)) {
5708+ err = vmtruncate(inode, ia->ia_size);
5709+ if (err)
5710+ printk("unionfs_setattr: vmtruncate failed\n");
5711+ }
5712+ }
5713+
5714+ /* get the size from the first lower inode */
5715+ lower_inode = unionfs_lower_inode(inode);
5716+ unionfs_copy_attr_all(inode, lower_inode);
5717+ fsstack_copy_inode_size(inode, lower_inode);
5718+ /* if setattr succeeded, then parent dir may have changed */
5719+ unionfs_copy_attr_times(dentry->d_parent->d_inode);
5720+out:
5721+ unionfs_unlock_dentry(dentry);
5722+ unionfs_check_dentry(dentry);
5723+ unionfs_check_dentry(dentry->d_parent);
5724+ unionfs_read_unlock(dentry->d_sb);
5725+
5726+ return err;
5727+}
5728+
5729+struct inode_operations unionfs_symlink_iops = {
5730+ .readlink = unionfs_readlink,
5731+ .permission = unionfs_permission,
5732+ .follow_link = unionfs_follow_link,
5733+ .setattr = unionfs_setattr,
5734+ .put_link = unionfs_put_link,
5735+};
5736+
5737+struct inode_operations unionfs_dir_iops = {
5738+ .create = unionfs_create,
5739+ .lookup = unionfs_lookup,
5740+ .link = unionfs_link,
5741+ .unlink = unionfs_unlink,
5742+ .symlink = unionfs_symlink,
5743+ .mkdir = unionfs_mkdir,
5744+ .rmdir = unionfs_rmdir,
5745+ .mknod = unionfs_mknod,
5746+ .rename = unionfs_rename,
5747+ .permission = unionfs_permission,
5748+ .setattr = unionfs_setattr,
5749+#ifdef CONFIG_UNION_FS_XATTR
5750+ .setxattr = unionfs_setxattr,
5751+ .getxattr = unionfs_getxattr,
5752+ .removexattr = unionfs_removexattr,
5753+ .listxattr = unionfs_listxattr,
5754+#endif /* CONFIG_UNION_FS_XATTR */
5755+};
5756+
5757+struct inode_operations unionfs_main_iops = {
5758+ .permission = unionfs_permission,
5759+ .setattr = unionfs_setattr,
5760+#ifdef CONFIG_UNION_FS_XATTR
5761+ .setxattr = unionfs_setxattr,
5762+ .getxattr = unionfs_getxattr,
5763+ .removexattr = unionfs_removexattr,
5764+ .listxattr = unionfs_listxattr,
5765+#endif /* CONFIG_UNION_FS_XATTR */
5766+};
5767diff --git a/fs/unionfs/lookup.c b/fs/unionfs/lookup.c
5768new file mode 100644
5769index 0000000..da89ced
5770--- /dev/null
5771+++ b/fs/unionfs/lookup.c
5772@@ -0,0 +1,587 @@
5773+/*
5774+ * Copyright (c) 2003-2007 Erez Zadok
5775+ * Copyright (c) 2003-2006 Charles P. Wright
5776+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
5777+ * Copyright (c) 2005-2006 Junjiro Okajima
5778+ * Copyright (c) 2005 Arun M. Krishnakumar
5779+ * Copyright (c) 2004-2006 David P. Quigley
5780+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
5781+ * Copyright (c) 2003 Puja Gupta
5782+ * Copyright (c) 2003 Harikesavan Krishnan
5783+ * Copyright (c) 2003-2007 Stony Brook University
5784+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
5785+ *
5786+ * This program is free software; you can redistribute it and/or modify
5787+ * it under the terms of the GNU General Public License version 2 as
5788+ * published by the Free Software Foundation.
5789+ */
5790+
5791+#include "union.h"
5792+
5793+static int realloc_dentry_private_data(struct dentry *dentry);
5794+
5795+/* is the filename valid == !(whiteout for a file or opaque dir marker) */
5796+static int is_validname(const char *name)
5797+{
5798+ if (!strncmp(name, UNIONFS_WHPFX, UNIONFS_WHLEN))
5799+ return 0;
5800+ if (!strncmp(name, UNIONFS_DIR_OPAQUE_NAME,
5801+ sizeof(UNIONFS_DIR_OPAQUE_NAME) - 1))
5802+ return 0;
5803+ return 1;
5804+}
5805+
5806+/* The rest of these are utility functions for lookup. */
5807+static noinline int is_opaque_dir(struct dentry *dentry, int bindex)
5808+{
5809+ int err = 0;
5810+ struct dentry *lower_dentry;
5811+ struct dentry *wh_lower_dentry;
5812+ struct inode *lower_inode;
5813+ struct sioq_args args;
5814+
5815+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
5816+ lower_inode = lower_dentry->d_inode;
5817+
5818+ BUG_ON(!S_ISDIR(lower_inode->i_mode));
5819+
5820+ mutex_lock(&lower_inode->i_mutex);
5821+
5822+ if (!permission(lower_inode, MAY_EXEC, NULL))
5823+ wh_lower_dentry =
5824+ lookup_one_len(UNIONFS_DIR_OPAQUE, lower_dentry,
5825+ sizeof(UNIONFS_DIR_OPAQUE) - 1);
5826+ else {
5827+ args.is_opaque.dentry = lower_dentry;
5828+ run_sioq(__is_opaque_dir, &args);
5829+ wh_lower_dentry = args.ret;
5830+ }
5831+
5832+ mutex_unlock(&lower_inode->i_mutex);
5833+
5834+ if (IS_ERR(wh_lower_dentry)) {
5835+ err = PTR_ERR(wh_lower_dentry);
5836+ goto out;
5837+ }
5838+
5839+ /* This is an opaque dir iff wh_lower_dentry is positive */
5840+ err = !!wh_lower_dentry->d_inode;
5841+
5842+ dput(wh_lower_dentry);
5843+out:
5844+ return err;
5845+}
5846+
5847+/*
5848+ * Main (and complex) driver function for Unionfs's lookup
5849+ *
5850+ * Returns: NULL (ok), ERR_PTR if an error occurred, or a non-null non-error
5851+ * PTR if d_splice returned a different dentry.
5852+ *
5853+ * If lookupmode is INTERPOSE_PARTIAL/REVAL/REVAL_NEG, the passed dentry's
5854+ * inode info must be locked. If lookupmode is INTERPOSE_LOOKUP (i.e., a
5855+ * newly looked-up dentry), then unionfs_lookup_backend will return a locked
5856+ * dentry's info, which the caller must unlock.
5857+ */
5858+struct dentry *unionfs_lookup_backend(struct dentry *dentry,
5859+ struct nameidata *nd, int lookupmode)
5860+{
5861+ int err = 0;
5862+ struct dentry *lower_dentry = NULL;
5863+ struct dentry *wh_lower_dentry = NULL;
5864+ struct dentry *lower_dir_dentry = NULL;
5865+ struct dentry *parent_dentry = NULL;
5866+ struct dentry *d_interposed = NULL;
5867+ int bindex, bstart = -1, bend, bopaque;
5868+ int dentry_count = 0; /* Number of positive dentries. */
5869+ int first_dentry_offset = -1; /* -1 is uninitialized */
5870+ struct dentry *first_dentry = NULL;
5871+ struct dentry *first_lower_dentry = NULL;
5872+ struct vfsmount *first_lower_mnt = NULL;
5873+ int locked_parent = 0;
5874+ int opaque;
5875+ char *whname = NULL;
5876+ const char *name;
5877+ int namelen;
5878+
5879+ /*
5880+ * We should already have a lock on this dentry in the case of a
5881+ * partial lookup, or a revalidation. Otherwise it is returned from
5882+ * new_dentry_private_data already locked.
5883+ */
5884+ if (lookupmode == INTERPOSE_PARTIAL || lookupmode == INTERPOSE_REVAL ||
5885+ lookupmode == INTERPOSE_REVAL_NEG)
5886+ verify_locked(dentry);
5887+ else /* this could only be INTERPOSE_LOOKUP */
5888+ BUG_ON(UNIONFS_D(dentry) != NULL);
5889+
5890+ switch (lookupmode) {
5891+ case INTERPOSE_PARTIAL:
5892+ break;
5893+ case INTERPOSE_LOOKUP:
5894+ if ((err = new_dentry_private_data(dentry)))
5895+ goto out;
5896+ break;
5897+ default:
5898+ /* default: can only be INTERPOSE_REVAL/REVAL_NEG */
5899+ if ((err = realloc_dentry_private_data(dentry)))
5900+ goto out;
5901+ break;
5902+ }
5903+
5904+ /* must initialize dentry operations */
5905+ dentry->d_op = &unionfs_dops;
5906+
5907+ parent_dentry = dget_parent(dentry);
5908+ /* We never partial lookup the root directory. */
5909+ if (parent_dentry != dentry) {
5910+ unionfs_lock_dentry(parent_dentry);
5911+ locked_parent = 1;
5912+ } else {
5913+ dput(parent_dentry);
5914+ parent_dentry = NULL;
5915+ goto out;
5916+ }
5917+
5918+ name = dentry->d_name.name;
5919+ namelen = dentry->d_name.len;
5920+
5921+ /* No dentries should get created for possible whiteout names. */
5922+ if (!is_validname(name)) {
5923+ err = -EPERM;
5924+ goto out_free;
5925+ }
5926+
5927+ /* Now start the actual lookup procedure. */
5928+ bstart = dbstart(parent_dentry);
5929+ bend = dbend(parent_dentry);
5930+ bopaque = dbopaque(parent_dentry);
5931+ BUG_ON(bstart < 0);
5932+
5933+ /*
5934+ * It would be ideal if we could convert partial lookups to only have
5935+ * to do this work when they really need to. It could probably improve
5936+ * performance quite a bit, and maybe simplify the rest of the code.
5937+ */
5938+ if (lookupmode == INTERPOSE_PARTIAL) {
5939+ bstart++;
5940+ if ((bopaque != -1) && (bopaque < bend))
5941+ bend = bopaque;
5942+ }
5943+
5944+ for (bindex = bstart; bindex <= bend; bindex++) {
5945+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
5946+ if (lookupmode == INTERPOSE_PARTIAL && lower_dentry)
5947+ continue;
5948+ BUG_ON(lower_dentry != NULL);
5949+
5950+ lower_dir_dentry =
5951+ unionfs_lower_dentry_idx(parent_dentry, bindex);
5952+
5953+ /* if the parent lower dentry does not exist skip this */
5954+ if (!(lower_dir_dentry && lower_dir_dentry->d_inode))
5955+ continue;
5956+
5957+ /* also skip it if the parent isn't a directory. */
5958+ if (!S_ISDIR(lower_dir_dentry->d_inode->i_mode))
5959+ continue;
5960+
5961+ /* Reuse the whiteout name because its value doesn't change. */
5962+ if (!whname) {
5963+ whname = alloc_whname(name, namelen);
5964+ if (IS_ERR(whname)) {
5965+ err = PTR_ERR(whname);
5966+ goto out_free;
5967+ }
5968+ }
5969+
5970+ /* check if whiteout exists in this branch: lookup .wh.foo */
5971+ wh_lower_dentry = lookup_one_len(whname, lower_dir_dentry,
5972+ namelen + UNIONFS_WHLEN);
5973+ if (IS_ERR(wh_lower_dentry)) {
5974+ dput(first_lower_dentry);
5975+ unionfs_mntput(first_dentry, first_dentry_offset);
5976+ err = PTR_ERR(wh_lower_dentry);
5977+ goto out_free;
5978+ }
5979+
5980+ if (wh_lower_dentry->d_inode) {
5981+ /* We found a whiteout so lets give up. */
5982+ if (S_ISREG(wh_lower_dentry->d_inode->i_mode)) {
5983+ set_dbend(dentry, bindex);
5984+ set_dbopaque(dentry, bindex);
5985+ dput(wh_lower_dentry);
5986+ break;
5987+ }
5988+ err = -EIO;
5989+ printk(KERN_NOTICE "unionfs: EIO: invalid whiteout "
5990+ "entry type %d.\n",
5991+ wh_lower_dentry->d_inode->i_mode);
5992+ dput(wh_lower_dentry);
5993+ dput(first_lower_dentry);
5994+ unionfs_mntput(first_dentry, first_dentry_offset);
5995+ goto out_free;
5996+ }
5997+
5998+ dput(wh_lower_dentry);
5999+ wh_lower_dentry = NULL;
6000+
6001+ /* Now do regular lookup; lookup foo */
6002+ nd->dentry = unionfs_lower_dentry_idx(dentry, bindex);
6003+ /* FIXME: fix following line for mount point crossing */
6004+ nd->mnt = unionfs_lower_mnt_idx(parent_dentry, bindex);
6005+
6006+ lower_dentry = lookup_one_len_nd(name, lower_dir_dentry,
6007+ namelen, nd);
6008+ if (IS_ERR(lower_dentry)) {
6009+ dput(first_lower_dentry);
6010+ unionfs_mntput(first_dentry, first_dentry_offset);
6011+ err = PTR_ERR(lower_dentry);
6012+ goto out_free;
6013+ }
6014+
6015+ /*
6016+ * Store the first negative dentry specially, because if they
6017+ * are all negative we need this for future creates.
6018+ */
6019+ if (!lower_dentry->d_inode) {
6020+ if (!first_lower_dentry && (dbstart(dentry) == -1)) {
6021+ first_lower_dentry = lower_dentry;
6022+ /*
6023+ * FIXME: following line needs to be changed
6024+ * to allow mount-point crossing
6025+ */
6026+ first_dentry = parent_dentry;
6027+ first_lower_mnt =
6028+ unionfs_mntget(parent_dentry, bindex);
6029+ first_dentry_offset = bindex;
6030+ } else
6031+ dput(lower_dentry);
6032+
6033+ continue;
6034+ }
6035+
6036+ /* number of positive dentries */
6037+ dentry_count++;
6038+
6039+ /* store underlying dentry */
6040+ if (dbstart(dentry) == -1)
6041+ set_dbstart(dentry, bindex);
6042+ unionfs_set_lower_dentry_idx(dentry, bindex, lower_dentry);
6043+ /*
6044+ * FIXME: the following line needs to get fixed to allow
6045+ * mount-point crossing
6046+ */
6047+ unionfs_set_lower_mnt_idx(dentry, bindex,
6048+ unionfs_mntget(parent_dentry,
6049+ bindex));
6050+ set_dbend(dentry, bindex);
6051+
6052+ /* update parent directory's atime with the bindex */
6053+ fsstack_copy_attr_atime(parent_dentry->d_inode,
6054+ lower_dir_dentry->d_inode);
6055+
6056+ /* We terminate file lookups here. */
6057+ if (!S_ISDIR(lower_dentry->d_inode->i_mode)) {
6058+ if (lookupmode == INTERPOSE_PARTIAL)
6059+ continue;
6060+ if (dentry_count == 1)
6061+ goto out_positive;
6062+ /* This can only happen with mixed D-*-F-* */
6063+ BUG_ON(!S_ISDIR(unionfs_lower_dentry(dentry)->
6064+ d_inode->i_mode));
6065+ continue;
6066+ }
6067+
6068+ opaque = is_opaque_dir(dentry, bindex);
6069+ if (opaque < 0) {
6070+ dput(first_lower_dentry);
6071+ unionfs_mntput(first_dentry, first_dentry_offset);
6072+ err = opaque;
6073+ goto out_free;
6074+ } else if (opaque) {
6075+ set_dbend(dentry, bindex);
6076+ set_dbopaque(dentry, bindex);
6077+ break;
6078+ }
6079+ }
6080+
6081+ if (dentry_count)
6082+ goto out_positive;
6083+ else
6084+ goto out_negative;
6085+
6086+out_negative:
6087+ if (lookupmode == INTERPOSE_PARTIAL)
6088+ goto out;
6089+
6090+ /* If we've only got negative dentries, then use the leftmost one. */
6091+ if (lookupmode == INTERPOSE_REVAL) {
6092+ if (dentry->d_inode)
6093+ UNIONFS_I(dentry->d_inode)->stale = 1;
6094+ goto out;
6095+ }
6096+ /* This should only happen if we found a whiteout. */
6097+ if (first_dentry_offset == -1) {
6098+ nd->dentry = dentry;
6099+ /* FIXME: fix following line for mount point crossing */
6100+ nd->mnt = unionfs_lower_mnt_idx(parent_dentry, bindex);
6101+
6102+ first_lower_dentry =
6103+ lookup_one_len_nd(name, lower_dir_dentry,
6104+ namelen, nd);
6105+ first_dentry_offset = bindex;
6106+ if (IS_ERR(first_lower_dentry)) {
6107+ err = PTR_ERR(first_lower_dentry);
6108+ goto out;
6109+ }
6110+
6111+ /*
6112+ * FIXME: the following line needs to be changed to allow
6113+ * mount-point crossing
6114+ */
6115+ first_dentry = dentry;
6116+ first_lower_mnt = unionfs_mntget(dentry->d_sb->s_root,
6117+ bindex);
6118+ }
6119+ unionfs_set_lower_dentry_idx(dentry, first_dentry_offset,
6120+ first_lower_dentry);
6121+ unionfs_set_lower_mnt_idx(dentry, first_dentry_offset,
6122+ first_lower_mnt);
6123+ set_dbstart(dentry, first_dentry_offset);
6124+ set_dbend(dentry, first_dentry_offset);
6125+
6126+ if (lookupmode == INTERPOSE_REVAL_NEG)
6127+ BUG_ON(dentry->d_inode != NULL);
6128+ else
6129+ d_add(dentry, NULL);
6130+ goto out;
6131+
6132+/* This part of the code is for positive dentries. */
6133+out_positive:
6134+ BUG_ON(dentry_count <= 0);
6135+
6136+ /*
6137+ * If we're holding onto the first negative dentry & corresponding
6138+ * vfsmount - throw it out.
6139+ */
6140+ dput(first_lower_dentry);
6141+ unionfs_mntput(first_dentry, first_dentry_offset);
6142+
6143+ /* Partial lookups need to re-interpose, or throw away older negs. */
6144+ if (lookupmode == INTERPOSE_PARTIAL) {
6145+ if (dentry->d_inode) {
6146+ unionfs_reinterpose(dentry);
6147+ goto out;
6148+ }
6149+
6150+ /*
6151+ * This somehow turned positive, so it is as if we had a
6152+ * negative revalidation.
6153+ */
6154+ lookupmode = INTERPOSE_REVAL_NEG;
6155+
6156+ update_bstart(dentry);
6157+ bstart = dbstart(dentry);
6158+ bend = dbend(dentry);
6159+ }
6160+
6161+ /*
6162+ * Interpose can return a dentry if d_splice returned a different
6163+ * dentry.
6164+ */
6165+ d_interposed = unionfs_interpose(dentry, dentry->d_sb, lookupmode);
6166+ if (IS_ERR(d_interposed))
6167+ err = PTR_ERR(d_interposed);
6168+ else if (d_interposed)
6169+ dentry = d_interposed;
6170+
6171+ if (err)
6172+ goto out_drop;
6173+
6174+ goto out;
6175+
6176+out_drop:
6177+ d_drop(dentry);
6178+
6179+out_free:
6180+ /* should dput all the underlying dentries on error condition */
6181+ bstart = dbstart(dentry);
6182+ if (bstart >= 0) {
6183+ bend = dbend(dentry);
6184+ for (bindex = bstart; bindex <= bend; bindex++) {
6185+ dput(unionfs_lower_dentry_idx(dentry, bindex));
6186+ unionfs_mntput(dentry, bindex);
6187+ }
6188+ }
6189+ kfree(UNIONFS_D(dentry)->lower_paths);
6190+ UNIONFS_D(dentry)->lower_paths = NULL;
6191+ set_dbstart(dentry, -1);
6192+ set_dbend(dentry, -1);
6193+
6194+out:
6195+ if (!err && UNIONFS_D(dentry)) {
6196+ BUG_ON(dbend(dentry) > UNIONFS_D(dentry)->bcount);
6197+ BUG_ON(dbend(dentry) > sbmax(dentry->d_sb));
6198+ if (dbstart(dentry) < 0 &&
6199+ dentry->d_inode && bstart >= 0 &&
6200+ (!UNIONFS_I(dentry->d_inode) ||
6201+ !UNIONFS_I(dentry->d_inode)->lower_inodes)) {
6202+ unionfs_mntput(dentry->d_sb->s_root, bstart);
6203+ dput(first_lower_dentry);
6204+ UNIONFS_I(dentry->d_inode)->stale = 1;
6205+ }
6206+ }
6207+ kfree(whname);
6208+ if (locked_parent)
6209+ unionfs_unlock_dentry(parent_dentry);
6210+ dput(parent_dentry);
6211+ if (err && (lookupmode == INTERPOSE_LOOKUP))
6212+ unionfs_unlock_dentry(dentry);
6213+ if (!err && d_interposed)
6214+ return d_interposed;
6215+ if (dentry->d_inode && UNIONFS_I(dentry->d_inode)->stale &&
6216+ first_dentry_offset >= 0)
6217+ unionfs_mntput(dentry->d_sb->s_root, first_dentry_offset);
6218+ return ERR_PTR(err);
6219+}
6220+
6221+/*
6222+ * This is a utility function that fills in a unionfs dentry.
6223+ *
6224+ * Returns: 0 (ok), or -ERRNO if an error occurred.
6225+ */
6226+int unionfs_partial_lookup(struct dentry *dentry)
6227+{
6228+ struct dentry *tmp;
6229+ struct nameidata nd = { .flags = 0 };
6230+ int err = -ENOSYS;
6231+
6232+ tmp = unionfs_lookup_backend(dentry, &nd, INTERPOSE_PARTIAL);
6233+ if (!tmp) {
6234+ err = 0;
6235+ goto out;
6236+ }
6237+ if (IS_ERR(tmp)) {
6238+ err = PTR_ERR(tmp);
6239+ goto out;
6240+ }
6241+ /* need to change the interface */
6242+ BUG_ON(tmp != dentry);
6243+out:
6244+ return err;
6245+}
6246+
6247+/* The dentry cache is just so we have properly sized dentries. */
6248+static struct kmem_cache *unionfs_dentry_cachep;
6249+int unionfs_init_dentry_cache(void)
6250+{
6251+ unionfs_dentry_cachep =
6252+ kmem_cache_create("unionfs_dentry",
6253+ sizeof(struct unionfs_dentry_info),
6254+ 0, SLAB_RECLAIM_ACCOUNT, NULL, NULL);
6255+
6256+ return (unionfs_dentry_cachep ? 0 : -ENOMEM);
6257+}
6258+
6259+void unionfs_destroy_dentry_cache(void)
6260+{
6261+ if (unionfs_dentry_cachep)
6262+ kmem_cache_destroy(unionfs_dentry_cachep);
6263+}
6264+
6265+void free_dentry_private_data(struct dentry *dentry)
6266+{
6267+ if (!dentry || !dentry->d_fsdata)
6268+ return;
6269+ kmem_cache_free(unionfs_dentry_cachep, dentry->d_fsdata);
6270+ dentry->d_fsdata = NULL;
6271+}
6272+
6273+static inline int __realloc_dentry_private_data(struct dentry *dentry)
6274+{
6275+ struct unionfs_dentry_info *info = UNIONFS_D(dentry);
6276+ void *p;
6277+ int size;
6278+
6279+ BUG_ON(!info);
6280+
6281+ size = sizeof(struct path) * sbmax(dentry->d_sb);
6282+ p = krealloc(info->lower_paths, size, GFP_ATOMIC);
6283+ if (!p)
6284+ return -ENOMEM;
6285+
6286+ info->lower_paths = p;
6287+
6288+ info->bstart = -1;
6289+ info->bend = -1;
6290+ info->bopaque = -1;
6291+ info->bcount = sbmax(dentry->d_sb);
6292+ atomic_set(&info->generation,
6293+ atomic_read(&UNIONFS_SB(dentry->d_sb)->generation));
6294+
6295+ memset(info->lower_paths, 0, size);
6296+
6297+ return 0;
6298+}
6299+
6300+/* UNIONFS_D(dentry)->lock must be locked */
6301+static int realloc_dentry_private_data(struct dentry *dentry)
6302+{
6303+ if (!__realloc_dentry_private_data(dentry))
6304+ return 0;
6305+
6306+ kfree(UNIONFS_D(dentry)->lower_paths);
6307+ free_dentry_private_data(dentry);
6308+ return -ENOMEM;
6309+}
6310+
6311+/* allocate new dentry private data */
6312+int new_dentry_private_data(struct dentry *dentry)
6313+{
6314+ struct unionfs_dentry_info *info = UNIONFS_D(dentry);
6315+
6316+ BUG_ON(info);
6317+
6318+ info = kmem_cache_alloc(unionfs_dentry_cachep, GFP_ATOMIC);
6319+ if (!info)
6320+ return -ENOMEM;
6321+
6322+ mutex_init(&info->lock);
6323+ mutex_lock(&info->lock);
6324+
6325+ info->lower_paths = NULL;
6326+
6327+ dentry->d_fsdata = info;
6328+
6329+ if (!__realloc_dentry_private_data(dentry))
6330+ return 0;
6331+
6332+ mutex_unlock(&info->lock);
6333+ free_dentry_private_data(dentry);
6334+ return -ENOMEM;
6335+}
6336+
6337+/*
6338+ * scan through the lower dentry objects, and set bstart to reflect the
6339+ * starting branch
6340+ */
6341+void update_bstart(struct dentry *dentry)
6342+{
6343+ int bindex;
6344+ int bstart = dbstart(dentry);
6345+ int bend = dbend(dentry);
6346+ struct dentry *lower_dentry;
6347+
6348+ for (bindex = bstart; bindex <= bend; bindex++) {
6349+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6350+ if (!lower_dentry)
6351+ continue;
6352+ if (lower_dentry->d_inode) {
6353+ set_dbstart(dentry, bindex);
6354+ break;
6355+ }
6356+ dput(lower_dentry);
6357+ unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
6358+ }
6359+}
6360diff --git a/fs/unionfs/main.c b/fs/unionfs/main.c
6361new file mode 100644
6362index 0000000..4faae44
6363--- /dev/null
6364+++ b/fs/unionfs/main.c
6365@@ -0,0 +1,773 @@
6366+/*
6367+ * Copyright (c) 2003-2007 Erez Zadok
6368+ * Copyright (c) 2003-2006 Charles P. Wright
6369+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
6370+ * Copyright (c) 2005-2006 Junjiro Okajima
6371+ * Copyright (c) 2005 Arun M. Krishnakumar
6372+ * Copyright (c) 2004-2006 David P. Quigley
6373+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
6374+ * Copyright (c) 2003 Puja Gupta
6375+ * Copyright (c) 2003 Harikesavan Krishnan
6376+ * Copyright (c) 2003-2007 Stony Brook University
6377+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
6378+ *
6379+ * This program is free software; you can redistribute it and/or modify
6380+ * it under the terms of the GNU General Public License version 2 as
6381+ * published by the Free Software Foundation.
6382+ */
6383+
6384+#include "union.h"
6385+#include <linux/module.h>
6386+#include <linux/moduleparam.h>
6387+
6388+static void unionfs_fill_inode(struct dentry *dentry,
6389+ struct inode *inode)
6390+{
6391+ struct inode *lower_inode;
6392+ struct dentry *lower_dentry;
6393+ int bindex, bstart, bend;
6394+
6395+ bstart = dbstart(dentry);
6396+ bend = dbend(dentry);
6397+
6398+ for (bindex = bstart; bindex <= bend; bindex++) {
6399+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6400+ if (!lower_dentry) {
6401+ unionfs_set_lower_inode_idx(inode, bindex, NULL);
6402+ continue;
6403+ }
6404+
6405+ /* Initialize the lower inode to the new lower inode. */
6406+ if (!lower_dentry->d_inode)
6407+ continue;
6408+
6409+ unionfs_set_lower_inode_idx(inode, bindex,
6410+ igrab(lower_dentry->d_inode));
6411+ }
6412+
6413+ ibstart(inode) = dbstart(dentry);
6414+ ibend(inode) = dbend(dentry);
6415+
6416+ /* Use attributes from the first branch. */
6417+ lower_inode = unionfs_lower_inode(inode);
6418+
6419+ /* Use different set of inode ops for symlinks & directories */
6420+ if (S_ISLNK(lower_inode->i_mode))
6421+ inode->i_op = &unionfs_symlink_iops;
6422+ else if (S_ISDIR(lower_inode->i_mode))
6423+ inode->i_op = &unionfs_dir_iops;
6424+
6425+ /* Use different set of file ops for directories */
6426+ if (S_ISDIR(lower_inode->i_mode))
6427+ inode->i_fop = &unionfs_dir_fops;
6428+
6429+ /* properly initialize special inodes */
6430+ if (S_ISBLK(lower_inode->i_mode) || S_ISCHR(lower_inode->i_mode) ||
6431+ S_ISFIFO(lower_inode->i_mode) || S_ISSOCK(lower_inode->i_mode))
6432+ init_special_inode(inode, lower_inode->i_mode,
6433+ lower_inode->i_rdev);
6434+
6435+ /* all well, copy inode attributes */
6436+ unionfs_copy_attr_all(inode, lower_inode);
6437+ fsstack_copy_inode_size(inode, lower_inode);
6438+}
6439+
6440+/*
6441+ * Connect a unionfs inode dentry/inode with several lower ones. This is
6442+ * the classic stackable file system "vnode interposition" action.
6443+ *
6444+ * @sb: unionfs's super_block
6445+ */
6446+struct dentry *unionfs_interpose(struct dentry *dentry, struct super_block *sb,
6447+ int flag)
6448+{
6449+ int err = 0;
6450+ struct inode *inode;
6451+ int is_negative_dentry = 1;
6452+ int bindex, bstart, bend;
6453+ int need_fill_inode = 1;
6454+ struct dentry *spliced = NULL;
6455+
6456+ verify_locked(dentry);
6457+
6458+ bstart = dbstart(dentry);
6459+ bend = dbend(dentry);
6460+
6461+ /* Make sure that we didn't get a negative dentry. */
6462+ for (bindex = bstart; bindex <= bend; bindex++) {
6463+ if (unionfs_lower_dentry_idx(dentry, bindex) &&
6464+ unionfs_lower_dentry_idx(dentry, bindex)->d_inode) {
6465+ is_negative_dentry = 0;
6466+ break;
6467+ }
6468+ }
6469+ BUG_ON(is_negative_dentry);
6470+
6471+ /*
6472+ * We allocate our new inode below, by calling iget.
6473+ * iget will call our read_inode which will initialize some
6474+ * of the new inode's fields
6475+ */
6476+
6477+ /*
6478+ * On revalidate we've already got our own inode and just need
6479+ * to fix it up.
6480+ */
6481+ if (flag == INTERPOSE_REVAL) {
6482+ inode = dentry->d_inode;
6483+ UNIONFS_I(inode)->bstart = -1;
6484+ UNIONFS_I(inode)->bend = -1;
6485+ atomic_set(&UNIONFS_I(inode)->generation,
6486+ atomic_read(&UNIONFS_SB(sb)->generation));
6487+
6488+ UNIONFS_I(inode)->lower_inodes =
6489+ kcalloc(sbmax(sb), sizeof(struct inode *), GFP_KERNEL);
6490+ if (!UNIONFS_I(inode)->lower_inodes) {
6491+ err = -ENOMEM;
6492+ goto out;
6493+ }
6494+ } else {
6495+ /* get unique inode number for unionfs */
6496+ inode = iget(sb, iunique(sb, UNIONFS_ROOT_INO));
6497+ if (!inode) {
6498+ err = -EACCES;
6499+ goto out;
6500+ }
6501+ if (atomic_read(&inode->i_count) > 1)
6502+ goto skip;
6503+ }
6504+
6505+ need_fill_inode = 0;
6506+ unionfs_fill_inode(dentry, inode);
6507+
6508+skip:
6509+ /* only (our) lookup wants to do a d_add */
6510+ switch (flag) {
6511+ case INTERPOSE_DEFAULT:
6512+ case INTERPOSE_REVAL_NEG:
6513+ d_instantiate(dentry, inode);
6514+ break;
6515+ case INTERPOSE_LOOKUP:
6516+ spliced = d_splice_alias(inode, dentry);
6517+ if (IS_ERR(spliced))
6518+ err = PTR_ERR(spliced);
6519+ else if (spliced && spliced != dentry) {
6520+ /*
6521+ * d_splice can return a dentry if it was
6522+ * disconnected and had to be moved. We must ensure
6523+ * that the private data of the new dentry is
6524+ * correct and that the inode info was filled
6525+ * properly. Finally we must return this new
6526+ * dentry.
6527+ */
6528+ spliced->d_op = &unionfs_dops;
6529+ spliced->d_fsdata = dentry->d_fsdata;
6530+ dentry->d_fsdata = NULL;
6531+ dentry = spliced;
6532+ if (need_fill_inode) {
6533+ need_fill_inode = 0;
6534+ unionfs_fill_inode(dentry, inode);
6535+ }
6536+ goto out_spliced;
6537+ }
6538+ break;
6539+ case INTERPOSE_REVAL:
6540+ /* Do nothing. */
6541+ break;
6542+ default:
6543+ printk(KERN_ERR "unionfs: invalid interpose flag passed!");
6544+ BUG();
6545+ }
6546+ goto out;
6547+
6548+out_spliced:
6549+ if (!err)
6550+ return spliced;
6551+out:
6552+ return ERR_PTR(err);
6553+}
6554+
6555+/* like interpose above, but for an already existing dentry */
6556+void unionfs_reinterpose(struct dentry *dentry)
6557+{
6558+ struct dentry *lower_dentry;
6559+ struct inode *inode;
6560+ int bindex, bstart, bend;
6561+
6562+ verify_locked(dentry);
6563+
6564+ /* This is pre-allocated inode */
6565+ inode = dentry->d_inode;
6566+
6567+ bstart = dbstart(dentry);
6568+ bend = dbend(dentry);
6569+ for (bindex = bstart; bindex <= bend; bindex++) {
6570+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6571+ if (!lower_dentry)
6572+ continue;
6573+
6574+ if (!lower_dentry->d_inode)
6575+ continue;
6576+ if (unionfs_lower_inode_idx(inode, bindex))
6577+ continue;
6578+ unionfs_set_lower_inode_idx(inode, bindex,
6579+ igrab(lower_dentry->d_inode));
6580+ }
6581+ ibstart(inode) = dbstart(dentry);
6582+ ibend(inode) = dbend(dentry);
6583+}
6584+
6585+/*
6586+ * make sure the branch we just looked up (nd) makes sense:
6587+ *
6588+ * 1) we're not trying to stack unionfs on top of unionfs
6589+ * 2) it exists
6590+ * 3) is a directory
6591+ */
6592+int check_branch(struct nameidata *nd)
6593+{
6594+ /* XXX: remove in ODF code -- stacking unions allowed there */
6595+ if (!strcmp(nd->dentry->d_sb->s_type->name, "unionfs"))
6596+ return -EINVAL;
6597+ if (!nd->dentry->d_inode)
6598+ return -ENOENT;
6599+ if (!S_ISDIR(nd->dentry->d_inode->i_mode))
6600+ return -ENOTDIR;
6601+ return 0;
6602+}
6603+
6604+/* checks if two lower_dentries have overlapping branches */
6605+static int is_branch_overlap(struct dentry *dent1, struct dentry *dent2)
6606+{
6607+ struct dentry *dent = NULL;
6608+
6609+ dent = dent1;
6610+ while ((dent != dent2) && (dent->d_parent != dent))
6611+ dent = dent->d_parent;
6612+
6613+ if (dent == dent2)
6614+ return 1;
6615+
6616+ dent = dent2;
6617+ while ((dent != dent1) && (dent->d_parent != dent))
6618+ dent = dent->d_parent;
6619+
6620+ return (dent == dent1);
6621+}
6622+
6623+/*
6624+ * Parse branch mode helper function
6625+ */
6626+int __parse_branch_mode(const char *name)
6627+{
6628+ if (!name)
6629+ return 0;
6630+ if (!strcmp(name, "ro"))
6631+ return MAY_READ;
6632+ if (!strcmp(name, "rw"))
6633+ return (MAY_READ | MAY_WRITE);
6634+ return 0;
6635+}
6636+
6637+/*
6638+ * Parse "ro" or "rw" options, but default to "rw" of no mode options
6639+ * was specified.
6640+ */
6641+int parse_branch_mode(const char *name)
6642+{
6643+ int perms = __parse_branch_mode(name);
6644+
6645+ if (perms == 0)
6646+ perms = MAY_READ | MAY_WRITE;
6647+ return perms;
6648+}
6649+
6650+/*
6651+ * parse the dirs= mount argument
6652+ *
6653+ * We don't need to lock the superblock private data's rwsem, as we get
6654+ * called only by unionfs_read_super - it is still a long time before anyone
6655+ * can even get a reference to us.
6656+ */
6657+static int parse_dirs_option(struct super_block *sb, struct unionfs_dentry_info
6658+ *lower_root_info, char *options)
6659+{
6660+ struct nameidata nd;
6661+ char *name;
6662+ int err = 0;
6663+ int branches = 1;
6664+ int bindex = 0;
6665+ int i = 0;
6666+ int j = 0;
6667+ struct dentry *dent1;
6668+ struct dentry *dent2;
6669+
6670+ if (options[0] == '\0') {
6671+ printk(KERN_WARNING "unionfs: no branches specified\n");
6672+ err = -EINVAL;
6673+ goto out;
6674+ }
6675+
6676+ /*
6677+ * Each colon means we have a separator, this is really just a rough
6678+ * guess, since strsep will handle empty fields for us.
6679+ */
6680+ for (i = 0; options[i]; i++)
6681+ if (options[i] == ':')
6682+ branches++;
6683+
6684+ /* allocate space for underlying pointers to lower dentry */
6685+ UNIONFS_SB(sb)->data =
6686+ kcalloc(branches, sizeof(struct unionfs_data), GFP_KERNEL);
6687+ if (!UNIONFS_SB(sb)->data) {
6688+ err = -ENOMEM;
6689+ goto out;
6690+ }
6691+
6692+ lower_root_info->lower_paths =
6693+ kcalloc(branches, sizeof(struct path), GFP_KERNEL);
6694+ if (!lower_root_info->lower_paths) {
6695+ err = -ENOMEM;
6696+ goto out;
6697+ }
6698+
6699+ /* now parsing a string such as "b1:b2=rw:b3=ro:b4" */
6700+ branches = 0;
6701+ while ((name = strsep(&options, ":")) != NULL) {
6702+ int perms;
6703+ char *mode = strchr(name, '=');
6704+
6705+ if (!name)
6706+ continue;
6707+ if (!*name) { /* bad use of ':' (extra colons) */
6708+ err = -EINVAL;
6709+ goto out;
6710+ }
6711+
6712+ branches++;
6713+
6714+ /* strip off '=' if any */
6715+ if (mode)
6716+ *mode++ = '\0';
6717+
6718+ perms = parse_branch_mode(mode);
6719+ if (!bindex && !(perms & MAY_WRITE)) {
6720+ err = -EINVAL;
6721+ goto out;
6722+ }
6723+
6724+ err = path_lookup(name, LOOKUP_FOLLOW, &nd);
6725+ if (err) {
6726+ printk(KERN_WARNING "unionfs: error accessing "
6727+ "lower directory '%s' (error %d)\n",
6728+ name, err);
6729+ goto out;
6730+ }
6731+
6732+ if ((err = check_branch(&nd))) {
6733+ printk(KERN_WARNING "unionfs: lower directory "
6734+ "'%s' is not a valid branch\n", name);
6735+ path_release(&nd);
6736+ goto out;
6737+ }
6738+
6739+ lower_root_info->lower_paths[bindex].dentry = nd.dentry;
6740+ lower_root_info->lower_paths[bindex].mnt = nd.mnt;
6741+
6742+ set_branchperms(sb, bindex, perms);
6743+ set_branch_count(sb, bindex, 0);
6744+ new_branch_id(sb, bindex);
6745+
6746+ if (lower_root_info->bstart < 0)
6747+ lower_root_info->bstart = bindex;
6748+ lower_root_info->bend = bindex;
6749+ bindex++;
6750+ }
6751+
6752+ if (branches == 0) {
6753+ printk(KERN_WARNING "unionfs: no branches specified\n");
6754+ err = -EINVAL;
6755+ goto out;
6756+ }
6757+
6758+ BUG_ON(branches != (lower_root_info->bend + 1));
6759+
6760+ /*
6761+ * Ensure that no overlaps exist in the branches.
6762+ *
6763+ * This test is required because the Linux kernel has no support
6764+ * currently for ensuring coherency between stackable layers and
6765+ * branches. If we were to allow overlapping branches, it would be
6766+ * possible, for example, to delete a file via one branch, which
6767+ * would not be reflected in another branch. Such incoherency could
6768+ * lead to inconsistencies and even kernel oopses. Rather than
6769+ * implement hacks to work around some of these cache-coherency
6770+ * problems, we prevent branch overlapping, for now. A complete
6771+ * solution will involve proper kernel/VFS support for cache
6772+ * coherency, at which time we could safely remove this
6773+ * branch-overlapping test.
6774+ */
6775+ for (i = 0; i < branches; i++) {
6776+ dent1 = lower_root_info->lower_paths[i].dentry;
6777+ for (j = i + 1; j < branches; j++) {
6778+ dent2 = lower_root_info->lower_paths[j].dentry;
6779+ if (is_branch_overlap(dent1, dent2)) {
6780+ printk(KERN_WARNING "unionfs: branches %d and "
6781+ "%d overlap\n", i, j);
6782+ err = -EINVAL;
6783+ goto out;
6784+ }
6785+ }
6786+ }
6787+
6788+out:
6789+ if (err) {
6790+ for (i = 0; i < branches; i++)
6791+ if (lower_root_info->lower_paths[i].dentry) {
6792+ dput(lower_root_info->lower_paths[i].dentry);
6793+ /* initialize: can't use unionfs_mntput here */
6794+ mntput(lower_root_info->lower_paths[i].mnt);
6795+ }
6796+
6797+ kfree(lower_root_info->lower_paths);
6798+ kfree(UNIONFS_SB(sb)->data);
6799+
6800+ /*
6801+ * MUST clear the pointers to prevent potential double free if
6802+ * the caller dies later on
6803+ */
6804+ lower_root_info->lower_paths = NULL;
6805+ UNIONFS_SB(sb)->data = NULL;
6806+ }
6807+ return err;
6808+}
6809+
6810+/*
6811+ * Parse mount options. See the manual page for usage instructions.
6812+ *
6813+ * Returns the dentry object of the lower-level (lower) directory;
6814+ * We want to mount our stackable file system on top of that lower directory.
6815+ */
6816+static struct unionfs_dentry_info *unionfs_parse_options(
6817+ struct super_block *sb,
6818+ char *options)
6819+{
6820+ struct unionfs_dentry_info *lower_root_info;
6821+ char *optname;
6822+ int err = 0;
6823+ int bindex;
6824+ int dirsfound = 0;
6825+
6826+ /* allocate private data area */
6827+ err = -ENOMEM;
6828+ lower_root_info =
6829+ kzalloc(sizeof(struct unionfs_dentry_info), GFP_KERNEL);
6830+ if (!lower_root_info)
6831+ goto out_error;
6832+ lower_root_info->bstart = -1;
6833+ lower_root_info->bend = -1;
6834+ lower_root_info->bopaque = -1;
6835+
6836+ while ((optname = strsep(&options, ",")) != NULL) {
6837+ char *optarg;
6838+ char *endptr;
6839+ int intval;
6840+
6841+ if (!optname || !*optname)
6842+ continue;
6843+
6844+ optarg = strchr(optname, '=');
6845+ if (optarg)
6846+ *optarg++ = '\0';
6847+
6848+ /*
6849+ * All of our options take an argument now. Insert ones that
6850+ * don't, above this check.
6851+ */
6852+ if (!optarg) {
6853+ printk("unionfs: %s requires an argument.\n", optname);
6854+ err = -EINVAL;
6855+ goto out_error;
6856+ }
6857+
6858+ if (!strcmp("dirs", optname)) {
6859+ if (++dirsfound > 1) {
6860+ printk(KERN_WARNING
6861+ "unionfs: multiple dirs specified\n");
6862+ err = -EINVAL;
6863+ goto out_error;
6864+ }
6865+ err = parse_dirs_option(sb, lower_root_info, optarg);
6866+ if (err)
6867+ goto out_error;
6868+ continue;
6869+ }
6870+
6871+ /* All of these options require an integer argument. */
6872+ intval = simple_strtoul(optarg, &endptr, 0);
6873+ if (*endptr) {
6874+ printk(KERN_WARNING
6875+ "unionfs: invalid %s option '%s'\n",
6876+ optname, optarg);
6877+ err = -EINVAL;
6878+ goto out_error;
6879+ }
6880+
6881+ err = -EINVAL;
6882+ printk(KERN_WARNING
6883+ "unionfs: unrecognized option '%s'\n", optname);
6884+ goto out_error;
6885+ }
6886+ if (dirsfound != 1) {
6887+ printk(KERN_WARNING "unionfs: dirs option required\n");
6888+ err = -EINVAL;
6889+ goto out_error;
6890+ }
6891+ goto out;
6892+
6893+out_error:
6894+ if (lower_root_info && lower_root_info->lower_paths) {
6895+ for (bindex = lower_root_info->bstart;
6896+ bindex >= 0 && bindex <= lower_root_info->bend;
6897+ bindex++) {
6898+ struct dentry *d;
6899+ struct vfsmount *m;
6900+
6901+ d = lower_root_info->lower_paths[bindex].dentry;
6902+ m = lower_root_info->lower_paths[bindex].mnt;
6903+
6904+ dput(d);
6905+ /* initializing: can't use unionfs_mntput here */
6906+ mntput(m);
6907+ }
6908+ }
6909+
6910+ kfree(lower_root_info->lower_paths);
6911+ kfree(lower_root_info);
6912+
6913+ kfree(UNIONFS_SB(sb)->data);
6914+ UNIONFS_SB(sb)->data = NULL;
6915+
6916+ lower_root_info = ERR_PTR(err);
6917+out:
6918+ return lower_root_info;
6919+}
6920+
6921+/*
6922+ * our custom d_alloc_root work-alike
6923+ *
6924+ * we can't use d_alloc_root if we want to use our own interpose function
6925+ * unchanged, so we simply call our own "fake" d_alloc_root
6926+ */
6927+static struct dentry *unionfs_d_alloc_root(struct super_block *sb)
6928+{
6929+ struct dentry *ret = NULL;
6930+
6931+ if (sb) {
6932+ static const struct qstr name = {.name = "/",.len = 1 };
6933+
6934+ ret = d_alloc(NULL, &name);
6935+ if (ret) {
6936+ ret->d_op = &unionfs_dops;
6937+ ret->d_sb = sb;
6938+ ret->d_parent = ret;
6939+ }
6940+ }
6941+ return ret;
6942+}
6943+
6944+/*
6945+ * There is no need to lock the unionfs_super_info's rwsem as there is no
6946+ * way anyone can have a reference to the superblock at this point in time.
6947+ */
6948+static int unionfs_read_super(struct super_block *sb, void *raw_data,
6949+ int silent)
6950+{
6951+ int err = 0;
6952+ struct unionfs_dentry_info *lower_root_info = NULL;
6953+ int bindex, bstart, bend;
6954+
6955+ if (!raw_data) {
6956+ printk(KERN_WARNING
6957+ "unionfs: read_super: missing data argument\n");
6958+ err = -EINVAL;
6959+ goto out;
6960+ }
6961+
6962+ /* Allocate superblock private data */
6963+ sb->s_fs_info = kzalloc(sizeof(struct unionfs_sb_info), GFP_KERNEL);
6964+ if (!UNIONFS_SB(sb)) {
6965+ printk(KERN_WARNING "unionfs: read_super: out of memory\n");
6966+ err = -ENOMEM;
6967+ goto out;
6968+ }
6969+
6970+ UNIONFS_SB(sb)->bend = -1;
6971+ atomic_set(&UNIONFS_SB(sb)->generation, 1);
6972+ init_rwsem(&UNIONFS_SB(sb)->rwsem);
6973+ UNIONFS_SB(sb)->high_branch_id = -1; /* -1 == invalid branch ID */
6974+
6975+ lower_root_info = unionfs_parse_options(sb, raw_data);
6976+ if (IS_ERR(lower_root_info)) {
6977+ printk(KERN_WARNING
6978+ "unionfs: read_super: error while parsing options "
6979+ "(err = %ld)\n", PTR_ERR(lower_root_info));
6980+ err = PTR_ERR(lower_root_info);
6981+ lower_root_info = NULL;
6982+ goto out_free;
6983+ }
6984+ if (lower_root_info->bstart == -1) {
6985+ err = -ENOENT;
6986+ goto out_free;
6987+ }
6988+
6989+ /* set the lower superblock field of upper superblock */
6990+ bstart = lower_root_info->bstart;
6991+ BUG_ON(bstart != 0);
6992+ sbend(sb) = bend = lower_root_info->bend;
6993+ for (bindex = bstart; bindex <= bend; bindex++) {
6994+ struct dentry *d = lower_root_info->lower_paths[bindex].dentry;
6995+ unionfs_set_lower_super_idx(sb, bindex, d->d_sb);
6996+ }
6997+
6998+ /* max Bytes is the maximum bytes from highest priority branch */
6999+ sb->s_maxbytes = unionfs_lower_super_idx(sb, 0)->s_maxbytes;
7000+
7001+ sb->s_op = &unionfs_sops;
7002+
7003+ /* See comment next to the definition of unionfs_d_alloc_root */
7004+ sb->s_root = unionfs_d_alloc_root(sb);
7005+ if (!sb->s_root) {
7006+ err = -ENOMEM;
7007+ goto out_dput;
7008+ }
7009+
7010+ /* link the upper and lower dentries */
7011+ sb->s_root->d_fsdata = NULL;
7012+ if ((err = new_dentry_private_data(sb->s_root)))
7013+ goto out_freedpd;
7014+
7015+ /* Set the lower dentries for s_root */
7016+ for (bindex = bstart; bindex <= bend; bindex++) {
7017+ struct dentry *d;
7018+ struct vfsmount *m;
7019+
7020+ d = lower_root_info->lower_paths[bindex].dentry;
7021+ m = lower_root_info->lower_paths[bindex].mnt;
7022+
7023+ unionfs_set_lower_dentry_idx(sb->s_root, bindex, d);
7024+ unionfs_set_lower_mnt_idx(sb->s_root, bindex, m);
7025+ }
7026+ set_dbstart(sb->s_root, bstart);
7027+ set_dbend(sb->s_root, bend);
7028+
7029+ /* Set the generation number to one, since this is for the mount. */
7030+ atomic_set(&UNIONFS_D(sb->s_root)->generation, 1);
7031+
7032+ /*
7033+ * Call interpose to create the upper level inode. Only
7034+ * INTERPOSE_LOOKUP can return a value other than 0 on err.
7035+ */
7036+ err = PTR_ERR(unionfs_interpose(sb->s_root, sb, 0));
7037+ unionfs_unlock_dentry(sb->s_root);
7038+ if (!err)
7039+ goto out;
7040+ /* else fall through */
7041+
7042+out_freedpd:
7043+ if (UNIONFS_D(sb->s_root)) {
7044+ kfree(UNIONFS_D(sb->s_root)->lower_paths);
7045+ free_dentry_private_data(sb->s_root);
7046+ }
7047+ dput(sb->s_root);
7048+
7049+out_dput:
7050+ if (lower_root_info && !IS_ERR(lower_root_info)) {
7051+ for (bindex = lower_root_info->bstart;
7052+ bindex <= lower_root_info->bend; bindex++) {
7053+ struct dentry *d;
7054+ struct vfsmount *m;
7055+
7056+ d = lower_root_info->lower_paths[bindex].dentry;
7057+ m = lower_root_info->lower_paths[bindex].mnt;
7058+
7059+ dput(d);
7060+ /* initializing: can't use unionfs_mntput here */
7061+ mntput(m);
7062+ }
7063+ kfree(lower_root_info->lower_paths);
7064+ kfree(lower_root_info);
7065+ lower_root_info = NULL;
7066+ }
7067+
7068+out_free:
7069+ kfree(UNIONFS_SB(sb)->data);
7070+ kfree(UNIONFS_SB(sb));
7071+ sb->s_fs_info = NULL;
7072+
7073+out:
7074+ if (lower_root_info && !IS_ERR(lower_root_info)) {
7075+ kfree(lower_root_info->lower_paths);
7076+ kfree(lower_root_info);
7077+ }
7078+ return err;
7079+}
7080+
7081+static int unionfs_get_sb(struct file_system_type *fs_type,
7082+ int flags, const char *dev_name,
7083+ void *raw_data, struct vfsmount *mnt)
7084+{
7085+ return get_sb_nodev(fs_type, flags, raw_data, unionfs_read_super, mnt);
7086+}
7087+
7088+static struct file_system_type unionfs_fs_type = {
7089+ .owner = THIS_MODULE,
7090+ .name = "unionfs",
7091+ .get_sb = unionfs_get_sb,
7092+ .kill_sb = generic_shutdown_super,
7093+ .fs_flags = FS_REVAL_DOT,
7094+};
7095+
7096+static int __init init_unionfs_fs(void)
7097+{
7098+ int err;
7099+
7100+ printk("Registering unionfs " UNIONFS_VERSION "\n");
7101+
7102+ if ((err = unionfs_init_filldir_cache()))
7103+ goto out;
7104+ if ((err = unionfs_init_inode_cache()))
7105+ goto out;
7106+ if ((err = unionfs_init_dentry_cache()))
7107+ goto out;
7108+ if ((err = init_sioq()))
7109+ goto out;
7110+ err = register_filesystem(&unionfs_fs_type);
7111+out:
7112+ if (err) {
7113+ stop_sioq();
7114+ unionfs_destroy_filldir_cache();
7115+ unionfs_destroy_inode_cache();
7116+ unionfs_destroy_dentry_cache();
7117+ }
7118+ return err;
7119+}
7120+
7121+static void __exit exit_unionfs_fs(void)
7122+{
7123+ stop_sioq();
7124+ unionfs_destroy_filldir_cache();
7125+ unionfs_destroy_inode_cache();
7126+ unionfs_destroy_dentry_cache();
7127+ unregister_filesystem(&unionfs_fs_type);
7128+ printk("Completed unionfs module unload.\n");
7129+}
7130+
7131+MODULE_AUTHOR("Erez Zadok, Filesystems and Storage Lab, Stony Brook University"
7132+ " (http://www.fsl.cs.sunysb.edu)");
7133+MODULE_DESCRIPTION("Unionfs " UNIONFS_VERSION
7134+ " (http://unionfs.filesystems.org)");
7135+MODULE_LICENSE("GPL");
7136+
7137+module_init(init_unionfs_fs);
7138+module_exit(exit_unionfs_fs);
7139diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c
7140new file mode 100644
7141index 0000000..88ef6a6
7142--- /dev/null
7143+++ b/fs/unionfs/mmap.c
7144@@ -0,0 +1,378 @@
7145+/*
7146+ * Copyright (c) 2003-2007 Erez Zadok
7147+ * Copyright (c) 2003-2006 Charles P. Wright
7148+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7149+ * Copyright (c) 2005-2006 Junjiro Okajima
7150+ * Copyright (c) 2006 Shaya Potter
7151+ * Copyright (c) 2005 Arun M. Krishnakumar
7152+ * Copyright (c) 2004-2006 David P. Quigley
7153+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7154+ * Copyright (c) 2003 Puja Gupta
7155+ * Copyright (c) 2003 Harikesavan Krishnan
7156+ * Copyright (c) 2003-2007 Stony Brook University
7157+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
7158+ *
7159+ * This program is free software; you can redistribute it and/or modify
7160+ * it under the terms of the GNU General Public License version 2 as
7161+ * published by the Free Software Foundation.
7162+ */
7163+
7164+#include "union.h"
7165+
7166+/*
7167+ * Unionfs doesn't implement ->writepages, which is OK with the VFS and
7168+ * keeps our code simpler and smaller. Nevertheless, somehow, our own
7169+ * ->writepage must be called so we can sync the upper pages with the lower
7170+ * pages: otherwise data changed at the upper layer won't get written to the
7171+ * lower layer.
7172+ *
7173+ * Some lower file systems (e.g., NFS) expect the VFS to call its writepages
7174+ * only, which in turn will call generic_writepages and invoke each of the
7175+ * lower file system's ->writepage. NFS in particular uses the
7176+ * wbc->fs_private field in its nfs_writepage, which is set in its
7177+ * nfs_writepages. So if we don't call the lower nfs_writepages first, then
7178+ * NFS's nfs_writepage will dereference a NULL wbc->fs_private and cause an
7179+ * OOPS. If, however, we implement a unionfs_writepages and then we do call
7180+ * the lower nfs_writepages, then we "lose control" over the pages we're
7181+ * trying to write to the lower file system: we won't be writing our own
7182+ * new/modified data from the upper pages to the lower pages, and any
7183+ * mmap-based changes are lost.
7184+ *
7185+ * This is a fundamental cache-coherency problem in Linux. The kernel isn't
7186+ * able to support such stacking abstractions cleanly. One possible clean
7187+ * way would be that a lower file system's ->writepage method have some sort
7188+ * of a callback to validate if any upper pages for the same file+offset
7189+ * exist and have newer content in them.
7190+ *
7191+ * This whole NULL ptr dereference is triggered at the lower file system
7192+ * (NFS) because the wbc->for_writepages is set to 1. Therefore, to avoid
7193+ * this NULL pointer dereference, we set this flag to 0 and restore it upon
7194+ * exit. This probably means that we're slightly less efficient in writing
7195+ * pages out, doing them one at a time, but at least we avoid the oops until
7196+ * such day as Linux can better support address_space_ops in a stackable
7197+ * fashion.
7198+ */
7199+static int unionfs_writepage(struct page *page, struct writeback_control *wbc)
7200+{
7201+ int err = -EIO;
7202+ struct inode *inode;
7203+ struct inode *lower_inode;
7204+ struct page *lower_page;
7205+ char *kaddr, *lower_kaddr;
7206+ int saved_for_writepages = wbc->for_writepages;
7207+
7208+ inode = page->mapping->host;
7209+ lower_inode = unionfs_lower_inode(inode);
7210+
7211+ /*
7212+ * find lower page (returns a locked page)
7213+ *
7214+ * NOTE: we used to call grab_cache_page(), but that was unnecessary
7215+ * as it would have tried to create a new lower page if it didn't
7216+ * exist, leading to deadlocks (esp. under memory-pressure
7217+ * conditions, when it is really a bad idea to *consume* more
7218+ * memory). Instead, we assume the lower page exists, and if we can
7219+ * find it, then we ->writepage on it; if we can't find it, then it
7220+ * couldn't have disappeared unless the kernel already flushed it,
7221+ * in which case we're still OK. This is especially correct if
7222+ * wbc->sync_mode is WB_SYNC_NONE (as per
7223+ * Documentation/filesystems/vfs.txt). If we can't flush our page
7224+ * because we can't find a lower page, then at least we re-mark our
7225+ * page as dirty, and return AOP_WRITEPAGE_ACTIVATE as the VFS
7226+ * expects us to. (Note, if in the future it'd turn out that we
7227+ * have to find a lower page no matter what, then we'd have to
7228+ * resort to RAIF's page pointer flipping trick.)
7229+ */
7230+ lower_page = find_lock_page(lower_inode->i_mapping, page->index);
7231+ if (!lower_page) {
7232+ err = AOP_WRITEPAGE_ACTIVATE;
7233+ set_page_dirty(page);
7234+ goto out;
7235+ }
7236+
7237+ /* get page address, and encode it */
7238+ kaddr = kmap(page);
7239+ lower_kaddr = kmap(lower_page);
7240+
7241+ memcpy(lower_kaddr, kaddr, PAGE_CACHE_SIZE);
7242+
7243+ kunmap(page);
7244+ kunmap(lower_page);
7245+
7246+ BUG_ON(!lower_inode->i_mapping->a_ops->writepage);
7247+
7248+ /* workaround for some lower file systems: see big comment on top */
7249+ if (wbc->for_writepages && !wbc->fs_private)
7250+ wbc->for_writepages = 0;
7251+
7252+ /* call lower writepage (expects locked page) */
7253+ clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */
7254+ err = lower_inode->i_mapping->a_ops->writepage(lower_page, wbc);
7255+ wbc->for_writepages = saved_for_writepages; /* restore value */
7256+
7257+ /* b/c find_lock_page locked it and ->writepage unlocks on success */
7258+ if (err)
7259+ unlock_page(lower_page);
7260+ /* b/c grab_cache_page increased refcnt */
7261+ page_cache_release(lower_page);
7262+
7263+ if (err < 0) {
7264+ ClearPageUptodate(page);
7265+ goto out;
7266+ }
7267+ if (err == AOP_WRITEPAGE_ACTIVATE) {
7268+ /*
7269+ * Lower file systems such as ramfs and tmpfs, may return
7270+ * AOP_WRITEPAGE_ACTIVATE so that the VM won't try to
7271+ * (pointlessly) write the page again for a while. But
7272+ * those lower file systems also set the page dirty bit back
7273+ * again. So we mimic that behaviour here.
7274+ */
7275+ if (PageDirty(lower_page))
7276+ set_page_dirty(page);
7277+ goto out;
7278+ }
7279+
7280+ /* all is well */
7281+ SetPageUptodate(page);
7282+ /* lower mtimes has changed: update ours */
7283+ unionfs_copy_attr_times(inode);
7284+
7285+ unlock_page(page);
7286+
7287+out:
7288+ return err;
7289+}
7290+
7291+/*
7292+ * readpage is called from generic_page_read and the fault handler.
7293+ * If your file system uses generic_page_read for the read op, it
7294+ * must implement readpage.
7295+ *
7296+ * Readpage expects a locked page, and must unlock it.
7297+ */
7298+static int unionfs_do_readpage(struct file *file, struct page *page)
7299+{
7300+ int err = -EIO;
7301+ struct file *lower_file;
7302+ struct inode *inode;
7303+ mm_segment_t old_fs;
7304+ char *page_data = NULL;
7305+ loff_t offset;
7306+
7307+ if (!UNIONFS_F(file)) {
7308+ err = -ENOENT;
7309+ goto out;
7310+ }
7311+
7312+ lower_file = unionfs_lower_file(file);
7313+ /* FIXME: is this assertion right here? */
7314+ BUG_ON(lower_file == NULL);
7315+
7316+ inode = file->f_path.dentry->d_inode;
7317+
7318+ page_data = (char *) kmap(page);
7319+ /*
7320+ * Use vfs_read because some lower file systems don't have a
7321+ * readpage method, and some file systems (esp. distributed ones)
7322+ * don't like their pages to be accessed directly. Using vfs_read
7323+ * may be a little slower, but a lot safer, as the VFS does a lot of
7324+ * the necessary magic for us.
7325+ */
7326+ offset = lower_file->f_pos = (page->index << PAGE_CACHE_SHIFT);
7327+ old_fs = get_fs();
7328+ set_fs(KERNEL_DS);
7329+ err = vfs_read(lower_file, page_data, PAGE_CACHE_SIZE,
7330+ &lower_file->f_pos);
7331+ set_fs(old_fs);
7332+
7333+ kunmap(page);
7334+
7335+ if (err < 0)
7336+ goto out;
7337+ err = 0;
7338+
7339+ /* if vfs_read succeeded above, sync up our times */
7340+ unionfs_copy_attr_times(inode);
7341+
7342+ flush_dcache_page(page);
7343+
7344+out:
7345+ if (err == 0)
7346+ SetPageUptodate(page);
7347+ else
7348+ ClearPageUptodate(page);
7349+
7350+ return err;
7351+}
7352+
7353+static int unionfs_readpage(struct file *file, struct page *page)
7354+{
7355+ int err;
7356+
7357+ unionfs_read_lock(file->f_path.dentry->d_sb);
7358+ if ((err = unionfs_file_revalidate(file, false)))
7359+ goto out;
7360+ unionfs_check_file(file);
7361+
7362+ err = unionfs_do_readpage(file, page);
7363+
7364+ if (!err) {
7365+ touch_atime(unionfs_lower_mnt(file->f_path.dentry),
7366+ unionfs_lower_dentry(file->f_path.dentry));
7367+ unionfs_copy_attr_times(file->f_path.dentry->d_inode);
7368+ }
7369+
7370+ /*
7371+ * we have to unlock our page, b/c we _might_ have gotten a locked
7372+ * page. but we no longer have to wakeup on our page here, b/c
7373+ * UnlockPage does it
7374+ */
7375+out:
7376+ unlock_page(page);
7377+ unionfs_check_file(file);
7378+ unionfs_read_unlock(file->f_path.dentry->d_sb);
7379+
7380+ return err;
7381+}
7382+
7383+static int unionfs_prepare_write(struct file *file, struct page *page,
7384+ unsigned from, unsigned to)
7385+{
7386+ int err;
7387+
7388+ unionfs_read_lock(file->f_path.dentry->d_sb);
7389+ /*
7390+ * This is the only place where we unconditionally copy the lower
7391+ * attribute times before calling unionfs_file_revalidate. The
7392+ * reason is that our ->write calls do_sync_write which in turn will
7393+ * call our ->prepare_write and then ->commit_write. Before our
7394+ * ->write is called, the lower mtimes are in sync, but by the time
7395+ * the VFS calls our ->commit_write, the lower mtimes have changed.
7396+ * Therefore, the only reasonable time for us to sync up from the
7397+ * changed lower mtimes, and avoid an invariant violation warning,
7398+ * is here, in ->prepare_write.
7399+ */
7400+ unionfs_copy_attr_times(file->f_path.dentry->d_inode);
7401+ err = unionfs_file_revalidate(file, true);
7402+ unionfs_check_file(file);
7403+ unionfs_read_unlock(file->f_path.dentry->d_sb);
7404+
7405+ return err;
7406+}
7407+
7408+static int unionfs_commit_write(struct file *file, struct page *page,
7409+ unsigned from, unsigned to)
7410+{
7411+ int err = -ENOMEM;
7412+ struct inode *inode, *lower_inode;
7413+ struct file *lower_file = NULL;
7414+ loff_t pos;
7415+ unsigned bytes = to - from;
7416+ char *page_data = NULL;
7417+ mm_segment_t old_fs;
7418+
7419+ BUG_ON(file == NULL);
7420+
7421+ unionfs_read_lock(file->f_path.dentry->d_sb);
7422+ if ((err = unionfs_file_revalidate(file, true)))
7423+ goto out;
7424+ unionfs_check_file(file);
7425+
7426+ inode = page->mapping->host;
7427+ lower_inode = unionfs_lower_inode(inode);
7428+
7429+ if (UNIONFS_F(file) != NULL)
7430+ lower_file = unionfs_lower_file(file);
7431+
7432+ /* FIXME: is this assertion right here? */
7433+ BUG_ON(lower_file == NULL);
7434+
7435+ page_data = (char *)kmap(page);
7436+ lower_file->f_pos = (page->index << PAGE_CACHE_SHIFT) + from;
7437+
7438+ /*
7439+ * SP: I use vfs_write instead of copying page data and the
7440+ * prepare_write/commit_write combo because file system's like
7441+ * GFS/OCFS2 don't like things touching those directly,
7442+ * calling the underlying write op, while a little bit slower, will
7443+ * call all the FS specific code as well
7444+ */
7445+ old_fs = get_fs();
7446+ set_fs(KERNEL_DS);
7447+ err = vfs_write(lower_file, page_data + from, bytes,
7448+ &lower_file->f_pos);
7449+ set_fs(old_fs);
7450+
7451+ kunmap(page);
7452+
7453+ if (err < 0)
7454+ goto out;
7455+
7456+ inode->i_blocks = lower_inode->i_blocks;
7457+ /* we may have to update i_size */
7458+ pos = ((loff_t) page->index << PAGE_CACHE_SHIFT) + to;
7459+ if (pos > i_size_read(inode))
7460+ i_size_write(inode, pos);
7461+ /* if vfs_write succeeded above, sync up our times */
7462+ unionfs_copy_attr_times(inode);
7463+ mark_inode_dirty_sync(inode);
7464+
7465+out:
7466+ if (err < 0)
7467+ ClearPageUptodate(page);
7468+
7469+ unionfs_read_unlock(file->f_path.dentry->d_sb);
7470+ unionfs_check_file(file);
7471+ return err; /* assume all is ok */
7472+}
7473+
7474+static void unionfs_sync_page(struct page *page)
7475+{
7476+ struct inode *inode;
7477+ struct inode *lower_inode;
7478+ struct page *lower_page;
7479+ struct address_space *mapping;
7480+
7481+ inode = page->mapping->host;
7482+ lower_inode = unionfs_lower_inode(inode);
7483+
7484+ /*
7485+ * Find lower page (returns a locked page).
7486+ *
7487+ * NOTE: we used to call grab_cache_page(), but that was unnecessary
7488+ * as it would have tried to create a new lower page if it didn't
7489+ * exist, leading to deadlocks. All our sync_page method needs to
7490+ * do is ensure that pending I/O gets done.
7491+ */
7492+ lower_page = find_lock_page(lower_inode->i_mapping, page->index);
7493+ if (!lower_page) {
7494+ printk(KERN_DEBUG "unionfs: find_lock_page failed\n");
7495+ goto out;
7496+ }
7497+
7498+ /* do the actual sync */
7499+ mapping = lower_page->mapping;
7500+ /*
7501+ * XXX: can we optimize ala RAIF and set the lower page to be
7502+ * discarded after a successful sync_page?
7503+ */
7504+ if (mapping && mapping->a_ops && mapping->a_ops->sync_page)
7505+ mapping->a_ops->sync_page(lower_page);
7506+
7507+ /* b/c find_lock_page locked it */
7508+ unlock_page(lower_page);
7509+ /* b/c find_lock_page increased refcnt */
7510+ page_cache_release(lower_page);
7511+
7512+out:
7513+ return;
7514+}
7515+
7516+struct address_space_operations unionfs_aops = {
7517+ .writepage = unionfs_writepage,
7518+ .readpage = unionfs_readpage,
7519+ .prepare_write = unionfs_prepare_write,
7520+ .commit_write = unionfs_commit_write,
7521+ .sync_page = unionfs_sync_page,
7522+};
7523diff --git a/fs/unionfs/rdstate.c b/fs/unionfs/rdstate.c
7524new file mode 100644
7525index 0000000..5c9d14b
7526--- /dev/null
7527+++ b/fs/unionfs/rdstate.c
7528@@ -0,0 +1,282 @@
7529+/*
7530+ * Copyright (c) 2003-2007 Erez Zadok
7531+ * Copyright (c) 2003-2006 Charles P. Wright
7532+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7533+ * Copyright (c) 2005-2006 Junjiro Okajima
7534+ * Copyright (c) 2005 Arun M. Krishnakumar
7535+ * Copyright (c) 2004-2006 David P. Quigley
7536+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7537+ * Copyright (c) 2003 Puja Gupta
7538+ * Copyright (c) 2003 Harikesavan Krishnan
7539+ * Copyright (c) 2003-2007 Stony Brook University
7540+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
7541+ *
7542+ * This program is free software; you can redistribute it and/or modify
7543+ * it under the terms of the GNU General Public License version 2 as
7544+ * published by the Free Software Foundation.
7545+ */
7546+
7547+#include "union.h"
7548+
7549+/* This file contains the routines for maintaining readdir state. */
7550+
7551+/*
7552+ * There are two structures here, rdstate which is a hash table
7553+ * of the second structure which is a filldir_node.
7554+ */
7555+
7556+/*
7557+ * This is a struct kmem_cache for filldir nodes, because we allocate a lot
7558+ * of them and they shouldn't waste memory. If the node has a small name
7559+ * (as defined by the dentry structure), then we use an inline name to
7560+ * preserve kmalloc space.
7561+ */
7562+static struct kmem_cache *unionfs_filldir_cachep;
7563+
7564+int unionfs_init_filldir_cache(void)
7565+{
7566+ unionfs_filldir_cachep =
7567+ kmem_cache_create("unionfs_filldir",
7568+ sizeof(struct filldir_node), 0,
7569+ SLAB_RECLAIM_ACCOUNT, NULL, NULL);
7570+
7571+ return (unionfs_filldir_cachep ? 0 : -ENOMEM);
7572+}
7573+
7574+void unionfs_destroy_filldir_cache(void)
7575+{
7576+ if (unionfs_filldir_cachep)
7577+ kmem_cache_destroy(unionfs_filldir_cachep);
7578+}
7579+
7580+/*
7581+ * This is a tuning parameter that tells us roughly how big to make the
7582+ * hash table in directory entries per page. This isn't perfect, but
7583+ * at least we get a hash table size that shouldn't be too overloaded.
7584+ * The following averages are based on my home directory.
7585+ * 14.44693 Overall
7586+ * 12.29 Single Page Directories
7587+ * 117.93 Multi-page directories
7588+ */
7589+#define DENTPAGE 4096
7590+#define DENTPERONEPAGE 12
7591+#define DENTPERPAGE 118
7592+#define MINHASHSIZE 1
7593+static int guesstimate_hash_size(struct inode *inode)
7594+{
7595+ struct inode *lower_inode;
7596+ int bindex;
7597+ int hashsize = MINHASHSIZE;
7598+
7599+ if (UNIONFS_I(inode)->hashsize > 0)
7600+ return UNIONFS_I(inode)->hashsize;
7601+
7602+ for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
7603+ if (!(lower_inode = unionfs_lower_inode_idx(inode, bindex)))
7604+ continue;
7605+
7606+ if (lower_inode->i_size == DENTPAGE)
7607+ hashsize += DENTPERONEPAGE;
7608+ else
7609+ hashsize += (lower_inode->i_size / DENTPAGE) *
7610+ DENTPERPAGE;
7611+ }
7612+
7613+ return hashsize;
7614+}
7615+
7616+int init_rdstate(struct file *file)
7617+{
7618+ BUG_ON(sizeof(loff_t) !=
7619+ (sizeof(unsigned int) + sizeof(unsigned int)));
7620+ BUG_ON(UNIONFS_F(file)->rdstate != NULL);
7621+
7622+ UNIONFS_F(file)->rdstate = alloc_rdstate(file->f_path.dentry->d_inode,
7623+ fbstart(file));
7624+
7625+ return (UNIONFS_F(file)->rdstate ? 0 : -ENOMEM);
7626+}
7627+
7628+struct unionfs_dir_state *find_rdstate(struct inode *inode, loff_t fpos)
7629+{
7630+ struct unionfs_dir_state *rdstate = NULL;
7631+ struct list_head *pos;
7632+
7633+ spin_lock(&UNIONFS_I(inode)->rdlock);
7634+ list_for_each(pos, &UNIONFS_I(inode)->readdircache) {
7635+ struct unionfs_dir_state *r =
7636+ list_entry(pos, struct unionfs_dir_state, cache);
7637+ if (fpos == rdstate2offset(r)) {
7638+ UNIONFS_I(inode)->rdcount--;
7639+ list_del(&r->cache);
7640+ rdstate = r;
7641+ break;
7642+ }
7643+ }
7644+ spin_unlock(&UNIONFS_I(inode)->rdlock);
7645+ return rdstate;
7646+}
7647+
7648+struct unionfs_dir_state *alloc_rdstate(struct inode *inode, int bindex)
7649+{
7650+ int i = 0;
7651+ int hashsize;
7652+ unsigned long mallocsize = sizeof(struct unionfs_dir_state);
7653+ struct unionfs_dir_state *rdstate;
7654+
7655+ hashsize = guesstimate_hash_size(inode);
7656+ mallocsize += hashsize * sizeof(struct list_head);
7657+ mallocsize = __roundup_pow_of_two(mallocsize);
7658+
7659+ /* This should give us about 500 entries anyway. */
7660+ if (mallocsize > PAGE_SIZE)
7661+ mallocsize = PAGE_SIZE;
7662+
7663+ hashsize = (mallocsize - sizeof(struct unionfs_dir_state)) /
7664+ sizeof(struct list_head);
7665+
7666+ rdstate = kmalloc(mallocsize, GFP_KERNEL);
7667+ if (!rdstate)
7668+ return NULL;
7669+
7670+ spin_lock(&UNIONFS_I(inode)->rdlock);
7671+ if (UNIONFS_I(inode)->cookie >= (MAXRDCOOKIE - 1))
7672+ UNIONFS_I(inode)->cookie = 1;
7673+ else
7674+ UNIONFS_I(inode)->cookie++;
7675+
7676+ rdstate->cookie = UNIONFS_I(inode)->cookie;
7677+ spin_unlock(&UNIONFS_I(inode)->rdlock);
7678+ rdstate->offset = 1;
7679+ rdstate->access = jiffies;
7680+ rdstate->bindex = bindex;
7681+ rdstate->dirpos = 0;
7682+ rdstate->hashentries = 0;
7683+ rdstate->size = hashsize;
7684+ for (i = 0; i < rdstate->size; i++)
7685+ INIT_LIST_HEAD(&rdstate->list[i]);
7686+
7687+ return rdstate;
7688+}
7689+
7690+static void free_filldir_node(struct filldir_node *node)
7691+{
7692+ if (node->namelen >= DNAME_INLINE_LEN_MIN)
7693+ kfree(node->name);
7694+ kmem_cache_free(unionfs_filldir_cachep, node);
7695+}
7696+
7697+void free_rdstate(struct unionfs_dir_state *state)
7698+{
7699+ struct filldir_node *tmp;
7700+ int i;
7701+
7702+ for (i = 0; i < state->size; i++) {
7703+ struct list_head *head = &(state->list[i]);
7704+ struct list_head *pos, *n;
7705+
7706+ /* traverse the list and deallocate space */
7707+ list_for_each_safe(pos, n, head) {
7708+ tmp = list_entry(pos, struct filldir_node, file_list);
7709+ list_del(&tmp->file_list);
7710+ free_filldir_node(tmp);
7711+ }
7712+ }
7713+
7714+ kfree(state);
7715+}
7716+
7717+struct filldir_node *find_filldir_node(struct unionfs_dir_state *rdstate,
7718+ const char *name, int namelen)
7719+{
7720+ int index;
7721+ unsigned int hash;
7722+ struct list_head *head;
7723+ struct list_head *pos;
7724+ struct filldir_node *cursor = NULL;
7725+ int found = 0;
7726+
7727+ BUG_ON(namelen <= 0);
7728+
7729+ hash = full_name_hash(name, namelen);
7730+ index = hash % rdstate->size;
7731+
7732+ head = &(rdstate->list[index]);
7733+ list_for_each(pos, head) {
7734+ cursor = list_entry(pos, struct filldir_node, file_list);
7735+
7736+ if (cursor->namelen == namelen && cursor->hash == hash &&
7737+ !strncmp(cursor->name, name, namelen)) {
7738+ /*
7739+ * a duplicate exists, and hence no need to create
7740+ * entry to the list
7741+ */
7742+ found = 1;
7743+
7744+ /*
7745+ * if the duplicate is in this branch, then the file
7746+ * system is corrupted.
7747+ */
7748+ if (cursor->bindex == rdstate->bindex) {
7749+ printk(KERN_DEBUG "unionfs: filldir: possible "
7750+ "I/O error: a file is duplicated "
7751+ "in the same branch %d: %s\n",
7752+ rdstate->bindex, cursor->name);
7753+ }
7754+ break;
7755+ }
7756+ }
7757+
7758+ if (!found)
7759+ cursor = NULL;
7760+
7761+ return cursor;
7762+}
7763+
7764+int add_filldir_node(struct unionfs_dir_state *rdstate, const char *name,
7765+ int namelen, int bindex, int whiteout)
7766+{
7767+ struct filldir_node *new;
7768+ unsigned int hash;
7769+ int index;
7770+ int err = 0;
7771+ struct list_head *head;
7772+
7773+ BUG_ON(namelen <= 0);
7774+
7775+ hash = full_name_hash(name, namelen);
7776+ index = hash % rdstate->size;
7777+ head = &(rdstate->list[index]);
7778+
7779+ new = kmem_cache_alloc(unionfs_filldir_cachep, GFP_KERNEL);
7780+ if (!new) {
7781+ err = -ENOMEM;
7782+ goto out;
7783+ }
7784+
7785+ INIT_LIST_HEAD(&new->file_list);
7786+ new->namelen = namelen;
7787+ new->hash = hash;
7788+ new->bindex = bindex;
7789+ new->whiteout = whiteout;
7790+
7791+ if (namelen < DNAME_INLINE_LEN_MIN)
7792+ new->name = new->iname;
7793+ else {
7794+ new->name = kmalloc(namelen + 1, GFP_KERNEL);
7795+ if (!new->name) {
7796+ kmem_cache_free(unionfs_filldir_cachep, new);
7797+ new = NULL;
7798+ goto out;
7799+ }
7800+ }
7801+
7802+ memcpy(new->name, name, namelen);
7803+ new->name[namelen] = '\0';
7804+
7805+ rdstate->hashentries++;
7806+
7807+ list_add(&(new->file_list), head);
7808+out:
7809+ return err;
7810+}
7811diff --git a/fs/unionfs/rename.c b/fs/unionfs/rename.c
7812new file mode 100644
7813index 0000000..807ad73
7814--- /dev/null
7815+++ b/fs/unionfs/rename.c
7816@@ -0,0 +1,521 @@
7817+/*
7818+ * Copyright (c) 2003-2007 Erez Zadok
7819+ * Copyright (c) 2003-2006 Charles P. Wright
7820+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7821+ * Copyright (c) 2005-2006 Junjiro Okajima
7822+ * Copyright (c) 2005 Arun M. Krishnakumar
7823+ * Copyright (c) 2004-2006 David P. Quigley
7824+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7825+ * Copyright (c) 2003 Puja Gupta
7826+ * Copyright (c) 2003 Harikesavan Krishnan
7827+ * Copyright (c) 2003-2007 Stony Brook University
7828+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
7829+ *
7830+ * This program is free software; you can redistribute it and/or modify
7831+ * it under the terms of the GNU General Public License version 2 as
7832+ * published by the Free Software Foundation.
7833+ */
7834+
7835+#include "union.h"
7836+
7837+static int __unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
7838+ struct inode *new_dir, struct dentry *new_dentry,
7839+ int bindex, struct dentry **wh_old)
7840+{
7841+ int err = 0;
7842+ struct dentry *lower_old_dentry;
7843+ struct dentry *lower_new_dentry;
7844+ struct dentry *lower_old_dir_dentry;
7845+ struct dentry *lower_new_dir_dentry;
7846+ struct dentry *lower_wh_dentry;
7847+ struct dentry *lower_wh_dir_dentry;
7848+ char *wh_name = NULL;
7849+
7850+ lower_new_dentry = unionfs_lower_dentry_idx(new_dentry, bindex);
7851+ lower_old_dentry = unionfs_lower_dentry_idx(old_dentry, bindex);
7852+
7853+ if (!lower_new_dentry) {
7854+ lower_new_dentry =
7855+ create_parents(new_dentry->d_parent->d_inode,
7856+ new_dentry, new_dentry->d_name.name,
7857+ bindex);
7858+ if (IS_ERR(lower_new_dentry)) {
7859+ printk(KERN_DEBUG "unionfs: error creating directory "
7860+ "tree for rename, bindex = %d, err = %ld\n",
7861+ bindex, PTR_ERR(lower_new_dentry));
7862+ err = PTR_ERR(lower_new_dentry);
7863+ goto out;
7864+ }
7865+ }
7866+
7867+ wh_name = alloc_whname(new_dentry->d_name.name,
7868+ new_dentry->d_name.len);
7869+ if (IS_ERR(wh_name)) {
7870+ err = PTR_ERR(wh_name);
7871+ goto out;
7872+ }
7873+
7874+ lower_wh_dentry = lookup_one_len(wh_name, lower_new_dentry->d_parent,
7875+ new_dentry->d_name.len +
7876+ UNIONFS_WHLEN);
7877+ if (IS_ERR(lower_wh_dentry)) {
7878+ err = PTR_ERR(lower_wh_dentry);
7879+ goto out;
7880+ }
7881+
7882+ if (lower_wh_dentry->d_inode) {
7883+ /* get rid of the whiteout that is existing */
7884+ if (lower_new_dentry->d_inode) {
7885+ printk(KERN_WARNING "unionfs: both a whiteout and a "
7886+ "dentry exist when doing a rename!\n");
7887+ err = -EIO;
7888+
7889+ dput(lower_wh_dentry);
7890+ goto out;
7891+ }
7892+
7893+ lower_wh_dir_dentry = lock_parent(lower_wh_dentry);
7894+ if (!(err = is_robranch_super(old_dentry->d_sb, bindex)))
7895+ err = vfs_unlink(lower_wh_dir_dentry->d_inode,
7896+ lower_wh_dentry);
7897+
7898+ dput(lower_wh_dentry);
7899+ unlock_dir(lower_wh_dir_dentry);
7900+ if (err)
7901+ goto out;
7902+ } else
7903+ dput(lower_wh_dentry);
7904+
7905+ dget(lower_old_dentry);
7906+ lower_old_dir_dentry = dget_parent(lower_old_dentry);
7907+ lower_new_dir_dentry = dget_parent(lower_new_dentry);
7908+
7909+ lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
7910+
7911+ err = is_robranch_super(old_dentry->d_sb, bindex);
7912+ if (err)
7913+ goto out_unlock;
7914+
7915+ /*
7916+ * ready to whiteout for old_dentry. caller will create the actual
7917+ * whiteout, and must dput(*wh_old)
7918+ */
7919+ if (wh_old) {
7920+ char *whname;
7921+ whname = alloc_whname(old_dentry->d_name.name,
7922+ old_dentry->d_name.len);
7923+ err = PTR_ERR(whname);
7924+ if (IS_ERR(whname))
7925+ goto out_unlock;
7926+ *wh_old = lookup_one_len(whname, lower_old_dir_dentry,
7927+ old_dentry->d_name.len +
7928+ UNIONFS_WHLEN);
7929+ kfree(whname);
7930+ err = PTR_ERR(*wh_old);
7931+ if (IS_ERR(*wh_old)) {
7932+ *wh_old = NULL;
7933+ goto out_unlock;
7934+ }
7935+ }
7936+
7937+ err = vfs_rename(lower_old_dir_dentry->d_inode, lower_old_dentry,
7938+ lower_new_dir_dentry->d_inode, lower_new_dentry);
7939+
7940+out_unlock:
7941+ unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
7942+
7943+ dput(lower_old_dir_dentry);
7944+ dput(lower_new_dir_dentry);
7945+ dput(lower_old_dentry);
7946+
7947+out:
7948+ if (!err) {
7949+ /* Fixup the new_dentry. */
7950+ if (bindex < dbstart(new_dentry))
7951+ set_dbstart(new_dentry, bindex);
7952+ else if (bindex > dbend(new_dentry))
7953+ set_dbend(new_dentry, bindex);
7954+ }
7955+
7956+ kfree(wh_name);
7957+
7958+ return err;
7959+}
7960+
7961+/*
7962+ * Main rename code. This is sufficiently complex, that it's documented in
7963+ * Documentation/filesystems/unionfs/rename.txt. This routine calls
7964+ * __unionfs_rename() above to perform some of the work.
7965+ */
7966+static int do_unionfs_rename(struct inode *old_dir,
7967+ struct dentry *old_dentry,
7968+ struct inode *new_dir,
7969+ struct dentry *new_dentry)
7970+{
7971+ int err = 0;
7972+ int bindex, bwh_old;
7973+ int old_bstart, old_bend;
7974+ int new_bstart, new_bend;
7975+ int do_copyup = -1;
7976+ struct dentry *parent_dentry;
7977+ int local_err = 0;
7978+ int eio = 0;
7979+ int revert = 0;
7980+ struct dentry *wh_old = NULL;
7981+
7982+ old_bstart = dbstart(old_dentry);
7983+ bwh_old = old_bstart;
7984+ old_bend = dbend(old_dentry);
7985+ parent_dentry = old_dentry->d_parent;
7986+
7987+ new_bstart = dbstart(new_dentry);
7988+ new_bend = dbend(new_dentry);
7989+
7990+ /* Rename source to destination. */
7991+ err = __unionfs_rename(old_dir, old_dentry, new_dir, new_dentry,
7992+ old_bstart, &wh_old);
7993+ if (err) {
7994+ if (!IS_COPYUP_ERR(err))
7995+ goto out;
7996+ do_copyup = old_bstart - 1;
7997+ } else
7998+ revert = 1;
7999+
8000+ /*
8001+ * Unlink all instances of destination that exist to the left of
8002+ * bstart of source. On error, revert back, goto out.
8003+ */
8004+ for (bindex = old_bstart - 1; bindex >= new_bstart; bindex--) {
8005+ struct dentry *unlink_dentry;
8006+ struct dentry *unlink_dir_dentry;
8007+
8008+ unlink_dentry = unionfs_lower_dentry_idx(new_dentry, bindex);
8009+ if (!unlink_dentry)
8010+ continue;
8011+
8012+ unlink_dir_dentry = lock_parent(unlink_dentry);
8013+ if (!(err = is_robranch_super(old_dir->i_sb, bindex)))
8014+ err = vfs_unlink(unlink_dir_dentry->d_inode,
8015+ unlink_dentry);
8016+
8017+ fsstack_copy_attr_times(new_dentry->d_parent->d_inode,
8018+ unlink_dir_dentry->d_inode);
8019+ /* propagate number of hard-links */
8020+ new_dentry->d_parent->d_inode->i_nlink =
8021+ unionfs_get_nlinks(new_dentry->d_parent->d_inode);
8022+
8023+ unlock_dir(unlink_dir_dentry);
8024+ if (!err) {
8025+ if (bindex != new_bstart) {
8026+ dput(unlink_dentry);
8027+ unionfs_set_lower_dentry_idx(new_dentry,
8028+ bindex, NULL);
8029+ }
8030+ } else if (IS_COPYUP_ERR(err)) {
8031+ do_copyup = bindex - 1;
8032+ } else if (revert) {
8033+ dput(wh_old);
8034+ goto revert;
8035+ }
8036+ }
8037+
8038+ if (do_copyup != -1) {
8039+ for (bindex = do_copyup; bindex >= 0; bindex--) {
8040+ /*
8041+ * copyup the file into some left directory, so that
8042+ * you can rename it
8043+ */
8044+ err = copyup_dentry(old_dentry->d_parent->d_inode,
8045+ old_dentry, old_bstart, bindex,
8046+ old_dentry->d_name.name,
8047+ old_dentry->d_name.len,
8048+ NULL, old_dentry->d_inode->i_size);
8049+ /* if copyup failed, try next branch to the left */
8050+ if (err)
8051+ continue;
8052+ dput(wh_old);
8053+ bwh_old = bindex;
8054+ err = __unionfs_rename(old_dir, old_dentry,
8055+ new_dir, new_dentry,
8056+ bindex, &wh_old);
8057+ break;
8058+ }
8059+ }
8060+
8061+ /* make it opaque */
8062+ if (S_ISDIR(old_dentry->d_inode->i_mode)) {
8063+ err = make_dir_opaque(old_dentry, dbstart(old_dentry));
8064+ if (err)
8065+ goto revert;
8066+ }
8067+
8068+ /*
8069+ * Create whiteout for source, only if:
8070+ * (1) There is more than one underlying instance of source.
8071+ * (2) We did a copy_up
8072+ */
8073+ if ((old_bstart != old_bend) || (do_copyup != -1)) {
8074+ struct dentry *lower_parent;
8075+ if (!wh_old || wh_old->d_inode || bwh_old < 0) {
8076+ printk(KERN_ERR "unionfs: rename error "
8077+ "(wh_old=%p/%p bwh_old=%d)\n", wh_old,
8078+ (wh_old ? wh_old->d_inode : NULL), bwh_old);
8079+ err = -EIO;
8080+ goto out;
8081+ }
8082+ lower_parent = lock_parent(wh_old);
8083+ local_err = vfs_create(lower_parent->d_inode, wh_old, S_IRUGO,
8084+ NULL);
8085+ unlock_dir(lower_parent);
8086+ if (!local_err)
8087+ set_dbopaque(old_dentry, bwh_old);
8088+ else {
8089+ /*
8090+ * we can't fix anything now, so we cop-out and use
8091+ * -EIO.
8092+ */
8093+ printk(KERN_ERR "unionfs: can't create a whiteout for "
8094+ "the source in rename!\n");
8095+ err = -EIO;
8096+ }
8097+ }
8098+
8099+out:
8100+ dput(wh_old);
8101+ return err;
8102+
8103+revert:
8104+ /* Do revert here. */
8105+ local_err = unionfs_refresh_lower_dentry(new_dentry, old_bstart);
8106+ if (local_err) {
8107+ printk(KERN_WARNING "unionfs: revert failed in rename: "
8108+ "the new refresh failed.\n");
8109+ eio = -EIO;
8110+ }
8111+
8112+ local_err = unionfs_refresh_lower_dentry(old_dentry, old_bstart);
8113+ if (local_err) {
8114+ printk(KERN_WARNING "unionfs: revert failed in rename: "
8115+ "the old refresh failed.\n");
8116+ eio = -EIO;
8117+ goto revert_out;
8118+ }
8119+
8120+ if (!unionfs_lower_dentry_idx(new_dentry, bindex) ||
8121+ !unionfs_lower_dentry_idx(new_dentry, bindex)->d_inode) {
8122+ printk(KERN_WARNING "unionfs: revert failed in rename: "
8123+ "the object disappeared from under us!\n");
8124+ eio = -EIO;
8125+ goto revert_out;
8126+ }
8127+
8128+ if (unionfs_lower_dentry_idx(old_dentry, bindex) &&
8129+ unionfs_lower_dentry_idx(old_dentry, bindex)->d_inode) {
8130+ printk(KERN_WARNING "unionfs: revert failed in rename: "
8131+ "the object was created underneath us!\n");
8132+ eio = -EIO;
8133+ goto revert_out;
8134+ }
8135+
8136+ local_err = __unionfs_rename(new_dir, new_dentry,
8137+ old_dir, old_dentry, old_bstart, NULL);
8138+
8139+ /* If we can't fix it, then we cop-out with -EIO. */
8140+ if (local_err) {
8141+ printk(KERN_WARNING "unionfs: revert failed in rename!\n");
8142+ eio = -EIO;
8143+ }
8144+
8145+ local_err = unionfs_refresh_lower_dentry(new_dentry, bindex);
8146+ if (local_err)
8147+ eio = -EIO;
8148+ local_err = unionfs_refresh_lower_dentry(old_dentry, bindex);
8149+ if (local_err)
8150+ eio = -EIO;
8151+
8152+revert_out:
8153+ if (eio)
8154+ err = eio;
8155+ return err;
8156+}
8157+
8158+static struct dentry *lookup_whiteout(struct dentry *dentry)
8159+{
8160+ char *whname;
8161+ int bindex = -1, bstart = -1, bend = -1;
8162+ struct dentry *parent, *lower_parent, *wh_dentry;
8163+
8164+ whname = alloc_whname(dentry->d_name.name, dentry->d_name.len);
8165+ if (IS_ERR(whname))
8166+ return (void *)whname;
8167+
8168+ parent = dget_parent(dentry);
8169+ unionfs_lock_dentry(parent);
8170+ bstart = dbstart(parent);
8171+ bend = dbend(parent);
8172+ wh_dentry = ERR_PTR(-ENOENT);
8173+ for (bindex = bstart; bindex <= bend; bindex++) {
8174+ lower_parent = unionfs_lower_dentry_idx(parent, bindex);
8175+ if (!lower_parent)
8176+ continue;
8177+ wh_dentry = lookup_one_len(whname, lower_parent,
8178+ dentry->d_name.len + UNIONFS_WHLEN);
8179+ if (IS_ERR(wh_dentry))
8180+ continue;
8181+ if (wh_dentry->d_inode)
8182+ break;
8183+ dput(wh_dentry);
8184+ wh_dentry = ERR_PTR(-ENOENT);
8185+ }
8186+ unionfs_unlock_dentry(parent);
8187+ dput(parent);
8188+ kfree(whname);
8189+ return wh_dentry;
8190+}
8191+
8192+/*
8193+ * We can't copyup a directory, because it may involve huge numbers of
8194+ * children, etc. Doing that in the kernel would be bad, so instead we
8195+ * return EXDEV to the user-space utility that caused this, and let the
8196+ * user-space recurse and ask us to copy up each file separately.
8197+ */
8198+static int may_rename_dir(struct dentry *dentry)
8199+{
8200+ int err, bstart;
8201+
8202+ err = check_empty(dentry, NULL);
8203+ if (err == -ENOTEMPTY) {
8204+ if (is_robranch(dentry))
8205+ return -EXDEV;
8206+ } else if (err)
8207+ return err;
8208+
8209+ bstart = dbstart(dentry);
8210+ if (dbend(dentry) == bstart || dbopaque(dentry) == bstart)
8211+ return 0;
8212+
8213+ set_dbstart(dentry, bstart + 1);
8214+ err = check_empty(dentry, NULL);
8215+ set_dbstart(dentry, bstart);
8216+ if (err == -ENOTEMPTY)
8217+ err = -EXDEV;
8218+ return err;
8219+}
8220+
8221+int unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
8222+ struct inode *new_dir, struct dentry *new_dentry)
8223+{
8224+ int err = 0;
8225+ struct dentry *wh_dentry;
8226+
8227+ unionfs_read_lock(old_dentry->d_sb);
8228+ unionfs_double_lock_dentry(old_dentry, new_dentry);
8229+
8230+ if (!__unionfs_d_revalidate_chain(old_dentry, NULL, false)) {
8231+ err = -ESTALE;
8232+ goto out;
8233+ }
8234+ if (!d_deleted(new_dentry) && new_dentry->d_inode &&
8235+ !__unionfs_d_revalidate_chain(new_dentry, NULL, false)) {
8236+ err = -ESTALE;
8237+ goto out;
8238+ }
8239+
8240+ if (!S_ISDIR(old_dentry->d_inode->i_mode))
8241+ err = unionfs_partial_lookup(old_dentry);
8242+ else
8243+ err = may_rename_dir(old_dentry);
8244+
8245+ if (err)
8246+ goto out;
8247+
8248+ err = unionfs_partial_lookup(new_dentry);
8249+ if (err)
8250+ goto out;
8251+
8252+ /*
8253+ * if new_dentry is already lower because of whiteout,
8254+ * simply override it even if the whited-out dir is not empty.
8255+ */
8256+ wh_dentry = lookup_whiteout(new_dentry);
8257+ if (!IS_ERR(wh_dentry))
8258+ dput(wh_dentry);
8259+ else if (new_dentry->d_inode) {
8260+ if (S_ISDIR(old_dentry->d_inode->i_mode) !=
8261+ S_ISDIR(new_dentry->d_inode->i_mode)) {
8262+ err = S_ISDIR(old_dentry->d_inode->i_mode) ?
8263+ -ENOTDIR : -EISDIR;
8264+ goto out;
8265+ }
8266+
8267+ if (S_ISDIR(new_dentry->d_inode->i_mode)) {
8268+ struct unionfs_dir_state *namelist;
8269+ /* check if this unionfs directory is empty or not */
8270+ err = check_empty(new_dentry, &namelist);
8271+ if (err)
8272+ goto out;
8273+
8274+ if (!is_robranch(new_dentry))
8275+ err = delete_whiteouts(new_dentry,
8276+ dbstart(new_dentry),
8277+ namelist);
8278+
8279+ free_rdstate(namelist);
8280+
8281+ if (err)
8282+ goto out;
8283+ }
8284+ }
8285+ err = do_unionfs_rename(old_dir, old_dentry, new_dir, new_dentry);
8286+out:
8287+ if (err)
8288+ /* clear the new_dentry stuff created */
8289+ d_drop(new_dentry);
8290+ else {
8291+ /*
8292+ * force re-lookup since the dir on ro branch is not renamed,
8293+ * and lower dentries still indicate the un-renamed ones.
8294+ */
8295+ if (S_ISDIR(old_dentry->d_inode->i_mode))
8296+ atomic_dec(&UNIONFS_D(old_dentry)->generation);
8297+ else
8298+ unionfs_postcopyup_release(old_dentry);
8299+ if (new_dentry->d_inode &&
8300+ !S_ISDIR(new_dentry->d_inode->i_mode)) {
8301+ unionfs_postcopyup_release(new_dentry);
8302+ unionfs_postcopyup_setmnt(new_dentry);
8303+ if (!unionfs_lower_inode(new_dentry->d_inode)) {
8304+ /*
8305+ * If we get here, it means that no copyup
8306+ * was needed, and that a file by the old
8307+ * name already existing on the destination
8308+ * branch; that file got renamed earlier in
8309+ * this function, so all we need to do here
8310+ * is set the lower inode.
8311+ */
8312+ struct inode *inode;
8313+ inode = unionfs_lower_inode(
8314+ old_dentry->d_inode);
8315+ igrab(inode);
8316+ unionfs_set_lower_inode_idx(
8317+ new_dentry->d_inode,
8318+ dbstart(new_dentry), inode);
8319+ }
8320+
8321+ }
8322+ /* if all of this renaming succeeded, update our times */
8323+ unionfs_copy_attr_times(old_dir);
8324+ unionfs_copy_attr_times(new_dir);
8325+ unionfs_copy_attr_times(old_dentry->d_inode);
8326+ unionfs_copy_attr_times(new_dentry->d_inode);
8327+ unionfs_check_inode(old_dir);
8328+ unionfs_check_inode(new_dir);
8329+ unionfs_check_dentry(old_dentry);
8330+ unionfs_check_dentry(new_dentry);
8331+ }
8332+
8333+ unionfs_unlock_dentry(new_dentry);
8334+ unionfs_unlock_dentry(old_dentry);
8335+ unionfs_read_unlock(old_dentry->d_sb);
8336+ return err;
8337+}
8338diff --git a/fs/unionfs/sioq.c b/fs/unionfs/sioq.c
8339new file mode 100644
8340index 0000000..2a8c88e
8341--- /dev/null
8342+++ b/fs/unionfs/sioq.c
8343@@ -0,0 +1,119 @@
8344+/*
8345+ * Copyright (c) 2006-2007 Erez Zadok
8346+ * Copyright (c) 2006 Charles P. Wright
8347+ * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
8348+ * Copyright (c) 2006 Junjiro Okajima
8349+ * Copyright (c) 2006 David P. Quigley
8350+ * Copyright (c) 2006-2007 Stony Brook University
8351+ * Copyright (c) 2006-2007 The Research Foundation of SUNY
8352+ *
8353+ * This program is free software; you can redistribute it and/or modify
8354+ * it under the terms of the GNU General Public License version 2 as
8355+ * published by the Free Software Foundation.
8356+ */
8357+
8358+#include "union.h"
8359+
8360+/*
8361+ * Super-user IO work Queue - sometimes we need to perform actions which
8362+ * would fail due to the unix permissions on the parent directory (e.g.,
8363+ * rmdir a directory which appears empty, but in reality contains
8364+ * whiteouts).
8365+ */
8366+
8367+static struct workqueue_struct *superio_workqueue;
8368+
8369+int __init init_sioq(void)
8370+{
8371+ int err;
8372+
8373+ superio_workqueue = create_workqueue("unionfs_siod");
8374+ if (!IS_ERR(superio_workqueue))
8375+ return 0;
8376+
8377+ err = PTR_ERR(superio_workqueue);
8378+ printk(KERN_ERR "unionfs: create_workqueue failed %d\n", err);
8379+ superio_workqueue = NULL;
8380+ return err;
8381+}
8382+
8383+void stop_sioq(void)
8384+{
8385+ if (superio_workqueue)
8386+ destroy_workqueue(superio_workqueue);
8387+}
8388+
8389+void run_sioq(work_func_t func, struct sioq_args *args)
8390+{
8391+ INIT_WORK(&args->work, func);
8392+
8393+ init_completion(&args->comp);
8394+ while (!queue_work(superio_workqueue, &args->work)) {
8395+ /* TODO: do accounting if needed */
8396+ schedule();
8397+ }
8398+ wait_for_completion(&args->comp);
8399+}
8400+
8401+void __unionfs_create(struct work_struct *work)
8402+{
8403+ struct sioq_args *args = container_of(work, struct sioq_args, work);
8404+ struct create_args *c = &args->create;
8405+
8406+ args->err = vfs_create(c->parent, c->dentry, c->mode, c->nd);
8407+ complete(&args->comp);
8408+}
8409+
8410+void __unionfs_mkdir(struct work_struct *work)
8411+{
8412+ struct sioq_args *args = container_of(work, struct sioq_args, work);
8413+ struct mkdir_args *m = &args->mkdir;
8414+
8415+ args->err = vfs_mkdir(m->parent, m->dentry, m->mode);
8416+ complete(&args->comp);
8417+}
8418+
8419+void __unionfs_mknod(struct work_struct *work)
8420+{
8421+ struct sioq_args *args = container_of(work, struct sioq_args, work);
8422+ struct mknod_args *m = &args->mknod;
8423+
8424+ args->err = vfs_mknod(m->parent, m->dentry, m->mode, m->dev);
8425+ complete(&args->comp);
8426+}
8427+
8428+void __unionfs_symlink(struct work_struct *work)
8429+{
8430+ struct sioq_args *args = container_of(work, struct sioq_args, work);
8431+ struct symlink_args *s = &args->symlink;
8432+
8433+ args->err = vfs_symlink(s->parent, s->dentry, s->symbuf, s->mode);
8434+ complete(&args->comp);
8435+}
8436+
8437+void __unionfs_unlink(struct work_struct *work)
8438+{
8439+ struct sioq_args *args = container_of(work, struct sioq_args, work);
8440+ struct unlink_args *u = &args->unlink;
8441+
8442+ args->err = vfs_unlink(u->parent, u->dentry);
8443+ complete(&args->comp);
8444+}
8445+
8446+void __delete_whiteouts(struct work_struct *work)
8447+{
8448+ struct sioq_args *args = container_of(work, struct sioq_args, work);
8449+ struct deletewh_args *d = &args->deletewh;
8450+
8451+ args->err = do_delete_whiteouts(d->dentry, d->bindex, d->namelist);
8452+ complete(&args->comp);
8453+}
8454+
8455+void __is_opaque_dir(struct work_struct *work)
8456+{
8457+ struct sioq_args *args = container_of(work, struct sioq_args, work);
8458+
8459+ args->ret = lookup_one_len(UNIONFS_DIR_OPAQUE, args->is_opaque.dentry,
8460+ sizeof(UNIONFS_DIR_OPAQUE) - 1);
8461+ complete(&args->comp);
8462+}
8463diff --git a/fs/unionfs/sioq.h b/fs/unionfs/sioq.h
8464new file mode 100644
8465index 0000000..afb71ee
8466--- /dev/null
8467+++ b/fs/unionfs/sioq.h
8468@@ -0,0 +1,92 @@
8469+/*
8470+ * Copyright (c) 2006-2007 Erez Zadok
8471+ * Copyright (c) 2006 Charles P. Wright
8472+ * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
8473+ * Copyright (c) 2006 Junjiro Okajima
8474+ * Copyright (c) 2006 David P. Quigley
8475+ * Copyright (c) 2006-2007 Stony Brook University
8476+ * Copyright (c) 2006-2007 The Research Foundation of SUNY
8477+ *
8478+ * This program is free software; you can redistribute it and/or modify
8479+ * it under the terms of the GNU General Public License version 2 as
8480+ * published by the Free Software Foundation.
8481+ */
8482+
8483+#ifndef _SIOQ_H
8484+#define _SIOQ_H
8485+
8486+struct deletewh_args {
8487+ struct unionfs_dir_state *namelist;
8488+ struct dentry *dentry;
8489+ int bindex;
8490+};
8491+
8492+struct is_opaque_args {
8493+ struct dentry *dentry;
8494+};
8495+
8496+struct create_args {
8497+ struct inode *parent;
8498+ struct dentry *dentry;
8499+ umode_t mode;
8500+ struct nameidata *nd;
8501+};
8502+
8503+struct mkdir_args {
8504+ struct inode *parent;
8505+ struct dentry *dentry;
8506+ umode_t mode;
8507+};
8508+
8509+struct mknod_args {
8510+ struct inode *parent;
8511+ struct dentry *dentry;
8512+ umode_t mode;
8513+ dev_t dev;
8514+};
8515+
8516+struct symlink_args {
8517+ struct inode *parent;
8518+ struct dentry *dentry;
8519+ char *symbuf;
8520+ umode_t mode;
8521+};
8522+
8523+struct unlink_args {
8524+ struct inode *parent;
8525+ struct dentry *dentry;
8526+};
8527+
8528+
8529+struct sioq_args {
8530+ struct completion comp;
8531+ struct work_struct work;
8532+ int err;
8533+ void *ret;
8534+
8535+ union {
8536+ struct deletewh_args deletewh;
8537+ struct is_opaque_args is_opaque;
8538+ struct create_args create;
8539+ struct mkdir_args mkdir;
8540+ struct mknod_args mknod;
8541+ struct symlink_args symlink;
8542+ struct unlink_args unlink;
8543+ };
8544+};
8545+
8546+/* Extern definitions for SIOQ functions */
8547+extern int __init init_sioq(void);
8548+extern void stop_sioq(void);
8549+extern void run_sioq(work_func_t func, struct sioq_args *args);
8550+
8551+/* Extern definitions for our privilege escalation helpers */
8552+extern void __unionfs_create(struct work_struct *work);
8553+extern void __unionfs_mkdir(struct work_struct *work);
8554+extern void __unionfs_mknod(struct work_struct *work);
8555+extern void __unionfs_symlink(struct work_struct *work);
8556+extern void __unionfs_unlink(struct work_struct *work);
8557+extern void __delete_whiteouts(struct work_struct *work);
8558+extern void __is_opaque_dir(struct work_struct *work);
8559+
8560+#endif /* not _SIOQ_H */
8561diff --git a/fs/unionfs/subr.c b/fs/unionfs/subr.c
8562new file mode 100644
8563index 0000000..c9a89ab
8564--- /dev/null
8565+++ b/fs/unionfs/subr.c
8566@@ -0,0 +1,213 @@
8567+/*
8568+ * Copyright (c) 2003-2007 Erez Zadok
8569+ * Copyright (c) 2003-2006 Charles P. Wright
8570+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
8571+ * Copyright (c) 2005-2006 Junjiro Okajima
8572+ * Copyright (c) 2005 Arun M. Krishnakumar
8573+ * Copyright (c) 2004-2006 David P. Quigley
8574+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
8575+ * Copyright (c) 2003 Puja Gupta
8576+ * Copyright (c) 2003 Harikesavan Krishnan
8577+ * Copyright (c) 2003-2007 Stony Brook University
8578+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
8579+ *
8580+ * This program is free software; you can redistribute it and/or modify
8581+ * it under the terms of the GNU General Public License version 2 as
8582+ * published by the Free Software Foundation.
8583+ */
8584+
8585+#include "union.h"
8586+
8587+/*
8588+ * Pass an unionfs dentry and an index. It will try to create a whiteout
8589+ * for the filename in dentry, and will try in branch 'index'. On error,
8590+ * it will proceed to a branch to the left.
8591+ */
8592+int create_whiteout(struct dentry *dentry, int start)
8593+{
8594+ int bstart, bend, bindex;
8595+ struct dentry *lower_dir_dentry;
8596+ struct dentry *lower_dentry;
8597+ struct dentry *lower_wh_dentry;
8598+ char *name = NULL;
8599+ int err = -EINVAL;
8600+
8601+ verify_locked(dentry);
8602+
8603+ bstart = dbstart(dentry);
8604+ bend = dbend(dentry);
8605+
8606+ /* create dentry's whiteout equivalent */
8607+ name = alloc_whname(dentry->d_name.name, dentry->d_name.len);
8608+ if (IS_ERR(name)) {
8609+ err = PTR_ERR(name);
8610+ goto out;
8611+ }
8612+
8613+ for (bindex = start; bindex >= 0; bindex--) {
8614+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
8615+
8616+ if (!lower_dentry) {
8617+ /*
8618+ * if lower dentry is not present, create the
8619+ * entire lower dentry directory structure and go
8620+ * ahead. Since we want to just create whiteout, we
8621+ * only want the parent dentry, and hence get rid of
8622+ * this dentry.
8623+ */
8624+ lower_dentry = create_parents(dentry->d_inode,
8625+ dentry,
8626+ dentry->d_name.name,
8627+ bindex);
8628+ if (!lower_dentry || IS_ERR(lower_dentry)) {
8629+ printk(KERN_DEBUG "unionfs: create_parents "
8630+ "failed for bindex = %d\n", bindex);
8631+ continue;
8632+ }
8633+ }
8634+
8635+ lower_wh_dentry =
8636+ lookup_one_len(name, lower_dentry->d_parent,
8637+ dentry->d_name.len + UNIONFS_WHLEN);
8638+ if (IS_ERR(lower_wh_dentry))
8639+ continue;
8640+
8641+ /*
8642+ * The whiteout already exists. This used to be impossible,
8643+ * but now is possible because of opaqueness.
8644+ */
8645+ if (lower_wh_dentry->d_inode) {
8646+ dput(lower_wh_dentry);
8647+ err = 0;
8648+ goto out;
8649+ }
8650+
8651+ lower_dir_dentry = lock_parent(lower_wh_dentry);
8652+ if (!(err = is_robranch_super(dentry->d_sb, bindex)))
8653+ err = vfs_create(lower_dir_dentry->d_inode,
8654+ lower_wh_dentry,
8655+ ~current->fs->umask & S_IRWXUGO,
8656+ NULL);
8657+ unlock_dir(lower_dir_dentry);
8658+ dput(lower_wh_dentry);
8659+
8660+ if (!err || !IS_COPYUP_ERR(err))
8661+ break;
8662+ }
8663+
8664+ /* set dbopaque so that lookup will not proceed after this branch */
8665+ if (!err)
8666+ set_dbopaque(dentry, bindex);
8667+
8668+out:
8669+ kfree(name);
8670+ return err;
8671+}
8672+
8673+/*
8674+ * This is a helper function for rename, which ends up with hosed over
8675+ * dentries when it needs to revert.
8676+ */
8677+int unionfs_refresh_lower_dentry(struct dentry *dentry, int bindex)
8678+{
8679+ struct dentry *lower_dentry;
8680+ struct dentry *lower_parent;
8681+ int err = 0;
8682+
8683+ verify_locked(dentry);
8684+
8685+ unionfs_lock_dentry(dentry->d_parent);
8686+ lower_parent = unionfs_lower_dentry_idx(dentry->d_parent, bindex);
8687+ unionfs_unlock_dentry(dentry->d_parent);
8688+
8689+ BUG_ON(!S_ISDIR(lower_parent->d_inode->i_mode));
8690+
8691+ lower_dentry = lookup_one_len(dentry->d_name.name, lower_parent,
8692+ dentry->d_name.len);
8693+ if (IS_ERR(lower_dentry)) {
8694+ err = PTR_ERR(lower_dentry);
8695+ goto out;
8696+ }
8697+
8698+ dput(unionfs_lower_dentry_idx(dentry, bindex));
8699+ iput(unionfs_lower_inode_idx(dentry->d_inode, bindex));
8700+ unionfs_set_lower_inode_idx(dentry->d_inode, bindex, NULL);
8701+
8702+ if (!lower_dentry->d_inode) {
8703+ dput(lower_dentry);
8704+ unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
8705+ } else {
8706+ unionfs_set_lower_dentry_idx(dentry, bindex, lower_dentry);
8707+ unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
8708+ igrab(lower_dentry->d_inode));
8709+ }
8710+
8711+out:
8712+ return err;
8713+}
8714+
8715+int make_dir_opaque(struct dentry *dentry, int bindex)
8716+{
8717+ int err = 0;
8718+ struct dentry *lower_dentry, *diropq;
8719+ struct inode *lower_dir;
8720+
8721+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
8722+ lower_dir = lower_dentry->d_inode;
8723+ BUG_ON(!S_ISDIR(dentry->d_inode->i_mode) ||
8724+ !S_ISDIR(lower_dir->i_mode));
8725+
8726+ mutex_lock(&lower_dir->i_mutex);
8727+ diropq = lookup_one_len(UNIONFS_DIR_OPAQUE, lower_dentry,
8728+ sizeof(UNIONFS_DIR_OPAQUE) - 1);
8729+ if (IS_ERR(diropq)) {
8730+ err = PTR_ERR(diropq);
8731+ goto out;
8732+ }
8733+
8734+ if (!diropq->d_inode)
8735+ err = vfs_create(lower_dir, diropq, S_IRUGO, NULL);
8736+ if (!err)
8737+ set_dbopaque(dentry, bindex);
8738+
8739+ dput(diropq);
8740+
8741+out:
8742+ mutex_unlock(&lower_dir->i_mutex);
8743+ return err;
8744+}
8745+
8746+/*
8747+ * returns the right n_link value based on the inode type
8748+ */
8749+int unionfs_get_nlinks(const struct inode *inode)
8750+{
8751+ /* don't bother to do all the work since we're unlinked */
8752+ if (inode->i_nlink == 0)
8753+ return 0;
8754+
8755+ if (!S_ISDIR(inode->i_mode))
8756+ return unionfs_lower_inode(inode)->i_nlink;
8757+
8758+ /*
8759+ * For directories, we return 1. The only place that could cares
8760+ * about links is readdir, and there's d_type there so even that
8761+ * doesn't matter.
8762+ */
8763+ return 1;
8764+}
8765+
8766+/* construct whiteout filename */
8767+char *alloc_whname(const char *name, int len)
8768+{
8769+ char *buf;
8770+
8771+ buf = kmalloc(len + UNIONFS_WHLEN + 1, GFP_KERNEL);
8772+ if (!buf)
8773+ return ERR_PTR(-ENOMEM);
8774+
8775+ strcpy(buf, UNIONFS_WHPFX);
8776+ strlcat(buf, name, len + UNIONFS_WHLEN + 1);
8777+
8778+ return buf;
8779+}
8780diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
8781new file mode 100644
8782index 0000000..80b3a73
8783--- /dev/null
8784+++ b/fs/unionfs/super.c
8785@@ -0,0 +1,1010 @@
8786+/*
8787+ * Copyright (c) 2003-2007 Erez Zadok
8788+ * Copyright (c) 2003-2006 Charles P. Wright
8789+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
8790+ * Copyright (c) 2005-2006 Junjiro Okajima
8791+ * Copyright (c) 2005 Arun M. Krishnakumar
8792+ * Copyright (c) 2004-2006 David P. Quigley
8793+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
8794+ * Copyright (c) 2003 Puja Gupta
8795+ * Copyright (c) 2003 Harikesavan Krishnan
8796+ * Copyright (c) 2003-2007 Stony Brook University
8797+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
8798+ *
8799+ * This program is free software; you can redistribute it and/or modify
8800+ * it under the terms of the GNU General Public License version 2 as
8801+ * published by the Free Software Foundation.
8802+ */
8803+
8804+#include "union.h"
8805+
8806+/*
8807+ * The inode cache is used with alloc_inode for both our inode info and the
8808+ * vfs inode.
8809+ */
8810+static struct kmem_cache *unionfs_inode_cachep;
8811+
8812+static void unionfs_read_inode(struct inode *inode)
8813+{
8814+ extern struct address_space_operations unionfs_aops;
8815+ int size;
8816+ struct unionfs_inode_info *info = UNIONFS_I(inode);
8817+
8818+ unionfs_read_lock(inode->i_sb);
8819+
8820+ memset(info, 0, offsetof(struct unionfs_inode_info, vfs_inode));
8821+ info->bstart = -1;
8822+ info->bend = -1;
8823+ atomic_set(&info->generation,
8824+ atomic_read(&UNIONFS_SB(inode->i_sb)->generation));
8825+ spin_lock_init(&info->rdlock);
8826+ info->rdcount = 1;
8827+ info->hashsize = -1;
8828+ INIT_LIST_HEAD(&info->readdircache);
8829+
8830+ size = sbmax(inode->i_sb) * sizeof(struct inode *);
8831+ info->lower_inodes = kzalloc(size, GFP_KERNEL);
8832+ if (!info->lower_inodes) {
8833+ printk(KERN_ERR "unionfs: no kernel memory when allocating "
8834+ "lower-pointer array!\n");
8835+ BUG();
8836+ }
8837+
8838+ inode->i_version++;
8839+ inode->i_op = &unionfs_main_iops;
8840+ inode->i_fop = &unionfs_main_fops;
8841+
8842+ inode->i_mapping->a_ops = &unionfs_aops;
8843+
8844+ unionfs_read_unlock(inode->i_sb);
8845+}
8846+
8847+/*
8848+ * we now define delete_inode, because there are two VFS paths that may
8849+ * destroy an inode: one of them calls clear inode before doing everything
8850+ * else that's needed, and the other is fine. This way we truncate the inode
8851+ * size (and its pages) and then clear our own inode, which will do an iput
8852+ * on our and the lower inode.
8853+ *
8854+ * No need to lock sb info's rwsem.
8855+ */
8856+static void unionfs_delete_inode(struct inode *inode)
8857+{
8858+ inode->i_size = 0; /* every f/s seems to do that */
8859+
8860+ if (inode->i_data.nrpages)
8861+ truncate_inode_pages(&inode->i_data, 0);
8862+
8863+ clear_inode(inode);
8864+}
8865+
8866+/*
8867+ * final actions when unmounting a file system
8868+ *
8869+ * No need to lock rwsem.
8870+ */
8871+static void unionfs_put_super(struct super_block *sb)
8872+{
8873+ int bindex, bstart, bend;
8874+ struct unionfs_sb_info *spd;
8875+ int leaks = 0;
8876+
8877+ spd = UNIONFS_SB(sb);
8878+ if (!spd)
8879+ return;
8880+
8881+ bstart = sbstart(sb);
8882+ bend = sbend(sb);
8883+
8884+ /* Make sure we have no leaks of branchget/branchput. */
8885+ for (bindex = bstart; bindex <= bend; bindex++)
8886+ if (branch_count(sb, bindex) != 0) {
8887+ printk("unionfs: branch %d has %d references left!\n",
8888+ bindex, branch_count(sb, bindex));
8889+ leaks = 1;
8890+ }
8891+ BUG_ON(leaks != 0);
8892+
8893+ kfree(spd->data);
8894+ kfree(spd);
8895+ sb->s_fs_info = NULL;
8896+}
8897+
8898+/*
8899+ * Since people use this to answer the "How big of a file can I write?"
8900+ * question, we report the size of the highest priority branch as the size of
8901+ * the union.
8902+ */
8903+static int unionfs_statfs(struct dentry *dentry, struct kstatfs *buf)
8904+{
8905+ int err = 0;
8906+ struct super_block *sb;
8907+ struct dentry *lower_dentry;
8908+
8909+ sb = dentry->d_sb;
8910+
8911+ unionfs_read_lock(sb);
8912+ unionfs_lock_dentry(dentry);
8913+
8914+ if (!__unionfs_d_revalidate_chain(dentry, NULL, false)) {
8915+ err = -ESTALE;
8916+ goto out;
8917+ }
8918+ unionfs_check_dentry(dentry);
8919+
8920+ lower_dentry = unionfs_lower_dentry(sb->s_root);
8921+ err = vfs_statfs(lower_dentry, buf);
8922+
8923+ /* set return buf to our f/s to avoid confusing user-level utils */
8924+ buf->f_type = UNIONFS_SUPER_MAGIC;
8925+ /*
8926+ * Our maximum file name can is shorter by a few bytes because every
8927+ * file name could potentially be whited-out.
8928+ *
8929+ * XXX: this restriction goes away with ODF.
8930+ */
8931+ buf->f_namelen -= UNIONFS_WHLEN;
8932+
8933+ /*
8934+ * reset two fields to avoid confusing user-land.
8935+ * XXX: is this still necessary?
8936+ */
8937+ memset(&buf->f_fsid, 0, sizeof(__kernel_fsid_t));
8938+ memset(&buf->f_spare, 0, sizeof(buf->f_spare));
8939+
8940+out:
8941+ unionfs_unlock_dentry(dentry);
8942+ unionfs_check_dentry(dentry);
8943+ unionfs_read_unlock(sb);
8944+ return err;
8945+}
8946+
8947+/* handle mode changing during remount */
8948+static noinline int do_remount_mode_option(char *optarg, int cur_branches,
8949+ struct unionfs_data *new_data,
8950+ struct path *new_lower_paths)
8951+{
8952+ int err = -EINVAL;
8953+ int perms, idx;
8954+ char *modename = strchr(optarg, '=');
8955+ struct nameidata nd;
8956+
8957+ /* by now, optarg contains the branch name */
8958+ if (!*optarg) {
8959+ printk("unionfs: no branch specified for mode change.\n");
8960+ goto out;
8961+ }
8962+ if (!modename) {
8963+ printk("unionfs: branch \"%s\" requires a mode.\n", optarg);
8964+ goto out;
8965+ }
8966+ *modename++ = '\0';
8967+ perms = __parse_branch_mode(modename);
8968+ if (perms == 0) {
8969+ printk("unionfs: invalid mode \"%s\" for \"%s\".\n",
8970+ modename, optarg);
8971+ goto out;
8972+ }
8973+
8974+ /*
8975+ * Find matching branch index. For now, this assumes that nothing
8976+ * has been mounted on top of this Unionfs stack. Once we have /odf
8977+ * and cache-coherency resolved, we'll address the branch-path
8978+ * uniqueness.
8979+ */
8980+ err = path_lookup(optarg, LOOKUP_FOLLOW, &nd);
8981+ if (err) {
8982+ printk(KERN_WARNING "unionfs: error accessing "
8983+ "lower directory \"%s\" (error %d)\n",
8984+ optarg, err);
8985+ goto out;
8986+ }
8987+ for (idx=0; idx<cur_branches; idx++)
8988+ if (nd.mnt == new_lower_paths[idx].mnt &&
8989+ nd.dentry == new_lower_paths[idx].dentry)
8990+ break;
8991+ path_release(&nd); /* no longer needed */
8992+ if (idx == cur_branches) {
8993+ err = -ENOENT; /* err may have been reset above */
8994+ printk(KERN_WARNING "unionfs: branch \"%s\" "
8995+ "not found\n", optarg);
8996+ goto out;
8997+ }
8998+ /* check/change mode for existing branch */
8999+ /* we don't warn if perms==branchperms */
9000+ new_data[idx].branchperms = perms;
9001+ err = 0;
9002+out:
9003+ return err;
9004+}
9005+
9006+/* handle branch deletion during remount */
9007+static noinline int do_remount_del_option(char *optarg, int cur_branches,
9008+ struct unionfs_data *new_data,
9009+ struct path *new_lower_paths)
9010+{
9011+ int err = -EINVAL;
9012+ int idx;
9013+ struct nameidata nd;
9014+
9015+ /* optarg contains the branch name to delete */
9016+
9017+ /*
9018+ * Find matching branch index. For now, this assumes that nothing
9019+ * has been mounted on top of this Unionfs stack. Once we have /odf
9020+ * and cache-coherency resolved, we'll address the branch-path
9021+ * uniqueness.
9022+ */
9023+ err = path_lookup(optarg, LOOKUP_FOLLOW, &nd);
9024+ if (err) {
9025+ printk(KERN_WARNING "unionfs: error accessing "
9026+ "lower directory \"%s\" (error %d)\n",
9027+ optarg, err);
9028+ goto out;
9029+ }
9030+ for (idx=0; idx < cur_branches; idx++)
9031+ if (nd.mnt == new_lower_paths[idx].mnt &&
9032+ nd.dentry == new_lower_paths[idx].dentry)
9033+ break;
9034+ path_release(&nd); /* no longer needed */
9035+ if (idx == cur_branches) {
9036+ printk(KERN_WARNING "unionfs: branch \"%s\" "
9037+ "not found\n", optarg);
9038+ err = -ENOENT;
9039+ goto out;
9040+ }
9041+ /* check if there are any open files on the branch to be deleted */
9042+ if (atomic_read(&new_data[idx].open_files) > 0) {
9043+ err = -EBUSY;
9044+ goto out;
9045+ }
9046+
9047+ /*
9048+ * Now we have to delete the branch. First, release any handles it
9049+ * has. Then, move the remaining array indexes past "idx" in
9050+ * new_data and new_lower_paths one to the left. Finally, adjust
9051+ * cur_branches.
9052+ */
9053+ pathput(&new_lower_paths[idx]);
9054+
9055+ if (idx < cur_branches - 1) {
9056+ /* if idx==cur_branches-1, we delete last branch: easy */
9057+ memmove(&new_data[idx], &new_data[idx+1],
9058+ (cur_branches - 1 - idx) *
9059+ sizeof(struct unionfs_data));
9060+ memmove(&new_lower_paths[idx], &new_lower_paths[idx+1],
9061+ (cur_branches - 1 - idx) * sizeof(struct path));
9062+ }
9063+
9064+ err = 0;
9065+out:
9066+ return err;
9067+}
9068+
9069+/* handle branch insertion during remount */
9070+static noinline int do_remount_add_option(char *optarg, int cur_branches,
9071+ struct unionfs_data *new_data,
9072+ struct path *new_lower_paths,
9073+ int *high_branch_id)
9074+{
9075+ int err = -EINVAL;
9076+ int perms;
9077+ int idx = 0; /* default: insert at beginning */
9078+ char *new_branch , *modename = NULL;
9079+ struct nameidata nd;
9080+
9081+ /*
9082+ * optarg can be of several forms:
9083+ *
9084+ * /bar:/foo insert /foo before /bar
9085+ * /bar:/foo=ro insert /foo in ro mode before /bar
9086+ * /foo insert /foo in the beginning (prepend)
9087+ * :/foo insert /foo at the end (append)
9088+ */
9089+ if (*optarg == ':') { /* append? */
9090+ new_branch = optarg + 1; /* skip ':' */
9091+ idx = cur_branches;
9092+ goto found_insertion_point;
9093+ }
9094+ new_branch = strchr(optarg, ':');
9095+ if (!new_branch) { /* prepend? */
9096+ new_branch = optarg;
9097+ goto found_insertion_point;
9098+ }
9099+ *new_branch++ = '\0'; /* holds path+mode of new branch */
9100+
9101+ /*
9102+ * Find matching branch index. For now, this assumes that nothing
9103+ * has been mounted on top of this Unionfs stack. Once we have /odf
9104+ * and cache-coherency resolved, we'll address the branch-path
9105+ * uniqueness.
9106+ */
9107+ err = path_lookup(optarg, LOOKUP_FOLLOW, &nd);
9108+ if (err) {
9109+ printk(KERN_WARNING "unionfs: error accessing "
9110+ "lower directory \"%s\" (error %d)\n",
9111+ optarg, err);
9112+ goto out;
9113+ }
9114+ for (idx=0; idx < cur_branches; idx++)
9115+ if (nd.mnt == new_lower_paths[idx].mnt &&
9116+ nd.dentry == new_lower_paths[idx].dentry)
9117+ break;
9118+ path_release(&nd); /* no longer needed */
9119+ if (idx == cur_branches) {
9120+ printk(KERN_WARNING "unionfs: branch \"%s\" "
9121+ "not found\n", optarg);
9122+ err = -ENOENT;
9123+ goto out;
9124+ }
9125+
9126+ /*
9127+ * At this point idx will hold the index where the new branch should
9128+ * be inserted before.
9129+ */
9130+found_insertion_point:
9131+ /* find the mode for the new branch */
9132+ if (new_branch)
9133+ modename = strchr(new_branch, '=');
9134+ if (modename)
9135+ *modename++ = '\0';
9136+ perms = parse_branch_mode(modename);
9137+
9138+ if (!new_branch || !*new_branch) {
9139+ printk(KERN_WARNING "unionfs: null new branch\n");
9140+ err = -EINVAL;
9141+ goto out;
9142+ }
9143+ err = path_lookup(new_branch, LOOKUP_FOLLOW, &nd);
9144+ if (err) {
9145+ printk(KERN_WARNING "unionfs: error accessing "
9146+ "lower directory \"%s\" (error %d)\n",
9147+ new_branch, err);
9148+ goto out;
9149+ }
9150+ /*
9151+ * It's probably safe to check_mode the new branch to insert. Note:
9152+ * we don't allow inserting branches which are unionfs's by
9153+ * themselves (check_branch returns EINVAL in that case). This is
9154+ * because this code base doesn't support stacking unionfs: the ODF
9155+ * code base supports that correctly.
9156+ */
9157+ if ((err = check_branch(&nd))) {
9158+ printk(KERN_WARNING "unionfs: lower directory "
9159+ "\"%s\" is not a valid branch\n", optarg);
9160+ path_release(&nd);
9161+ goto out;
9162+ }
9163+
9164+ /*
9165+ * Now we have to insert the new branch. But first, move the bits
9166+ * to make space for the new branch, if needed. Finally, adjust
9167+ * cur_branches.
9168+ * We don't release nd here; it's kept until umount/remount.
9169+ */
9170+ if (idx < cur_branches) {
9171+ /* if idx==cur_branches, we append: easy */
9172+ memmove(&new_data[idx+1], &new_data[idx],
9173+ (cur_branches - idx) * sizeof(struct unionfs_data));
9174+ memmove(&new_lower_paths[idx+1], &new_lower_paths[idx],
9175+ (cur_branches - idx) * sizeof(struct path));
9176+ }
9177+ new_lower_paths[idx].dentry = nd.dentry;
9178+ new_lower_paths[idx].mnt = nd.mnt;
9179+
9180+ new_data[idx].sb = nd.dentry->d_sb;
9181+ atomic_set(&new_data[idx].open_files, 0);
9182+ new_data[idx].branchperms = perms;
9183+ new_data[idx].branch_id = ++*high_branch_id; /* assign new branch ID */
9184+
9185+ err = 0;
9186+out:
9187+ return err;
9188+}
9189+
9190+
9191+/*
9192+ * Support branch management options on remount.
9193+ *
9194+ * See Documentation/filesystems/unionfs/ for details.
9195+ *
9196+ * @flags: numeric mount options
9197+ * @options: mount options string
9198+ *
9199+ * This function can rearrange a mounted union dynamically, adding and
9200+ * removing branches, including changing branch modes. Clearly this has to
9201+ * be done safely and atomically. Luckily, the VFS already calls this
9202+ * function with lock_super(sb) and lock_kernel() held, preventing
9203+ * concurrent mixing of new mounts, remounts, and unmounts. Moreover,
9204+ * do_remount_sb(), our caller function, already called shrink_dcache_sb(sb)
9205+ * to purge dentries/inodes from our superblock, and also called
9206+ * fsync_super(sb) to purge any dirty pages. So we're good.
9207+ *
9208+ * XXX: however, our remount code may also need to invalidate mapped pages
9209+ * so as to force them to be re-gotten from the (newly reconfigured) lower
9210+ * branches. This has to wait for proper mmap and cache coherency support
9211+ * in the VFS.
9212+ *
9213+ */
9214+static int unionfs_remount_fs(struct super_block *sb, int *flags,
9215+ char *options)
9216+{
9217+ int err = 0;
9218+ int i;
9219+ char *optionstmp, *tmp_to_free; /* kstrdup'ed of "options" */
9220+ char *optname;
9221+ int cur_branches = 0; /* no. of current branches */
9222+ int new_branches = 0; /* no. of branches actually left in the end */
9223+ int add_branches; /* est. no. of branches to add */
9224+ int del_branches; /* est. no. of branches to del */
9225+ int max_branches; /* max possible no. of branches */
9226+ struct unionfs_data *new_data = NULL, *tmp_data = NULL;
9227+ struct path *new_lower_paths = NULL, *tmp_lower_paths = NULL;
9228+ struct inode **new_lower_inodes = NULL;
9229+ int new_high_branch_id; /* new high branch ID */
9230+ int size; /* memory allocation size, temp var */
9231+ int old_ibstart, old_ibend;
9232+
9233+ unionfs_write_lock(sb);
9234+
9235+ /*
9236+ * The VFS will take care of "ro" and "rw" flags, and we can safely
9237+ * ignore MS_SILENT, but anything else left over is an error. So we
9238+ * need to check if any other flags may have been passed (none are
9239+ * allowed/supported as of now).
9240+ */
9241+ if ((*flags & ~(MS_RDONLY | MS_SILENT)) != 0) {
9242+ printk(KERN_WARNING
9243+ "unionfs: remount flags 0x%x unsupported\n", *flags);
9244+ err = -EINVAL;
9245+ goto out_error;
9246+ }
9247+
9248+ /*
9249+ * If 'options' is NULL, it's probably because the user just changed
9250+ * the union to a "ro" or "rw" and the VFS took care of it. So
9251+ * nothing to do and we're done.
9252+ */
9253+ if (!options || options[0] == '\0')
9254+ goto out_error;
9255+
9256+ /*
9257+ * Find out how many branches we will have in the end, counting
9258+ * "add" and "del" commands. Copy the "options" string because
9259+ * strsep modifies the string and we need it later.
9260+ */
9261+ optionstmp = tmp_to_free = kstrdup(options, GFP_KERNEL);
9262+ if (!optionstmp) {
9263+ err = -ENOMEM;
9264+ goto out_free;
9265+ }
9266+ new_branches = cur_branches = sbmax(sb); /* current no. branches */
9267+ add_branches = del_branches = 0;
9268+ new_high_branch_id = sbhbid(sb); /* save current high_branch_id */
9269+ while ((optname = strsep(&optionstmp, ",")) != NULL) {
9270+ char *optarg;
9271+
9272+ if (!optname || !*optname)
9273+ continue;
9274+
9275+ optarg = strchr(optname, '=');
9276+ if (optarg)
9277+ *optarg++ = '\0';
9278+
9279+ if (!strcmp("add", optname))
9280+ add_branches++;
9281+ else if (!strcmp("del", optname))
9282+ del_branches++;
9283+ }
9284+ kfree(tmp_to_free);
9285+ /* after all changes, will we have at least one branch left? */
9286+ if ((new_branches + add_branches - del_branches) < 1) {
9287+ printk(KERN_WARNING
9288+ "unionfs: no branches left after remount\n");
9289+ err = -EINVAL;
9290+ goto out_free;
9291+ }
9292+
9293+ /*
9294+ * Since we haven't actually parsed all the add/del options, nor
9295+ * have we checked them for errors, we don't know for sure how many
9296+ * branches we will have after all changes have taken place. In
9297+ * fact, the total number of branches left could be less than what
9298+ * we have now. So we need to allocate space for a temporary
9299+ * placeholder that is at least as large as the maximum number of
9300+ * branches we *could* have, which is the current number plus all
9301+ * the additions. Once we're done with these temp placeholders, we
9302+ * may have to re-allocate the final size, copy over from the temp,
9303+ * and then free the temps (done near the end of this function).
9304+ */
9305+ max_branches = cur_branches + add_branches;
9306+ /* allocate space for new pointers to lower dentry */
9307+ tmp_data = kcalloc(max_branches,
9308+ sizeof(struct unionfs_data), GFP_KERNEL);
9309+ if (!tmp_data) {
9310+ err = -ENOMEM;
9311+ goto out_free;
9312+ }
9313+ /* allocate space for new pointers to lower paths */
9314+ tmp_lower_paths = kcalloc(max_branches,
9315+ sizeof(struct path), GFP_KERNEL);
9316+ if (!tmp_lower_paths) {
9317+ err = -ENOMEM;
9318+ goto out_free;
9319+ }
9320+ /* copy current info into new placeholders, incrementing refcnts */
9321+ memcpy(tmp_data, UNIONFS_SB(sb)->data,
9322+ cur_branches * sizeof(struct unionfs_data));
9323+ memcpy(tmp_lower_paths, UNIONFS_D(sb->s_root)->lower_paths,
9324+ cur_branches * sizeof(struct path));
9325+ for (i=0; i<cur_branches; i++)
9326+ pathget(&tmp_lower_paths[i]); /* drop refs at end of fxn */
9327+
9328+ /*******************************************************************
9329+ * For each branch command, do path_lookup on the requested branch,
9330+ * and apply the change to a temp branch list. To handle errors, we
9331+ * already dup'ed the old arrays (above), and increased the refcnts
9332+ * on various f/s objects. So now we can do all the path_lookups
9333+ * and branch-management commands on the new arrays. If it fail mid
9334+ * way, we free the tmp arrays and *put all objects. If we succeed,
9335+ * then we free old arrays and *put its objects, and then replace
9336+ * the arrays with the new tmp list (we may have to re-allocate the
9337+ * memory because the temp lists could have been larger than what we
9338+ * actually needed).
9339+ *******************************************************************/
9340+
9341+ while ((optname = strsep(&options, ",")) != NULL) {
9342+ char *optarg;
9343+
9344+ if (!optname || !*optname)
9345+ continue;
9346+ /*
9347+ * At this stage optname holds a comma-delimited option, but
9348+ * without the commas. Next, we need to break the string on
9349+ * the '=' symbol to separate CMD=ARG, where ARG itself can
9350+ * be KEY=VAL. For example, in mode=/foo=rw, CMD is "mode",
9351+ * KEY is "/foo", and VAL is "rw".
9352+ */
9353+ optarg = strchr(optname, '=');
9354+ if (optarg)
9355+ *optarg++ = '\0';
9356+ /* incgen remount option (instead of old ioctl) */
9357+ if (!strcmp("incgen", optname)) {
9358+ err = 0;
9359+ goto out_no_change;
9360+ }
9361+
9362+ /*
9363+ * All of our options take an argument now. (Insert ones
9364+ * that don't above this check.) So at this stage optname
9365+ * contains the CMD part and optarg contains the ARG part.
9366+ */
9367+ if (!optarg || !*optarg) {
9368+ printk("unionfs: all remount options require "
9369+ "an argument (%s).\n", optname);
9370+ err = -EINVAL;
9371+ goto out_release;
9372+ }
9373+
9374+ if (!strcmp("add", optname)) {
9375+ err = do_remount_add_option(optarg, new_branches,
9376+ tmp_data,
9377+ tmp_lower_paths,
9378+ &new_high_branch_id);
9379+ if (err)
9380+ goto out_release;
9381+ new_branches++;
9382+ if (new_branches > UNIONFS_MAX_BRANCHES) {
9383+ printk("unionfs: command exceeds "
9384+ "%d branches\n", UNIONFS_MAX_BRANCHES);
9385+ err = -E2BIG;
9386+ goto out_release;
9387+ }
9388+ continue;
9389+ }
9390+ if (!strcmp("del", optname)) {
9391+ err = do_remount_del_option(optarg, new_branches,
9392+ tmp_data,
9393+ tmp_lower_paths);
9394+ if (err)
9395+ goto out_release;
9396+ new_branches--;
9397+ continue;
9398+ }
9399+ if (!strcmp("mode", optname)) {
9400+ err = do_remount_mode_option(optarg, new_branches,
9401+ tmp_data,
9402+ tmp_lower_paths);
9403+ if (err)
9404+ goto out_release;
9405+ continue;
9406+ }
9407+
9408+ /*
9409+ * When you use "mount -o remount,ro", mount(8) will
9410+ * reportedly pass the original dirs= string from
9411+ * /proc/mounts. So for now, we have to ignore dirs= and
9412+ * not consider it an error, unless we want to allow users
9413+ * to pass dirs= in remount. Note that to allow the VFS to
9414+ * actually process the ro/rw remount options, we have to
9415+ * return 0 from this function.
9416+ */
9417+ if (!strcmp("dirs", optname)) {
9418+ printk(KERN_WARNING
9419+ "unionfs: remount ignoring option \"%s\".\n",
9420+ optname);
9421+ continue;
9422+ }
9423+
9424+ err = -EINVAL;
9425+ printk(KERN_WARNING
9426+ "unionfs: unrecognized option \"%s\"\n", optname);
9427+ goto out_release;
9428+ }
9429+
9430+out_no_change:
9431+
9432+ /******************************************************************
9433+ * WE'RE ALMOST DONE: check if leftmost branch might be read-only,
9434+ * see if we need to allocate a small-sized new vector, copy the
9435+ * vectors to their correct place, release the refcnt of the older
9436+ * ones, and return. Also handle invalidating any pages that will
9437+ * have to be re-read.
9438+ *******************************************************************/
9439+
9440+ if (!(tmp_data[0].branchperms & MAY_WRITE)) {
9441+ printk("unionfs: leftmost branch cannot be read-only "
9442+ "(use \"remount,ro\" to create a read-only union)\n");
9443+ err = -EINVAL;
9444+ goto out_release;
9445+ }
9446+
9447+ /* (re)allocate space for new pointers to lower dentry */
9448+ size = new_branches * sizeof(struct unionfs_data);
9449+ new_data = krealloc(tmp_data, size, GFP_KERNEL);
9450+ if (!new_data) {
9451+ err = -ENOMEM;
9452+ goto out_release;
9453+ }
9454+
9455+ /* allocate space for new pointers to lower paths */
9456+ size = new_branches * sizeof(struct path);
9457+ new_lower_paths = krealloc(tmp_lower_paths, size, GFP_KERNEL);
9458+ if (!new_lower_paths) {
9459+ err = -ENOMEM;
9460+ goto out_release;
9461+ }
9462+
9463+ /* allocate space for new pointers to lower inodes */
9464+ new_lower_inodes = kcalloc(new_branches,
9465+ sizeof(struct inode *), GFP_KERNEL);
9466+ if (!new_lower_inodes) {
9467+ err = -ENOMEM;
9468+ goto out_release;
9469+ }
9470+
9471+ /*
9472+ * OK, just before we actually put the new set of branches in place,
9473+ * we need to ensure that our own f/s has no dirty objects left.
9474+ * Luckily, do_remount_sb() already calls shrink_dcache_sb(sb) and
9475+ * fsync_super(sb), taking care of dentries, inodes, and dirty
9476+ * pages. So all that's left is for us to invalidate any leftover
9477+ * (non-dirty) pages to ensure that they will be re-read from the
9478+ * new lower branches (and to support mmap).
9479+ */
9480+
9481+ /*
9482+ * Now we call drop_pagecache_sb() to invalidate all pages in this
9483+ * super. This function calls invalidate_inode_pages(mapping),
9484+ * which calls invalidate_mapping_pages(): the latter, however, will
9485+ * not invalidate pages which are dirty, locked, under writeback, or
9486+ * mapped into page tables. We shouldn't have to worry about dirty
9487+ * or under-writeback pages, because do_remount_sb() called
9488+ * fsync_super() which would not have returned until all dirty pages
9489+ * were flushed.
9490+ *
9491+ * But do we have to worry about locked pages? Is there any chance
9492+ * that in here we'll get locked pages?
9493+ *
9494+ * XXX: what about pages mapped into pagetables? Are these pages
9495+ * which user processes may have mmap(2)'ed? If so, then we need to
9496+ * invalidate those too, no? Maybe we'll have to write our own
9497+ * version of invalidate_mapping_pages() which also handled mapped
9498+ * pages.
9499+ *
9500+ * XXX: Alternatively, maybe we should call truncate_inode_pages(),
9501+ * which use two passes over the pages list, and will truncate all
9502+ * pages.
9503+ */
9504+ drop_pagecache_sb(sb);
9505+
9506+ /* copy new vectors into their correct place */
9507+ tmp_data = UNIONFS_SB(sb)->data;
9508+ UNIONFS_SB(sb)->data = new_data;
9509+ new_data = NULL; /* so don't free good pointers below */
9510+ tmp_lower_paths = UNIONFS_D(sb->s_root)->lower_paths;
9511+ UNIONFS_D(sb->s_root)->lower_paths = new_lower_paths;
9512+ new_lower_paths = NULL; /* so don't free good pointers below */
9513+
9514+ /* update our unionfs_sb_info and root dentry index of last branch */
9515+ i = sbmax(sb); /* save no. of branches to release at end */
9516+ sbend(sb) = new_branches - 1;
9517+ set_dbend(sb->s_root, new_branches - 1);
9518+ old_ibstart = ibstart(sb->s_root->d_inode);
9519+ old_ibend = ibend(sb->s_root->d_inode);
9520+ ibend(sb->s_root->d_inode) = new_branches - 1;
9521+ UNIONFS_D(sb->s_root)->bcount = new_branches;
9522+ new_branches = i; /* no. of branches to release below */
9523+
9524+ /*
9525+ * Update lower inodes: 3 steps
9526+ * 1. grab ref on all new lower inodes
9527+ */
9528+ for (i=dbstart(sb->s_root); i<=dbend(sb->s_root); i++) {
9529+ struct dentry *lower_dentry =
9530+ unionfs_lower_dentry_idx(sb->s_root, i);
9531+ igrab(lower_dentry->d_inode);
9532+ new_lower_inodes[i] = lower_dentry->d_inode;
9533+ }
9534+ /* 2. release reference on all older lower inodes */
9535+ for (i=old_ibstart; i<=old_ibend; i++) {
9536+ iput(unionfs_lower_inode_idx(sb->s_root->d_inode, i));
9537+ unionfs_set_lower_inode_idx(sb->s_root->d_inode, i, NULL);
9538+ }
9539+ kfree(UNIONFS_I(sb->s_root->d_inode)->lower_inodes);
9540+ /* 3. update root dentry's inode to new lower_inodes array */
9541+ UNIONFS_I(sb->s_root->d_inode)->lower_inodes = new_lower_inodes;
9542+ new_lower_inodes = NULL;
9543+
9544+ /* maxbytes may have changed */
9545+ sb->s_maxbytes = unionfs_lower_super_idx(sb, 0)->s_maxbytes;
9546+ /* update high branch ID */
9547+ sbhbid(sb) = new_high_branch_id;
9548+
9549+ /* update our sb->generation for revalidating objects */
9550+ i = atomic_inc_return(&UNIONFS_SB(sb)->generation);
9551+ atomic_set(&UNIONFS_D(sb->s_root)->generation, i);
9552+ atomic_set(&UNIONFS_I(sb->s_root->d_inode)->generation, i);
9553+ if (!(*flags & MS_SILENT))
9554+ printk("unionfs: new generation number %d\n", i);
9555+ /* finally, update the root dentry's times */
9556+ unionfs_copy_attr_times(sb->s_root->d_inode);
9557+ err = 0; /* reset to success */
9558+
9559+ /*
9560+ * The code above falls through to the next label, and releases the
9561+ * refcnts of the older ones (stored in tmp_*): if we fell through
9562+ * here, it means success. However, if we jump directly to this
9563+ * label from any error above, then an error occurred after we
9564+ * grabbed various refcnts, and so we have to release the
9565+ * temporarily constructed structures.
9566+ */
9567+out_release:
9568+ /* no need to cleanup/release anything in tmp_data */
9569+ if (tmp_lower_paths)
9570+ for (i=0; i<new_branches; i++)
9571+ pathput(&tmp_lower_paths[i]);
9572+out_free:
9573+ kfree(tmp_lower_paths);
9574+ kfree(tmp_data);
9575+ kfree(new_lower_paths);
9576+ kfree(new_data);
9577+ kfree(new_lower_inodes);
9578+out_error:
9579+ unionfs_write_unlock(sb);
9580+ unionfs_check_dentry(sb->s_root);
9581+ return err;
9582+}
9583+
9584+/*
9585+ * Called by iput() when the inode reference count reached zero
9586+ * and the inode is not hashed anywhere. Used to clear anything
9587+ * that needs to be, before the inode is completely destroyed and put
9588+ * on the inode free list.
9589+ *
9590+ * No need to lock sb info's rwsem.
9591+ */
9592+static void unionfs_clear_inode(struct inode *inode)
9593+{
9594+ int bindex, bstart, bend;
9595+ struct inode *lower_inode;
9596+ struct list_head *pos, *n;
9597+ struct unionfs_dir_state *rdstate;
9598+
9599+ list_for_each_safe(pos, n, &UNIONFS_I(inode)->readdircache) {
9600+ rdstate = list_entry(pos, struct unionfs_dir_state, cache);
9601+ list_del(&rdstate->cache);
9602+ free_rdstate(rdstate);
9603+ }
9604+
9605+ /*
9606+ * Decrement a reference to a lower_inode, which was incremented
9607+ * by our read_inode when it was created initially.
9608+ */
9609+ bstart = ibstart(inode);
9610+ bend = ibend(inode);
9611+ if (bstart >= 0) {
9612+ for (bindex = bstart; bindex <= bend; bindex++) {
9613+ lower_inode = unionfs_lower_inode_idx(inode, bindex);
9614+ if (!lower_inode)
9615+ continue;
9616+ iput(lower_inode);
9617+ }
9618+ }
9619+
9620+ kfree(UNIONFS_I(inode)->lower_inodes);
9621+ UNIONFS_I(inode)->lower_inodes = NULL;
9622+}
9623+
9624+static struct inode *unionfs_alloc_inode(struct super_block *sb)
9625+{
9626+ struct unionfs_inode_info *i;
9627+
9628+ i = kmem_cache_alloc(unionfs_inode_cachep, GFP_KERNEL);
9629+ if (!i)
9630+ return NULL;
9631+
9632+ /* memset everything up to the inode to 0 */
9633+ memset(i, 0, offsetof(struct unionfs_inode_info, vfs_inode));
9634+
9635+ i->vfs_inode.i_version = 1;
9636+ return &i->vfs_inode;
9637+}
9638+
9639+static void unionfs_destroy_inode(struct inode *inode)
9640+{
9641+ kmem_cache_free(unionfs_inode_cachep, UNIONFS_I(inode));
9642+}
9643+
9644+/* unionfs inode cache constructor */
9645+static void init_once(void *v, struct kmem_cache *cachep, unsigned long flags)
9646+{
9647+ struct unionfs_inode_info *i = v;
9648+
9649+ inode_init_once(&i->vfs_inode);
9650+}
9651+
9652+int unionfs_init_inode_cache(void)
9653+{
9654+ int err = 0;
9655+
9656+ unionfs_inode_cachep =
9657+ kmem_cache_create("unionfs_inode_cache",
9658+ sizeof(struct unionfs_inode_info), 0,
9659+ SLAB_RECLAIM_ACCOUNT, init_once, NULL);
9660+ if (!unionfs_inode_cachep)
9661+ err = -ENOMEM;
9662+ return err;
9663+}
9664+
9665+/* unionfs inode cache destructor */
9666+void unionfs_destroy_inode_cache(void)
9667+{
9668+ if (unionfs_inode_cachep)
9669+ kmem_cache_destroy(unionfs_inode_cachep);
9670+}
9671+
9672+/*
9673+ * Called when we have a dirty inode, right here we only throw out
9674+ * parts of our readdir list that are too old.
9675+ *
9676+ * No need to grab sb info's rwsem.
9677+ */
9678+static int unionfs_write_inode(struct inode *inode, int sync)
9679+{
9680+ struct list_head *pos, *n;
9681+ struct unionfs_dir_state *rdstate;
9682+
9683+ spin_lock(&UNIONFS_I(inode)->rdlock);
9684+ list_for_each_safe(pos, n, &UNIONFS_I(inode)->readdircache) {
9685+ rdstate = list_entry(pos, struct unionfs_dir_state, cache);
9686+ /* We keep this list in LRU order. */
9687+ if ((rdstate->access + RDCACHE_JIFFIES) > jiffies)
9688+ break;
9689+ UNIONFS_I(inode)->rdcount--;
9690+ list_del(&rdstate->cache);
9691+ free_rdstate(rdstate);
9692+ }
9693+ spin_unlock(&UNIONFS_I(inode)->rdlock);
9694+
9695+ return 0;
9696+}
9697+
9698+/*
9699+ * Used only in nfs, to kill any pending RPC tasks, so that subsequent
9700+ * code can actually succeed and won't leave tasks that need handling.
9701+ */
9702+static void unionfs_umount_begin(struct vfsmount *mnt, int flags)
9703+{
9704+ struct super_block *sb, *lower_sb;
9705+ struct vfsmount *lower_mnt;
9706+ int bindex, bstart, bend;
9707+
9708+ if (!(flags & MNT_FORCE))
9709+ /*
9710+ * we are not being MNT_FORCE'd, therefore we should emulate
9711+ * old behavior
9712+ */
9713+ return;
9714+
9715+ sb = mnt->mnt_sb;
9716+
9717+ unionfs_read_lock(sb);
9718+
9719+ bstart = sbstart(sb);
9720+ bend = sbend(sb);
9721+ for (bindex = bstart; bindex <= bend; bindex++) {
9722+ lower_mnt = unionfs_lower_mnt_idx(sb->s_root, bindex);
9723+ lower_sb = unionfs_lower_super_idx(sb, bindex);
9724+
9725+ if (lower_mnt && lower_sb && lower_sb->s_op &&
9726+ lower_sb->s_op->umount_begin)
9727+ lower_sb->s_op->umount_begin(lower_mnt, flags);
9728+ }
9729+
9730+ unionfs_read_unlock(sb);
9731+}
9732+
9733+static int unionfs_show_options(struct seq_file *m, struct vfsmount *mnt)
9734+{
9735+ struct super_block *sb = mnt->mnt_sb;
9736+ int ret = 0;
9737+ char *tmp_page;
9738+ char *path;
9739+ int bindex, bstart, bend;
9740+ int perms;
9741+
9742+ unionfs_read_lock(sb);
9743+
9744+ unionfs_lock_dentry(sb->s_root);
9745+
9746+ tmp_page = (char*) __get_free_page(GFP_KERNEL);
9747+ if (!tmp_page) {
9748+ ret = -ENOMEM;
9749+ goto out;
9750+ }
9751+
9752+ bstart = sbstart(sb);
9753+ bend = sbend(sb);
9754+
9755+ seq_printf(m, ",dirs=");
9756+ for (bindex = bstart; bindex <= bend; bindex++) {
9757+ path = d_path(unionfs_lower_dentry_idx(sb->s_root, bindex),
9758+ unionfs_lower_mnt_idx(sb->s_root, bindex),
9759+ tmp_page, PAGE_SIZE);
9760+ if (IS_ERR(path)) {
9761+ ret = PTR_ERR(path);
9762+ goto out;
9763+ }
9764+
9765+ perms = branchperms(sb, bindex);
9766+
9767+ seq_printf(m, "%s=%s", path,
9768+ perms & MAY_WRITE ? "rw" : "ro");
9769+ if (bindex != bend)
9770+ seq_printf(m, ":");
9771+ }
9772+
9773+out:
9774+ free_page((unsigned long) tmp_page);
9775+
9776+ unionfs_unlock_dentry(sb->s_root);
9777+
9778+ unionfs_read_unlock(sb);
9779+
9780+ return ret;
9781+}
9782+
9783+struct super_operations unionfs_sops = {
9784+ .read_inode = unionfs_read_inode,
9785+ .delete_inode = unionfs_delete_inode,
9786+ .put_super = unionfs_put_super,
9787+ .statfs = unionfs_statfs,
9788+ .remount_fs = unionfs_remount_fs,
9789+ .clear_inode = unionfs_clear_inode,
9790+ .umount_begin = unionfs_umount_begin,
9791+ .show_options = unionfs_show_options,
9792+ .write_inode = unionfs_write_inode,
9793+ .alloc_inode = unionfs_alloc_inode,
9794+ .destroy_inode = unionfs_destroy_inode,
9795+};
9796diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
9797new file mode 100644
9798index 0000000..54320b5
9799--- /dev/null
9800+++ b/fs/unionfs/union.h
9801@@ -0,0 +1,563 @@
9802+/*
9803+ * Copyright (c) 2003-2007 Erez Zadok
9804+ * Copyright (c) 2003-2006 Charles P. Wright
9805+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
9806+ * Copyright (c) 2005 Arun M. Krishnakumar
9807+ * Copyright (c) 2004-2006 David P. Quigley
9808+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
9809+ * Copyright (c) 2003 Puja Gupta
9810+ * Copyright (c) 2003 Harikesavan Krishnan
9811+ * Copyright (c) 2003-2007 Stony Brook University
9812+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
9813+ *
9814+ * This program is free software; you can redistribute it and/or modify
9815+ * it under the terms of the GNU General Public License version 2 as
9816+ * published by the Free Software Foundation.
9817+ */
9818+
9819+#ifndef _UNION_H_
9820+#define _UNION_H_
9821+
9822+#include <linux/dcache.h>
9823+#include <linux/file.h>
9824+#include <linux/list.h>
9825+#include <linux/fs.h>
9826+#include <linux/mm.h>
9827+#include <linux/module.h>
9828+#include <linux/mount.h>
9829+#include <linux/namei.h>
9830+#include <linux/page-flags.h>
9831+#include <linux/pagemap.h>
9832+#include <linux/poll.h>
9833+#include <linux/security.h>
9834+#include <linux/seq_file.h>
9835+#include <linux/slab.h>
9836+#include <linux/spinlock.h>
9837+#include <linux/smp_lock.h>
9838+#include <linux/statfs.h>
9839+#include <linux/string.h>
9840+#include <linux/vmalloc.h>
9841+#include <linux/writeback.h>
9842+#include <linux/buffer_head.h>
9843+#include <linux/xattr.h>
9844+#include <linux/fs_stack.h>
9845+#include <linux/magic.h>
9846+#include <linux/log2.h>
9847+
9848+#include <asm/mman.h>
9849+#include <asm/system.h>
9850+
9851+#include <linux/union_fs.h>
9852+
9853+/* the file system name */
9854+#define UNIONFS_NAME "unionfs"
9855+
9856+/* unionfs root inode number */
9857+#define UNIONFS_ROOT_INO 1
9858+
9859+/* number of times we try to get a unique temporary file name */
9860+#define GET_TMPNAM_MAX_RETRY 5
9861+
9862+/* maximum number of branches we support, to avoid memory blowup */
9863+#define UNIONFS_MAX_BRANCHES 128
9864+
9865+/* Operations vectors defined in specific files. */
9866+extern struct file_operations unionfs_main_fops;
9867+extern struct file_operations unionfs_dir_fops;
9868+extern struct inode_operations unionfs_main_iops;
9869+extern struct inode_operations unionfs_dir_iops;
9870+extern struct inode_operations unionfs_symlink_iops;
9871+extern struct super_operations unionfs_sops;
9872+extern struct dentry_operations unionfs_dops;
9873+
9874+/* How long should an entry be allowed to persist */
9875+#define RDCACHE_JIFFIES (5*HZ)
9876+
9877+/* file private data. */
9878+struct unionfs_file_info {
9879+ int bstart;
9880+ int bend;
9881+ atomic_t generation;
9882+
9883+ struct unionfs_dir_state *rdstate;
9884+ struct file **lower_files;
9885+ int *saved_branch_ids; /* IDs of branches when file was opened */
9886+};
9887+
9888+/* unionfs inode data in memory */
9889+struct unionfs_inode_info {
9890+ int bstart;
9891+ int bend;
9892+ atomic_t generation;
9893+ int stale;
9894+ /* Stuff for readdir over NFS. */
9895+ spinlock_t rdlock;
9896+ struct list_head readdircache;
9897+ int rdcount;
9898+ int hashsize;
9899+ int cookie;
9900+
9901+ /* The lower inodes */
9902+ struct inode **lower_inodes;
9903+ /* to keep track of reads/writes for unlinks before closes */
9904+ atomic_t totalopens;
9905+
9906+ struct inode vfs_inode;
9907+};
9908+
9909+/* unionfs dentry data in memory */
9910+struct unionfs_dentry_info {
9911+ /*
9912+ * The semaphore is used to lock the dentry as soon as we get into a
9913+ * unionfs function from the VFS. Our lock ordering is that children
9914+ * go before their parents.
9915+ */
9916+ struct mutex lock;
9917+ int bstart;
9918+ int bend;
9919+ int bopaque;
9920+ int bcount;
9921+ atomic_t generation;
9922+ struct path *lower_paths;
9923+};
9924+
9925+/* These are the pointers to our various objects. */
9926+struct unionfs_data {
9927+ struct super_block *sb;
9928+ atomic_t open_files; /* number of open files on branch */
9929+ int branchperms;
9930+ int branch_id; /* unique branch ID at re/mount time */
9931+};
9932+
9933+/* unionfs super-block data in memory */
9934+struct unionfs_sb_info {
9935+ int bend;
9936+
9937+ atomic_t generation;
9938+
9939+ /*
9940+ * This rwsem is used to make sure that a branch management
9941+ * operation...
9942+ * 1) will not begin before all currently in-flight operations
9943+ * complete
9944+ * 2) any new operations do not execute until the currently
9945+ * running branch management operation completes
9946+ */
9947+#ifdef CONFIG_PREEMPT_RT
9948+ struct compat_rw_semaphore rwsem;
9949+#else /* not CONFIG_PREEMPT_RT */
9950+ struct rw_semaphore rwsem;
9951+#endif /* not CONFIG_PREEMPT_RT */
9952+ int high_branch_id; /* last unique branch ID given */
9953+ struct unionfs_data *data;
9954+};
9955+
9956+/*
9957+ * structure for making the linked list of entries by readdir on left branch
9958+ * to compare with entries on right branch
9959+ */
9960+struct filldir_node {
9961+ struct list_head file_list; /* list for directory entries */
9962+ char *name; /* name entry */
9963+ int hash; /* name hash */
9964+ int namelen; /* name len since name is not 0 terminated */
9965+
9966+ /*
9967+ * we can check for duplicate whiteouts and files in the same branch
9968+ * in order to return -EIO.
9969+ */
9970+ int bindex;
9971+
9972+ /* is this a whiteout entry? */
9973+ int whiteout;
9974+
9975+ /* Inline name, so we don't need to separately kmalloc small ones */
9976+ char iname[DNAME_INLINE_LEN_MIN];
9977+};
9978+
9979+/* Directory hash table. */
9980+struct unionfs_dir_state {
9981+ unsigned int cookie; /* the cookie, based off of rdversion */
9982+ unsigned int offset; /* The entry we have returned. */
9983+ int bindex;
9984+ loff_t dirpos; /* offset within the lower level directory */
9985+ int size; /* How big is the hash table? */
9986+ int hashentries; /* How many entries have been inserted? */
9987+ unsigned long access;
9988+
9989+ /* This cache list is used when the inode keeps us around. */
9990+ struct list_head cache;
9991+ struct list_head list[0];
9992+};
9993+
9994+/* externs needed for fanout.h or sioq.h */
9995+extern int unionfs_get_nlinks(const struct inode *inode);
9996+
9997+/* include miscellaneous macros */
9998+#include "fanout.h"
9999+#include "sioq.h"
10000+
10001+/* externs for cache creation/deletion routines */
10002+extern void unionfs_destroy_filldir_cache(void);
10003+extern int unionfs_init_filldir_cache(void);
10004+extern int unionfs_init_inode_cache(void);
10005+extern void unionfs_destroy_inode_cache(void);
10006+extern int unionfs_init_dentry_cache(void);
10007+extern void unionfs_destroy_dentry_cache(void);
10008+
10009+/* Initialize and free readdir-specific state. */
10010+extern int init_rdstate(struct file *file);
10011+extern struct unionfs_dir_state *alloc_rdstate(struct inode *inode,
10012+ int bindex);
10013+extern struct unionfs_dir_state *find_rdstate(struct inode *inode,
10014+ loff_t fpos);
10015+extern void free_rdstate(struct unionfs_dir_state *state);
10016+extern int add_filldir_node(struct unionfs_dir_state *rdstate,
10017+ const char *name, int namelen, int bindex,
10018+ int whiteout);
10019+extern struct filldir_node *find_filldir_node(struct unionfs_dir_state *rdstate,
10020+ const char *name, int namelen);
10021+
10022+extern struct dentry **alloc_new_dentries(int objs);
10023+extern struct unionfs_data *alloc_new_data(int objs);
10024+
10025+/* We can only use 32-bits of offset for rdstate --- blech! */
10026+#define DIREOF (0xfffff)
10027+#define RDOFFBITS 20 /* This is the number of bits in DIREOF. */
10028+#define MAXRDCOOKIE (0xfff)
10029+/* Turn an rdstate into an offset. */
10030+static inline off_t rdstate2offset(struct unionfs_dir_state *buf)
10031+{
10032+ off_t tmp;
10033+
10034+ tmp = ((buf->cookie & MAXRDCOOKIE) << RDOFFBITS)
10035+ | (buf->offset & DIREOF);
10036+ return tmp;
10037+}
10038+
10039+#define unionfs_read_lock(sb) down_read(&UNIONFS_SB(sb)->rwsem)
10040+#define unionfs_read_unlock(sb) up_read(&UNIONFS_SB(sb)->rwsem)
10041+#define unionfs_write_lock(sb) down_write(&UNIONFS_SB(sb)->rwsem)
10042+#define unionfs_write_unlock(sb) up_write(&UNIONFS_SB(sb)->rwsem)
10043+
10044+static inline void unionfs_double_lock_dentry(struct dentry *d1,
10045+ struct dentry *d2)
10046+{
10047+ if (d2 < d1) {
10048+ struct dentry *tmp = d1;
10049+ d1 = d2;
10050+ d2 = tmp;
10051+ }
10052+ unionfs_lock_dentry(d1);
10053+ unionfs_lock_dentry(d2);
10054+}
10055+
10056+extern int new_dentry_private_data(struct dentry *dentry);
10057+extern void free_dentry_private_data(struct dentry *dentry);
10058+extern void update_bstart(struct dentry *dentry);
10059+
10060+/*
10061+ * EXTERNALS:
10062+ */
10063+
10064+/* replicates the directory structure up to given dentry in given branch */
10065+extern struct dentry *create_parents(struct inode *dir, struct dentry *dentry,
10066+ const char *name, int bindex);
10067+extern int make_dir_opaque(struct dentry *dir, int bindex);
10068+
10069+/* partial lookup */
10070+extern int unionfs_partial_lookup(struct dentry *dentry);
10071+
10072+/*
10073+ * Pass an unionfs dentry and an index and it will try to create a whiteout
10074+ * in branch 'index'.
10075+ *
10076+ * On error, it will proceed to a branch to the left
10077+ */
10078+extern int create_whiteout(struct dentry *dentry, int start);
10079+/* copies a file from dbstart to newbindex branch */
10080+extern int copyup_file(struct inode *dir, struct file *file, int bstart,
10081+ int newbindex, loff_t size);
10082+extern int copyup_named_file(struct inode *dir, struct file *file,
10083+ char *name, int bstart, int new_bindex,
10084+ loff_t len);
10085+/* copies a dentry from dbstart to newbindex branch */
10086+extern int copyup_dentry(struct inode *dir, struct dentry *dentry,
10087+ int bstart, int new_bindex, const char *name,
10088+ int namelen, struct file **copyup_file, loff_t len);
10089+/* helper functions for post-copyup actions */
10090+extern void unionfs_postcopyup_setmnt(struct dentry *dentry);
10091+extern void unionfs_postcopyup_release(struct dentry *dentry);
10092+
10093+extern int remove_whiteouts(struct dentry *dentry,
10094+ struct dentry *lower_dentry, int bindex);
10095+
10096+extern int do_delete_whiteouts(struct dentry *dentry, int bindex,
10097+ struct unionfs_dir_state *namelist);
10098+
10099+/* Is this directory empty: 0 if it is empty, -ENOTEMPTY if not. */
10100+extern int check_empty(struct dentry *dentry,
10101+ struct unionfs_dir_state **namelist);
10102+/* Delete whiteouts from this directory in branch bindex. */
10103+extern int delete_whiteouts(struct dentry *dentry, int bindex,
10104+ struct unionfs_dir_state *namelist);
10105+
10106+/* Re-lookup a lower dentry. */
10107+extern int unionfs_refresh_lower_dentry(struct dentry *dentry, int bindex);
10108+
10109+extern void unionfs_reinterpose(struct dentry *this_dentry);
10110+extern struct super_block *unionfs_duplicate_super(struct super_block *sb);
10111+
10112+/* Locking functions. */
10113+extern int unionfs_setlk(struct file *file, int cmd, struct file_lock *fl);
10114+extern int unionfs_getlk(struct file *file, struct file_lock *fl);
10115+
10116+/* Common file operations. */
10117+extern int unionfs_file_revalidate(struct file *file, bool willwrite);
10118+extern int unionfs_open(struct inode *inode, struct file *file);
10119+extern int unionfs_file_release(struct inode *inode, struct file *file);
10120+extern int unionfs_flush(struct file *file, fl_owner_t id);
10121+extern long unionfs_ioctl(struct file *file, unsigned int cmd,
10122+ unsigned long arg);
10123+extern int unionfs_fsync(struct file *file, struct dentry *dentry,
10124+ int datasync);
10125+extern int unionfs_fasync(int fd, struct file *file, int flag);
10126+
10127+/* Inode operations */
10128+extern int unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
10129+ struct inode *new_dir, struct dentry *new_dentry);
10130+extern int unionfs_unlink(struct inode *dir, struct dentry *dentry);
10131+extern int unionfs_rmdir(struct inode *dir, struct dentry *dentry);
10132+
10133+extern bool __unionfs_d_revalidate_chain(struct dentry *dentry,
10134+ struct nameidata *nd, bool willwrite);
10135+extern bool is_newer_lower(const struct dentry *dentry);
10136+
10137+/* The values for unionfs_interpose's flag. */
10138+#define INTERPOSE_DEFAULT 0
10139+#define INTERPOSE_LOOKUP 1
10140+#define INTERPOSE_REVAL 2
10141+#define INTERPOSE_REVAL_NEG 3
10142+#define INTERPOSE_PARTIAL 4
10143+
10144+extern struct dentry *unionfs_interpose(struct dentry *this_dentry,
10145+ struct super_block *sb, int flag);
10146+
10147+#ifdef CONFIG_UNION_FS_XATTR
10148+/* Extended attribute functions. */
10149+extern void *unionfs_xattr_alloc(size_t size, size_t limit);
10150+static inline void unionfs_xattr_kfree(const void *p)
10151+{
10152+ kfree(p);
10153+}
10154+extern ssize_t unionfs_getxattr(struct dentry *dentry, const char *name,
10155+ void *value, size_t size);
10156+extern int unionfs_removexattr(struct dentry *dentry, const char *name);
10157+extern ssize_t unionfs_listxattr(struct dentry *dentry, char *list,
10158+ size_t size);
10159+extern int unionfs_setxattr(struct dentry *dentry, const char *name,
10160+ const void *value, size_t size, int flags);
10161+#endif /* CONFIG_UNION_FS_XATTR */
10162+
10163+/* The root directory is unhashed, but isn't deleted. */
10164+static inline int d_deleted(struct dentry *d)
10165+{
10166+ return d_unhashed(d) && (d != d->d_sb->s_root);
10167+}
10168+
10169+struct dentry *unionfs_lookup_backend(struct dentry *dentry,
10170+ struct nameidata *nd, int lookupmode);
10171+
10172+/* unionfs_permission, check if we should bypass error to facilitate copyup */
10173+#define IS_COPYUP_ERR(err) ((err) == -EROFS)
10174+
10175+/* unionfs_open, check if we need to copyup the file */
10176+#define OPEN_WRITE_FLAGS (O_WRONLY | O_RDWR | O_APPEND)
10177+#define IS_WRITE_FLAG(flag) ((flag) & OPEN_WRITE_FLAGS)
10178+
10179+static inline int branchperms(const struct super_block *sb, int index)
10180+{
10181+ BUG_ON(index < 0);
10182+ return UNIONFS_SB(sb)->data[index].branchperms;
10183+}
10184+
10185+static inline int set_branchperms(struct super_block *sb, int index, int perms)
10186+{
10187+ BUG_ON(index < 0);
10188+ UNIONFS_SB(sb)->data[index].branchperms = perms;
10189+ return perms;
10190+}
10191+
10192+/* Is this file on a read-only branch? */
10193+static inline int is_robranch_super(const struct super_block *sb, int index)
10194+{
10195+ int ret;
10196+
10197+ ret = (!(branchperms(sb, index) & MAY_WRITE)) ? -EROFS : 0;
10198+ return ret;
10199+}
10200+
10201+/* Is this file on a read-only branch? */
10202+static inline int is_robranch_idx(const struct dentry *dentry, int index)
10203+{
10204+ struct super_block *lower_sb;
10205+
10206+ BUG_ON(index < 0);
10207+
10208+ if (!(branchperms(dentry->d_sb, index) & MAY_WRITE))
10209+ return -EROFS;
10210+
10211+ lower_sb = unionfs_lower_super_idx(dentry->d_sb, index);
10212+ BUG_ON(lower_sb == NULL);
10213+ /*
10214+ * test sb flags directly, not IS_RDONLY(lower_inode) because the
10215+ * lower_dentry could be a negative.
10216+ */
10217+ if (lower_sb->s_flags & MS_RDONLY)
10218+ return -EROFS;
10219+
10220+ return 0;
10221+}
10222+
10223+static inline int is_robranch(const struct dentry *dentry)
10224+{
10225+ int index;
10226+
10227+ index = UNIONFS_D(dentry)->bstart;
10228+ BUG_ON(index < 0);
10229+
10230+ return is_robranch_idx(dentry, index);
10231+}
10232+
10233+/* What do we use for whiteouts. */
10234+#define UNIONFS_WHPFX ".wh."
10235+#define UNIONFS_WHLEN 4
10236+/*
10237+ * If a directory contains this file, then it is opaque. We start with the
10238+ * .wh. flag so that it is blocked by lookup.
10239+ */
10240+#define UNIONFS_DIR_OPAQUE_NAME "__dir_opaque"
10241+#define UNIONFS_DIR_OPAQUE UNIONFS_WHPFX UNIONFS_DIR_OPAQUE_NAME
10242+
10243+/*
10244+ * EXTERNALS:
10245+ */
10246+extern char *alloc_whname(const char *name, int len);
10247+extern int check_branch(struct nameidata *nd);
10248+extern int __parse_branch_mode(const char *name);
10249+extern int parse_branch_mode(const char *name);
10250+
10251+/*
10252+ * These two functions are here because it is kind of daft to copy and paste
10253+ * the contents of the two functions to 32+ places in unionfs
10254+ */
10255+static inline struct dentry *lock_parent(struct dentry *dentry)
10256+{
10257+ struct dentry *dir = dget(dentry->d_parent);
10258+
10259+ mutex_lock(&dir->d_inode->i_mutex);
10260+ return dir;
10261+}
10262+
10263+static inline void unlock_dir(struct dentry *dir)
10264+{
10265+ mutex_unlock(&dir->d_inode->i_mutex);
10266+ dput(dir);
10267+}
10268+
10269+static inline struct vfsmount *unionfs_mntget(struct dentry *dentry,
10270+ int bindex)
10271+{
10272+ struct vfsmount *mnt;
10273+
10274+ BUG_ON(!dentry || bindex < 0);
10275+
10276+ mnt = mntget(unionfs_lower_mnt_idx(dentry, bindex));
10277+#ifdef CONFIG_UNION_FS_DEBUG
10278+ if (!mnt)
10279+ printk(KERN_DEBUG "unionfs_mntget: mnt=%p bindex=%d\n",
10280+ mnt, bindex);
10281+#endif /* CONFIG_UNION_FS_DEBUG */
10282+
10283+ return mnt;
10284+}
10285+
10286+static inline void unionfs_mntput(struct dentry *dentry, int bindex)
10287+{
10288+ struct vfsmount *mnt;
10289+
10290+ if (!dentry && bindex < 0)
10291+ return;
10292+ BUG_ON(!dentry || bindex < 0);
10293+
10294+ mnt = unionfs_lower_mnt_idx(dentry, bindex);
10295+#ifdef CONFIG_UNION_FS_DEBUG
10296+ /*
10297+ * Directories can have NULL lower objects in between start/end, but
10298+ * NOT if at the start/end range. We cannot verify that this dentry
10299+ * is a type=DIR, because it may already be a negative dentry. But
10300+ * if dbstart is greater than dbend, we know that this couldn't have
10301+ * been a regular file: it had to have been a directory.
10302+ */
10303+ if (!mnt && !(bindex > dbstart(dentry) && bindex < dbend(dentry)))
10304+ printk(KERN_WARNING
10305+ "unionfs_mntput: mnt=%p bindex=%d\n",
10306+ mnt, bindex);
10307+#endif /* CONFIG_UNION_FS_DEBUG */
10308+ mntput(mnt);
10309+}
10310+
10311+#ifdef CONFIG_UNION_FS_DEBUG
10312+
10313+#define dprintk printk
10314+
10315+/* useful for tracking code reachability */
10316+#define UDBG printk("DBG:%s:%s:%d\n",__FILE__,__FUNCTION__,__LINE__)
10317+
10318+#define unionfs_check_inode(i) __unionfs_check_inode((i), \
10319+ __FILE__,__FUNCTION__,__LINE__)
10320+#define unionfs_check_dentry(d) __unionfs_check_dentry((d), \
10321+ __FILE__,__FUNCTION__,__LINE__)
10322+#define unionfs_check_file(f) __unionfs_check_file((f), \
10323+ __FILE__,__FUNCTION__,__LINE__)
10324+#define show_branch_counts(sb) __show_branch_counts((sb), \
10325+ __FILE__,__FUNCTION__,__LINE__)
10326+#define show_inode_times(i) __show_inode_times((i), \
10327+ __FILE__,__FUNCTION__,__LINE__)
10328+#define show_dinode_times(d) __show_dinode_times((d), \
10329+ __FILE__,__FUNCTION__,__LINE__)
10330+#define show_inode_counts(i) __show_inode_counts((i), \
10331+ __FILE__,__FUNCTION__,__LINE__)
10332+
10333+extern void __unionfs_check_inode(const struct inode *inode, const char *fname,
10334+ const char *fxn, int line);
10335+extern void __unionfs_check_dentry(const struct dentry *dentry,
10336+ const char *fname, const char *fxn,
10337+ int line);
10338+extern void __unionfs_check_file(const struct file *file,
10339+ const char *fname, const char *fxn, int line);
10340+extern void __show_branch_counts(const struct super_block *sb,
10341+ const char *file, const char *fxn, int line);
10342+extern void __show_inode_times(const struct inode *inode,
10343+ const char *file, const char *fxn, int line);
10344+extern void __show_dinode_times(const struct dentry *dentry,
10345+ const char *file, const char *fxn, int line);
10346+extern void __show_inode_counts(const struct inode *inode,
10347+ const char *file, const char *fxn, int line);
10348+
10349+#else /* not CONFIG_UNION_FS_DEBUG */
10350+
10351+#define dprintk(x...) do { ; } while (0)
10352+
10353+/* we leave useful hooks for these check functions throughout the code */
10354+#define unionfs_check_inode(i) do { } while(0)
10355+#define unionfs_check_dentry(d) do { } while(0)
10356+#define unionfs_check_file(f) do { } while(0)
10357+#define show_branch_counts(sb) do { } while(0)
10358+#define show_inode_times(i) do { } while(0)
10359+#define show_dinode_times(d) do { } while(0)
10360+#define show_inode_counts(i) do { } while(0)
10361+
10362+#endif /* not CONFIG_UNION_FS_DEBUG */
10363+
10364+#endif /* not _UNION_H_ */
10365diff --git a/fs/unionfs/unlink.c b/fs/unionfs/unlink.c
10366new file mode 100644
10367index 0000000..3924f7f
10368--- /dev/null
10369+++ b/fs/unionfs/unlink.c
10370@@ -0,0 +1,192 @@
10371+/*
10372+ * Copyright (c) 2003-2007 Erez Zadok
10373+ * Copyright (c) 2003-2006 Charles P. Wright
10374+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
10375+ * Copyright (c) 2005-2006 Junjiro Okajima
10376+ * Copyright (c) 2005 Arun M. Krishnakumar
10377+ * Copyright (c) 2004-2006 David P. Quigley
10378+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
10379+ * Copyright (c) 2003 Puja Gupta
10380+ * Copyright (c) 2003 Harikesavan Krishnan
10381+ * Copyright (c) 2003-2007 Stony Brook University
10382+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
10383+ *
10384+ * This program is free software; you can redistribute it and/or modify
10385+ * it under the terms of the GNU General Public License version 2 as
10386+ * published by the Free Software Foundation.
10387+ */
10388+
10389+#include "union.h"
10390+
10391+/* unlink a file by creating a whiteout */
10392+static int unionfs_unlink_whiteout(struct inode *dir, struct dentry *dentry)
10393+{
10394+ struct dentry *lower_dentry;
10395+ struct dentry *lower_dir_dentry;
10396+ int bindex;
10397+ int err = 0;
10398+
10399+ if ((err = unionfs_partial_lookup(dentry)))
10400+ goto out;
10401+
10402+ bindex = dbstart(dentry);
10403+
10404+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10405+ if (!lower_dentry)
10406+ goto out;
10407+
10408+ lower_dir_dentry = lock_parent(lower_dentry);
10409+
10410+ /* avoid destroying the lower inode if the file is in use */
10411+ dget(lower_dentry);
10412+ if (!(err = is_robranch_super(dentry->d_sb, bindex)))
10413+ err = vfs_unlink(lower_dir_dentry->d_inode, lower_dentry);
10414+ /* if vfs_unlink succeeded, update our inode's times */
10415+ if (!err)
10416+ unionfs_copy_attr_times(dentry->d_inode);
10417+ dput(lower_dentry);
10418+ fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
10419+ unlock_dir(lower_dir_dentry);
10420+
10421+ if (err && !IS_COPYUP_ERR(err))
10422+ goto out;
10423+
10424+ if (err) {
10425+ if (dbstart(dentry) == 0)
10426+ goto out;
10427+ err = create_whiteout(dentry, dbstart(dentry) - 1);
10428+ } else if (dbopaque(dentry) != -1)
10429+ /* There is a lower lower-priority file with the same name. */
10430+ err = create_whiteout(dentry, dbopaque(dentry));
10431+ else
10432+ err = create_whiteout(dentry, dbstart(dentry));
10433+
10434+out:
10435+ if (!err)
10436+ dentry->d_inode->i_nlink--;
10437+
10438+ /* We don't want to leave negative leftover dentries for revalidate. */
10439+ if (!err && (dbopaque(dentry) != -1))
10440+ update_bstart(dentry);
10441+
10442+ return err;
10443+}
10444+
10445+int unionfs_unlink(struct inode *dir, struct dentry *dentry)
10446+{
10447+ int err = 0;
10448+
10449+ unionfs_read_lock(dentry->d_sb);
10450+ unionfs_lock_dentry(dentry);
10451+
10452+ if (!__unionfs_d_revalidate_chain(dentry, NULL, false)) {
10453+ err = -ESTALE;
10454+ goto out;
10455+ }
10456+ unionfs_check_dentry(dentry);
10457+
10458+ err = unionfs_unlink_whiteout(dir, dentry);
10459+ /* call d_drop so the system "forgets" about us */
10460+ if (!err) {
10461+ if (!S_ISDIR(dentry->d_inode->i_mode))
10462+ unionfs_postcopyup_release(dentry);
10463+ d_drop(dentry);
10464+ /*
10465+ * if unlink/whiteout succeeded, parent dir mtime has
10466+ * changed
10467+ */
10468+ unionfs_copy_attr_times(dir);
10469+ }
10470+
10471+out:
10472+ if (!err) {
10473+ unionfs_check_dentry(dentry);
10474+ unionfs_check_inode(dir);
10475+ }
10476+ unionfs_unlock_dentry(dentry);
10477+ unionfs_read_unlock(dentry->d_sb);
10478+ return err;
10479+}
10480+
10481+static int unionfs_rmdir_first(struct inode *dir, struct dentry *dentry,
10482+ struct unionfs_dir_state *namelist)
10483+{
10484+ int err;
10485+ struct dentry *lower_dentry;
10486+ struct dentry *lower_dir_dentry = NULL;
10487+
10488+ /* Here we need to remove whiteout entries. */
10489+ err = delete_whiteouts(dentry, dbstart(dentry), namelist);
10490+ if (err)
10491+ goto out;
10492+
10493+ lower_dentry = unionfs_lower_dentry(dentry);
10494+
10495+ lower_dir_dentry = lock_parent(lower_dentry);
10496+
10497+ /* avoid destroying the lower inode if the file is in use */
10498+ dget(lower_dentry);
10499+ if (!(err = is_robranch(dentry)))
10500+ err = vfs_rmdir(lower_dir_dentry->d_inode, lower_dentry);
10501+ dput(lower_dentry);
10502+
10503+ fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
10504+ /* propagate number of hard-links */
10505+ dentry->d_inode->i_nlink = unionfs_get_nlinks(dentry->d_inode);
10506+
10507+out:
10508+ if (lower_dir_dentry)
10509+ unlock_dir(lower_dir_dentry);
10510+ return err;
10511+}
10512+
10513+int unionfs_rmdir(struct inode *dir, struct dentry *dentry)
10514+{
10515+ int err = 0;
10516+ struct unionfs_dir_state *namelist = NULL;
10517+
10518+ unionfs_read_lock(dentry->d_sb);
10519+ unionfs_lock_dentry(dentry);
10520+
10521+ if (!__unionfs_d_revalidate_chain(dentry, NULL, false)) {
10522+ err = -ESTALE;
10523+ goto out;
10524+ }
10525+ unionfs_check_dentry(dentry);
10526+
10527+ /* check if this unionfs directory is empty or not */
10528+ err = check_empty(dentry, &namelist);
10529+ if (err)
10530+ goto out;
10531+
10532+ err = unionfs_rmdir_first(dir, dentry, namelist);
10533+ /* create whiteout */
10534+ if (!err)
10535+ err = create_whiteout(dentry, dbstart(dentry));
10536+ else {
10537+ int new_err;
10538+
10539+ if (dbstart(dentry) == 0)
10540+ goto out;
10541+
10542+ /* exit if the error returned was NOT -EROFS */
10543+ if (!IS_COPYUP_ERR(err))
10544+ goto out;
10545+
10546+ new_err = create_whiteout(dentry, dbstart(dentry) - 1);
10547+ if (new_err != -EEXIST)
10548+ err = new_err;
10549+ }
10550+
10551+out:
10552+ /* call d_drop so the system "forgets" about us */
10553+ if (!err)
10554+ d_drop(dentry);
10555+
10556+ if (namelist)
10557+ free_rdstate(namelist);
10558+
10559+ unionfs_unlock_dentry(dentry);
10560+ unionfs_read_unlock(dentry->d_sb);
10561+ return err;
10562+}
10563diff --git a/fs/unionfs/xattr.c b/fs/unionfs/xattr.c
10564new file mode 100644
10565index 0000000..7f77d7d
10566--- /dev/null
10567+++ b/fs/unionfs/xattr.c
10568@@ -0,0 +1,153 @@
10569+/*
10570+ * Copyright (c) 2003-2007 Erez Zadok
10571+ * Copyright (c) 2003-2006 Charles P. Wright
10572+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
10573+ * Copyright (c) 2005-2006 Junjiro Okajima
10574+ * Copyright (c) 2005 Arun M. Krishnakumar
10575+ * Copyright (c) 2004-2006 David P. Quigley
10576+ * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
10577+ * Copyright (c) 2003 Puja Gupta
10578+ * Copyright (c) 2003 Harikesavan Krishnan
10579+ * Copyright (c) 2003-2007 Stony Brook University
10580+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
10581+ *
10582+ * This program is free software; you can redistribute it and/or modify
10583+ * it under the terms of the GNU General Public License version 2 as
10584+ * published by the Free Software Foundation.
10585+ */
10586+
10587+#include "union.h"
10588+
10589+/* This is lifted from fs/xattr.c */
10590+void *unionfs_xattr_alloc(size_t size, size_t limit)
10591+{
10592+ void *ptr;
10593+
10594+ if (size > limit)
10595+ return ERR_PTR(-E2BIG);
10596+
10597+ if (!size) /* size request, no buffer is needed */
10598+ return NULL;
10599+
10600+ ptr = kmalloc(size, GFP_KERNEL);
10601+ if (!ptr)
10602+ return ERR_PTR(-ENOMEM);
10603+ return ptr;
10604+}
10605+
10606+/*
10607+ * BKL held by caller.
10608+ * dentry->d_inode->i_mutex locked
10609+ */
10610+ssize_t unionfs_getxattr(struct dentry *dentry, const char *name, void *value,
10611+ size_t size)
10612+{
10613+ struct dentry *lower_dentry = NULL;
10614+ int err = -EOPNOTSUPP;
10615+
10616+ unionfs_read_lock(dentry->d_sb);
10617+ unionfs_lock_dentry(dentry);
10618+
10619+ if (!__unionfs_d_revalidate_chain(dentry, NULL, false)) {
10620+ err = -ESTALE;
10621+ goto out;
10622+ }
10623+
10624+ lower_dentry = unionfs_lower_dentry(dentry);
10625+
10626+ err = vfs_getxattr(lower_dentry, (char*) name, value, size);
10627+
10628+out:
10629+ unionfs_unlock_dentry(dentry);
10630+ unionfs_check_dentry(dentry);
10631+ unionfs_read_unlock(dentry->d_sb);
10632+ return err;
10633+}
10634+
10635+/*
10636+ * BKL held by caller.
10637+ * dentry->d_inode->i_mutex locked
10638+ */
10639+int unionfs_setxattr(struct dentry *dentry, const char *name,
10640+ const void *value, size_t size, int flags)
10641+{
10642+ struct dentry *lower_dentry = NULL;
10643+ int err = -EOPNOTSUPP;
10644+
10645+ unionfs_read_lock(dentry->d_sb);
10646+ unionfs_lock_dentry(dentry);
10647+
10648+ if (!__unionfs_d_revalidate_chain(dentry, NULL, false)) {
10649+ err = -ESTALE;
10650+ goto out;
10651+ }
10652+
10653+ lower_dentry = unionfs_lower_dentry(dentry);
10654+
10655+ err = vfs_setxattr(lower_dentry, (char*) name, (void*) value,
10656+ size, flags);
10657+
10658+out:
10659+ unionfs_unlock_dentry(dentry);
10660+ unionfs_check_dentry(dentry);
10661+ unionfs_read_unlock(dentry->d_sb);
10662+ return err;
10663+}
10664+
10665+/*
10666+ * BKL held by caller.
10667+ * dentry->d_inode->i_mutex locked
10668+ */
10669+int unionfs_removexattr(struct dentry *dentry, const char *name)
10670+{
10671+ struct dentry *lower_dentry = NULL;
10672+ int err = -EOPNOTSUPP;
10673+
10674+ unionfs_read_lock(dentry->d_sb);
10675+ unionfs_lock_dentry(dentry);
10676+
10677+ if (!__unionfs_d_revalidate_chain(dentry, NULL, false)) {
10678+ err = -ESTALE;
10679+ goto out;
10680+ }
10681+
10682+ lower_dentry = unionfs_lower_dentry(dentry);
10683+
10684+ err = vfs_removexattr(lower_dentry, (char*) name);
10685+
10686+out:
10687+ unionfs_unlock_dentry(dentry);
10688+ unionfs_check_dentry(dentry);
10689+ unionfs_read_unlock(dentry->d_sb);
10690+ return err;
10691+}
10692+
10693+/*
10694+ * BKL held by caller.
10695+ * dentry->d_inode->i_mutex locked
10696+ */
10697+ssize_t unionfs_listxattr(struct dentry *dentry, char *list, size_t size)
10698+{
10699+ struct dentry *lower_dentry = NULL;
10700+ int err = -EOPNOTSUPP;
10701+ char *encoded_list = NULL;
10702+
10703+ unionfs_read_lock(dentry->d_sb);
10704+ unionfs_lock_dentry(dentry);
10705+
10706+ if (!__unionfs_d_revalidate_chain(dentry, NULL, false)) {
10707+ err = -ESTALE;
10708+ goto out;
10709+ }
10710+
10711+ lower_dentry = unionfs_lower_dentry(dentry);
10712+
10713+ encoded_list = list;
10714+ err = vfs_listxattr(lower_dentry, encoded_list, size);
10715+
10716+out:
10717+ unionfs_unlock_dentry(dentry);
10718+ unionfs_check_dentry(dentry);
10719+ unionfs_read_unlock(dentry->d_sb);
10720+ return err;
10721+}
10722diff --git a/include/linux/fs_stack.h b/include/linux/fs_stack.h
10723index bb516ce..6b52faf 100644
10724--- a/include/linux/fs_stack.h
10725+++ b/include/linux/fs_stack.h
10726@@ -1,17 +1,28 @@
10727+/*
10728+ * Copyright (c) 2006-2007 Erez Zadok
10729+ * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
10730+ * Copyright (c) 2006-2007 Stony Brook University
10731+ * Copyright (c) 2006-2007 The Research Foundation of SUNY
10732+ *
10733+ * This program is free software; you can redistribute it and/or modify
10734+ * it under the terms of the GNU General Public License version 2 as
10735+ * published by the Free Software Foundation.
10736+ */
10737+
10738 #ifndef _LINUX_FS_STACK_H
10739 #define _LINUX_FS_STACK_H
10740
10741-/* This file defines generic functions used primarily by stackable
10742+/*
10743+ * This file defines generic functions used primarily by stackable
10744 * filesystems; none of these functions require i_mutex to be held.
10745 */
10746
10747 #include <linux/fs.h>
10748
10749 /* externs for fs/stack.c */
10750-extern void fsstack_copy_attr_all(struct inode *dest, const struct inode *src,
10751- int (*get_nlinks)(struct inode *));
10752-
10753-extern void fsstack_copy_inode_size(struct inode *dst, const struct inode *src);
10754+extern void fsstack_copy_attr_all(struct inode *dest, const struct inode *src);
10755+extern void fsstack_copy_inode_size(struct inode *dst,
10756+ const struct inode *src);
10757
10758 /* inlines */
10759 static inline void fsstack_copy_attr_atime(struct inode *dest,
10760diff --git a/include/linux/magic.h b/include/linux/magic.h
10761index 9d713c0..a1da278 100644
10762--- a/include/linux/magic.h
10763+++ b/include/linux/magic.h
10764@@ -36,6 +36,8 @@
10765 #define REISER2FS_SUPER_MAGIC_STRING "ReIsEr2Fs"
10766 #define REISER2FS_JR_SUPER_MAGIC_STRING "ReIsEr3Fs"
10767
10768+#define UNIONFS_SUPER_MAGIC 0xf15f083d
10769+
10770 #define SMB_SUPER_MAGIC 0x517B
10771 #define USBDEVICE_SUPER_MAGIC 0x9fa2
10772
10773diff --git a/include/linux/mm.h b/include/linux/mm.h
10774index 1c12074..063f7dc 100644
10775--- a/include/linux/mm.h
10776+++ b/include/linux/mm.h
10777@@ -1199,6 +1199,7 @@ int drop_caches_sysctl_handler(struct ctl_table *, int, struct file *,
10778 void __user *, size_t *, loff_t *);
10779 unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
10780 unsigned long lru_pages);
10781+extern void drop_pagecache_sb(struct super_block *);
10782 void drop_pagecache(void);
10783 void drop_slab(void);
10784
10785diff --git a/include/linux/namei.h b/include/linux/namei.h
10786index b7dd249..1658291 100644
10787--- a/include/linux/namei.h
10788+++ b/include/linux/namei.h
10789@@ -3,6 +3,7 @@
10790
10791 #include <linux/dcache.h>
10792 #include <linux/linkage.h>
10793+#include <linux/mount.h>
10794
10795 struct vfsmount;
10796
10797@@ -47,6 +48,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
10798 * - internal "there are more path compnents" flag
10799 * - locked when lookup done with dcache_lock held
10800 * - dentry cache is untrusted; force a real lookup
10801+ * - lookup path from given dentry/vfsmount pair
10802 */
10803 #define LOOKUP_FOLLOW 1
10804 #define LOOKUP_DIRECTORY 2
10805@@ -54,6 +56,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
10806 #define LOOKUP_PARENT 16
10807 #define LOOKUP_NOALT 32
10808 #define LOOKUP_REVAL 64
10809+#define LOOKUP_ONE 128
10810 /*
10811 * Intent data
10812 */
10813@@ -81,9 +84,16 @@ extern struct file *lookup_instantiate_filp(struct nameidata *nd, struct dentry
10814 extern struct file *nameidata_to_filp(struct nameidata *nd, int flags);
10815 extern void release_open_intent(struct nameidata *);
10816
10817-extern struct dentry * lookup_one_len(const char *, struct dentry *, int);
10818+extern struct dentry * lookup_one_len_nd(const char *, struct dentry *,
10819+ int, struct nameidata *);
10820 extern struct dentry *lookup_one_len_kern(const char *, struct dentry *, int);
10821
10822+static inline struct dentry *lookup_one_len(const char *name,
10823+ struct dentry *dir, int len)
10824+{
10825+ return lookup_one_len_nd(name, dir, len, NULL);
10826+}
10827+
10828 extern int follow_down(struct vfsmount **, struct dentry **);
10829 extern int follow_up(struct vfsmount **, struct dentry **);
10830
10831@@ -100,4 +110,16 @@ static inline char *nd_get_link(struct nameidata *nd)
10832 return nd->saved_names[nd->depth];
10833 }
10834
10835+static inline void pathget(struct path *path)
10836+{
10837+ mntget(path->mnt);
10838+ dget(path->dentry);
10839+}
10840+
10841+static inline void pathput(struct path *path)
10842+{
10843+ dput(path->dentry);
10844+ mntput(path->mnt);
10845+}
10846+
10847 #endif /* _LINUX_NAMEI_H */
10848diff --git a/include/linux/union_fs.h b/include/linux/union_fs.h
10849new file mode 100644
10850index 0000000..d13eb48
10851--- /dev/null
10852+++ b/include/linux/union_fs.h
10853@@ -0,0 +1,25 @@
10854+/*
10855+ * Copyright (c) 2003-2007 Erez Zadok
10856+ * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
10857+ * Copyright (c) 2003-2007 Stony Brook University
10858+ * Copyright (c) 2003-2007 The Research Foundation of SUNY
10859+ *
10860+ * This program is free software; you can redistribute it and/or modify
10861+ * it under the terms of the GNU General Public License version 2 as
10862+ * published by the Free Software Foundation.
10863+ */
10864+
10865+#ifndef _LINUX_UNION_FS_H
10866+#define _LINUX_UNION_FS_H
10867+
10868+/*
10869+ * DEFINITIONS FOR USER AND KERNEL CODE:
10870+ */
10871+# define UNIONFS_IOCTL_INCGEN _IOR(0x15, 11, int)
10872+# define UNIONFS_IOCTL_QUERYFILE _IOR(0x15, 15, int)
10873+
10874+/* We don't support normal remount, but unionctl uses it. */
10875+# define UNIONFS_REMOUNT_MAGIC 0x4a5a4380
10876+
10877+#endif /* _LINUX_UNIONFS_H */
10878+
This page took 1.357405 seconds and 4 git commands to generate.