]> git.pld-linux.org Git - packages/kernel.git/blame - linux-2.6-suspend2.patch
- disbaled B43_PCMCIA and SSB_PCMCIAHOST, kills pcmcia dependency in ssb.ko
[packages/kernel.git] / linux-2.6-suspend2.patch
CommitLineData
4e97e4e9 1diff --git a/Documentation/power/tuxonice-internals.txt b/Documentation/power/tuxonice-internals.txt
2new file mode 100644
7f9d2ee0 3index 0000000..afba75a
4e97e4e9 4--- /dev/null
5+++ b/Documentation/power/tuxonice-internals.txt
6@@ -0,0 +1,469 @@
7f9d2ee0 7+ TuxOnIce 3.0 Internal Documentation.
8+ Updated to 11 March 2008
24613191 9+
10+1. Introduction.
11+
7f9d2ee0 12+ TuxOnIce 3.0 is an addition to the Linux Kernel, designed to
24613191 13+ allow the user to quickly shutdown and quickly boot a computer, without
14+ needing to close documents or programs. It is equivalent to the
15+ hibernate facility in some laptops. This implementation, however,
16+ requires no special BIOS or hardware support.
17+
18+ The code in these files is based upon the original implementation
19+ prepared by Gabor Kuti and additional work by Pavel Machek and a
20+ host of others. This code has been substantially reworked by Nigel
21+ Cunningham, again with the help and testing of many others, not the
22+ least of whom is Michael Frank. At its heart, however, the operation is
23+ essentially the same as Gabor's version.
24+
25+2. Overview of operation.
26+
27+ The basic sequence of operations is as follows:
28+
29+ a. Quiesce all other activity.
30+ b. Ensure enough memory and storage space are available, and attempt
31+ to free memory/storage if necessary.
32+ c. Allocate the required memory and storage space.
33+ d. Write the image.
34+ e. Power down.
35+
36+ There are a number of complicating factors which mean that things are
37+ not as simple as the above would imply, however...
38+
39+ o The activity of each process must be stopped at a point where it will
40+ not be holding locks necessary for saving the image, or unexpectedly
41+ restart operations due to something like a timeout and thereby make
42+ our image inconsistent.
43+
44+ o It is desirous that we sync outstanding I/O to disk before calculating
45+ image statistics. This reduces corruption if one should suspend but
46+ then not resume, and also makes later parts of the operation safer (see
47+ below).
48+
49+ o We need to get as close as we can to an atomic copy of the data.
50+ Inconsistencies in the image will result in inconsistent memory contents at
51+ resume time, and thus in instability of the system and/or file system
52+ corruption. This would appear to imply a maximum image size of one half of
53+ the amount of RAM, but we have a solution... (again, below).
54+
55+ o In 2.6, we choose to play nicely with the other suspend-to-disk
56+ implementations.
57+
58+3. Detailed description of internals.
59+
60+ a. Quiescing activity.
61+
4e97e4e9 62+ Safely quiescing the system is achieved using three separate but related
63+ aspects.
24613191 64+
65+ First, we note that the vast majority of processes don't need to run during
66+ suspend. They can be 'frozen'. We therefore implement a refrigerator
67+ routine, which processes enter and in which they remain until the cycle is
68+ complete. Processes enter the refrigerator via try_to_freeze() invocations
69+ at appropriate places. A process cannot be frozen in any old place. It
70+ must not be holding locks that will be needed for writing the image or
71+ freezing other processes. For this reason, userspace processes generally
72+ enter the refrigerator via the signal handling code, and kernel threads at
73+ the place in their event loops where they drop locks and yield to other
74+ processes or sleep.
75+
4e97e4e9 76+ The task of freezing processes is complicated by the fact that there can be
77+ interdependencies between processes. Freezing process A before process B may
78+ mean that process B cannot be frozen, because it stops at waiting for
79+ process A rather than in the refrigerator. This issue is seen where
80+ userspace waits on freezeable kernel threads or fuse filesystem threads. To
81+ address this issue, we implement the following algorithm for quiescing
82+ activity:
83+
84+ - Freeze filesystems (including fuse - userspace programs starting
85+ new requests are immediately frozen; programs already running
86+ requests complete their work before being frozen in the next
87+ step)
24613191 88+ - Freeze userspace
4e97e4e9 89+ - Thaw filesystems (this is safe now that userspace is frozen and no
90+ fuse requests are outstanding).
91+ - Invoke sys_sync (noop on fuse).
24613191 92+ - Freeze filesystems
93+ - Freeze kernel threads
94+
95+ If we need to free memory, we thaw kernel threads and filesystems, but not
96+ userspace. We can then free caches without worrying about deadlocks due to
97+ swap files being on frozen filesystems or such like.
98+
24613191 99+ b. Ensure enough memory & storage are available.
100+
101+ We have a number of constraints to meet in order to be able to successfully
102+ suspend and resume.
103+
104+ First, the image will be written in two parts, described below. One of these
105+ parts needs to have an atomic copy made, which of course implies a maximum
106+ size of one half of the amount of system memory. The other part ('pageset')
107+ is not atomically copied, and can therefore be as large or small as desired.
108+
109+ Second, we have constraints on the amount of storage available. In these
110+ calculations, we may also consider any compression that will be done. The
111+ cryptoapi module allows the user to configure an expected compression ratio.
112+
113+ Third, the user can specify an arbitrary limit on the image size, in
114+ megabytes. This limit is treated as a soft limit, so that we don't fail the
115+ attempt to suspend if we cannot meet this constraint.
116+
117+ c. Allocate the required memory and storage space.
118+
119+ Having done the initial freeze, we determine whether the above constraints
120+ are met, and seek to allocate the metadata for the image. If the constraints
121+ are not met, or we fail to allocate the required space for the metadata, we
122+ seek to free the amount of memory that we calculate is needed and try again.
123+ We allow up to four iterations of this loop before aborting the cycle. If we
7f9d2ee0 124+ do fail, it should only be because of a bug in TuxOnIce's calculations.
24613191 125+
126+ These steps are merged together in the prepare_image function, found in
127+ prepare_image.c. The functions are merged because of the cyclical nature
128+ of the problem of calculating how much memory and storage is needed. Since
129+ the data structures containing the information about the image must
130+ themselves take memory and use storage, the amount of memory and storage
131+ required changes as we prepare the image. Since the changes are not large,
132+ only one or two iterations will be required to achieve a solution.
133+
134+ The recursive nature of the algorithm is miminised by keeping user space
135+ frozen while preparing the image, and by the fact that our records of which
136+ pages are to be saved and which pageset they are saved in use bitmaps (so
137+ that changes in number or fragmentation of the pages to be saved don't
138+ feedback via changes in the amount of memory needed for metadata). The
139+ recursiveness is thus limited to any extra slab pages allocated to store the
140+ extents that record storage used, and he effects of seeking to free memory.
141+
142+ d. Write the image.
143+
144+ We previously mentioned the need to create an atomic copy of the data, and
145+ the half-of-memory limitation that is implied in this. This limitation is
146+ circumvented by dividing the memory to be saved into two parts, called
147+ pagesets.
148+
149+ Pageset2 contains the page cache - the pages on the active and inactive
4e97e4e9 150+ lists. These pages aren't needed or modifed while TuxOnIce is running, so
24613191 151+ they can be safely written without an atomic copy. They are therefore
4e97e4e9 152+ saved first and reloaded last. While saving these pages, TuxOnIce carefully
24613191 153+ ensures that the work of writing the pages doesn't make the image
154+ inconsistent.
155+
156+ Once pageset2 has been saved, we prepare to do the atomic copy of remaining
157+ memory. As part of the preparation, we power down drivers, thereby providing
158+ them with the opportunity to have their state recorded in the image. The
159+ amount of memory allocated by drivers for this is usually negligible, but if
160+ DRI is in use, video drivers may require significants amounts. Ideally we
161+ would be able to query drivers while preparing the image as to the amount of
162+ memory they will need. Unfortunately no such mechanism exists at the time of
4e97e4e9 163+ writing. For this reason, TuxOnIce allows the user to set an
24613191 164+ 'extra_pages_allowance', which is used to seek to ensure sufficient memory
4e97e4e9 165+ is available for drivers at this point. TuxOnIce also lets the user set this
24613191 166+ value to 0. In this case, a test driver suspend is done while preparing the
167+ image, and the difference (plus a margin) used instead.
168+
169+ Having suspended the drivers, we save the CPU context before making an
170+ atomic copy of pageset1, resuming the drivers and saving the atomic copy.
171+ After saving the two pagesets, we just need to save our metadata before
172+ powering down.
173+
174+ As we mentioned earlier, the contents of pageset2 pages aren't needed once
175+ they've been saved. We therefore use them as the destination of our atomic
176+ copy. In the unlikely event that pageset1 is larger, extra pages are
177+ allocated while the image is being prepared. This is normally only a real
178+ possibility when the system has just been booted and the page cache is
179+ small.
180+
181+ This is where we need to be careful about syncing, however. Pageset2 will
182+ probably contain filesystem meta data. If this is overwritten with pageset1
183+ and then a sync occurs, the filesystem will be corrupted - at least until
184+ resume time and another sync of the restored data. Since there is a
185+ possibility that the user might not resume or (may it never be!) that
186+ suspend might oops, we do our utmost to avoid syncing filesystems after
187+ copying pageset1.
188+
189+ e. Power down.
190+
4e97e4e9 191+ Powering down uses standard kernel routines. TuxOnIce supports powering down
24613191 192+ using the ACPI S3, S4 and S5 methods or the kernel's non-ACPI power-off.
193+ Supporting suspend to ram (S3) as a power off option might sound strange,
194+ but it allows the user to quickly get their system up and running again if
195+ the battery doesn't run out (we just need to re-read the overwritten pages)
196+ and if the battery does run out (or the user removes power), they can still
197+ resume.
198+
199+4. Data Structures.
200+
4e97e4e9 201+ TuxOnIce uses three main structures to store its metadata and configuration
24613191 202+ information:
203+
204+ a) Pageflags bitmaps.
205+
7f9d2ee0 206+ TuxOnIce records which pages will be in pageset1, pageset2, the destination
24613191 207+ of the atomic copy and the source of the atomically restored image using
208+ bitmaps. These bitmaps are created from order zero allocations to maximise
209+ reliability. The individual pages are combined together with pointers to
210+ form per-zone bitmaps, which are in turn combined with another layer of
211+ pointers to construct the overall bitmap.
212+
213+ The pageset1 bitmap is thus easily stored in the image header for use at
214+ resume time.
215+
216+ As mentioned above, using bitmaps also means that the amount of memory and
217+ storage required for recording the above information is constant. This
218+ greatly simplifies the work of preparing the image. In earlier versions of
4e97e4e9 219+ TuxOnIce, extents were used to record which pages would be stored. In that
24613191 220+ case, however, eating memory could result in greater fragmentation of the
221+ lists of pages, which in turn required more memory to store the extents and
222+ more storage in the image header. These could in turn require further
223+ freeing of memory, and another iteration. All of this complexity is removed
224+ by having bitmaps.
225+
4e97e4e9 226+ Bitmaps also make a lot of sense because TuxOnIce only ever iterates
24613191 227+ through the lists. There is therefore no cost to not being able to find the
228+ nth page in order 0 time. We only need to worry about the cost of finding
229+ the n+1th page, given the location of the nth page. Bitwise optimisations
230+ help here.
231+
232+ The data structure is: unsigned long ***.
233+
234+ b) Extents for block data.
235+
4e97e4e9 236+ TuxOnIce supports writing the image to multiple block devices. In the case
24613191 237+ of swap, multiple partitions and/or files may be in use, and we happily use
238+ them all. This is accomplished as follows:
239+
240+ Whatever the actual source of the allocated storage, the destination of the
241+ image can be viewed in terms of one or more block devices, and on each
242+ device, a list of sectors. To simplify matters, we only use contiguous,
243+ PAGE_SIZE aligned sectors, like the swap code does.
244+
245+ Since sector numbers on each bdev may well not start at 0, it makes much
246+ more sense to use extents here. Contiguous ranges of pages can thus be
247+ represented in the extents by contiguous values.
248+
249+ Variations in block size are taken account of in transforming this data
250+ into the parameters for bio submission.
251+
4e97e4e9 252+ We can thus implement a layer of abstraction wherein the core of TuxOnIce
24613191 253+ doesn't have to worry about which device we're currently writing to or
254+ where in the device we are. It simply requests that the next page in the
255+ pageset or header be written, leaving the details to this lower layer.
256+ The lower layer remembers where in the sequence of devices and blocks each
257+ pageset starts. The header always starts at the beginning of the allocated
258+ storage.
259+
260+ So extents are:
261+
262+ struct extent {
263+ unsigned long minimum, maximum;
264+ struct extent *next;
265+ }
266+
267+ These are combined into chains of extents for a device:
268+
269+ struct extent_chain {
270+ int size; /* size of the extent ie sum (max-min+1) */
271+ int allocs, frees;
272+ char *name;
273+ struct extent *first, *last_touched;
274+ };
275+
276+ For each bdev, we need to store a little more info:
277+
278+ struct suspend_bdev_info {
279+ struct block_device *bdev;
280+ dev_t dev_t;
281+ int bmap_shift;
282+ int blocks_per_page;
283+ };
284+
285+ The dev_t is used to identify the device in the stored image. As a result,
286+ we expect devices at resume time to have the same major and minor numbers
287+ as they had while suspending. This is primarily a concern where the user
288+ utilises LVM for storage, as they will need to dmsetup their partitions in
289+ such a way as to maintain this consistency at resume time.
290+
291+ bmap_shift and blocks_per_page record apply the effects of variations in
292+ blocks per page settings for the filesystem and underlying bdev. For most
293+ filesystems, these are the same, but for xfs, they can have independant
294+ values.
295+
296+ Combining these two structures together, we have everything we need to
297+ record what devices and what blocks on each device are being used to
298+ store the image, and to submit i/o using bio_submit.
299+
300+ The last elements in the picture are a means of recording how the storage
301+ is being used.
302+
303+ We do this first and foremost by implementing a layer of abstraction on
304+ top of the devices and extent chains which allows us to view however many
305+ devices there might be as one long storage tape, with a single 'head' that
306+ tracks a 'current position' on the tape:
307+
308+ struct extent_iterate_state {
309+ struct extent_chain *chains;
310+ int num_chains;
311+ int current_chain;
312+ struct extent *current_extent;
313+ unsigned long current_offset;
314+ };
315+
316+ That is, *chains points to an array of size num_chains of extent chains.
317+ For the filewriter, this is always a single chain. For the swapwriter, the
318+ array is of size MAX_SWAPFILES.
319+
320+ current_chain, current_extent and current_offset thus point to the current
321+ index in the chains array (and into a matching array of struct
322+ suspend_bdev_info), the current extent in that chain (to optimise access),
323+ and the current value in the offset.
324+
325+ The image is divided into three parts:
326+ - The header
327+ - Pageset 1
328+ - Pageset 2
329+
330+ The header always starts at the first device and first block. We know its
331+ size before we begin to save the image because we carefully account for
332+ everything that will be stored in it.
333+
334+ The second pageset (LRU) is stored first. It begins on the next page after
335+ the end of the header.
336+
337+ The first pageset is stored second. It's start location is only known once
338+ pageset2 has been saved, since pageset2 may be compressed as it is written.
339+ This location is thus recorded at the end of saving pageset2. It is page
340+ aligned also.
341+
342+ Since this information is needed at resume time, and the location of extents
343+ in memory will differ at resume time, this needs to be stored in a portable
344+ way:
345+
346+ struct extent_iterate_saved_state {
347+ int chain_num;
348+ int extent_num;
349+ unsigned long offset;
350+ };
351+
4e97e4e9 352+ We can thus implement a layer of abstraction wherein the core of TuxOnIce
24613191 353+ doesn't have to worry about which device we're currently writing to or
354+ where in the device we are. It simply requests that the next page in the
355+ pageset or header be written, leaving the details to this layer, and
356+ invokes the routines to remember and restore the position, without having
357+ to worry about the details of how the data is arranged on disk or such like.
358+
359+ c) Modules
360+
4e97e4e9 361+ One aim in designing TuxOnIce was to make it flexible. We wanted to allow
24613191 362+ for the implementation of different methods of transforming a page to be
363+ written to disk and different methods of getting the pages stored.
364+
365+ In early versions (the betas and perhaps Suspend1), compression support was
366+ inlined in the image writing code, and the data structures and code for
367+ managing swap were intertwined with the rest of the code. A number of people
368+ had expressed interest in implementing image encryption, and alternative
369+ methods of storing the image.
370+
4e97e4e9 371+ In order to achieve this, TuxOnIce was given a modular design.
24613191 372+
373+ A module is a single file which encapsulates the functionality needed
374+ to transform a pageset of data (encryption or compression, for example),
375+ or to write the pageset to a device. The former type of module is called
376+ a 'page-transformer', the later a 'writer'.
377+
378+ Modules are linked together in pipeline fashion. There may be zero or more
379+ page transformers in a pipeline, and there is always exactly one writer.
380+ The pipeline follows this pattern:
381+
382+ ---------------------------------
4e97e4e9 383+ | TuxOnIce Core |
24613191 384+ ---------------------------------
385+ |
386+ |
387+ ---------------------------------
388+ | Page transformer 1 |
389+ ---------------------------------
390+ |
391+ |
392+ ---------------------------------
393+ | Page transformer 2 |
394+ ---------------------------------
395+ |
396+ |
397+ ---------------------------------
398+ | Writer |
399+ ---------------------------------
400+
401+ During the writing of an image, the core code feeds pages one at a time
402+ to the first module. This module performs whatever transformations it
403+ implements on the incoming data, completely consuming the incoming data and
404+ feeding output in a similar manner to the next module. A module may buffer
405+ its output.
406+
407+ During reading, the pipeline works in the reverse direction. The core code
408+ calls the first module with the address of a buffer which should be filled.
409+ (Note that the buffer size is always PAGE_SIZE at this time). This module
410+ will in turn request data from the next module and so on down until the
411+ writer is made to read from the stored image.
412+
413+ Part of definition of the structure of a module thus looks like this:
414+
415+ int (*rw_init) (int rw, int stream_number);
416+ int (*rw_cleanup) (int rw);
417+ int (*write_chunk) (struct page *buffer_page);
418+ int (*read_chunk) (struct page *buffer_page, int sync);
419+
420+ It should be noted that the _cleanup routine may be called before the
421+ full stream of data has been read or written. While writing the image,
422+ the user may (depending upon settings) choose to abort suspending, and
423+ if we are in the midst of writing the last portion of the image, a portion
424+ of the second pageset may be reread. This may also happen if an error
425+ occurs and we seek to abort the process of writing the image.
426+
427+ The modular design is also useful in a number of other ways. It provides
428+ a means where by we can add support for:
429+
430+ - providing overall initialisation and cleanup routines;
431+ - serialising configuration information in the image header;
432+ - providing debugging information to the user;
433+ - determining memory and image storage requirements;
434+ - dis/enabling components at run-time;
435+ - configuring the module (see below);
436+
437+ ...and routines for writers specific to their work:
4e97e4e9 438+ - Parsing a resume= location;
24613191 439+ - Determining whether an image exists;
440+ - Marking a resume as having been attempted;
441+ - Invalidating an image;
442+
443+ Since some parts of the core - the user interface and storage manager
444+ support - have use for some of these functions, they are registered as
445+ 'miscellaneous' modules as well.
446+
447+ d) Sysfs data structures.
448+
4e97e4e9 449+ This brings us naturally to support for configuring TuxOnIce. We desired to
450+ provide a way to make TuxOnIce as flexible and configurable as possible.
24613191 451+ The user shouldn't have to reboot just because they want to now suspend to
452+ a file instead of a partition, for example.
453+
4e97e4e9 454+ To accomplish this, TuxOnIce implements a very generic means whereby the
455+ core and modules can register new sysfs entries. All TuxOnIce entries use
24613191 456+ a single _store and _show routine, both of which are found in sysfs.c in
457+ the kernel/power directory. These routines handle the most common operations
458+ - getting and setting the values of bits, integers, longs, unsigned longs
459+ and strings in one place, and allow overrides for customised get and set
460+ options as well as side-effect routines for all reads and writes.
461+
462+ When combined with some simple macros, a new sysfs entry can then be defined
463+ in just a couple of lines:
464+
4e97e4e9 465+ { TOI_ATTR("progress_granularity", SYSFS_RW),
24613191 466+ SYSFS_INT(&progress_granularity, 1, 2048)
467+ },
468+
469+ This defines a sysfs entry named "progress_granularity" which is rw and
470+ allows the user to access an integer stored at &progress_granularity, giving
471+ it a value between 1 and 2048 inclusive.
472+
4e97e4e9 473+ Sysfs entries are registered under /sys/power/tuxonice, and entries for
24613191 474+ modules are located in a subdirectory named after the module.
475+
4e97e4e9 476diff --git a/Documentation/power/tuxonice.txt b/Documentation/power/tuxonice.txt
477new file mode 100644
7f9d2ee0 478index 0000000..c6d0778
4e97e4e9 479--- /dev/null
480+++ b/Documentation/power/tuxonice.txt
7f9d2ee0 481@@ -0,0 +1,758 @@
482+ --- TuxOnIce, version 3.0 ---
24613191 483+
484+1. What is it?
485+2. Why would you want it?
486+3. What do you need to use it?
487+4. Why not just use the version already in the kernel?
488+5. How do you use it?
4e97e4e9 489+6. What do all those entries in /sys/power/tuxonice do?
24613191 490+7. How do you get support?
491+8. I think I've found a bug. What should I do?
492+9. When will XXX be supported?
493+10 How does it work?
4e97e4e9 494+11. Who wrote TuxOnIce?
24613191 495+
496+1. What is it?
497+
498+ Imagine you're sitting at your computer, working away. For some reason, you
499+ need to turn off your computer for a while - perhaps it's time to go home
500+ for the day. When you come back to your computer next, you're going to want
501+ to carry on where you left off. Now imagine that you could push a button and
502+ have your computer store the contents of its memory to disk and power down.
503+ Then, when you next start up your computer, it loads that image back into
504+ memory and you can carry on from where you were, just as if you'd never
7f9d2ee0 505+ turned the computer off. You have far less time to start up, no reopening of
506+ applications or finding what directory you put that file in yesterday.
4e97e4e9 507+ That's what TuxOnIce does.
24613191 508+
4e97e4e9 509+ TuxOnIce has a long heritage. It began life as work by Gabor Kuti, who,
24613191 510+ with some help from Pavel Machek, got an early version going in 1999. The
511+ project was then taken over by Florent Chabaud while still in alpha version
512+ numbers. Nigel Cunningham came on the scene when Florent was unable to
513+ continue, moving the project into betas, then 1.0, 2.0 and so on up to
4e97e4e9 514+ the present series. During the 2.0 series, the name was contracted to
515+ Suspend2 and the website suspend2.net created. Beginning around July 2007,
516+ a transition to calling the software TuxOnIce was made, to seek to help
517+ make it clear that TuxOnIce is more concerned with hibernation than suspend
518+ to ram.
519+
520+ Pavel Machek's swsusp code, which was merged around 2.5.17 retains the
521+ original name, and was essentially a fork of the beta code until Rafael
522+ Wysocki came on the scene in 2005 and began to improve it further.
24613191 523+
524+2. Why would you want it?
525+
526+ Why wouldn't you want it?
527+
528+ Being able to save the state of your system and quickly restore it improves
529+ your productivity - you get a useful system in far less time than through
530+ the normal boot process.
531+
532+3. What do you need to use it?
533+
534+ a. Kernel Support.
535+
4e97e4e9 536+ i) The TuxOnIce patch.
24613191 537+
4e97e4e9 538+ TuxOnIce is part of the Linux Kernel. This version is not part of Linus's
24613191 539+ 2.6 tree at the moment, so you will need to download the kernel source and
540+ apply the latest patch. Having done that, enable the appropriate options in
7f9d2ee0 541+ make [menu|x]config (under Power Management Options - look for "Enhanced
542+ Hibernation"), compile and install your kernel. TuxOnIce works with SMP,
543+ Highmem, preemption, fuse filesystems, x86-32, PPC and x86_64.
24613191 544+
4e97e4e9 545+ TuxOnIce patches are available from http://tuxonice.net.
24613191 546+
4e97e4e9 547+ ii) Compression support.
24613191 548+
4e97e4e9 549+ Compression support is implemented via the cryptoapi. You will therefore want
550+ to select any Cryptoapi transforms that you want to use on your image from
7f9d2ee0 551+ the Cryptoapi menu while configuring your kernel. Part of the TuxOnIce patch
552+ adds a new cryptoapi compression called LZF. We recommend the use of this
553+ compression method - it is very fast and still achieves good compression.
24613191 554+
4e97e4e9 555+ You can also tell TuxOnIce to write it's image to an encrypted and/or
24613191 556+ compressed filesystem/swap partition. In that case, you don't need to do
4e97e4e9 557+ anything special for TuxOnIce when it comes to kernel configuration.
24613191 558+
559+ iii) Configuring other options.
560+
561+ While you're configuring your kernel, try to configure as much as possible
562+ to build as modules. We recommend this because there are a number of drivers
563+ that are still in the process of implementing proper power management
564+ support. In those cases, the best way to work around their current lack is
7f9d2ee0 565+ to build them as modules and remove the modules while hibernating. You might
24613191 566+ also bug the driver authors to get their support up to speed, or even help!
567+
568+ b. Storage.
569+
570+ i) Swap.
571+
7f9d2ee0 572+ TuxOnIce can store the hibernation image in your swap partition, a swap file or
24613191 573+ a combination thereof. Whichever combination you choose, you will probably
574+ want to create enough swap space to store the largest image you could have,
575+ plus the space you'd normally use for swap. A good rule of thumb would be
4e97e4e9 576+ to calculate the amount of swap you'd want without using TuxOnIce, and then
24613191 577+ add the amount of memory you have. This swapspace can be arranged in any way
578+ you'd like. It can be in one partition or file, or spread over a number. The
7f9d2ee0 579+ only requirement is that they be active when you start a hibernation cycle.
24613191 580+
4e97e4e9 581+ There is one exception to this requirement. TuxOnIce has the ability to turn
7f9d2ee0 582+ on one swap file or partition at the start of hibernating and turn it back off
24613191 583+ at the end. If you want to ensure you have enough memory to store a image
584+ when your memory is fully used, you might want to make one swap partition or
4e97e4e9 585+ file for 'normal' use, and another for TuxOnIce to activate & deactivate
24613191 586+ automatically. (Further details below).
587+
588+ ii) Normal files.
589+
4e97e4e9 590+ TuxOnIce includes a 'file allocator'. The file allocator can store your
591+ image in a simple file. Since Linux has the concept of everything being a
592+ file, this is more powerful than it initially sounds. If, for example, you
7f9d2ee0 593+ were to set up a network block device file, you could hibernate to a network
4e97e4e9 594+ server. This has been tested and works to a point, but nbd itself isn't
595+ stateless enough for our purposes.
24613191 596+
4e97e4e9 597+ Take extra care when setting up the file allocator. If you just type
7f9d2ee0 598+ commands without thinking and then try to hibernate, you could cause
4e97e4e9 599+ irreversible corruption on your filesystems! Make sure you have backups.
24613191 600+
7f9d2ee0 601+ Most people will only want to hibernate to a local file. To achieve that, do
24613191 602+ something along the lines of:
603+
7f9d2ee0 604+ echo "TuxOnIce" > /hibernation-file
605+ dd if=/dev/zero bs=1M count=512 >> hibernation-file
24613191 606+
7f9d2ee0 607+ This will create a 512MB file called /hibernation-file. To get TuxOnIce to use
24613191 608+ it:
609+
7f9d2ee0 610+ echo /hibernation-file > /sys/power/tuxonice/file/target
24613191 611+
612+ Then
613+
4e97e4e9 614+ cat /sys/power/tuxonice/resume
24613191 615+
616+ Put the results of this into your bootloader's configuration (see also step
7f9d2ee0 617+ C, below):
24613191 618+
619+ ---EXAMPLE-ONLY-DON'T-COPY-AND-PASTE---
4e97e4e9 620+ # cat /sys/power/tuxonice/resume
24613191 621+ file:/dev/hda2:0x1e001
622+
623+ In this example, we would edit the append= line of our lilo.conf|menu.lst
624+ so that it included:
625+
4e97e4e9 626+ resume=file:/dev/hda2:0x1e001
24613191 627+ ---EXAMPLE-ONLY-DON'T-COPY-AND-PASTE---
628+
629+ For those who are thinking 'Could I make the file sparse?', the answer is
4e97e4e9 630+ 'No!'. At the moment, there is no way for TuxOnIce to fill in the holes in
7f9d2ee0 631+ a sparse file while hibernating. In the longer term (post merge!), I'd like
632+ to change things so that the file could be dynamically resized and have
633+ holes filled as needed. Right now, however, that's not possible and not a
634+ priority.
24613191 635+
636+ c. Bootloader configuration.
637+
4e97e4e9 638+ Using TuxOnIce also requires that you add an extra parameter to
24613191 639+ your lilo.conf or equivalent. Here's an example for a swap partition:
640+
4e97e4e9 641+ append="resume=swap:/dev/hda1"
24613191 642+
4e97e4e9 643+ This would tell TuxOnIce that /dev/hda1 is a swap partition you
644+ have. TuxOnIce will use the swap signature of this partition as a
7f9d2ee0 645+ pointer to your data when you hibernate. This means that (in this example)
24613191 646+ /dev/hda1 doesn't need to be _the_ swap partition where all of your data
647+ is actually stored. It just needs to be a swap partition that has a
648+ valid signature.
649+
4e97e4e9 650+ You don't need to have a swap partition for this purpose. TuxOnIce
24613191 651+ can also use a swap file, but usage is a little more complex. Having made
652+ your swap file, turn it on and do
653+
4e97e4e9 654+ cat /sys/power/tuxonice/swap/headerlocations
24613191 655+
4e97e4e9 656+ (this assumes you've already compiled your kernel with TuxOnIce
24613191 657+ support and booted it). The results of the cat command will tell you
658+ what you need to put in lilo.conf:
659+
4e97e4e9 660+ For swap partitions like /dev/hda1, simply use resume=/dev/hda1.
661+ For swapfile `swapfile`, use resume=swap:/dev/hda2:0x242d.
24613191 662+
663+ If the swapfile changes for any reason (it is moved to a different
664+ location, it is deleted and recreated, or the filesystem is
665+ defragmented) then you will have to check
4e97e4e9 666+ /sys/power/tuxonice/swap/headerlocations for a new resume_block value.
24613191 667+
668+ Once you've compiled and installed the kernel and adjusted your bootloader
669+ configuration, you should only need to reboot for the most basic part
4e97e4e9 670+ of TuxOnIce to be ready.
24613191 671+
4e97e4e9 672+ If you only compile in the swap allocator, or only compile in the file
673+ allocator, you don't need to add the "swap:" part of the resume=
7f9d2ee0 674+ parameters above. resume=/dev/hda2:0x242d will work just as well. If you
675+ have compiled both and your storage is on swap, you can also use this
676+ format (the swap allocator is the default allocator).
677+
678+ When compiling your kernel, one of the options in the 'Power Management
679+ Support' menu, just above the 'Enhanced Hibernation (TuxOnIce)' entry is
680+ called 'Default resume partition'. This can be used to set a default value
681+ for the resume= parameter.
24613191 682+
683+ d. The hibernate script.
684+
685+ Since the driver model in 2.6 kernels is still being developed, you may need
7f9d2ee0 686+ to do more than just configure TuxOnIce. Users of TuxOnIce usually start the
687+ process via a script which prepares for the hibernation cycle, tells the
688+ kernel to do its stuff and then restore things afterwards. This script might
689+ involve:
24613191 690+
691+ - Switching to a text console and back if X doesn't like the video card
692+ status on resume.
7f9d2ee0 693+ - Un/reloading drivers that don't play well with hibernation.
24613191 694+
695+ Note that you might not be able to unload some drivers if there are
696+ processes using them. You might have to kill off processes that hold
697+ devices open. Hint: if your X server accesses an USB mouse, doing a
698+ 'chvt' to a text console releases the device and you can unload the
699+ module.
700+
4e97e4e9 701+ Check out the latest script (available on tuxonice.net).
7f9d2ee0 702+
703+ e. The userspace user interface.
704+
705+ TuxOnIce has very limited support for displaying status if you only apply
706+ the kernel patch - it can printk messages, but that is all. In addition,
707+ some of the functions mentioned in this document (such as cancelling a cycle
708+ or performing interactive debugging) are unavailable. To utilise these
709+ functions, or simply get a nice display, you need the 'userui' component.
710+ Userui comes in three flavours, usplash, fbsplash and text. Text should
711+ work on any console. Usplash and fbsplash require the appropriate
712+ (distro specific?) support.
713+
714+ To utilise a userui, TuxOnIce just needs to be told where to find the
715+ userspace binary:
716+
717+ echo "/usr/local/sbin/tuxoniceui_fbsplash" > /sys/power/tuxonice/user_interface/program
718+
719+ The hibernate script can do this for you, and a default value for this
720+ setting can be configured when compiling the kernel. This path is also
721+ stored in the image header, so if you have an initrd or initramfs, you can
722+ use the userui during the first part of resuming (prior to the atomic
723+ restore) by putting the binary in the same path in your initrd/ramfs.
724+ Alternatively, you can put it in a different location and do an echo
725+ similar to the above prior to the echo > do_resume. The value saved in the
726+ image header will then be ignored.
727+
24613191 728+4. Why not just use the version already in the kernel?
729+
7f9d2ee0 730+ The version in the vanilla kernel has a number of drawbacks. The most
731+ serious of these are:
24613191 732+ - it has a maximum image size of 1/2 total memory.
733+ - it doesn't allocate storage until after it has snapshotted memory.
7f9d2ee0 734+ This means that you can't be sure hibernating will work until you
24613191 735+ see it start to write the image.
736+ - it performs all of it's I/O synchronously.
737+ - it does not allow you to press escape to cancel a cycle
738+ - it does not allow you to automatically swapon a file when
739+ starting a cycle.
740+ - it does not allow you to use multiple swap partitions.
741+ - it does not allow you to use swapfiles.
742+ - it does not allow you to use ordinary files.
743+ - it just invalidates an image and continues to boot if you
7f9d2ee0 744+ accidentally boot the wrong kernel after hibernating.
745+ - it doesn't support any sort of nice display while hibernating
24613191 746+ - it is moving toward requiring that you have an initrd/initramfs
747+ to ever have a hope of resuming (uswsusp). While uswsusp will
7f9d2ee0 748+ address some of the concerns above, it won't address all of them,
749+ and will be more complicated to get set up.
24613191 750+
751+5. How do you use it?
752+
7f9d2ee0 753+ A hibernation cycle can be started directly by doing:
24613191 754+
ad8f4a28 755+ echo > /sys/power/tuxonice/do_hibernate
24613191 756+
757+ In practice, though, you'll probably want to use the hibernate script
758+ to unload modules, configure the kernel the way you like it and so on.
759+ In that case, you'd do (as root):
760+
761+ hibernate
762+
763+ See the hibernate script's man page for more details on the options it
764+ takes.
765+
7f9d2ee0 766+ If you're using the text or splash user interface modules, one feature of
767+ TuxOnIce that you might find useful is that you can press Escape at any time
768+ during hibernating, and the process will be aborted.
769+
770+ Due to the way hibernation works, this means you'll have your system back and
24613191 771+ perfectly usable almost instantly. The only exception is when it's at the
7f9d2ee0 772+ very end of writing the image. Then it will need to reload a small (usually
773+ 4-50MBs, depending upon the image characteristics) portion first.
24613191 774+
7f9d2ee0 775+ Likewise, when resuming, you can press escape and resuming will be aborted.
776+ The computer will then powerdown again according to settings at that time for
777+ the powerdown method or rebooting.
778+
779+ You can change the settings for powering down while the image is being
780+ written by pressing 'R' to toggle rebooting and 'O' to toggle between
781+ suspending to ram and powering down completely).
782+
4e97e4e9 783+ If you run into problems with resuming, adding the "noresume" option to
24613191 784+ the kernel command line will let you skip the resume step and recover your
7f9d2ee0 785+ system. This option shouldn't normally be needed, because TuxOnIce modifies
786+ the image header prior to the atomic restore, and will thus prompt you
787+ if it detects that you've tried to resume an image before (this flag is
788+ removed if you press Escape to cancel a resume, so you won't be prompted
789+ then).
790+
791+ Recent kernels (2.6.24 onwards) add support for resuming from a different
792+ kernel to the one that was hibernated (thanks to Rafael for his work on
793+ this - I've just embraced and enhanced the support for TuxOnIce). This
794+ should further reduce the need for you to use the noresume option.
24613191 795+
4e97e4e9 796+6. What do all those entries in /sys/power/tuxonice do?
24613191 797+
4e97e4e9 798+ /sys/power/tuxonice is the directory which contains files you can use to
799+ tune and configure TuxOnIce to your liking. The exact contents of
800+ the directory will depend upon the version of TuxOnIce you're
24613191 801+ running and the options you selected at compile time. In the following
802+ descriptions, names in brackets refer to compile time options.
7f9d2ee0 803+ (Note that they're all dependant upon you having selected CONFIG_TUXONICE
24613191 804+ in the first place!).
805+
7f9d2ee0 806+ Since the values of these settings can open potential security risks, the
807+ writeable ones are accessible only to the root user. You may want to
808+ configure sudo to allow you to invoke your hibernate script as an ordinary
809+ user.
24613191 810+
811+ - checksum/enabled
812+
813+ Use cryptoapi hashing routines to verify that Pageset2 pages don't change
814+ while we're saving the first part of the image, and to get any pages that
815+ do change resaved in the atomic copy. This should normally not be needed,
816+ but if you're seeing issues, please enable this. If your issues stop you
7f9d2ee0 817+ being able to resume, enable this option, hibernate and cancel the cycle
24613191 818+ after the atomic copy is done. If the debugging info shows a non-zero
819+ number of pages resaved, please report this to Nigel.
820+
821+ - compression/algorithm
822+
823+ Set the cryptoapi algorithm used for compressing the image.
824+
825+ - compression/expected_compression
826+
7f9d2ee0 827+ These values allow you to set an expected compression ratio, which TuxOnice
828+ will use in calculating whether it meets constraints on the image size. If
829+ this expected compression ratio is not attained, the hibernation cycle will
24613191 830+ abort, so it is wise to allow some spare. You can see what compression
7f9d2ee0 831+ ratio is achieved in the logs after hibernating.
24613191 832+
833+ - debug_info:
834+
835+ This file returns information about your configuration that may be helpful
7f9d2ee0 836+ in diagnosing problems with hibernating.
24613191 837+
7f9d2ee0 838+ - do_hibernate:
24613191 839+
4e97e4e9 840+ When anything is written to this file, the kernel side of TuxOnIce will
24613191 841+ begin to attempt to write an image to disk and power down. You'll normally
842+ want to run the hibernate script instead, to get modules unloaded first.
843+
7f9d2ee0 844+ - do_resume:
24613191 845+
7f9d2ee0 846+ When anything is written to this file TuxOnIce will attempt to read and
847+ restore an image. If there is no image, it will return almost immediately.
848+ If an image exists, the echo > will never return. Instead, the original
849+ kernel context will be restored and the original echo > do_hibernate will
850+ return.
24613191 851+
852+ - */enabled
853+
7f9d2ee0 854+ These option can be used to temporarily disable various parts of TuxOnIce.
24613191 855+
24613191 856+ - extra_pages_allowance
857+
4e97e4e9 858+ When TuxOnIce does its atomic copy, it calls the driver model suspend
24613191 859+ and resume methods. If you have DRI enabled with a driver such as fglrx,
860+ this can result in the driver allocating a substantial amount of memory
7f9d2ee0 861+ for storing its state. Extra_pages_allowance tells TuxOnIce how much
24613191 862+ extra memory it should ensure is available for those allocations. If
7f9d2ee0 863+ your attempts at hibernating end with a message in dmesg indicating that
24613191 864+ insufficient extra pages were allowed, you need to increase this value.
865+
4e97e4e9 866+ - file/target:
24613191 867+
7f9d2ee0 868+ Read this value to get the current setting. Write to it to point TuxOnice
869+ at a new storage location for the file allocator. See section 3.b.ii above
870+ for details of how to set up the file allocator.
24613191 871+
872+ - freezer_test
873+
7f9d2ee0 874+ This entry can be used to get TuxOnIce to just test the freezer and prepare
875+ an image without actually doing a hibernation cycle. It is useful for
876+ diagnosing freezing and image preparation issues.
24613191 877+
878+ - image_exists:
879+
880+ Can be used in a script to determine whether a valid image exists at the
4e97e4e9 881+ location currently pointed to by resume=. Returns up to three lines.
24613191 882+ The first is whether an image exists (-1 for unsure, otherwise 0 or 1).
883+ If an image eixsts, additional lines will return the machine and version.
884+ Echoing anything to this entry removes any current image.
885+
886+ - image_size_limit:
887+
7f9d2ee0 888+ The maximum size of hibernation image written to disk, measured in megabytes
24613191 889+ (1024*1024).
890+
891+ - interface_version:
892+
893+ The value returned by this file can be used by scripts and configuration
894+ tools to determine what entries should be looked for. The value is
4e97e4e9 895+ incremented whenever an entry in /sys/power/tuxonice is obsoleted or
24613191 896+ added.
897+
898+ - last_result:
899+
7f9d2ee0 900+ The result of the last hibernation cycle, as defined in
24613191 901+ include/linux/suspend-debug.h with the values SUSPEND_ABORTED to
902+ SUSPEND_KEPT_IMAGE. This is a bitmask.
903+
904+ - log_everything (CONFIG_PM_DEBUG):
905+
906+ Setting this option results in all messages printed being logged. Normally,
907+ only a subset are logged, so as to not slow the process and not clutter the
908+ logs. Useful for debugging. It can be toggled during a cycle by pressing
909+ 'L'.
910+
911+ - pause_between_steps (CONFIG_PM_DEBUG):
912+
4e97e4e9 913+ This option is used during debugging, to make TuxOnIce pause between
24613191 914+ each step of the process. It is ignored when the nice display is on.
915+
916+ - powerdown_method:
917+
4e97e4e9 918+ Used to select a method by which TuxOnIce should powerdown after writing the
24613191 919+ image. Currently:
920+
921+ 0: Don't use ACPI to power off.
922+ 3: Attempt to enter Suspend-to-ram.
923+ 4: Attempt to enter ACPI S4 mode.
924+ 5: Attempt to power down via ACPI S5 mode.
925+
926+ Note that these options are highly dependant upon your hardware & software:
927+
7f9d2ee0 928+ 3: When succesful, your machine suspends to ram instead of powering off.
24613191 929+ The advantage of using this mode is that it doesn't matter whether your
930+ battery has enough charge to make it through to your next resume. If it
931+ lasts, you will simply resume from suspend to ram (and the image on disk
932+ will be discarded). If the battery runs out, you will resume from disk
933+ instead. The disadvantage is that it takes longer than a normal
934+ suspend-to-ram to enter the state, since the suspend-to-disk image needs
935+ to be written first.
936+ 4/5: When successful, your machine will be off and comsume (almost) no power.
937+ But it might still react to some external events like opening the lid or
938+ trafic on a network or usb device. For the bios, resume is then the same
939+ as warm boot, similar to a situation where you used the command `reboot'
940+ to reboot your machine. If your machine has problems on warm boot or if
941+ you want to protect your machine with the bios password, this is probably
942+ not the right choice. Mode 4 may be necessary on some machines where ACPI
943+ wake up methods need to be run to properly reinitialise hardware after a
7f9d2ee0 944+ hibernation cycle.
24613191 945+ 0: Switch the machine completely off. The only possible wakeup is the power
946+ button. For the bios, resume is then the same as a cold boot, in
947+ particular you would have to provide your bios boot password if your
948+ machine uses that feature for booting.
949+
950+ - progressbar_granularity_limit:
951+
952+ This option can be used to limit the granularity of the progress bar
953+ displayed with a bootsplash screen. The value is the maximum number of
954+ steps. That is, 10 will make the progress bar jump in 10% increments.
955+
956+ - reboot:
957+
4e97e4e9 958+ This option causes TuxOnIce to reboot rather than powering down
24613191 959+ at the end of saving an image. It can be toggled during a cycle by pressing
960+ 'R'.
961+
962+ - resume_commandline:
963+
964+ This entry can be read after resuming to see the commandline that was used
965+ when resuming began. You might use this to set up two bootloader entries
966+ that are the same apart from the fact that one includes a extra append=
967+ argument "at_work=1". You could then grep resume_commandline in your
968+ post-resume scripts and configure networking (for example) differently
969+ depending upon whether you're at home or work. resume_commandline can be
970+ set to arbitrary text if you wish to remove sensitive contents.
971+
4e97e4e9 972+ - swap/swapfilename:
24613191 973+
974+ This entry is used to specify the swapfile or partition that
4e97e4e9 975+ TuxOnIce will attempt to swapon/swapoff automatically. Thus, if
24613191 976+ I normally use /dev/hda1 for swap, and want to use /dev/hda2 for specifically
7f9d2ee0 977+ for my hibernation image, I would
24613191 978+
4e97e4e9 979+ echo /dev/hda2 > /sys/power/tuxonice/swap/swapfile
24613191 980+
981+ /dev/hda2 would then be automatically swapon'd and swapoff'd. Note that the
982+ swapon and swapoff occur while other processes are frozen (including kswapd)
983+ so this swap file will not be used up when attempting to free memory. The
984+ parition/file is also given the highest priority, so other swapfiles/partitions
985+ will only be used to save the image when this one is filled.
986+
987+ The value of this file is used by headerlocations along with any currently
988+ activated swapfiles/partitions.
989+
4e97e4e9 990+ - swap/headerlocations:
24613191 991+
4e97e4e9 992+ This option tells you the resume= options to use for swap devices you
24613191 993+ currently have activated. It is particularly useful when you only want to
994+ use a swap file to store your image. See above for further details.
995+
996+ - toggle_process_nofreeze
997+
998+ This entry can be used to toggle the NOFREEZE flag on a process, to allow it
7f9d2ee0 999+ to run during hibernating. It should be used with extreme caution. There are
1000+ strict limitations on what a process running during hibernation can do. This
1001+ is really only intended for use by TuxOnice's helpers (userui in particular).
24613191 1002+
1003+ - userui_program
1004+
7f9d2ee0 1005+ This entry is used to tell TuxOnice what userspace program to use for
1006+ providing a user interface while hibernating. The program uses a netlink
24613191 1007+ socket to pass messages back and forward to the kernel, allowing all of the
1008+ functions formerly implemented in the kernel user interface components.
1009+
1010+ - user_interface/debug_sections (CONFIG_PM_DEBUG):
1011+
1012+ This value, together with the console log level, controls what debugging
1013+ information is displayed. The console log level determines the level of
1014+ detail, and this value determines what detail is displayed. This value is
1015+ a bit vector, and the meaning of the bits can be found in the kernel tree
4e97e4e9 1016+ in include/linux/tuxonice.h. It can be overridden using the kernel's
24613191 1017+ command line option suspend_dbg.
1018+
1019+ - user_interface/default_console_level (CONFIG_PM_DEBUG):
1020+
1021+ This determines the value of the console log level at the start of a
7f9d2ee0 1022+ hibernation cycle. If debugging is compiled in, the console log level can be
24613191 1023+ changed during a cycle by pressing the digit keys. Meanings are:
1024+
1025+ 0: Nice display.
1026+ 1: Nice display plus numerical progress.
1027+ 2: Errors only.
1028+ 3: Low level debugging info.
1029+ 4: Medium level debugging info.
1030+ 5: High level debugging info.
1031+ 6: Verbose debugging info.
1032+
1033+ - user_interface/enable_escape:
1034+
7f9d2ee0 1035+ Setting this to "1" will enable you abort a hibernation cycle or resuming by
24613191 1036+ pressing escape, "0" (default) disables this feature. Note that enabling
7f9d2ee0 1037+ this option means that you cannot initiate a hibernation cycle and then walk
1038+away
24613191 1039+ from your computer, expecting it to be secure. With feature disabled,
7f9d2ee0 1040+ you can validly have this expectation once TuxOnice begins to write the
1041+ image to disk. (Prior to this point, it is possible that TuxOnice might
24613191 1042+ about because of failure to freeze all processes or because constraints
1043+ on its ability to save the image are not met).
1044+
1045+ - version:
1046+
7f9d2ee0 1047+ The version of TuxOnIce you have compiled into the currently running kernel.
24613191 1048+
1049+7. How do you get support?
1050+
4e97e4e9 1051+ Glad you asked. TuxOnIce is being actively maintained and supported
24613191 1052+ by Nigel (the guy doing most of the kernel coding at the moment), Bernard
1053+ (who maintains the hibernate script and userspace user interface components)
1054+ and its users.
1055+
1056+ Resources availble include HowTos, FAQs and a Wiki, all available via
4e97e4e9 1057+ tuxonice.net. You can find the mailing lists there.
24613191 1058+
1059+8. I think I've found a bug. What should I do?
1060+
4e97e4e9 1061+ By far and a way, the most common problems people have with TuxOnIce
24613191 1062+ related to drivers not having adequate power management support. In this
4e97e4e9 1063+ case, it is not a bug with TuxOnIce, but we can still help you. As we
24613191 1064+ mentioned above, such issues can usually be worked around by building the
7f9d2ee0 1065+ functionality as modules and unloading them while hibernating. Please visit
24613191 1066+ the Wiki for up-to-date lists of known issues and work arounds.
1067+
1068+ If this information doesn't help, try running:
1069+
1070+ hibernate --bug-report
1071+
1072+ ..and sending the output to the users mailing list.
1073+
1074+ Good information on how to provide us with useful information from an
1075+ oops is found in the file REPORTING-BUGS, in the top level directory
1076+ of the kernel tree. If you get an oops, please especially note the
1077+ information about running what is printed on the screen through ksymoops.
1078+ The raw information is useless.
1079+
1080+9. When will XXX be supported?
1081+
4e97e4e9 1082+ If there's a feature missing from TuxOnIce that you'd like, feel free to
24613191 1083+ ask. We try to be obliging, within reason.
1084+
1085+ Patches are welcome. Please send to the list.
1086+
1087+10. How does it work?
1088+
4e97e4e9 1089+ TuxOnIce does its work in a number of steps.
24613191 1090+
1091+ a. Freezing system activity.
1092+
7f9d2ee0 1093+ The first main stage in hibernating is to stop all other activity. This is
24613191 1094+ achieved in stages. Processes are considered in fours groups, which we will
1095+ describe in reverse order for clarity's sake: Threads with the PF_NOFREEZE
1096+ flag, kernel threads without this flag, userspace processes with the
1097+ PF_SYNCTHREAD flag and all other processes. The first set (PF_NOFREEZE) are
7f9d2ee0 1098+ untouched by the refrigerator code. They are allowed to run during hibernating
24613191 1099+ and resuming, and are used to support user interaction, storage access or the
7f9d2ee0 1100+ like. Other kernel threads (those unneeded while hibernating) are frozen last.
24613191 1101+ This leaves us with userspace processes that need to be frozen. When a
1102+ process enters one of the *_sync system calls, we set a PF_SYNCTHREAD flag on
1103+ that process for the duration of that call. Processes that have this flag are
1104+ frozen after processes without it, so that we can seek to ensure that dirty
1105+ data is synced to disk as quickly as possible in a situation where other
1106+ processes may be submitting writes at the same time. Freezing the processes
1107+ that are submitting data stops new I/O from being submitted. Syncthreads can
1108+ then cleanly finish their work. So the order is:
1109+
1110+ - Userspace processes without PF_SYNCTHREAD or PF_NOFREEZE;
1111+ - Userspace processes with PF_SYNCTHREAD (they won't have NOFREEZE);
1112+ - Kernel processes without PF_NOFREEZE.
1113+
1114+ b. Eating memory.
1115+
7f9d2ee0 1116+ For a successful hibernation cycle, you need to have enough disk space to store the
4e97e4e9 1117+ image and enough memory for the various limitations of TuxOnIce's
24613191 1118+ algorithm. You can also specify a maximum image size. In order to attain
4e97e4e9 1119+ to those constraints, TuxOnIce may 'eat' memory. If, after freezing
1120+ processes, the constraints aren't met, TuxOnIce will thaw all the
24613191 1121+ other processes and begin to eat memory until its calculations indicate
1122+ the constraints are met. It will then freeze processes again and recheck
1123+ its calculations.
1124+
1125+ c. Allocation of storage.
1126+
4e97e4e9 1127+ Next, TuxOnIce allocates the storage that will be used to save
24613191 1128+ the image.
1129+
4e97e4e9 1130+ The core of TuxOnIce knows nothing about how or where pages are stored. We
1131+ therefore request the active allocator (remember you might have compiled in
24613191 1132+ more than one!) to allocate enough storage for our expect image size. If
1133+ this request cannot be fulfilled, we eat more memory and try again. If it
1134+ is fulfiled, we seek to allocate additional storage, just in case our
1135+ expected compression ratio (if any) isn't achieved. This time, however, we
1136+ just continue if we can't allocate enough storage.
1137+
4e97e4e9 1138+ If these calls to our allocator change the characteristics of the image
1139+ such that we haven't allocated enough memory, we also loop. (The allocator
1140+ may well need to allocate space for its storage information).
24613191 1141+
1142+ d. Write the first part of the image.
1143+
4e97e4e9 1144+ TuxOnIce stores the image in two sets of pages called 'pagesets'.
24613191 1145+ Pageset 2 contains pages on the active and inactive lists; essentially
1146+ the page cache. Pageset 1 contains all other pages, including the kernel.
1147+ We use two pagesets for one important reason: We need to make an atomic copy
1148+ of the kernel to ensure consistency of the image. Without a second pageset,
1149+ that would limit us to an image that was at most half the amount of memory
1150+ available. Using two pagesets allows us to store a full image. Since pageset
1151+ 2 pages won't be needed in saving pageset 1, we first save pageset 2 pages.
1152+ We can then make our atomic copy of the remaining pages using both pageset 2
1153+ pages and any other pages that are free. While saving both pagesets, we are
1154+ careful not to corrupt the image. Among other things, we use lowlevel block
1155+ I/O routines that don't change the pagecache contents.
1156+
1157+ The next step, then, is writing pageset 2.
1158+
1159+ e. Suspending drivers and storing processor context.
1160+
4e97e4e9 1161+ Having written pageset2, TuxOnIce calls the power management functions to
7f9d2ee0 1162+ notify drivers of the hibernation, and saves the processor state in preparation
24613191 1163+ for the atomic copy of memory we are about to make.
1164+
1165+ f. Atomic copy.
1166+
4e97e4e9 1167+ At this stage, everything else but the TuxOnIce code is halted. Processes
24613191 1168+ are frozen or idling, drivers are quiesced and have stored (ideally and where
1169+ necessary) their configuration in memory we are about to atomically copy.
1170+ In our lowlevel architecture specific code, we have saved the CPU state.
1171+ We can therefore now do our atomic copy before resuming drivers etc.
1172+
1173+ g. Save the atomic copy (pageset 1).
1174+
7f9d2ee0 1175+ TuxOnice can then write the atomic copy of the remaining pages. Since we
24613191 1176+ have copied the pages into other locations, we can continue to use the
1177+ normal block I/O routines without fear of corruption our image.
1178+
7f9d2ee0 1179+ f. Save the image header.
24613191 1180+
1181+ Nearly there! We save our settings and other parameters needed for
7f9d2ee0 1182+ reloading pageset 1 in an 'image header'. We also tell our allocator to
24613191 1183+ serialise its data at this stage, so that it can reread the image at resume
4e97e4e9 1184+ time.
24613191 1185+
1186+ g. Set the image header.
1187+
4e97e4e9 1188+ Finally, we edit the header at our resume= location. The signature is
1189+ changed by the allocator to reflect the fact that an image exists, and to
1190+ point to the start of that data if necessary (swap allocator).
24613191 1191+
1192+ h. Power down.
1193+
1194+ Or reboot if we're debugging and the appropriate option is selected.
1195+
1196+ Whew!
1197+
1198+ Reloading the image.
1199+ --------------------
1200+
1201+ Reloading the image is essentially the reverse of all the above. We load
1202+ our copy of pageset 1, being careful to choose locations that aren't going
1203+ to be overwritten as we copy it back (We start very early in the boot
1204+ process, so there are no other processes to quiesce here). We then copy
1205+ pageset 1 back to its original location in memory and restore the process
1206+ context. We are now running with the original kernel. Next, we reload the
4e97e4e9 1207+ pageset 2 pages, free the memory and swap used by TuxOnIce, restore
24613191 1208+ the pageset header and restart processes. Sounds easy in comparison to
7f9d2ee0 1209+ hibernating, doesn't it!
24613191 1210+
4e97e4e9 1211+ There is of course more to TuxOnIce than this, but this explanation
24613191 1212+ should be a good start. If there's interest, I'll write further
1213+ documentation on range pages and the low level I/O.
1214+
4e97e4e9 1215+11. Who wrote TuxOnIce?
24613191 1216+
1217+ (Answer based on the writings of Florent Chabaud, credits in files and
1218+ Nigel's limited knowledge; apologies to anyone missed out!)
1219+
4e97e4e9 1220+ The main developers of TuxOnIce have been...
24613191 1221+
1222+ Gabor Kuti
1223+ Pavel Machek
1224+ Florent Chabaud
1225+ Bernard Blackham
1226+ Nigel Cunningham
1227+
4e97e4e9 1228+ Significant portions of swsusp, the code in the vanilla kernel which
1229+ TuxOnIce enhances, have been worked on by Rafael Wysocki. Thanks should
1230+ also be expressed to him.
1231+
1232+ The above mentioned developers have been aided in their efforts by a host
1233+ of hundreds, if not thousands of testers and people who have submitted bug
1234+ fixes & suggestions. Of special note are the efforts of Michael Frank, who
7f9d2ee0 1235+ had his computers repetitively hibernate and resume for literally tens of
4e97e4e9 1236+ thousands of cycles and developed scripts to stress the system and test
1237+ TuxOnIce far beyond the point most of us (Nigel included!) would consider
1238+ testing. His efforts have contributed as much to TuxOnIce as any of the
1239+ names above.
1240diff --git a/MAINTAINERS b/MAINTAINERS
7f9d2ee0 1241index e467758..1921b3a 100644
4e97e4e9 1242--- a/MAINTAINERS
1243+++ b/MAINTAINERS
7f9d2ee0 1244@@ -3935,6 +3935,13 @@ P: Maciej W. Rozycki
4e97e4e9 1245 M: macro@linux-mips.org
24613191 1246 S: Maintained
1247
4e97e4e9 1248+TUXONICE (ENHANCED HIBERNATION)
24613191 1249+P: Nigel Cunningham
4e97e4e9 1250+M: nigel@tuxonice.net
1251+L: suspend2-devel@tuxonice.net
1252+W: http://tuxonice.net
24613191 1253+S: Maintained
1254+
4e97e4e9 1255 U14-34F SCSI DRIVER
1256 P: Dario Ballabio
1257 M: ballabio_dario@emc.com
7f9d2ee0 1258diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
1259index ec08d83..c417de4 100644
1260--- a/arch/x86/mm/fault.c
1261+++ b/arch/x86/mm/fault.c
43540741 1262@@ -25,6 +25,7 @@
24613191 1263 #include <linux/kprobes.h>
1264 #include <linux/uaccess.h>
43540741 1265 #include <linux/kdebug.h>
24613191 1266+#include <linux/suspend.h>
1267
1268 #include <asm/system.h>
7f9d2ee0 1269 #include <asm/desc.h>
1270@@ -49,6 +50,11 @@
1271 #define PF_RSVD (1<<3)
1272 #define PF_INSTR (1<<4)
24613191 1273
7f9d2ee0 1274+#ifdef CONFIG_X86_32
ad8f4a28 1275+int toi_faulted;
7f9d2ee0 1276+EXPORT_SYMBOL_GPL(toi_faulted);
1277+#endif
24613191 1278+
ad8f4a28 1279 static inline int notify_page_fault(struct pt_regs *regs)
24613191 1280 {
7f9d2ee0 1281 #ifdef CONFIG_KPROBES
1282@@ -599,6 +605,22 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
24613191 1283
1284 si_code = SEGV_MAPERR;
1285
4e97e4e9 1286+ /* During a TuxOnIce atomic copy, with DEBUG_SLAB, we will
24613191 1287+ * get page faults where slab has been unmapped. Map them
4e97e4e9 1288+ * temporarily and set the variable that tells TuxOnIce to
24613191 1289+ * unmap afterwards.
1290+ */
1291+
7f9d2ee0 1292+#ifdef CONFIG_DEBUG_PAGEALLOC /* X86_32 only */
4e97e4e9 1293+ if (unlikely(toi_running && !toi_faulted)) {
24613191 1294+ struct page *page = NULL;
4e97e4e9 1295+ toi_faulted = 1;
24613191 1296+ page = virt_to_page(address);
1297+ kernel_map_pages(page, 1, 1);
1298+ return;
1299+ }
4e97e4e9 1300+#endif
24613191 1301+
7f9d2ee0 1302 if (notify_page_fault(regs))
1303 return;
24613191 1304
4e97e4e9 1305diff --git a/crypto/Kconfig b/crypto/Kconfig
7f9d2ee0 1306index 69f1be6..21c95e0 100644
4e97e4e9 1307--- a/crypto/Kconfig
1308+++ b/crypto/Kconfig
7f9d2ee0 1309@@ -529,6 +529,14 @@ config CRYPTO_DEFLATE
24613191 1310
1311 You will most probably want this if using IPSec.
1312
1313+config CRYPTO_LZF
1314+ tristate "LZF compression algorithm"
1315+ default y
1316+ select CRYPTO_ALGAPI
1317+ help
4e97e4e9 1318+ This is the LZF algorithm. It is especially useful for TuxOnIce,
24613191 1319+ because it achieves good compression quickly.
1320+
1321 config CRYPTO_MICHAEL_MIC
1322 tristate "Michael MIC keyed digest algorithm"
1323 select CRYPTO_ALGAPI
4e97e4e9 1324diff --git a/crypto/Makefile b/crypto/Makefile
7f9d2ee0 1325index 7cf3625..af17245 100644
4e97e4e9 1326--- a/crypto/Makefile
1327+++ b/crypto/Makefile
7f9d2ee0 1328@@ -60,6 +60,7 @@ obj-$(CONFIG_CRYPTO_SALSA20) += salsa20_generic.o
24613191 1329 obj-$(CONFIG_CRYPTO_DEFLATE) += deflate.o
1330 obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o
1331 obj-$(CONFIG_CRYPTO_CRC32C) += crc32c.o
1332+obj-$(CONFIG_CRYPTO_LZF) += lzf.o
ad8f4a28 1333 obj-$(CONFIG_CRYPTO_AUTHENC) += authenc.o
7f9d2ee0 1334 obj-$(CONFIG_CRYPTO_LZO) += lzo.o
24613191 1335
4e97e4e9 1336diff --git a/crypto/lzf.c b/crypto/lzf.c
1337new file mode 100644
7f9d2ee0 1338index 0000000..3e0aa8c
4e97e4e9 1339--- /dev/null
1340+++ b/crypto/lzf.c
7f9d2ee0 1341@@ -0,0 +1,326 @@
ad8f4a28 1342+/*
24613191 1343+ * Cryptoapi LZF compression module.
1344+ *
4e97e4e9 1345+ * Copyright (c) 2004-2005 Nigel Cunningham <nigel at tuxonice net>
24613191 1346+ *
1347+ * based on the deflate.c file:
ad8f4a28 1348+ *
24613191 1349+ * Copyright (c) 2003 James Morris <jmorris@intercode.com.au>
ad8f4a28 1350+ *
4e97e4e9 1351+ * and upon the LZF compression module donated to the TuxOnIce project with
24613191 1352+ * the following copyright:
1353+ *
1354+ * This program is free software; you can redistribute it and/or modify it
1355+ * under the terms of the GNU General Public License as published by the Free
ad8f4a28 1356+ * Software Foundation; either version 2 of the License, or (at your option)
24613191 1357+ * any later version.
1358+ * Copyright (c) 2000-2003 Marc Alexander Lehmann <pcg@goof.com>
ad8f4a28 1359+ *
24613191 1360+ * Redistribution and use in source and binary forms, with or without modifica-
1361+ * tion, are permitted provided that the following conditions are met:
ad8f4a28 1362+ *
24613191 1363+ * 1. Redistributions of source code must retain the above copyright notice,
1364+ * this list of conditions and the following disclaimer.
ad8f4a28 1365+ *
24613191 1366+ * 2. Redistributions in binary form must reproduce the above copyright
1367+ * notice, this list of conditions and the following disclaimer in the
1368+ * documentation and/or other materials provided with the distribution.
ad8f4a28 1369+ *
24613191 1370+ * 3. The name of the author may not be used to endorse or promote products
1371+ * derived from this software without specific prior written permission.
ad8f4a28 1372+ *
24613191 1373+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
1374+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MER-
1375+ * CHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
1376+ * EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPE-
1377+ * CIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
1378+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
1379+ * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
1380+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTH-
1381+ * ERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
1382+ * OF THE POSSIBILITY OF SUCH DAMAGE.
1383+ *
1384+ * Alternatively, the contents of this file may be used under the terms of
1385+ * the GNU General Public License version 2 (the "GPL"), in which case the
1386+ * provisions of the GPL are applicable instead of the above. If you wish to
1387+ * allow the use of your version of this file only under the terms of the
1388+ * GPL and not to allow others to use your version of this file under the
1389+ * BSD license, indicate your decision by deleting the provisions above and
1390+ * replace them with the notice and other provisions required by the GPL. If
1391+ * you do not delete the provisions above, a recipient may use your version
1392+ * of this file under either the BSD or the GPL.
1393+ */
1394+
1395+#include <linux/kernel.h>
1396+#include <linux/module.h>
1397+#include <linux/init.h>
1398+#include <linux/module.h>
1399+#include <linux/crypto.h>
1400+#include <linux/err.h>
1401+#include <linux/vmalloc.h>
1402+#include <asm/string.h>
1403+
1404+struct lzf_ctx {
1405+ void *hbuf;
1406+ unsigned int bufofs;
1407+};
1408+
1409+/*
1410+ * size of hashtable is (1 << hlog) * sizeof (char *)
1411+ * decompression is independent of the hash table size
1412+ * the difference between 15 and 14 is very small
1413+ * for small blocks (and 14 is also faster).
1414+ * For a low-memory configuration, use hlog == 13;
1415+ * For best compression, use 15 or 16.
1416+ */
ad8f4a28 1417+static const int hlog = 13;
24613191 1418+
1419+/*
1420+ * don't play with this unless you benchmark!
1421+ * decompression is not dependent on the hash function
1422+ * the hashing function might seem strange, just believe me
1423+ * it works ;)
1424+ */
1425+static inline u16 first(const u8 *p)
1426+{
1427+ return ((p[0]) << 8) + p[1];
1428+}
1429+
1430+static inline u16 next(u8 v, const u8 *p)
1431+{
1432+ return ((v) << 8) + p[2];
1433+}
1434+
1435+static inline u32 idx(unsigned int h)
1436+{
1437+ return (((h ^ (h << 5)) >> (3*8 - hlog)) + h*3) & ((1 << hlog) - 1);
1438+}
1439+
1440+/*
1441+ * IDX works because it is very similar to a multiplicative hash, e.g.
1442+ * (h * 57321 >> (3*8 - hlog))
1443+ * the next one is also quite good, albeit slow ;)
1444+ * (int)(cos(h & 0xffffff) * 1e6)
1445+ */
1446+
1447+static const int max_lit = (1 << 5);
1448+static const int max_off = (1 << 13);
1449+static const int max_ref = ((1 << 8) + (1 << 3));
1450+
1451+/*
1452+ * compressed format
1453+ *
1454+ * 000LLLLL <L+1> ; literal
1455+ * LLLOOOOO oooooooo ; backref L
1456+ * 111OOOOO LLLLLLLL oooooooo ; backref L+7
1457+ *
1458+ */
1459+
1460+static void lzf_compress_exit(struct crypto_tfm *tfm)
1461+{
1462+ struct lzf_ctx *ctx = crypto_tfm_ctx(tfm);
1463+
e8d0ad9d 1464+ if (!ctx->hbuf)
1465+ return;
1466+
1467+ vfree(ctx->hbuf);
1468+ ctx->hbuf = NULL;
24613191 1469+}
1470+
1471+static int lzf_compress_init(struct crypto_tfm *tfm)
1472+{
1473+ struct lzf_ctx *ctx = crypto_tfm_ctx(tfm);
1474+
1475+ /* Get LZF ready to go */
1476+ ctx->hbuf = vmalloc_32((1 << hlog) * sizeof(char *));
e8d0ad9d 1477+ if (ctx->hbuf)
1478+ return 0;
1479+
1480+ printk(KERN_WARNING "Failed to allocate %ld bytes for lzf workspace\n",
1481+ (long) ((1 << hlog) * sizeof(char *)));
1482+ return -ENOMEM;
24613191 1483+}
1484+
1485+static int lzf_compress(struct crypto_tfm *tfm, const u8 *in_data,
1486+ unsigned int in_len, u8 *out_data, unsigned int *out_len)
1487+{
1488+ struct lzf_ctx *ctx = crypto_tfm_ctx(tfm);
1489+ const u8 **htab = ctx->hbuf;
1490+ const u8 **hslot;
1491+ const u8 *ip = in_data;
1492+ u8 *op = out_data;
1493+ const u8 *in_end = ip + in_len;
1494+ u8 *out_end = op + *out_len - 3;
1495+ const u8 *ref;
1496+
1497+ unsigned int hval = first(ip);
1498+ unsigned long off;
1499+ int lit = 0;
1500+
1501+ memset(htab, 0, sizeof(htab));
1502+
1503+ for (;;) {
1504+ if (ip < in_end - 2) {
1505+ hval = next(hval, ip);
1506+ hslot = htab + idx(hval);
1507+ ref = *hslot;
1508+ *hslot = ip;
1509+
ad8f4a28
AM
1510+ off = ip - ref - 1;
1511+ if (off < max_off
24613191 1512+ && ip + 4 < in_end && ref > in_data
1513+ && *(u16 *) ref == *(u16 *) ip && ref[2] == ip[2]
1514+ ) {
1515+ /* match found at *ref++ */
1516+ unsigned int len = 2;
1517+ unsigned int maxlen = in_end - ip - len;
1518+ maxlen = maxlen > max_ref ? max_ref : maxlen;
1519+
1520+ do
1521+ len++;
1522+ while (len < maxlen && ref[len] == ip[len]);
1523+
1524+ if (op + lit + 1 + 3 >= out_end) {
1525+ *out_len = PAGE_SIZE;
1526+ return 0;
1527+ }
1528+
1529+ if (lit) {
1530+ *op++ = lit - 1;
1531+ lit = -lit;
ad8f4a28 1532+ do {
24613191 1533+ *op++ = ip[lit];
ad8f4a28 1534+ } while (++lit);
24613191 1535+ }
1536+
1537+ len -= 2;
1538+ ip++;
1539+
1540+ if (len < 7) {
1541+ *op++ = (off >> 8) + (len << 5);
1542+ } else {
1543+ *op++ = (off >> 8) + (7 << 5);
1544+ *op++ = len - 7;
1545+ }
1546+
1547+ *op++ = off;
1548+
1549+ ip += len;
1550+ hval = first(ip);
1551+ hval = next(hval, ip);
1552+ htab[idx(hval)] = ip;
1553+ ip++;
1554+ continue;
1555+ }
1556+ } else if (ip == in_end)
1557+ break;
1558+
1559+ /* one more literal byte we must copy */
1560+ lit++;
1561+ ip++;
1562+
1563+ if (lit == max_lit) {
1564+ if (op + 1 + max_lit >= out_end) {
1565+ *out_len = PAGE_SIZE;
1566+ return 0;
1567+ }
1568+
1569+ *op++ = max_lit - 1;
1570+ memcpy(op, ip - max_lit, max_lit);
1571+ op += max_lit;
1572+ lit = 0;
1573+ }
1574+ }
1575+
1576+ if (lit) {
1577+ if (op + lit + 1 >= out_end) {
1578+ *out_len = PAGE_SIZE;
1579+ return 0;
1580+ }
1581+
1582+ *op++ = lit - 1;
1583+ lit = -lit;
ad8f4a28 1584+ do {
24613191 1585+ *op++ = ip[lit];
ad8f4a28 1586+ } while (++lit);
24613191 1587+ }
1588+
1589+ *out_len = op - out_data;
1590+ return 0;
1591+}
1592+
1593+static int lzf_decompress(struct crypto_tfm *tfm, const u8 *src,
1594+ unsigned int slen, u8 *dst, unsigned int *dlen)
1595+{
1596+ u8 const *ip = src;
1597+ u8 *op = dst;
1598+ u8 const *const in_end = ip + slen;
1599+ u8 *const out_end = op + *dlen;
1600+
e8d0ad9d 1601+ *dlen = PAGE_SIZE;
24613191 1602+ do {
1603+ unsigned int ctrl = *ip++;
1604+
ad8f4a28
AM
1605+ if (ctrl < (1 << 5)) {
1606+ /* literal run */
24613191 1607+ ctrl++;
1608+
e8d0ad9d 1609+ if (op + ctrl > out_end)
24613191 1610+ return 0;
24613191 1611+ memcpy(op, ip, ctrl);
1612+ op += ctrl;
1613+ ip += ctrl;
1614+ } else { /* back reference */
1615+
1616+ unsigned int len = ctrl >> 5;
1617+
1618+ u8 *ref = op - ((ctrl & 0x1f) << 8) - 1;
1619+
1620+ if (len == 7)
1621+ len += *ip++;
1622+
1623+ ref -= *ip++;
e8d0ad9d 1624+ len += 2;
24613191 1625+
e8d0ad9d 1626+ if (op + len > out_end || ref < (u8 *) dst)
24613191 1627+ return 0;
24613191 1628+
ad8f4a28 1629+ do {
24613191 1630+ *op++ = *ref++;
ad8f4a28 1631+ } while (--len);
24613191 1632+ }
7f9d2ee0 1633+ } while (op < out_end && ip < in_end);
24613191 1634+
1635+ *dlen = op - (u8 *) dst;
1636+ return 0;
1637+}
1638+
1639+static struct crypto_alg alg = {
1640+ .cra_name = "lzf",
1641+ .cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
4e97e4e9 1642+ .cra_ctxsize = sizeof(struct lzf_ctx),
24613191 1643+ .cra_module = THIS_MODULE,
1644+ .cra_list = LIST_HEAD_INIT(alg.cra_list),
1645+ .cra_init = lzf_compress_init,
1646+ .cra_exit = lzf_compress_exit,
1647+ .cra_u = { .compress = {
1648+ .coa_compress = lzf_compress,
1649+ .coa_decompress = lzf_decompress } }
1650+};
1651+
1652+static int __init init(void)
1653+{
1654+ return crypto_register_alg(&alg);
1655+}
1656+
1657+static void __exit fini(void)
1658+{
1659+ crypto_unregister_alg(&alg);
1660+}
1661+
1662+module_init(init);
1663+module_exit(fini);
1664+
1665+MODULE_LICENSE("GPL");
1666+MODULE_DESCRIPTION("LZF Compression Algorithm");
1667+MODULE_AUTHOR("Marc Alexander Lehmann & Nigel Cunningham");
4e97e4e9 1668diff --git a/drivers/macintosh/via-pmu.c b/drivers/macintosh/via-pmu.c
7f9d2ee0 1669index d6365a9..03e1eff 100644
4e97e4e9 1670--- a/drivers/macintosh/via-pmu.c
1671+++ b/drivers/macintosh/via-pmu.c
7f9d2ee0 1672@@ -39,7 +39,6 @@
24613191 1673 #include <linux/interrupt.h>
1674 #include <linux/device.h>
1675 #include <linux/sysdev.h>
1676-#include <linux/freezer.h>
1677 #include <linux/syscalls.h>
1678 #include <linux/suspend.h>
1679 #include <linux/cpu.h>
4e97e4e9 1680diff --git a/drivers/md/md.c b/drivers/md/md.c
7f9d2ee0 1681index 61ccbd2..5083aae 100644
4e97e4e9 1682--- a/drivers/md/md.c
1683+++ b/drivers/md/md.c
7f9d2ee0 1684@@ -5623,6 +5623,8 @@ void md_do_sync(mddev_t *mddev)
e8d0ad9d 1685 last_mark = next;
1686 }
1687
ad8f4a28 1688+ while (freezer_is_on())
e8d0ad9d 1689+ yield();
1690
7f9d2ee0 1691 if (kthread_should_stop())
1692 goto interrupted;
4e97e4e9 1693diff --git a/fs/buffer.c b/fs/buffer.c
7f9d2ee0 1694index 39ff144..915fc52 100644
4e97e4e9 1695--- a/fs/buffer.c
1696+++ b/fs/buffer.c
ad8f4a28 1697@@ -247,6 +247,93 @@ void thaw_bdev(struct block_device *bdev, struct super_block *sb)
4e97e4e9 1698 }
1699 EXPORT_SYMBOL(thaw_bdev);
24613191 1700
ad8f4a28
AM
1701+/* #define DEBUG_FS_FREEZING */
1702+
4e97e4e9 1703+/**
1704+ * freeze_filesystems - lock all filesystems and force them into a consistent
1705+ * state
1706+ */
ad8f4a28 1707+void freeze_filesystems(int which)
4e97e4e9 1708+{
1709+ struct super_block *sb;
24613191 1710+
4e97e4e9 1711+ lockdep_off();
1712+
1713+ /*
1714+ * Freeze in reverse order so filesystems dependant upon others are
1715+ * frozen in the right order (eg. loopback on ext3).
1716+ */
1717+ list_for_each_entry_reverse(sb, &super_blocks, s_list) {
1718+#ifdef DEBUG_FS_FREEZING
ad8f4a28 1719+ printk(KERN_INFO "Considering %s.%s: (root %p, bdev %x)",
4e97e4e9 1720+ sb->s_type->name ? sb->s_type->name : "?",
1721+ sb->s_subtype ? sb->s_subtype : "", sb->s_root,
1722+ sb->s_bdev ? sb->s_bdev->bd_dev : 0);
1723+#endif
1724+
1725+ if (sb->s_type->fs_flags & FS_IS_FUSE &&
ad8f4a28
AM
1726+ sb->s_frozen == SB_UNFROZEN &&
1727+ which & FS_FREEZER_FUSE) {
4e97e4e9 1728+ sb->s_frozen = SB_FREEZE_TRANS;
1729+ sb->s_flags |= MS_FROZEN;
1730+ printk("Fuse filesystem done.\n");
1731+ continue;
1732+ }
1733+
1734+ if (!sb->s_root || !sb->s_bdev ||
1735+ (sb->s_frozen == SB_FREEZE_TRANS) ||
1736+ (sb->s_flags & MS_RDONLY) ||
ad8f4a28
AM
1737+ (sb->s_flags & MS_FROZEN) ||
1738+ !(which & FS_FREEZER_NORMAL)) {
4e97e4e9 1739+#ifdef DEBUG_FS_FREEZING
ad8f4a28 1740+ printk(KERN_INFO "Nope.\n");
4e97e4e9 1741+#endif
1742+ continue;
1743+ }
1744+
1745+#ifdef DEBUG_FS_FREEZING
ad8f4a28 1746+ printk(KERN_INFO "Freezing %x... ", sb->s_bdev->bd_dev);
4e97e4e9 1747+#endif
1748+ freeze_bdev(sb->s_bdev);
1749+ sb->s_flags |= MS_FROZEN;
1750+#ifdef DEBUG_FS_FREEZING
ad8f4a28 1751+ printk(KERN_INFO "Done.\n");
4e97e4e9 1752+#endif
1753+ }
1754+
1755+ lockdep_on();
1756+}
1757+
1758+/**
1759+ * thaw_filesystems - unlock all filesystems
1760+ */
1761+void thaw_filesystems(int which)
1762+{
1763+ struct super_block *sb;
1764+
1765+ lockdep_off();
1766+
1767+ list_for_each_entry(sb, &super_blocks, s_list) {
1768+ if (!(sb->s_flags & MS_FROZEN))
1769+ continue;
ad8f4a28 1770+
4e97e4e9 1771+ if (sb->s_type->fs_flags & FS_IS_FUSE) {
ad8f4a28 1772+ if (!(which & FS_FREEZER_FUSE))
4e97e4e9 1773+ continue;
1774+
1775+ sb->s_frozen = SB_UNFROZEN;
1776+ } else {
ad8f4a28 1777+ if (!(which & FS_FREEZER_NORMAL))
4e97e4e9 1778+ continue;
1779+
1780+ thaw_bdev(sb->s_bdev, sb);
1781+ }
1782+ sb->s_flags &= ~MS_FROZEN;
1783+ }
1784+
1785+ lockdep_on();
1786+}
1787+
1788 /*
1789 * Various filesystems appear to want __find_get_block to be non-blocking.
1790 * But it's the page lock which protects the buffers. To get around this,
1791diff --git a/fs/fuse/control.c b/fs/fuse/control.c
1792index 105d4a2..57eeca4 100644
1793--- a/fs/fuse/control.c
1794+++ b/fs/fuse/control.c
1795@@ -207,6 +207,7 @@ static void fuse_ctl_kill_sb(struct super_block *sb)
1796 static struct file_system_type fuse_ctl_fs_type = {
1797 .owner = THIS_MODULE,
1798 .name = "fusectl",
1799+ .fs_flags = FS_IS_FUSE,
1800 .get_sb = fuse_ctl_get_sb,
1801 .kill_sb = fuse_ctl_kill_sb,
1802 };
1803diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
7f9d2ee0 1804index af63980..d417551 100644
4e97e4e9 1805--- a/fs/fuse/dev.c
1806+++ b/fs/fuse/dev.c
1807@@ -7,6 +7,7 @@
1808 */
24613191 1809
4e97e4e9 1810 #include "fuse_i.h"
1811+#include "fuse.h"
1812
1813 #include <linux/init.h>
1814 #include <linux/module.h>
1815@@ -16,6 +17,7 @@
1816 #include <linux/pagemap.h>
1817 #include <linux/file.h>
1818 #include <linux/slab.h>
1819+#include <linux/freezer.h>
1820
1821 MODULE_ALIAS_MISCDEV(FUSE_MINOR);
1822
7f9d2ee0 1823@@ -723,6 +725,8 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const struct iovec *iov,
4e97e4e9 1824 if (!fc)
1825 return -EPERM;
1826
1827+ FUSE_MIGHT_FREEZE(file->f_mapping->host->i_sb, "fuse_dev_read");
1828+
1829 restart:
1830 spin_lock(&fc->lock);
1831 err = -EAGAIN;
7f9d2ee0 1832@@ -849,6 +853,9 @@ static ssize_t fuse_dev_write(struct kiocb *iocb, const struct iovec *iov,
4e97e4e9 1833 if (!fc)
1834 return -EPERM;
1835
1836+ FUSE_MIGHT_FREEZE(iocb->ki_filp->f_mapping->host->i_sb,
1837+ "fuse_dev_write");
1838+
1839 fuse_copy_init(&cs, fc, 0, NULL, iov, nr_segs);
1840 if (nbytes < sizeof(struct fuse_out_header))
1841 return -EINVAL;
1842diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
7f9d2ee0 1843index c4807b3..d8b4526 100644
4e97e4e9 1844--- a/fs/fuse/dir.c
1845+++ b/fs/fuse/dir.c
1846@@ -7,12 +7,14 @@
1847 */
1848
1849 #include "fuse_i.h"
1850+#include "fuse.h"
1851
1852 #include <linux/pagemap.h>
1853 #include <linux/file.h>
1854 #include <linux/gfp.h>
1855 #include <linux/sched.h>
1856 #include <linux/namei.h>
1857+#include <linux/freezer.h>
1858
1859 #if BITS_PER_LONG >= 64
1860 static inline void fuse_dentry_settime(struct dentry *entry, u64 time)
ad8f4a28 1861@@ -176,6 +178,9 @@ static int fuse_dentry_revalidate(struct dentry *entry, struct nameidata *nd)
4e97e4e9 1862 return 0;
1863
1864 fc = get_fuse_conn(inode);
1865+
1866+ FUSE_MIGHT_FREEZE(inode->i_sb, "fuse_dentry_revalidate");
1867+
1868 req = fuse_get_req(fc);
1869 if (IS_ERR(req))
1870 return 0;
ad8f4a28 1871@@ -271,6 +276,8 @@ static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry,
4e97e4e9 1872 if (IS_ERR(req))
7f9d2ee0 1873 return ERR_CAST(req);
4e97e4e9 1874
1875+ FUSE_MIGHT_FREEZE(dir->i_sb, "fuse_lookup");
1876+
1877 forget_req = fuse_get_req(fc);
1878 if (IS_ERR(forget_req)) {
1879 fuse_put_request(fc, req);
ad8f4a28 1880@@ -361,6 +368,8 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry, int mode,
4e97e4e9 1881 if (IS_ERR(forget_req))
1882 return PTR_ERR(forget_req);
1883
1884+ FUSE_MIGHT_FREEZE(dir->i_sb, "fuse_create_open");
1885+
1886 req = fuse_get_req(fc);
1887 err = PTR_ERR(req);
1888 if (IS_ERR(req))
7f9d2ee0 1889@@ -447,6 +456,8 @@ static int create_new_entry(struct fuse_conn *fc, struct fuse_req *req,
4e97e4e9 1890 int err;
1891 struct fuse_req *forget_req;
1892
1893+ FUSE_MIGHT_FREEZE(dir->i_sb, "create_new_entry");
1894+
1895 forget_req = fuse_get_req(fc);
1896 if (IS_ERR(forget_req)) {
1897 fuse_put_request(fc, req);
7f9d2ee0 1898@@ -544,7 +555,11 @@ static int fuse_mkdir(struct inode *dir, struct dentry *entry, int mode)
4e97e4e9 1899 {
1900 struct fuse_mkdir_in inarg;
1901 struct fuse_conn *fc = get_fuse_conn(dir);
1902- struct fuse_req *req = fuse_get_req(fc);
1903+ struct fuse_req *req;
1904+
1905+ FUSE_MIGHT_FREEZE(dir->i_sb, "fuse_mkdir");
1906+
1907+ req = fuse_get_req(fc);
1908 if (IS_ERR(req))
1909 return PTR_ERR(req);
1910
7f9d2ee0 1911@@ -564,7 +579,11 @@ static int fuse_symlink(struct inode *dir, struct dentry *entry,
4e97e4e9 1912 {
1913 struct fuse_conn *fc = get_fuse_conn(dir);
1914 unsigned len = strlen(link) + 1;
1915- struct fuse_req *req = fuse_get_req(fc);
1916+ struct fuse_req *req;
1917+
1918+ FUSE_MIGHT_FREEZE(dir->i_sb, "fuse_symlink");
1919+
1920+ req = fuse_get_req(fc);
1921 if (IS_ERR(req))
1922 return PTR_ERR(req);
1923
7f9d2ee0 1924@@ -581,7 +600,11 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry)
4e97e4e9 1925 {
1926 int err;
1927 struct fuse_conn *fc = get_fuse_conn(dir);
1928- struct fuse_req *req = fuse_get_req(fc);
1929+ struct fuse_req *req;
1930+
1931+ FUSE_MIGHT_FREEZE(dir->i_sb, "fuse_unlink");
1932+
1933+ req = fuse_get_req(fc);
1934 if (IS_ERR(req))
1935 return PTR_ERR(req);
1936
7f9d2ee0 1937@@ -612,7 +635,11 @@ static int fuse_rmdir(struct inode *dir, struct dentry *entry)
4e97e4e9 1938 {
1939 int err;
1940 struct fuse_conn *fc = get_fuse_conn(dir);
1941- struct fuse_req *req = fuse_get_req(fc);
1942+ struct fuse_req *req;
1943+
1944+ FUSE_MIGHT_FREEZE(dir->i_sb, "fuse_rmdir");
1945+
1946+ req = fuse_get_req(fc);
1947 if (IS_ERR(req))
1948 return PTR_ERR(req);
1949
1950diff --git a/fs/fuse/file.c b/fs/fuse/file.c
7f9d2ee0 1951index 676b0bc..56bb4eb 100644
4e97e4e9 1952--- a/fs/fuse/file.c
1953+++ b/fs/fuse/file.c
1954@@ -7,11 +7,13 @@
1955 */
1956
1957 #include "fuse_i.h"
1958+#include "fuse.h"
1959
1960 #include <linux/pagemap.h>
1961 #include <linux/slab.h>
1962 #include <linux/kernel.h>
1963 #include <linux/sched.h>
1964+#include <linux/freezer.h>
1965
1966 static const struct file_operations fuse_direct_io_file_operations;
1967
1968@@ -23,6 +25,8 @@ static int fuse_send_open(struct inode *inode, struct file *file, int isdir,
1969 struct fuse_req *req;
1970 int err;
1971
1972+ FUSE_MIGHT_FREEZE(inode->i_sb, "fuse_send_open");
1973+
1974 req = fuse_get_req(fc);
1975 if (IS_ERR(req))
1976 return PTR_ERR(req);
7f9d2ee0 1977@@ -546,6 +550,8 @@ static int fuse_buffered_write(struct file *file, struct inode *inode,
4e97e4e9 1978 if (is_bad_inode(inode))
1979 return -EIO;
1980
1981+ FUSE_MIGHT_FREEZE(inode->i_sb, "fuse_commit_write");
1982+
1983 req = fuse_get_req(fc);
1984 if (IS_ERR(req))
1985 return PTR_ERR(req);
7f9d2ee0 1986@@ -639,6 +645,8 @@ static ssize_t fuse_direct_io(struct file *file, const char __user *buf,
4e97e4e9 1987 if (is_bad_inode(inode))
1988 return -EIO;
1989
1990+ FUSE_MIGHT_FREEZE(file->f_mapping->host->i_sb, "fuse_direct_io");
1991+
1992 req = fuse_get_req(fc);
1993 if (IS_ERR(req))
1994 return PTR_ERR(req);
7f9d2ee0 1995@@ -791,6 +799,8 @@ static int fuse_getlk(struct file *file, struct file_lock *fl)
4e97e4e9 1996 struct fuse_lk_out outarg;
1997 int err;
24613191 1998
4e97e4e9 1999+ FUSE_MIGHT_FREEZE(file->f_mapping->host->i_sb, "fuse_getlk");
2000+
2001 req = fuse_get_req(fc);
2002 if (IS_ERR(req))
2003 return PTR_ERR(req);
7f9d2ee0 2004@@ -821,6 +831,8 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
4e97e4e9 2005 if (fl->fl_flags & FL_CLOSE)
2006 return 0;
2007
2008+ FUSE_MIGHT_FREEZE(file->f_mapping->host->i_sb, "fuse_setlk");
2009+
2010 req = fuse_get_req(fc);
2011 if (IS_ERR(req))
2012 return PTR_ERR(req);
7f9d2ee0 2013@@ -885,6 +897,8 @@ static sector_t fuse_bmap(struct address_space *mapping, sector_t block)
4e97e4e9 2014 if (!inode->i_sb->s_bdev || fc->no_bmap)
2015 return 0;
2016
2017+ FUSE_MIGHT_FREEZE(inode->i_sb, "fuse_bmap");
2018+
2019 req = fuse_get_req(fc);
2020 if (IS_ERR(req))
2021 return 0;
2022diff --git a/fs/fuse/fuse.h b/fs/fuse/fuse.h
2023new file mode 100644
ad8f4a28 2024index 0000000..170e49a
4e97e4e9 2025--- /dev/null
2026+++ b/fs/fuse/fuse.h
ad8f4a28 2027@@ -0,0 +1,13 @@
4e97e4e9 2028+#define FUSE_MIGHT_FREEZE(superblock, desc) \
2029+do { \
2030+ int printed = 0; \
ad8f4a28 2031+ while (superblock->s_frozen != SB_UNFROZEN) { \
4e97e4e9 2032+ if (!printed) { \
ad8f4a28
AM
2033+ printk(KERN_INFO "%d frozen in " desc ".\n", \
2034+ current->pid); \
4e97e4e9 2035+ printed = 1; \
2036+ } \
2037+ try_to_freeze(); \
2038+ yield(); \
2039+ } \
ad8f4a28 2040+} while (0)
4e97e4e9 2041diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
7f9d2ee0 2042index 033f7bd..ac6805a 100644
4e97e4e9 2043--- a/fs/fuse/inode.c
2044+++ b/fs/fuse/inode.c
ad8f4a28 2045@@ -702,7 +702,7 @@ static int fuse_get_sb(struct file_system_type *fs_type,
4e97e4e9 2046 static struct file_system_type fuse_fs_type = {
2047 .owner = THIS_MODULE,
2048 .name = "fuse",
2049- .fs_flags = FS_HAS_SUBTYPE,
2050+ .fs_flags = FS_HAS_SUBTYPE | FS_IS_FUSE,
2051 .get_sb = fuse_get_sb,
2052 .kill_sb = kill_anon_super,
2053 };
ad8f4a28 2054@@ -721,7 +721,7 @@ static struct file_system_type fuseblk_fs_type = {
4e97e4e9 2055 .name = "fuseblk",
2056 .get_sb = fuse_get_sb_blk,
2057 .kill_sb = kill_block_super,
2058- .fs_flags = FS_REQUIRES_DEV | FS_HAS_SUBTYPE,
2059+ .fs_flags = FS_REQUIRES_DEV | FS_HAS_SUBTYPE | FS_IS_FUSE,
2060 };
2061
2062 static inline int register_fuseblk(void)
2063diff --git a/fs/ioctl.c b/fs/ioctl.c
7f9d2ee0 2064index f32fbde..9014fe4 100644
4e97e4e9 2065--- a/fs/ioctl.c
2066+++ b/fs/ioctl.c
7f9d2ee0 2067@@ -211,3 +211,4 @@ asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg)
4e97e4e9 2068 out:
2069 return error;
2070 }
4e97e4e9 2071+EXPORT_SYMBOL(sys_ioctl);
ad8f4a28 2072diff --git a/fs/namei.c b/fs/namei.c
7f9d2ee0 2073index 8cf9bb9..c1abd13 100644
ad8f4a28
AM
2074--- a/fs/namei.c
2075+++ b/fs/namei.c
7f9d2ee0 2076@@ -2177,6 +2177,8 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
ad8f4a28
AM
2077 if (!dir->i_op || !dir->i_op->unlink)
2078 return -EPERM;
24613191 2079
ad8f4a28 2080+ vfs_check_frozen(dir->i_sb, SB_FREEZE_WRITE);
24613191 2081+
ad8f4a28
AM
2082 DQUOT_INIT(dir);
2083
2084 mutex_lock(&dentry->d_inode->i_mutex);
4e97e4e9 2085diff --git a/include/asm-powerpc/suspend.h b/include/asm-powerpc/suspend.h
ad8f4a28 2086index cbf2c94..e0756c2 100644
4e97e4e9 2087--- a/include/asm-powerpc/suspend.h
2088+++ b/include/asm-powerpc/suspend.h
2089@@ -6,4 +6,7 @@ static inline int arch_prepare_suspend(void) { return 0; }
2090 void save_processor_state(void);
2091 void restore_processor_state(void);
2092
2093+#define toi_faulted (0)
ad8f4a28 2094+#define clear_toi_fault() do { } while (0)
4e97e4e9 2095+
2096 #endif /* __ASM_POWERPC_SUSPEND_H */
2097diff --git a/include/asm-ppc/suspend.h b/include/asm-ppc/suspend.h
ad8f4a28 2098index 3df9f32..1e2e73d 100644
4e97e4e9 2099--- a/include/asm-ppc/suspend.h
2100+++ b/include/asm-ppc/suspend.h
2101@@ -10,3 +10,6 @@ static inline void save_processor_state(void)
24613191 2102 static inline void restore_processor_state(void)
2103 {
2104 }
2105+
4e97e4e9 2106+#define toi_faulted (0)
ad8f4a28
AM
2107+#define clear_toi_fault() do { } while (0)
2108diff --git a/include/asm-x86/suspend_32.h b/include/asm-x86/suspend_32.h
7f9d2ee0 2109index 1bbda3a..e58d1b5 100644
ad8f4a28
AM
2110--- a/include/asm-x86/suspend_32.h
2111+++ b/include/asm-x86/suspend_32.h
2112@@ -8,6 +8,9 @@
2113
2114 static inline int arch_prepare_suspend(void) { return 0; }
2115
2116+extern int toi_faulted;
2117+#define clear_toi_fault() do { toi_faulted = 0; } while (0)
2118+
2119 /* image of the saved processor state */
2120 struct saved_context {
2121 u16 es, fs, gs, ss;
2122diff --git a/include/asm-x86/suspend_64.h b/include/asm-x86/suspend_64.h
7f9d2ee0 2123index 2eb92cb..2d429e7 100644
ad8f4a28
AM
2124--- a/include/asm-x86/suspend_64.h
2125+++ b/include/asm-x86/suspend_64.h
2126@@ -15,6 +15,9 @@ arch_prepare_suspend(void)
24613191 2127 return 0;
2128 }
2129
4e97e4e9 2130+#define toi_faulted (0)
ad8f4a28 2131+#define clear_toi_fault() do { } while (0)
24613191 2132+
7f9d2ee0 2133 /*
2134 * Image of the saved processor state, used by the low level ACPI suspend to
2135 * RAM code and by the low level hibernation code.
4e97e4e9 2136diff --git a/include/linux/Kbuild b/include/linux/Kbuild
7f9d2ee0 2137index cedbbd8..7a731ba 100644
4e97e4e9 2138--- a/include/linux/Kbuild
2139+++ b/include/linux/Kbuild
ad8f4a28 2140@@ -202,6 +202,7 @@ unifdef-y += filter.h
4e97e4e9 2141 unifdef-y += flat.h
2142 unifdef-y += futex.h
2143 unifdef-y += fs.h
2144+unifdef-y += freezer.h
2145 unifdef-y += gameport.h
2146 unifdef-y += generic_serial.h
7f9d2ee0 2147 unifdef-y += gfs2_ondisk.h
4e97e4e9 2148diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
7f9d2ee0 2149index 932eb02..1436765 100644
4e97e4e9 2150--- a/include/linux/buffer_head.h
2151+++ b/include/linux/buffer_head.h
2152@@ -172,6 +172,11 @@ wait_queue_head_t *bh_waitq_head(struct buffer_head *bh);
2153 int fsync_bdev(struct block_device *);
2154 struct super_block *freeze_bdev(struct block_device *);
2155 void thaw_bdev(struct block_device *, struct super_block *);
ad8f4a28
AM
2156+#define FS_FREEZER_FUSE 1
2157+#define FS_FREEZER_NORMAL 2
2158+#define FS_FREEZER_ALL (FS_FREEZER_FUSE | FS_FREEZER_NORMAL)
2159+void freeze_filesystems(int which);
4e97e4e9 2160+void thaw_filesystems(int which);
2161 int fsync_super(struct super_block *);
2162 int fsync_no_super(struct block_device *);
2163 struct buffer_head *__find_get_block(struct block_device *bdev, sector_t block,
2164diff --git a/include/linux/dyn_pageflags.h b/include/linux/dyn_pageflags.h
2165new file mode 100644
ad8f4a28 2166index 0000000..e85c3ee
4e97e4e9 2167--- /dev/null
2168+++ b/include/linux/dyn_pageflags.h
ad8f4a28 2169@@ -0,0 +1,66 @@
24613191 2170+/*
2171+ * include/linux/dyn_pageflags.h
2172+ *
4e97e4e9 2173+ * Copyright (C) 2004-2007 Nigel Cunningham <nigel at tuxonice net>
24613191 2174+ *
2175+ * This file is released under the GPLv2.
2176+ *
2177+ * It implements support for dynamically allocated bitmaps that are
2178+ * used for temporary or infrequently used pageflags, in lieu of
2179+ * bits in the struct page flags entry.
2180+ */
2181+
2182+#ifndef DYN_PAGEFLAGS_H
2183+#define DYN_PAGEFLAGS_H
2184+
2185+#include <linux/mm.h>
2186+
4e97e4e9 2187+struct dyn_pageflags {
2188+ unsigned long ****bitmap; /* [pg_dat][zone][page_num] */
2189+ int sparse, initialised;
2190+ struct list_head list;
2191+ spinlock_t struct_lock;
2192+};
24613191 2193+
4e97e4e9 2194+#define DYN_PAGEFLAGS_INIT(name) { \
2195+ .list = LIST_HEAD_INIT(name.list), \
2196+ .struct_lock = __SPIN_LOCK_UNLOCKED(name.lock) \
2197+}
24613191 2198+
4e97e4e9 2199+#define DECLARE_DYN_PAGEFLAGS(name) \
2200+ struct dyn_pageflags name = DYN_PAGEFLAGS_INIT(name);
24613191 2201+
ad8f4a28
AM
2202+#define BITMAP_FOR_EACH_SET(BITMAP, CTR) \
2203+ for (CTR = get_next_bit_on(BITMAP, max_pfn + 1); CTR <= max_pfn; \
2204+ CTR = get_next_bit_on(BITMAP, CTR))
24613191 2205+
4e97e4e9 2206+extern void clear_dyn_pageflags(struct dyn_pageflags *pagemap);
2207+extern int allocate_dyn_pageflags(struct dyn_pageflags *pagemap, int sparse);
2208+extern void free_dyn_pageflags(struct dyn_pageflags *pagemap);
ad8f4a28
AM
2209+extern unsigned long get_next_bit_on(struct dyn_pageflags *bitmap,
2210+ unsigned long counter);
24613191 2211+
4e97e4e9 2212+extern int test_dynpageflag(struct dyn_pageflags *bitmap, struct page *page);
2213+/*
2214+ * In sparse bitmaps, setting a flag can fail (we can fail to allocate
2215+ * the page to store the bit. If this happens, we will BUG(). If you don't
2216+ * want this behaviour, don't allocate sparse pageflags.
2217+ */
2218+extern void set_dynpageflag(struct dyn_pageflags *bitmap, struct page *page);
2219+extern void clear_dynpageflag(struct dyn_pageflags *bitmap, struct page *page);
2220+extern void dump_pagemap(struct dyn_pageflags *pagemap);
24613191 2221+
ad8f4a28 2222+/*
24613191 2223+ * With the above macros defined, you can do...
2224+ * #define PagePageset1(page) (test_dynpageflag(&pageset1_map, page))
2225+ * #define SetPagePageset1(page) (set_dynpageflag(&pageset1_map, page))
2226+ * #define ClearPagePageset1(page) (clear_dynpageflag(&pageset1_map, page))
2227+ */
2228+
4e97e4e9 2229+extern void __init dyn_pageflags_init(void);
2230+extern void __init dyn_pageflags_use_kzalloc(void);
2231+
2232+#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
2233+extern void dyn_pageflags_hotplug(struct zone *zone);
2234+#endif
2235+#endif
2236diff --git a/include/linux/freezer.h b/include/linux/freezer.h
ad8f4a28 2237index 0893499..01e9dc6 100644
4e97e4e9 2238--- a/include/linux/freezer.h
2239+++ b/include/linux/freezer.h
ad8f4a28 2240@@ -127,6 +127,19 @@ static inline void set_freezable(void)
4e97e4e9 2241 current->flags &= ~PF_NOFREEZE;
43540741 2242 }
24613191 2243
2244+extern int freezer_state;
2245+#define FREEZER_OFF 0
4e97e4e9 2246+#define FREEZER_FILESYSTEMS_FROZEN 1
2247+#define FREEZER_USERSPACE_FROZEN 2
2248+#define FREEZER_FULLY_ON 3
24613191 2249+
2250+static inline int freezer_is_on(void)
2251+{
2252+ return (freezer_state == FREEZER_FULLY_ON);
2253+}
2254+
2255+extern void thaw_kernel_threads(void);
2256+
ad8f4a28
AM
2257 /*
2258 * Freezer-friendly wrappers around wait_event_interruptible() and
2259 * wait_event_interruptible_timeout(), originally defined in <linux/wait.h>
2260@@ -169,6 +182,8 @@ static inline int freeze_processes(void) { BUG(); return 0; }
24613191 2261 static inline void thaw_processes(void) {}
2262
2263 static inline int try_to_freeze(void) { return 0; }
2264+static inline int freezer_is_on(void) { return 0; }
2265+static inline void thaw_kernel_threads(void) { }
2266
43540741 2267 static inline void freezer_do_not_count(void) {}
2268 static inline void freezer_count(void) {}
4e97e4e9 2269diff --git a/include/linux/fs.h b/include/linux/fs.h
7f9d2ee0 2270index b84b848..7c14b03 100644
4e97e4e9 2271--- a/include/linux/fs.h
2272+++ b/include/linux/fs.h
2273@@ -8,6 +8,7 @@
2274
2275 #include <linux/limits.h>
2276 #include <linux/ioctl.h>
2277+#include <linux/freezer.h>
2278
2279 /*
2280 * It's silly to have NR_OPEN bigger than NR_FILE, but you can change
2281@@ -93,6 +94,7 @@ extern int dir_notify_enable;
2282 #define FS_REQUIRES_DEV 1
2283 #define FS_BINARY_MOUNTDATA 2
2284 #define FS_HAS_SUBTYPE 4
2285+#define FS_IS_FUSE 8 /* Fuse filesystem - bdev freeze these too */
2286 #define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */
2287 #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move()
2288 * during rename() internally.
7f9d2ee0 2289@@ -125,6 +127,7 @@ extern int dir_notify_enable;
4e97e4e9 2290 #define MS_RELATIME (1<<21) /* Update atime relative to mtime/ctime. */
ad8f4a28 2291 #define MS_KERNMOUNT (1<<22) /* this is a kern_mount call */
7f9d2ee0 2292 #define MS_I_VERSION (1<<23) /* Update inode I_version field */
2293+#define MS_FROZEN (1<<24) /* Frozen by freeze_filesystems() */
4e97e4e9 2294 #define MS_ACTIVE (1<<30)
2295 #define MS_NOUSER (1<<31)
2296
7f9d2ee0 2297@@ -1057,8 +1060,11 @@ enum {
4e97e4e9 2298 SB_FREEZE_TRANS = 2,
2299 };
2300
2301-#define vfs_check_frozen(sb, level) \
2302- wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level)))
2303+#define vfs_check_frozen(sb, level) do { \
2304+ freezer_do_not_count(); \
2305+ wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level))); \
2306+ freezer_count(); \
ad8f4a28 2307+} while (0)
4e97e4e9 2308
2309 #define get_fs_excl() atomic_inc(&current->fs_excl)
2310 #define put_fs_excl() atomic_dec(&current->fs_excl)
2311diff --git a/include/linux/kernel.h b/include/linux/kernel.h
7f9d2ee0 2312index 2df44e7..4d22ae4 100644
4e97e4e9 2313--- a/include/linux/kernel.h
2314+++ b/include/linux/kernel.h
7f9d2ee0 2315@@ -151,6 +151,8 @@ extern int vsprintf(char *buf, const char *, va_list)
24613191 2316 __attribute__ ((format (printf, 2, 0)));
2317 extern int snprintf(char * buf, size_t size, const char * fmt, ...)
2318 __attribute__ ((format (printf, 3, 4)));
2319+extern int snprintf_used(char *buffer, int buffer_size,
2320+ const char *fmt, ...);
2321 extern int vsnprintf(char *buf, size_t size, const char *fmt, va_list args)
2322 __attribute__ ((format (printf, 3, 0)));
2323 extern int scnprintf(char * buf, size_t size, const char * fmt, ...)
4e97e4e9 2324diff --git a/include/linux/netlink.h b/include/linux/netlink.h
7f9d2ee0 2325index fb0713b..2e14fde 100644
4e97e4e9 2326--- a/include/linux/netlink.h
2327+++ b/include/linux/netlink.h
73c609d5 2328@@ -24,6 +24,8 @@
24613191 2329 /* leave room for NETLINK_DM (DM Events) */
2330 #define NETLINK_SCSITRANSPORT 18 /* SCSI Transports */
73c609d5 2331 #define NETLINK_ECRYPTFS 19
ad8f4a28
AM
2332+#define NETLINK_TOI_USERUI 20 /* TuxOnIce's userui */
2333+#define NETLINK_TOI_USM 21 /* Userspace storage manager */
24613191 2334
2335 #define MAX_LINKS 32
2336
4e97e4e9 2337diff --git a/include/linux/suspend.h b/include/linux/suspend.h
7f9d2ee0 2338index 1d7d4c5..bbbb987 100644
4e97e4e9 2339--- a/include/linux/suspend.h
2340+++ b/include/linux/suspend.h
7f9d2ee0 2341@@ -255,4 +255,69 @@ static inline void register_nosave_region_late(unsigned long b, unsigned long e)
4e97e4e9 2342 }
2343 #endif
24613191 2344
2345+enum {
4e97e4e9 2346+ TOI_CAN_HIBERNATE,
2347+ TOI_CAN_RESUME,
2348+ TOI_RESUME_DEVICE_OK,
2349+ TOI_NORESUME_SPECIFIED,
2350+ TOI_SANITY_CHECK_PROMPT,
2351+ TOI_CONTINUE_REQ,
2352+ TOI_RESUMED_BEFORE,
2353+ TOI_BOOT_TIME,
2354+ TOI_NOW_RESUMING,
2355+ TOI_IGNORE_LOGLEVEL,
2356+ TOI_TRYING_TO_RESUME,
2357+ TOI_LOADING_ALT_IMAGE,
2358+ TOI_STOP_RESUME,
2359+ TOI_IO_STOPPED,
2360+ TOI_NOTIFIERS_PREPARE,
ad8f4a28 2361+ TOI_CLUSTER_MODE,
24613191 2362+};
2363+
4e97e4e9 2364+#ifdef CONFIG_TOI
24613191 2365+
2366+/* Used in init dir files */
4e97e4e9 2367+extern unsigned long toi_state;
2368+#define set_toi_state(bit) (set_bit(bit, &toi_state))
2369+#define clear_toi_state(bit) (clear_bit(bit, &toi_state))
2370+#define test_toi_state(bit) (test_bit(bit, &toi_state))
2371+extern int toi_running;
2372+
2373+#else /* !CONFIG_TOI */
2374+
2375+#define toi_state (0)
ad8f4a28 2376+#define set_toi_state(bit) do { } while (0)
4e97e4e9 2377+#define clear_toi_state(bit) do { } while (0)
2378+#define test_toi_state(bit) (0)
2379+#define toi_running (0)
2380+#endif /* CONFIG_TOI */
2381+
2382+#ifdef CONFIG_HIBERNATION
2383+#ifdef CONFIG_TOI
2384+extern void toi_try_resume(void);
e8d0ad9d 2385+#else
ad8f4a28 2386+#define toi_try_resume() do { } while (0)
e8d0ad9d 2387+#endif
2388+
2389+extern int resume_attempted;
24613191 2390+extern int software_resume(void);
e8d0ad9d 2391+
2392+static inline void check_resume_attempted(void)
2393+{
2394+ if (resume_attempted)
2395+ return;
2396+
2397+ software_resume();
2398+}
2399+#else
ad8f4a28 2400+#define check_resume_attempted() do { } while (0)
e8d0ad9d 2401+#define resume_attempted (0)
24613191 2402+#endif
2403+
2404+#ifdef CONFIG_PRINTK_NOSAVE
2405+#define POSS_NOSAVE __nosavedata
2406+#else
2407+#define POSS_NOSAVE
2408+#endif
2409+
ad8f4a28 2410 #endif /* _LINUX_SUSPEND_H */
4e97e4e9 2411diff --git a/include/linux/swap.h b/include/linux/swap.h
7f9d2ee0 2412index 878459a..55315f9 100644
4e97e4e9 2413--- a/include/linux/swap.h
2414+++ b/include/linux/swap.h
7f9d2ee0 2415@@ -164,6 +164,7 @@ extern unsigned long totalram_pages;
ad8f4a28
AM
2416 extern unsigned long totalreserve_pages;
2417 extern long nr_swap_pages;
2418 extern unsigned int nr_free_buffer_pages(void);
2419+extern unsigned int nr_unallocated_buffer_pages(void);
2420 extern unsigned int nr_free_pagecache_pages(void);
2421
2422 /* Definition of global_page_state not available yet */
7f9d2ee0 2423@@ -187,6 +188,8 @@ extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
2424 gfp_t gfp_mask);
2425 extern int __isolate_lru_page(struct page *page, int mode);
24613191 2426 extern unsigned long shrink_all_memory(unsigned long nr_pages);
4e97e4e9 2427+extern void shrink_one_zone(struct zone *zone, unsigned long desired_size,
2428+ int ps_wanted);
24613191 2429 extern int vm_swappiness;
2430 extern int remove_mapping(struct address_space *mapping, struct page *page);
2431 extern long vm_total_pages;
7f9d2ee0 2432@@ -356,5 +359,10 @@ static inline swp_entry_t get_swap_page(void)
24613191 2433 #define disable_swap_token() do { } while(0)
2434
2435 #endif /* CONFIG_SWAP */
2436+
4e97e4e9 2437+/* For TuxOnIce - unlink LRU pages while saving separately */
24613191 2438+void unlink_lru_lists(void);
2439+void relink_lru_lists(void);
2440+
2441 #endif /* __KERNEL__*/
2442 #endif /* _LINUX_SWAP_H */
4e97e4e9 2443diff --git a/init/do_mounts.c b/init/do_mounts.c
7f9d2ee0 2444index 3885e70..131427c 100644
4e97e4e9 2445--- a/init/do_mounts.c
2446+++ b/init/do_mounts.c
7f9d2ee0 2447@@ -373,6 +373,8 @@ void __init prepare_namespace(void)
24613191 2448 if (is_floppy && rd_doload && rd_load_disk(0))
2449 ROOT_DEV = Root_RAM0;
2450
e8d0ad9d 2451+ check_resume_attempted();
2452+
2453 mount_root();
2454 out:
2455 sys_mount(".", "/", NULL, MS_MOVE, NULL);
4e97e4e9 2456diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
ad8f4a28 2457index 614241b..f3ea292 100644
4e97e4e9 2458--- a/init/do_mounts_initrd.c
2459+++ b/init/do_mounts_initrd.c
24613191 2460@@ -6,6 +6,7 @@
2461 #include <linux/romfs_fs.h>
2462 #include <linux/initrd.h>
2463 #include <linux/sched.h>
2464+#include <linux/suspend.h>
2465 #include <linux/freezer.h>
2466
2467 #include "do_mounts.h"
ad8f4a28
AM
2468@@ -68,6 +69,11 @@ static void __init handle_initrd(void)
2469
2470 current->flags &= ~PF_FREEZER_SKIP;
24613191 2471
e8d0ad9d 2472+ if (!resume_attempted)
4e97e4e9 2473+ printk(KERN_ERR "TuxOnIce: No attempt was made to resume from "
73c609d5 2474+ "any image that might exist.\n");
4e97e4e9 2475+ clear_toi_state(TOI_BOOT_TIME);
24613191 2476+
2477 /* move initrd to rootfs' /old */
2478 sys_fchdir(old_fd);
2479 sys_mount("/", ".", NULL, MS_MOVE, NULL);
4e97e4e9 2480diff --git a/init/main.c b/init/main.c
7f9d2ee0 2481index 99ce949..f4f8330 100644
4e97e4e9 2482--- a/init/main.c
2483+++ b/init/main.c
ad8f4a28 2484@@ -56,6 +56,7 @@
24613191 2485 #include <linux/pid_namespace.h>
2486 #include <linux/device.h>
43540741 2487 #include <linux/kthread.h>
4e97e4e9 2488+#include <linux/dyn_pageflags.h>
ad8f4a28 2489 #include <linux/sched.h>
7f9d2ee0 2490 #include <linux/signal.h>
24613191 2491
ad8f4a28 2492@@ -572,6 +573,7 @@ asmlinkage void __init start_kernel(void)
4e97e4e9 2493 softirq_init();
2494 timekeeping_init();
2495 time_init();
2496+ dyn_pageflags_init();
2497 profile_init();
2498 if (!irqs_disabled())
2499 printk("start_kernel(): bug: interrupts were enabled early\n");
7f9d2ee0 2500@@ -610,6 +612,7 @@ asmlinkage void __init start_kernel(void)
2501 enable_debug_pagealloc();
2502 cpu_hotplug_init();
4e97e4e9 2503 kmem_cache_init();
2504+ dyn_pageflags_use_kzalloc();
2505 setup_per_cpu_pageset();
2506 numa_policy_init();
2507 if (late_time_init)
2508diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
7f9d2ee0 2509index 6233f3b..8bd74dc 100644
4e97e4e9 2510--- a/kernel/power/Kconfig
2511+++ b/kernel/power/Kconfig
ad8f4a28
AM
2512@@ -44,6 +44,18 @@ config PM_VERBOSE
2513 ---help---
2514 This option enables verbose messages from the Power Management code.
24613191 2515
2516+config PRINTK_NOSAVE
e8d0ad9d 2517+ depends on PM && PM_DEBUG
24613191 2518+ bool "Preserve printk data from boot kernel when resuming."
2519+ default n
2520+ ---help---
2521+ This option gives printk data and the associated variables the
e8d0ad9d 2522+ attribute __nosave, which means that they will not be saved as
24613191 2523+ part of the image. The net effect is that after resuming, your
2524+ dmesg will show the messages from prior to the atomic restore,
2525+ instead of the messages from the resumed kernel. This may be
2526+ useful for debugging hibernation.
2527+
7f9d2ee0 2528 config CAN_PM_TRACE
2529 def_bool y
2530 depends on PM_DEBUG && PM_SLEEP && EXPERIMENTAL
2531@@ -178,6 +190,258 @@ config PM_STD_PARTITION
4e97e4e9 2532 suspended image to. It will simply pick the first available swap
2533 device.
2534
2535+menuconfig TOI_CORE
2536+ tristate "Enhanced Hibernation (TuxOnIce)"
2537+ depends on HIBERNATION
24613191 2538+ default y
2539+ ---help---
4e97e4e9 2540+ TuxOnIce is the 'new and improved' suspend support.
24613191 2541+
4e97e4e9 2542+ See the TuxOnIce home page (tuxonice.net)
24613191 2543+ for FAQs, HOWTOs and other documentation.
2544+
2545+ comment "Image Storage (you need at least one allocator)"
4e97e4e9 2546+ depends on TOI_CORE
24613191 2547+
4e97e4e9 2548+ config TOI_FILE
24613191 2549+ tristate "File Allocator"
4e97e4e9 2550+ depends on TOI_CORE
24613191 2551+ default y
2552+ ---help---
2553+ This option enables support for storing an image in a
2554+ simple file. This should be possible, but we're still
2555+ testing it.
2556+
4e97e4e9 2557+ config TOI_SWAP
24613191 2558+ tristate "Swap Allocator"
4e97e4e9 2559+ depends on TOI_CORE && SWAP
24613191 2560+ default y
24613191 2561+ ---help---
2562+ This option enables support for storing an image in your
2563+ swap space.
2564+
2565+ comment "General Options"
4e97e4e9 2566+ depends on TOI_CORE
24613191 2567+
ad8f4a28
AM
2568+ config TOI_DEFAULT_PRE_HIBERNATE
2569+ string "Default pre-hibernate command"
2570+ depends on TOI_CORE
2571+ ---help---
2572+ This entry allows you to specify a command to be run prior
2573+ to starting a hibernation cycle. If this command returns
7f9d2ee0 2574+ a non-zero result code, hibernating will be aborted. If
2575+ you're starting hibernation via the hibernate script,
2576+ this value should probably be blank.
ad8f4a28
AM
2577+
2578+ config TOI_DEFAULT_POST_HIBERNATE
2579+ string "Default post-resume command"
2580+ depends on TOI_CORE
2581+ ---help---
2582+ This entry allows you to specify a command to be run after
2583+ completing a hibernation cycle. The return code of this
7f9d2ee0 2584+ command is ignored. If you're starting hibernation via the
2585+ hibernate script, this value should probably be blank.
ad8f4a28 2586+
4e97e4e9 2587+ config TOI_CRYPTO
24613191 2588+ tristate "Compression support"
4e97e4e9 2589+ depends on TOI_CORE && CRYPTO
24613191 2590+ default y
2591+ ---help---
2592+ This option adds support for using cryptoapi compression
2593+ algorithms. Compression is particularly useful as
4e97e4e9 2594+ the LZF support that comes with the TuxOnIce patch can double
24613191 2595+ your suspend and resume speed.
2596+
2597+ You probably want this, so say Y here.
2598+
2599+ comment "No compression support available without Cryptoapi support."
4e97e4e9 2600+ depends on TOI_CORE && !CRYPTO
24613191 2601+
4e97e4e9 2602+ config TOI_USERUI
73c609d5 2603+ tristate "Userspace User Interface support"
4e97e4e9 2604+ depends on TOI_CORE && NET && (VT || SERIAL_CONSOLE)
73c609d5 2605+ default y
2606+ ---help---
2607+ This option enabled support for a userspace based user interface
4e97e4e9 2608+ to TuxOnIce, which allows you to have a nice display while suspending
73c609d5 2609+ and resuming, and also enables features such as pressing escape to
2610+ cancel a cycle or interactive debugging.
2611+
4e97e4e9 2612+ config TOI_USERUI_DEFAULT_PATH
2613+ string "Default userui program location"
2614+ default "/usr/local/sbin/tuxonice_fbsplash"
2615+ depends on TOI_USERUI
24613191 2616+ ---help---
4e97e4e9 2617+ This entry allows you to specify a default path to the userui binary.
24613191 2618+
4e97e4e9 2619+ config TOI_KEEP_IMAGE
24613191 2620+ bool "Allow Keep Image Mode"
4e97e4e9 2621+ depends on TOI_CORE
24613191 2622+ ---help---
2623+ This option allows you to keep and image and reuse it. It is intended
2624+ __ONLY__ for use with systems where all filesystems are mounted read-
2625+ only (kiosks, for example). To use it, compile this option in and boot
4e97e4e9 2626+ normally. Set the KEEP_IMAGE flag in /sys/power/tuxonice and suspend.
24613191 2627+ When you resume, the image will not be removed. You will be unable to turn
2628+ off swap partitions (assuming you are using the swap allocator), but future
2629+ suspends simply do a power-down. The image can be updated using the
2630+ kernel command line parameter suspend_act= to turn off the keep image
2631+ bit. Keep image mode is a little less user friendly on purpose - it
2632+ should not be used without thought!
2633+
4e97e4e9 2634+ config TOI_REPLACE_SWSUSP
24613191 2635+ bool "Replace swsusp by default"
2636+ default y
4e97e4e9 2637+ depends on TOI_CORE
24613191 2638+ ---help---
4e97e4e9 2639+ TuxOnIce can replace swsusp. This option makes that the default state,
2640+ requiring you to echo 0 > /sys/power/tuxonice/replace_swsusp if you want
24613191 2641+ to use the vanilla kernel functionality. Note that your initrd/ramfs will
2642+ need to do this before trying to resume, too.
ad8f4a28
AM
2643+ With overriding swsusp enabled, echoing disk to /sys/power/state will
2644+ start a TuxOnIce cycle. If resume= doesn't specify an allocator and both
2645+ the swap and file allocators are compiled in, the swap allocator will be
2646+ used by default.
2647+
2648+ menuconfig TOI_CLUSTER
24613191 2649+ tristate "Cluster support"
2650+ default n
7f9d2ee0 2651+ depends on TOI_CORE && NET && BROKEN
24613191 2652+ ---help---
2653+ Support for linking multiple machines in a cluster so that they suspend
2654+ and resume together.
2655+
ad8f4a28
AM
2656+ config TOI_DEFAULT_CLUSTER_INTERFACE
2657+ string "Default cluster interface"
4e97e4e9 2658+ depends on TOI_CLUSTER
24613191 2659+ ---help---
ad8f4a28
AM
2660+ The default interface on which to communicate with other nodes in
2661+ the cluster.
2662+
24613191 2663+ If no value is set here, cluster support will be disabled by default.
2664+
ad8f4a28
AM
2665+ config TOI_DEFAULT_CLUSTER_KEY
2666+ string "Default cluster key"
2667+ default "Default"
2668+ depends on TOI_CLUSTER
2669+ ---help---
2670+ The default key used by this node. All nodes in the same cluster
2671+ have the same key. Multiple clusters may coexist on the same lan
2672+ by using different values for this key.
2673+
2674+ config TOI_CLUSTER_IMAGE_TIMEOUT
2675+ int "Timeout when checking for image"
2676+ default 15
2677+ depends on TOI_CLUSTER
2678+ ---help---
2679+ Timeout (seconds) before continuing to boot when waiting to see
2680+ whether other nodes might have an image. Set to -1 to wait
2681+ indefinitely. In WAIT_UNTIL_NODES is non zero, we might continue
2682+ booting sooner than this timeout.
2683+
2684+ config TOI_CLUSTER_WAIT_UNTIL_NODES
2685+ int "Nodes without image before continuing"
2686+ default 0
2687+ depends on TOI_CLUSTER
2688+ ---help---
2689+ When booting and no image is found, we wait to see if other nodes
2690+ have an image before continuing to boot. This value lets us
2691+ continue after seeing a certain number of nodes without an image,
2692+ instead of continuing to wait for the timeout. Set to 0 to only
2693+ use the timeout.
2694+
2695+ config TOI_DEFAULT_CLUSTER_PRE_HIBERNATE
2696+ string "Default pre-hibernate script"
2697+ depends on TOI_CLUSTER
2698+ ---help---
2699+ The default script to be called when starting to hibernate.
2700+
2701+ config TOI_DEFAULT_CLUSTER_POST_HIBERNATE
2702+ string "Default post-hibernate script"
2703+ depends on TOI_CLUSTER
2704+ ---help---
2705+ The default script to be called after resuming from hibernation.
2706+
4e97e4e9 2707+ config TOI_CHECKSUM
24613191 2708+ bool "Checksum pageset2"
4e97e4e9 2709+ default y
2710+ depends on TOI_CORE
24613191 2711+ select CRYPTO
2712+ select CRYPTO_ALGAPI
4e97e4e9 2713+ select CRYPTO_MD4
24613191 2714+ ---help---
2715+ Adds support for checksumming pageset2 pages, to ensure you really get an
4e97e4e9 2716+ atomic copy. Since some filesystems (XFS especially) change metadata even
2717+ when there's no other activity, we need this to check for pages that have
2718+ been changed while we were saving the page cache. If your debugging output
2719+ always says no pages were resaved, you may be able to safely disable this
2720+ option.
2721+
2722+ config TOI_DEFAULT_WAIT
2723+ int "Default waiting time for emergency boot messages"
2724+ default "25"
2725+ range -1 32768
2726+ depends on TOI_CORE
2727+ help
2728+ TuxOnIce can display warnings very early in the process of resuming,
2729+ if (for example) it appears that you have booted a kernel that doesn't
2730+ match an image on disk. It can then give you the opportunity to either
2731+ continue booting that kernel, or reboot the machine. This option can be
2732+ used to control how long to wait in such circumstances. -1 means wait
2733+ forever. 0 means don't wait at all (do the default action, which will
2734+ generally be to continue booting and remove the image). Values of 1 or
2735+ more indicate a number of seconds (up to 255) to wait before doing the
2736+ default.
2737+
ad8f4a28
AM
2738+ config TOI_PAGEFLAGS_TEST
2739+ tristate "Test pageflags"
2740+ default N
2741+ depends on TOI_CORE
2742+ help
2743+ Test pageflags.
2744+
2745+config TOI_PAGEFLAGS_EXPORTS
2746+ bool
2747+ depends on TOI_PAGEFLAGS_TEST=m
2748+ default y
2749+
4e97e4e9 2750+config TOI_USERUI_EXPORTS
24613191 2751+ bool
4e97e4e9 2752+ depends on TOI_USERUI=m
24613191 2753+ default y
2754+
4e97e4e9 2755+config TOI_SWAP_EXPORTS
73c609d5 2756+ bool
4e97e4e9 2757+ depends on TOI_SWAP=m
73c609d5 2758+ default y
2759+
4e97e4e9 2760+config TOI_FILE_EXPORTS
24613191 2761+ bool
4e97e4e9 2762+ depends on TOI_FILE=m
24613191 2763+ default y
2764+
4e97e4e9 2765+config TOI_CRYPTO_EXPORTS
24613191 2766+ bool
4e97e4e9 2767+ depends on TOI_CRYPTO=m
24613191 2768+ default y
2769+
4e97e4e9 2770+config TOI_CORE_EXPORTS
24613191 2771+ bool
4e97e4e9 2772+ depends on TOI_CORE=m
24613191 2773+ default y
2774+
4e97e4e9 2775+config TOI_EXPORTS
24613191 2776+ bool
4e97e4e9 2777+ depends on TOI_SWAP_EXPORTS || TOI_FILE_EXPORTS || \
2778+ TOI_CRYPTO_EXPORTS || TOI_CLUSTER=m || \
ad8f4a28 2779+ TOI_USERUI_EXPORTS || TOI_PAGEFLAGS_EXPORTS
24613191 2780+ default y
2781+
4e97e4e9 2782+config TOI
24613191 2783+ bool
4e97e4e9 2784+ depends on TOI_CORE!=n
24613191 2785+ default y
2786+
4e97e4e9 2787 config APM_EMULATION
2788 tristate "Advanced Power Management Emulation"
2789 depends on PM && SYS_SUPPORTS_APM_EMULATION
2790diff --git a/kernel/power/Makefile b/kernel/power/Makefile
ad8f4a28 2791index f7dfff2..8ea53fa 100644
4e97e4e9 2792--- a/kernel/power/Makefile
2793+++ b/kernel/power/Makefile
ad8f4a28 2794@@ -5,6 +5,37 @@ endif
24613191 2795
4e97e4e9 2796 obj-y := main.o
24613191 2797 obj-$(CONFIG_PM_LEGACY) += pm.o
24613191 2798+
4e97e4e9 2799+tuxonice_core-objs := tuxonice_modules.o tuxonice_sysfs.o tuxonice_highlevel.o \
2800+ tuxonice_io.o tuxonice_pagedir.o tuxonice_prepare_image.o \
2801+ tuxonice_extent.o tuxonice_pageflags.o tuxonice_ui.o \
2802+ tuxonice_power_off.o tuxonice_atomic_copy.o
24613191 2803+
4e97e4e9 2804+obj-$(CONFIG_TOI) += tuxonice_builtin.o
24613191 2805+
ad8f4a28
AM
2806+ifdef CONFIG_PM_DEBUG
2807+tuxonice_core-objs += tuxonice_alloc.o
2808+endif
2809+
4e97e4e9 2810+ifdef CONFIG_TOI_CHECKSUM
2811+tuxonice_core-objs += tuxonice_checksum.o
24613191 2812+endif
2813+
2814+ifdef CONFIG_NET
4e97e4e9 2815+tuxonice_core-objs += tuxonice_storage.o tuxonice_netlink.o
24613191 2816+endif
2817+
4e97e4e9 2818+obj-$(CONFIG_TOI_CORE) += tuxonice_core.o
2819+obj-$(CONFIG_TOI_CRYPTO) += tuxonice_compress.o
24613191 2820+
4e97e4e9 2821+obj-$(CONFIG_TOI_SWAP) += tuxonice_block_io.o tuxonice_swap.o
2822+obj-$(CONFIG_TOI_FILE) += tuxonice_block_io.o tuxonice_file.o
2823+obj-$(CONFIG_TOI_CLUSTER) += tuxonice_cluster.o
24613191 2824+
4e97e4e9 2825+obj-$(CONFIG_TOI_USERUI) += tuxonice_userui.o
ad8f4a28
AM
2826+
2827+obj-$(CONFIG_TOI_PAGEFLAGS_TEST) += toi_pageflags_test.o
73c609d5 2828+
4e97e4e9 2829 obj-$(CONFIG_PM_SLEEP) += process.o console.o
2830 obj-$(CONFIG_HIBERNATION) += swsusp.o disk.o snapshot.o swap.o user.o
24613191 2831
4e97e4e9 2832diff --git a/kernel/power/disk.c b/kernel/power/disk.c
7f9d2ee0 2833index 14a656c..141606e 100644
4e97e4e9 2834--- a/kernel/power/disk.c
2835+++ b/kernel/power/disk.c
7f9d2ee0 2836@@ -24,9 +24,11 @@
4e97e4e9 2837
2838 #include "power.h"
2839
2840+#include "tuxonice.h"
2841+#include "tuxonice_builtin.h"
2842
2843 static int noresume = 0;
7f9d2ee0 2844-static char resume_file[256] = CONFIG_PM_STD_PARTITION;
2845+char resume_file[256] = CONFIG_PM_STD_PARTITION;
2846 dev_t swsusp_resume_device;
2847 sector_t swsusp_resume_block;
2848
2849@@ -104,7 +106,7 @@ static int hibernation_test(int level) { return 0; }
ad8f4a28
AM
2850 * hibernation
2851 */
2852
7f9d2ee0 2853-static int platform_begin(int platform_mode)
2854+int platform_begin(int platform_mode)
ad8f4a28
AM
2855 {
2856 return (platform_mode && hibernation_ops) ?
7f9d2ee0 2857 hibernation_ops->begin() : 0;
2858@@ -115,7 +117,7 @@ static int platform_begin(int platform_mode)
2859 * working state
2860 */
2861
2862-static void platform_end(int platform_mode)
2863+void platform_end(int platform_mode)
2864 {
2865 if (platform_mode && hibernation_ops)
2866 hibernation_ops->end();
2867@@ -126,7 +128,7 @@ static void platform_end(int platform_mode)
ad8f4a28
AM
2868 * platform driver if so configured and return an error code if it fails
2869 */
4e97e4e9 2870
ad8f4a28
AM
2871-static int platform_pre_snapshot(int platform_mode)
2872+int platform_pre_snapshot(int platform_mode)
2873 {
2874 return (platform_mode && hibernation_ops) ?
2875 hibernation_ops->pre_snapshot() : 0;
7f9d2ee0 2876@@ -137,7 +139,7 @@ static int platform_pre_snapshot(int platform_mode)
ad8f4a28
AM
2877 * of operation using the platform driver (called with interrupts disabled)
2878 */
4e97e4e9 2879
ad8f4a28
AM
2880-static void platform_leave(int platform_mode)
2881+void platform_leave(int platform_mode)
2882 {
2883 if (platform_mode && hibernation_ops)
2884 hibernation_ops->leave();
7f9d2ee0 2885@@ -148,7 +150,7 @@ static void platform_leave(int platform_mode)
ad8f4a28
AM
2886 * using the platform driver (must be called after platform_prepare())
2887 */
4e97e4e9 2888
ad8f4a28
AM
2889-static void platform_finish(int platform_mode)
2890+void platform_finish(int platform_mode)
2891 {
2892 if (platform_mode && hibernation_ops)
2893 hibernation_ops->finish();
7f9d2ee0 2894@@ -160,7 +162,7 @@ static void platform_finish(int platform_mode)
ad8f4a28
AM
2895 * called, platform_restore_cleanup() must be called.
2896 */
2897
2898-static int platform_pre_restore(int platform_mode)
2899+int platform_pre_restore(int platform_mode)
2900 {
2901 return (platform_mode && hibernation_ops) ?
2902 hibernation_ops->pre_restore() : 0;
7f9d2ee0 2903@@ -173,7 +175,7 @@ static int platform_pre_restore(int platform_mode)
ad8f4a28
AM
2904 * regardless of the result of platform_pre_restore().
2905 */
2906
2907-static void platform_restore_cleanup(int platform_mode)
2908+void platform_restore_cleanup(int platform_mode)
2909 {
2910 if (platform_mode && hibernation_ops)
2911 hibernation_ops->restore_cleanup();
7f9d2ee0 2912@@ -477,6 +479,11 @@ int hibernate(void)
4e97e4e9 2913 {
2914 int error;
2915
2916+#ifdef CONFIG_TOI
2917+ if (test_action_state(TOI_REPLACE_SWSUSP))
2918+ return toi_try_hibernate(1);
2919+#endif
24613191 2920+
4e97e4e9 2921 mutex_lock(&pm_mutex);
2922 /* The snapshot device should not be opened while we're running */
2923 if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
7f9d2ee0 2924@@ -549,10 +556,21 @@ int hibernate(void)
4e97e4e9 2925 *
2926 */
2927
2928-static int software_resume(void)
2929+int software_resume(void)
2930 {
2931 int error;
2932 unsigned int flags;
4e97e4e9 2933+ resume_attempted = 1;
24613191 2934+
4e97e4e9 2935+#ifdef CONFIG_TOI
ad8f4a28 2936+ /*
4e97e4e9 2937+ * We can't know (until an image header - if any - is loaded), whether
2938+ * we did override swsusp. We therefore ensure that both are tried.
2939+ */
2940+ if (test_action_state(TOI_REPLACE_SWSUSP))
ad8f4a28 2941+ printk(KERN_INFO "Replacing swsusp.\n");
4e97e4e9 2942+ toi_try_resume();
2943+#endif
ad8f4a28
AM
2944
2945 /*
2946 * name_to_dev_t() below takes a sysfs buffer mutex when sysfs
7f9d2ee0 2947@@ -565,6 +583,7 @@ static int software_resume(void)
ad8f4a28
AM
2948 * here to avoid lockdep complaining.
2949 */
2950 mutex_lock_nested(&pm_mutex, SINGLE_DEPTH_NESTING);
24613191 2951+
4e97e4e9 2952 if (!swsusp_resume_device) {
2953 if (!strlen(resume_file)) {
ad8f4a28 2954 mutex_unlock(&pm_mutex);
7f9d2ee0 2955@@ -636,9 +655,6 @@ static int software_resume(void)
4e97e4e9 2956 return error;
2957 }
2958
2959-late_initcall(software_resume);
2960-
2961-
2962 static const char * const hibernation_modes[] = {
2963 [HIBERNATION_PLATFORM] = "platform",
2964 [HIBERNATION_SHUTDOWN] = "shutdown",
7f9d2ee0 2965@@ -851,6 +867,7 @@ static int __init resume_offset_setup(char *str)
4e97e4e9 2966 static int __init noresume_setup(char *str)
2967 {
2968 noresume = 1;
2969+ set_toi_state(TOI_NORESUME_SPECIFIED);
2970 return 1;
2971 }
2972
2973diff --git a/kernel/power/power.h b/kernel/power/power.h
7f9d2ee0 2974index 700f44e..fdc558a 100644
4e97e4e9 2975--- a/kernel/power/power.h
2976+++ b/kernel/power/power.h
7f9d2ee0 2977@@ -1,7 +1,16 @@
24613191 2978+/*
4e97e4e9 2979+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
24613191 2980+ */
ad8f4a28
AM
2981+
2982+#ifndef KERNEL_POWER_POWER_H
2983+#define KERNEL_POWER_POWER_H
24613191 2984+
4e97e4e9 2985 #include <linux/suspend.h>
7f9d2ee0 2986 #include <linux/suspend_ioctls.h>
4e97e4e9 2987 #include <linux/utsname.h>
7f9d2ee0 2988 #include <linux/freezer.h>
4e97e4e9 2989+#include "tuxonice.h"
2990+#include "tuxonice_builtin.h"
2991
2992 struct swsusp_info {
2993 struct new_utsname uts;
7f9d2ee0 2994@@ -21,18 +30,22 @@ struct swsusp_info {
ad8f4a28
AM
2995 extern int arch_hibernation_header_save(void *addr, unsigned int max_size);
2996 extern int arch_hibernation_header_restore(void *addr);
2997
2998-static inline int init_header_complete(struct swsusp_info *info)
2999+static inline int init_swsusp_header_complete(struct swsusp_info *info)
3000 {
3001 return arch_hibernation_header_save(info, MAX_ARCH_HEADER_SIZE);
3002 }
4e97e4e9 3003
ad8f4a28
AM
3004-static inline char *check_image_kernel(struct swsusp_info *info)
3005+static inline char *check_swsusp_image_kernel(struct swsusp_info *info)
3006 {
3007 return arch_hibernation_header_restore(info) ?
3008 "architecture specific data" : NULL;
3009 }
3010+#else
3011+extern char *check_swsusp_image_kernel(struct swsusp_info *info);
3012 #endif /* CONFIG_ARCH_HIBERNATION_HEADER */
3013+extern int init_swsusp_header(struct swsusp_info *info);
4e97e4e9 3014
4e97e4e9 3015+extern char resume_file[256];
3016 /*
3017 * Keep some memory free so that I/O operations can succeed without paging
3018 * [Might this be more than 4 MB?]
7f9d2ee0 3019@@ -65,6 +78,8 @@ static struct kobj_attribute _name##_attr = { \
3020 .store = _name##_store, \
3021 }
4e97e4e9 3022
3023+extern struct pbe *restore_pblist;
3024+
3025 /* Preferred image size in bytes (default 500 MB) */
3026 extern unsigned long image_size;
3027 extern int in_suspend;
7f9d2ee0 3028@@ -225,3 +240,26 @@ static inline void suspend_thaw_processes(void)
3029 {
4e97e4e9 3030 }
7f9d2ee0 3031 #endif
4e97e4e9 3032+
3033+extern struct page *saveable_page(unsigned long pfn);
24613191 3034+#ifdef CONFIG_HIGHMEM
4e97e4e9 3035+extern struct page *saveable_highmem_page(unsigned long pfn);
3036+#else
3037+static inline void *saveable_highmem_page(unsigned long pfn) { return NULL; }
3038+#endif
24613191 3039+
4e97e4e9 3040+#define PBES_PER_PAGE (PAGE_SIZE / sizeof(struct pbe))
ad8f4a28
AM
3041+extern struct list_head nosave_regions;
3042+
3043+/**
3044+ * This structure represents a range of page frames the contents of which
3045+ * should not be saved during the suspend.
3046+ */
3047+
3048+struct nosave_region {
3049+ struct list_head list;
3050+ unsigned long start_pfn;
3051+ unsigned long end_pfn;
3052+};
3053+
3054+#endif
4e97e4e9 3055diff --git a/kernel/power/process.c b/kernel/power/process.c
7f9d2ee0 3056index f1d0b34..b835412 100644
4e97e4e9 3057--- a/kernel/power/process.c
3058+++ b/kernel/power/process.c
ad8f4a28 3059@@ -13,6 +13,10 @@
4e97e4e9 3060 #include <linux/module.h>
3061 #include <linux/syscalls.h>
3062 #include <linux/freezer.h>
3063+#include <linux/buffer_head.h>
24613191 3064+
ad8f4a28
AM
3065+int freezer_state;
3066+EXPORT_SYMBOL(freezer_state);
4e97e4e9 3067
3068 /*
3069 * Timeout for stopping processes
ad8f4a28
AM
3070@@ -74,6 +78,7 @@ void refrigerator(void)
3071 pr_debug("%s left refrigerator\n", current->comm);
3072 __set_current_state(save);
3073 }
3074+EXPORT_SYMBOL(refrigerator);
3075
7f9d2ee0 3076 static void fake_signal_wake_up(struct task_struct *p)
ad8f4a28 3077 {
7f9d2ee0 3078@@ -214,7 +219,8 @@ static int try_to_freeze_tasks(int freeze_user_space)
4e97e4e9 3079 do_each_thread(g, p) {
3080 task_lock(p);
3081 if (freezing(p) && !freezer_should_skip(p))
3082- printk(KERN_ERR " %s\n", p->comm);
ad8f4a28
AM
3083+ printk(KERN_ERR " %s (%d) failed to freeze.\n",
3084+ p->comm, p->pid);
4e97e4e9 3085 cancel_freezing(p);
3086 task_unlock(p);
3087 } while_each_thread(g, p);
7f9d2ee0 3088@@ -234,17 +240,25 @@ int freeze_processes(void)
4e97e4e9 3089 {
3090 int error;
3091
ad8f4a28
AM
3092- printk("Freezing user space processes ... ");
3093+ printk(KERN_INFO "Stopping fuse filesystems.\n");
3094+ freeze_filesystems(FS_FREEZER_FUSE);
4e97e4e9 3095+ freezer_state = FREEZER_FILESYSTEMS_FROZEN;
ad8f4a28 3096+ printk(KERN_INFO "Freezing user space processes ... ");
4e97e4e9 3097 error = try_to_freeze_tasks(FREEZER_USER_SPACE);
3098 if (error)
ad8f4a28
AM
3099 goto Exit;
3100- printk("done.\n");
3101+ printk(KERN_INFO "done.\n");
4e97e4e9 3102
ad8f4a28
AM
3103- printk("Freezing remaining freezable tasks ... ");
3104+ sys_sync();
3105+ printk(KERN_INFO "Stopping normal filesystems.\n");
3106+ freeze_filesystems(FS_FREEZER_NORMAL);
4e97e4e9 3107+ freezer_state = FREEZER_USERSPACE_FROZEN;
ad8f4a28 3108+ printk(KERN_INFO "Freezing remaining freezable tasks ... ");
4e97e4e9 3109 error = try_to_freeze_tasks(FREEZER_KERNEL_THREADS);
3110 if (error)
ad8f4a28 3111 goto Exit;
7f9d2ee0 3112 printk("done.");
4e97e4e9 3113+ freezer_state = FREEZER_FULLY_ON;
ad8f4a28 3114 Exit:
4e97e4e9 3115 BUG_ON(in_atomic());
ad8f4a28 3116 printk("\n");
7f9d2ee0 3117@@ -270,11 +284,33 @@ static void thaw_tasks(int thaw_user_space)
4e97e4e9 3118
3119 void thaw_processes(void)
3120 {
ad8f4a28
AM
3121- printk("Restarting tasks ... ");
3122- thaw_tasks(FREEZER_KERNEL_THREADS);
4e97e4e9 3123+ int old_state = freezer_state;
24613191 3124+
4e97e4e9 3125+ if (old_state == FREEZER_OFF)
24613191 3126+ return;
3127+
ad8f4a28 3128+ /*
4e97e4e9 3129+ * Change state beforehand because thawed tasks might submit I/O
3130+ * immediately.
3131+ */
3132+ freezer_state = FREEZER_OFF;
24613191 3133+
ad8f4a28
AM
3134+ printk(KERN_INFO "Restarting all filesystems ...\n");
3135+ thaw_filesystems(FS_FREEZER_ALL);
24613191 3136+
ad8f4a28 3137+ printk(KERN_INFO "Restarting tasks ... ");
24613191 3138+
4e97e4e9 3139+ if (old_state == FREEZER_FULLY_ON)
3140+ thaw_tasks(FREEZER_KERNEL_THREADS);
3141 thaw_tasks(FREEZER_USER_SPACE);
3142 schedule();
3143 printk("done.\n");
3144 }
3145
ad8f4a28 3146-EXPORT_SYMBOL(refrigerator);
4e97e4e9 3147+void thaw_kernel_threads(void)
3148+{
3149+ freezer_state = FREEZER_USERSPACE_FROZEN;
ad8f4a28
AM
3150+ printk(KERN_INFO "Restarting normal filesystems.\n");
3151+ thaw_filesystems(FS_FREEZER_NORMAL);
4e97e4e9 3152+ thaw_tasks(FREEZER_KERNEL_THREADS);
3153+}
4e97e4e9 3154diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
7f9d2ee0 3155index 5f91a07..816b60c 100644
4e97e4e9 3156--- a/kernel/power/snapshot.c
3157+++ b/kernel/power/snapshot.c
3158@@ -33,6 +33,7 @@
3159 #include <asm/io.h>
3160
3161 #include "power.h"
3162+#include "tuxonice_builtin.h"
3163
3164 static int swsusp_page_is_free(struct page *);
3165 static void swsusp_set_page_forbidden(struct page *);
3166@@ -44,6 +45,13 @@ static void swsusp_unset_page_forbidden(struct page *);
3167 * directly to their "original" page frames.
3168 */
3169 struct pbe *restore_pblist;
3170+int resume_attempted;
3171+EXPORT_SYMBOL_GPL(resume_attempted);
24613191 3172+
4e97e4e9 3173+#ifdef CONFIG_TOI
3174+#include "tuxonice_pagedir.h"
3175+int toi_post_context_save(void);
3176+#endif
3177
3178 /* Pointer to an auxiliary buffer (1 page) */
3179 static void *buffer;
3180@@ -86,6 +94,11 @@ static void *get_image_page(gfp_t gfp_mask, int safe_needed)
3181
3182 unsigned long get_safe_page(gfp_t gfp_mask)
3183 {
3184+#ifdef CONFIG_TOI
3185+ if (toi_running)
3186+ return toi_get_nonconflicting_page();
3187+#endif
24613191 3188+
4e97e4e9 3189 return (unsigned long)get_image_page(gfp_mask, PG_SAFE);
3190 }
3191
7f9d2ee0 3192@@ -607,18 +620,8 @@ static unsigned long memory_bm_next_pfn(struct memory_bitmap *bm)
ad8f4a28
AM
3193 return bb->start_pfn + chunk * BM_BITS_PER_CHUNK + bit;
3194 }
4e97e4e9 3195
ad8f4a28
AM
3196-/**
3197- * This structure represents a range of page frames the contents of which
3198- * should not be saved during the suspend.
3199- */
3200-
3201-struct nosave_region {
3202- struct list_head list;
3203- unsigned long start_pfn;
3204- unsigned long end_pfn;
3205-};
3206-
4e97e4e9 3207-static LIST_HEAD(nosave_regions);
3208+LIST_HEAD(nosave_regions);
4e97e4e9 3209+EXPORT_SYMBOL_GPL(nosave_regions);
3210
3211 /**
3212 * register_nosave_region - register a range of page frames the contents
7f9d2ee0 3213@@ -855,7 +858,7 @@ static unsigned int count_free_highmem_pages(void)
4e97e4e9 3214 * and it isn't a part of a free chunk of pages.
3215 */
3216
3217-static struct page *saveable_highmem_page(unsigned long pfn)
3218+struct page *saveable_highmem_page(unsigned long pfn)
3219 {
3220 struct page *page;
3221
7f9d2ee0 3222@@ -897,8 +900,6 @@ unsigned int count_highmem_pages(void)
3223 }
4e97e4e9 3224 return n;
3225 }
7f9d2ee0 3226-#else
4e97e4e9 3227-static inline void *saveable_highmem_page(unsigned long pfn) { return NULL; }
4e97e4e9 3228 #endif /* CONFIG_HIGHMEM */
3229
7f9d2ee0 3230 /**
3231@@ -910,7 +911,7 @@ static inline void *saveable_highmem_page(unsigned long pfn) { return NULL; }
4e97e4e9 3232 * a free chunk of pages.
3233 */
3234
3235-static struct page *saveable_page(unsigned long pfn)
3236+struct page *saveable_page(unsigned long pfn)
3237 {
3238 struct page *page;
3239
7f9d2ee0 3240@@ -1244,6 +1245,11 @@ asmlinkage int swsusp_save(void)
4e97e4e9 3241 {
3242 unsigned int nr_pages, nr_highmem;
3243
3244+#ifdef CONFIG_TOI
3245+ if (toi_running)
3246+ return toi_post_context_save();
24613191 3247+#endif
4e97e4e9 3248+
7f9d2ee0 3249 printk(KERN_INFO "PM: Creating hibernation image: \n");
4e97e4e9 3250
7f9d2ee0 3251 drain_local_pages(NULL);
3252@@ -1284,14 +1290,14 @@ asmlinkage int swsusp_save(void)
ad8f4a28
AM
3253 }
3254
3255 #ifndef CONFIG_ARCH_HIBERNATION_HEADER
3256-static int init_header_complete(struct swsusp_info *info)
3257+int init_swsusp_header_complete(struct swsusp_info *info)
3258 {
3259 memcpy(&info->uts, init_utsname(), sizeof(struct new_utsname));
3260 info->version_code = LINUX_VERSION_CODE;
3261 return 0;
3262 }
3263
3264-static char *check_image_kernel(struct swsusp_info *info)
3265+char *check_swsusp_image_kernel(struct swsusp_info *info)
3266 {
3267 if (info->version_code != LINUX_VERSION_CODE)
3268 return "kernel version";
7f9d2ee0 3269@@ -1305,6 +1311,7 @@ static char *check_image_kernel(struct swsusp_info *info)
ad8f4a28
AM
3270 return "machine";
3271 return NULL;
3272 }
3273+EXPORT_SYMBOL_GPL(check_swsusp_image_kernel);
3274 #endif /* CONFIG_ARCH_HIBERNATION_HEADER */
3275
7f9d2ee0 3276 unsigned long snapshot_get_image_size(void)
3277@@ -1312,7 +1319,7 @@ unsigned long snapshot_get_image_size(void)
3278 return nr_copy_pages + nr_meta_pages + 1;
3279 }
3280
ad8f4a28
AM
3281-static int init_header(struct swsusp_info *info)
3282+int init_swsusp_header(struct swsusp_info *info)
3283 {
3284 memset(info, 0, sizeof(struct swsusp_info));
3285 info->num_physpages = num_physpages;
7f9d2ee0 3286@@ -1320,7 +1327,7 @@ static int init_header(struct swsusp_info *info)
3287 info->pages = snapshot_get_image_size();
ad8f4a28
AM
3288 info->size = info->pages;
3289 info->size <<= PAGE_SHIFT;
3290- return init_header_complete(info);
3291+ return init_swsusp_header_complete(info);
3292 }
3293
3294 /**
7f9d2ee0 3295@@ -1376,7 +1383,7 @@ int snapshot_read_next(struct snapshot_handle *handle, size_t count)
ad8f4a28
AM
3296 if (!handle->offset) {
3297 int error;
3298
3299- error = init_header((struct swsusp_info *)buffer);
3300+ error = init_swsusp_header((struct swsusp_info *)buffer);
3301 if (error)
3302 return error;
3303 handle->buffer = buffer;
7f9d2ee0 3304@@ -1473,7 +1480,7 @@ static int check_header(struct swsusp_info *info)
ad8f4a28
AM
3305 {
3306 char *reason;
3307
3308- reason = check_image_kernel(info);
3309+ reason = check_swsusp_image_kernel(info);
3310 if (!reason && info->num_physpages != num_physpages)
3311 reason = "memory size";
3312 if (reason) {
3313diff --git a/kernel/power/toi_pageflags_test.c b/kernel/power/toi_pageflags_test.c
3314new file mode 100644
3315index 0000000..381f05b
3316--- /dev/null
3317+++ b/kernel/power/toi_pageflags_test.c
3318@@ -0,0 +1,80 @@
3319+/*
3320+ * TuxOnIce pageflags tester.
3321+ */
3322+
3323+#include "linux/module.h"
3324+#include "linux/bootmem.h"
3325+#include "linux/sched.h"
3326+#include "linux/dyn_pageflags.h"
3327+
3328+DECLARE_DYN_PAGEFLAGS(test_map);
3329+
3330+static char *bits_on(void)
3331+{
3332+ char *page = (char *) get_zeroed_page(GFP_KERNEL);
3333+ unsigned long index = get_next_bit_on(&test_map, max_pfn + 1);
3334+ int pos = 0;
3335+
3336+ while (index <= max_pfn) {
3337+ pos += snprintf_used(page + pos, PAGE_SIZE - pos - 1, "%d ",
3338+ index);
3339+ index = get_next_bit_on(&test_map, index);
3340+ }
3341+
3342+ return page;
3343+}
3344+
3345+static __init int do_check(void)
3346+{
3347+ unsigned long index;
3348+ int step = 1, steps = 100;
3349+
3350+ allocate_dyn_pageflags(&test_map, 0);
3351+
3352+ for (index = 1; index < max_pfn; index++) {
3353+ char *result;
3354+ char compare[100];
3355+
3356+ if (index > (max_pfn / steps * step)) {
3357+ printk(KERN_INFO "%d/%d\r", step, steps);
3358+ step++;
3359+ }
3360+
3361+
3362+ if (!pfn_valid(index))
3363+ continue;
3364+
3365+ clear_dyn_pageflags(&test_map);
3366+ set_dynpageflag(&test_map, pfn_to_page(0));
3367+ set_dynpageflag(&test_map, pfn_to_page(index));
3368+
3369+ sprintf(compare, "0 %lu ", index);
3370+
3371+ result = bits_on();
3372+
3373+ if (strcmp(result, compare)) {
3374+ printk(KERN_INFO "Expected \"%s\", got \"%s\"\n",
3375+ result, compare);
3376+ }
3377+
3378+ free_page((unsigned long) result);
3379+ schedule();
3380+ }
3381+
3382+ free_dyn_pageflags(&test_map);
3383+ return 0;
3384+}
3385+
3386+#ifdef MODULE
3387+static __exit void check_unload(void)
3388+{
3389+}
3390+
3391+module_init(do_check);
3392+module_exit(check_unload);
3393+MODULE_AUTHOR("Nigel Cunningham");
3394+MODULE_DESCRIPTION("Pageflags testing");
3395+MODULE_LICENSE("GPL");
3396+#else
3397+late_initcall(do_check);
3398+#endif
4e97e4e9 3399diff --git a/kernel/power/tuxonice.h b/kernel/power/tuxonice.h
3400new file mode 100644
7f9d2ee0 3401index 0000000..6943b24
4e97e4e9 3402--- /dev/null
3403+++ b/kernel/power/tuxonice.h
7f9d2ee0 3404@@ -0,0 +1,210 @@
4e97e4e9 3405+/*
3406+ * kernel/power/tuxonice.h
3407+ *
3408+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
3409+ *
3410+ * This file is released under the GPLv2.
3411+ *
3412+ * It contains declarations used throughout swsusp.
3413+ *
3414+ */
3415+
3416+#ifndef KERNEL_POWER_TOI_H
3417+#define KERNEL_POWER_TOI_H
3418+
3419+#include <linux/delay.h>
3420+#include <linux/bootmem.h>
3421+#include <linux/suspend.h>
3422+#include <linux/dyn_pageflags.h>
3423+#include <linux/fs.h>
ad8f4a28 3424+#include <linux/kmod.h>
4e97e4e9 3425+#include <asm/setup.h>
3426+#include "tuxonice_pageflags.h"
3427+
7f9d2ee0 3428+#define TOI_CORE_VERSION "3.0-rc7"
ad8f4a28
AM
3429+
3430+#define MY_BOOT_KERNEL_DATA_VERSION 1
3431+
3432+struct toi_boot_kernel_data {
3433+ int version;
3434+ int size;
3435+ unsigned long toi_action;
3436+ unsigned long toi_debug_state;
3437+ int toi_default_console_level;
3438+ int toi_io_time[2][2];
3439+ char toi_nosave_commandline[COMMAND_LINE_SIZE];
3440+};
3441+
3442+extern struct toi_boot_kernel_data toi_bkd;
3443+
3444+/* Location of book kernel data struct in kernel being resumed */
3445+extern unsigned long boot_kernel_data_buffer;
4e97e4e9 3446+
3447+/* == Action states == */
3448+
3449+enum {
3450+ TOI_REBOOT,
3451+ TOI_PAUSE,
4e97e4e9 3452+ TOI_LOGALL,
3453+ TOI_CAN_CANCEL,
3454+ TOI_KEEP_IMAGE,
3455+ TOI_FREEZER_TEST,
3456+ TOI_SINGLESTEP,
3457+ TOI_PAUSE_NEAR_PAGESET_END,
3458+ TOI_TEST_FILTER_SPEED,
3459+ TOI_TEST_BIO,
3460+ TOI_NO_PAGESET2,
3461+ TOI_PM_PREPARE_CONSOLE,
3462+ TOI_IGNORE_ROOTFS,
3463+ TOI_REPLACE_SWSUSP,
3464+ TOI_PAGESET2_FULL,
3465+ TOI_ABORT_ON_RESAVE_NEEDED,
3466+ TOI_NO_MULTITHREADED_IO,
3467+ TOI_NO_DIRECT_LOAD,
3468+ TOI_LATE_CPU_HOTPLUG,
7f9d2ee0 3469+ TOI_GET_MAX_MEM_ALLOCD,
3470+ TOI_NO_FLUSHER_THREAD,
4e97e4e9 3471+};
3472+
ad8f4a28
AM
3473+#define clear_action_state(bit) (test_and_clear_bit(bit, &toi_bkd.toi_action))
3474+#define test_action_state(bit) (test_bit(bit, &toi_bkd.toi_action))
4e97e4e9 3475+
3476+/* == Result states == */
3477+
3478+enum {
3479+ TOI_ABORTED,
3480+ TOI_ABORT_REQUESTED,
3481+ TOI_NOSTORAGE_AVAILABLE,
3482+ TOI_INSUFFICIENT_STORAGE,
3483+ TOI_FREEZING_FAILED,
4e97e4e9 3484+ TOI_KEPT_IMAGE,
3485+ TOI_WOULD_EAT_MEMORY,
3486+ TOI_UNABLE_TO_FREE_ENOUGH_MEMORY,
3487+ TOI_PM_SEM,
3488+ TOI_DEVICE_REFUSED,
3489+ TOI_EXTRA_PAGES_ALLOW_TOO_SMALL,
3490+ TOI_UNABLE_TO_PREPARE_IMAGE,
3491+ TOI_FAILED_MODULE_INIT,
3492+ TOI_FAILED_MODULE_CLEANUP,
3493+ TOI_FAILED_IO,
3494+ TOI_OUT_OF_MEMORY,
3495+ TOI_IMAGE_ERROR,
3496+ TOI_PLATFORM_PREP_FAILED,
3497+ TOI_CPU_HOTPLUG_FAILED,
3498+ TOI_ARCH_PREPARE_FAILED,
3499+ TOI_RESAVE_NEEDED,
3500+ TOI_CANT_SUSPEND,
3501+ TOI_NOTIFIERS_PREPARE_FAILED,
ad8f4a28
AM
3502+ TOI_PRE_SNAPSHOT_FAILED,
3503+ TOI_PRE_RESTORE_FAILED,
4e97e4e9 3504+};
3505+
3506+extern unsigned long toi_result;
3507+
3508+#define set_result_state(bit) (test_and_set_bit(bit, &toi_result))
ad8f4a28 3509+#define set_abort_result(bit) (test_and_set_bit(TOI_ABORTED, &toi_result), \
4e97e4e9 3510+ test_and_set_bit(bit, &toi_result))
3511+#define clear_result_state(bit) (test_and_clear_bit(bit, &toi_result))
3512+#define test_result_state(bit) (test_bit(bit, &toi_result))
3513+
3514+/* == Debug sections and levels == */
3515+
3516+/* debugging levels. */
3517+enum {
3518+ TOI_STATUS = 0,
3519+ TOI_ERROR = 2,
3520+ TOI_LOW,
3521+ TOI_MEDIUM,
3522+ TOI_HIGH,
3523+ TOI_VERBOSE,
3524+};
3525+
3526+enum {
3527+ TOI_ANY_SECTION,
3528+ TOI_EAT_MEMORY,
3529+ TOI_IO,
3530+ TOI_HEADER,
3531+ TOI_WRITER,
3532+ TOI_MEMORY,
3533+};
3534+
ad8f4a28 3535+#define set_debug_state(bit) (test_and_set_bit(bit, &toi_bkd.toi_debug_state))
7f9d2ee0 3536+#define clear_debug_state(bit) \
3537+ (test_and_clear_bit(bit, &toi_bkd.toi_debug_state))
ad8f4a28 3538+#define test_debug_state(bit) (test_bit(bit, &toi_bkd.toi_debug_state))
4e97e4e9 3539+
3540+/* == Steps in hibernating == */
3541+
3542+enum {
3543+ STEP_HIBERNATE_PREPARE_IMAGE,
3544+ STEP_HIBERNATE_SAVE_IMAGE,
3545+ STEP_HIBERNATE_POWERDOWN,
3546+ STEP_RESUME_CAN_RESUME,
3547+ STEP_RESUME_LOAD_PS1,
3548+ STEP_RESUME_DO_RESTORE,
3549+ STEP_RESUME_READ_PS2,
3550+ STEP_RESUME_GO,
3551+ STEP_RESUME_ALT_IMAGE,
ad8f4a28
AM
3552+ STEP_CLEANUP,
3553+ STEP_QUIET_CLEANUP
4e97e4e9 3554+};
3555+
3556+/* == TuxOnIce states ==
3557+ (see also include/linux/suspend.h) */
3558+
3559+#define get_toi_state() (toi_state)
3560+#define restore_toi_state(saved_state) \
ad8f4a28 3561+ do { toi_state = saved_state; } while (0)
4e97e4e9 3562+
3563+/* == Module support == */
3564+
3565+struct toi_core_fns {
3566+ int (*post_context_save)(void);
3567+ unsigned long (*get_nonconflicting_page)(void);
3568+ int (*try_hibernate)(int have_pmsem);
3569+ void (*try_resume)(void);
3570+};
3571+
3572+extern struct toi_core_fns *toi_core_fns;
3573+
3574+/* == All else == */
3575+#define KB(x) ((x) << (PAGE_SHIFT - 10))
3576+#define MB(x) ((x) >> (20 - PAGE_SHIFT))
3577+
3578+extern int toi_start_anything(int toi_or_resume);
3579+extern void toi_finish_anything(int toi_or_resume);
3580+
3581+extern int save_image_part1(void);
3582+extern int toi_atomic_restore(void);
3583+
3584+extern int _toi_try_hibernate(int have_pmsem);
3585+extern void __toi_try_resume(void);
3586+
3587+extern int __toi_post_context_save(void);
3588+
3589+extern unsigned int nr_hibernates;
3590+extern char alt_resume_param[256];
3591+
3592+extern void copyback_post(void);
3593+extern int toi_hibernate(void);
7f9d2ee0 3594+extern long extra_pd1_pages_used;
4e97e4e9 3595+
4e97e4e9 3596+#define SECTOR_SIZE 512
3597+
ad8f4a28
AM
3598+extern void toi_early_boot_message(int can_erase_image, int default_answer,
3599+ char *warning_reason, ...);
4e97e4e9 3600+
3601+static inline int load_direct(struct page *page)
3602+{
ad8f4a28
AM
3603+ return test_action_state(TOI_NO_DIRECT_LOAD) ? 0 :
3604+ PagePageset1Copy(page);
4e97e4e9 3605+}
3606+
3607+extern int pre_resume_freeze(void);
ad8f4a28
AM
3608+extern int do_check_can_resume(void);
3609+extern int do_toi_step(int step);
3610+extern int toi_launch_userspace_program(char *command, int channel_no,
3611+ enum umh_wait wait);
7f9d2ee0 3612+
3613+extern char * tuxonice_signature;
4e97e4e9 3614+#endif
ad8f4a28 3615diff --git a/kernel/power/tuxonice_alloc.c b/kernel/power/tuxonice_alloc.c
4e97e4e9 3616new file mode 100644
7f9d2ee0 3617index 0000000..101f65b
4e97e4e9 3618--- /dev/null
ad8f4a28 3619+++ b/kernel/power/tuxonice_alloc.c
7f9d2ee0 3620@@ -0,0 +1,293 @@
4e97e4e9 3621+/*
ad8f4a28 3622+ * kernel/power/tuxonice_alloc.c
4e97e4e9 3623+ *
ad8f4a28 3624+ * Copyright (C) 2007 Nigel Cunningham (nigel at tuxonice net)
4e97e4e9 3625+ *
ad8f4a28 3626+ * This file is released under the GPLv2.
4e97e4e9 3627+ *
4e97e4e9 3628+ */
3629+
ad8f4a28
AM
3630+#ifdef CONFIG_PM_DEBUG
3631+#include <linux/slab.h>
3632+#include <linux/module.h>
3633+#include "tuxonice_modules.h"
3634+#include "tuxonice_sysfs.h"
4e97e4e9 3635+
7f9d2ee0 3636+#define TOI_ALLOC_PATHS 39
ad8f4a28
AM
3637+
3638+DEFINE_MUTEX(toi_alloc_mutex);
3639+
7f9d2ee0 3640+static struct toi_module_ops toi_alloc_ops;
3641+
ad8f4a28
AM
3642+static int toi_fail_num;
3643+static atomic_t toi_alloc_count[TOI_ALLOC_PATHS],
3644+ toi_free_count[TOI_ALLOC_PATHS],
3645+ toi_test_count[TOI_ALLOC_PATHS],
3646+ toi_fail_count[TOI_ALLOC_PATHS];
3647+int toi_cur_allocd[TOI_ALLOC_PATHS], toi_max_allocd[TOI_ALLOC_PATHS];
3648+int cur_allocd, max_allocd;
3649+
3650+static char *toi_alloc_desc[TOI_ALLOC_PATHS] = {
3651+ "", /* 0 */
3652+ "get_io_info_struct",
3653+ "extent",
3654+ "extent (loading chain)",
3655+ "userui channel",
3656+ "userui arg", /* 5 */
3657+ "attention list metadata",
3658+ "extra pagedir memory metadata",
3659+ "bdev metadata",
3660+ "extra pagedir memory",
3661+ "header_locations_read", /* 10 */
3662+ "bio queue",
3663+ "prepare_readahead",
3664+ "i/o buffer",
3665+ "writer buffer in bio_init",
3666+ "checksum buffer", /* 15 */
3667+ "compression buffer",
3668+ "filewriter signature op",
3669+ "set resume param alloc1",
3670+ "set resume param alloc2",
3671+ "debugging info buffer", /* 20 */
3672+ "check can resume buffer",
3673+ "write module config buffer",
3674+ "read module config buffer",
3675+ "write image header buffer",
3676+ "read pageset1 buffer", /* 25 */
3677+ "get_have_image_data buffer",
3678+ "checksum page",
3679+ "worker rw loop",
3680+ "get nonconflicting page",
3681+ "ps1 load addresses", /* 30 */
3682+ "remove swap image",
3683+ "swap image exists",
3684+ "swap parse sig location",
3685+ "sysfs kobj",
3686+ "swap mark resume attempted buffer", /* 35 */
3687+ "cluster member",
7f9d2ee0 3688+ "boot kernel data buffer",
3689+ "setting swap signature"
ad8f4a28 3690+};
24613191 3691+
ad8f4a28
AM
3692+#define MIGHT_FAIL(FAIL_NUM, FAIL_VAL) \
3693+ do { \
3694+ BUG_ON(FAIL_NUM >= TOI_ALLOC_PATHS); \
3695+ \
3696+ if (FAIL_NUM == toi_fail_num) { \
3697+ atomic_inc(&toi_test_count[FAIL_NUM]); \
3698+ toi_fail_num = 0; \
3699+ return FAIL_VAL; \
3700+ } \
3701+ } while (0)
3702+
3703+static void alloc_update_stats(int fail_num, void *result)
24613191 3704+{
ad8f4a28
AM
3705+ if (!result) {
3706+ atomic_inc(&toi_fail_count[fail_num]);
3707+ return;
3708+ }
24613191 3709+
ad8f4a28
AM
3710+ atomic_inc(&toi_alloc_count[fail_num]);
3711+ if (unlikely(test_action_state(TOI_GET_MAX_MEM_ALLOCD))) {
3712+ mutex_lock(&toi_alloc_mutex);
3713+ toi_cur_allocd[fail_num]++;
3714+ cur_allocd++;
3715+ if (unlikely(cur_allocd > max_allocd)) {
3716+ int i;
3717+
3718+ for (i = 0; i < TOI_ALLOC_PATHS; i++)
3719+ toi_max_allocd[i] = toi_cur_allocd[i];
3720+ max_allocd = cur_allocd;
3721+ }
3722+ mutex_unlock(&toi_alloc_mutex);
3723+ }
3724+}
3725+
3726+static void free_update_stats(int fail_num)
3727+{
7f9d2ee0 3728+ BUG_ON(fail_num >= TOI_ALLOC_PATHS);
ad8f4a28
AM
3729+ atomic_inc(&toi_free_count[fail_num]);
3730+ if (unlikely(test_action_state(TOI_GET_MAX_MEM_ALLOCD))) {
3731+ mutex_lock(&toi_alloc_mutex);
3732+ cur_allocd--;
3733+ toi_cur_allocd[fail_num]--;
3734+ mutex_unlock(&toi_alloc_mutex);
3735+ }
3736+}
3737+
3738+void *toi_kzalloc(int fail_num, size_t size, gfp_t flags)
3739+{
3740+ void *result;
3741+
7f9d2ee0 3742+ if (toi_alloc_ops.enabled)
3743+ MIGHT_FAIL(fail_num, NULL);
ad8f4a28 3744+ result = kzalloc(size, flags);
7f9d2ee0 3745+ if (toi_alloc_ops.enabled)
3746+ alloc_update_stats(fail_num, result);
ad8f4a28
AM
3747+ return result;
3748+}
3749+
3750+unsigned long toi_get_free_pages(int fail_num, gfp_t mask,
3751+ unsigned int order)
3752+{
3753+ unsigned long result;
3754+
7f9d2ee0 3755+ if (toi_alloc_ops.enabled)
3756+ MIGHT_FAIL(fail_num, 0);
ad8f4a28 3757+ result = __get_free_pages(mask, order);
7f9d2ee0 3758+ if (toi_alloc_ops.enabled)
3759+ alloc_update_stats(fail_num, (void *) result);
ad8f4a28
AM
3760+ return result;
3761+}
3762+
3763+struct page *toi_alloc_page(int fail_num, gfp_t mask)
3764+{
3765+ struct page *result;
3766+
7f9d2ee0 3767+ if (toi_alloc_ops.enabled)
3768+ MIGHT_FAIL(fail_num, 0);
ad8f4a28 3769+ result = alloc_page(mask);
7f9d2ee0 3770+ if (toi_alloc_ops.enabled)
3771+ alloc_update_stats(fail_num, (void *) result);
ad8f4a28
AM
3772+ return result;
3773+}
3774+
3775+unsigned long toi_get_zeroed_page(int fail_num, gfp_t mask)
3776+{
3777+ unsigned long result;
3778+
7f9d2ee0 3779+ if (toi_alloc_ops.enabled)
3780+ MIGHT_FAIL(fail_num, 0);
ad8f4a28 3781+ result = get_zeroed_page(mask);
7f9d2ee0 3782+ if (toi_alloc_ops.enabled)
3783+ alloc_update_stats(fail_num, (void *) result);
ad8f4a28
AM
3784+ return result;
3785+}
3786+
3787+void toi_kfree(int fail_num, const void *arg)
3788+{
7f9d2ee0 3789+ if (arg && toi_alloc_ops.enabled)
ad8f4a28
AM
3790+ free_update_stats(fail_num);
3791+
3792+ kfree(arg);
3793+}
3794+
3795+void toi_free_page(int fail_num, unsigned long virt)
3796+{
7f9d2ee0 3797+ if (virt && toi_alloc_ops.enabled)
ad8f4a28
AM
3798+ free_update_stats(fail_num);
3799+
3800+ free_page(virt);
3801+}
3802+
3803+void toi__free_page(int fail_num, struct page *page)
3804+{
7f9d2ee0 3805+ if (page && toi_alloc_ops.enabled)
ad8f4a28
AM
3806+ free_update_stats(fail_num);
3807+
3808+ __free_page(page);
3809+}
3810+
3811+void toi_free_pages(int fail_num, struct page *page, int order)
3812+{
7f9d2ee0 3813+ if (page && toi_alloc_ops.enabled)
ad8f4a28
AM
3814+ free_update_stats(fail_num);
3815+
3816+ __free_pages(page, order);
3817+}
3818+
3819+void toi_alloc_print_debug_stats(void)
3820+{
7f9d2ee0 3821+ int i, header_done = 0;
ad8f4a28 3822+
7f9d2ee0 3823+ if (!toi_alloc_ops.enabled)
3824+ return;
ad8f4a28
AM
3825+
3826+ for (i = 0; i < TOI_ALLOC_PATHS; i++)
7f9d2ee0 3827+ if (atomic_read(&toi_alloc_count[i]) !=
3828+ atomic_read(&toi_free_count[i])) {
3829+ if (!header_done) {
3830+ printk(KERN_INFO "Idx Allocs Frees Tests "
3831+ " Fails Max Description\n");
3832+ header_done = 1;
3833+ }
3834+
ad8f4a28
AM
3835+ printk(KERN_INFO "%3d %7d %7d %7d %7d %7d %s\n", i,
3836+ atomic_read(&toi_alloc_count[i]),
3837+ atomic_read(&toi_free_count[i]),
3838+ atomic_read(&toi_test_count[i]),
3839+ atomic_read(&toi_fail_count[i]),
3840+ toi_max_allocd[i],
3841+ toi_alloc_desc[i]);
7f9d2ee0 3842+ }
ad8f4a28
AM
3843+}
3844+EXPORT_SYMBOL_GPL(toi_alloc_print_debug_stats);
3845+
3846+static int toi_alloc_initialise(int starting_cycle)
3847+{
3848+ int i;
3849+
7f9d2ee0 3850+ if (starting_cycle && toi_alloc_ops.enabled) {
ad8f4a28
AM
3851+ for (i = 0; i < TOI_ALLOC_PATHS; i++) {
3852+ atomic_set(&toi_alloc_count[i], 0);
3853+ atomic_set(&toi_free_count[i], 0);
3854+ atomic_set(&toi_test_count[i], 0);
3855+ atomic_set(&toi_fail_count[i], 0);
3856+ toi_cur_allocd[i] = 0;
3857+ toi_max_allocd[i] = 0;
3858+ };
3859+ max_allocd = 0;
3860+ cur_allocd = 0;
3861+ }
3862+
3863+ return 0;
3864+}
3865+
3866+static struct toi_sysfs_data sysfs_params[] = {
3867+ { TOI_ATTR("failure_test", SYSFS_RW),
3868+ SYSFS_INT(&toi_fail_num, 0, 99, 0)
3869+ },
3870+
3871+ { TOI_ATTR("find_max_mem_allocated", SYSFS_RW),
3872+ SYSFS_BIT(&toi_bkd.toi_action, TOI_GET_MAX_MEM_ALLOCD, 0)
7f9d2ee0 3873+ },
3874+
3875+ { TOI_ATTR("enabled", SYSFS_RW),
3876+ SYSFS_INT(&toi_alloc_ops.enabled, 0, 1, 0)
ad8f4a28
AM
3877+ }
3878+};
3879+
3880+static struct toi_module_ops toi_alloc_ops = {
3881+ .type = MISC_HIDDEN_MODULE,
3882+ .name = "allocation debugging",
3883+ .directory = "alloc",
3884+ .module = THIS_MODULE,
3885+ .early = 1,
3886+ .initialise = toi_alloc_initialise,
3887+
3888+ .sysfs_data = sysfs_params,
3889+ .num_sysfs_entries = sizeof(sysfs_params) /
3890+ sizeof(struct toi_sysfs_data),
3891+};
3892+
3893+int toi_alloc_init(void)
3894+{
7f9d2ee0 3895+ int result = toi_register_module(&toi_alloc_ops);
3896+ toi_alloc_ops.enabled = 0;
3897+ return result;
ad8f4a28
AM
3898+}
3899+
3900+void toi_alloc_exit(void)
3901+{
3902+ toi_unregister_module(&toi_alloc_ops);
3903+}
3904+#ifdef CONFIG_TOI_EXPORTS
3905+EXPORT_SYMBOL_GPL(toi_kzalloc);
3906+EXPORT_SYMBOL_GPL(toi_get_free_pages);
3907+EXPORT_SYMBOL_GPL(toi_get_zeroed_page);
3908+EXPORT_SYMBOL_GPL(toi_kfree);
3909+EXPORT_SYMBOL_GPL(toi_free_page);
3910+EXPORT_SYMBOL_GPL(toi__free_page);
7f9d2ee0 3911+EXPORT_SYMBOL_GPL(toi_alloc_page);
ad8f4a28
AM
3912+#endif
3913+#endif
3914diff --git a/kernel/power/tuxonice_alloc.h b/kernel/power/tuxonice_alloc.h
3915new file mode 100644
3916index 0000000..146c2bd
3917--- /dev/null
3918+++ b/kernel/power/tuxonice_alloc.h
3919@@ -0,0 +1,51 @@
3920+/*
3921+ * kernel/power/tuxonice_alloc.h
3922+ *
3923+ * Copyright (C) 2007 Nigel Cunningham (nigel at tuxonice net)
3924+ *
3925+ * This file is released under the GPLv2.
3926+ *
3927+ */
3928+
3929+#define TOI_WAIT_GFP (GFP_KERNEL | __GFP_NOWARN)
3930+#define TOI_ATOMIC_GFP (GFP_ATOMIC | __GFP_NOWARN)
3931+
3932+#ifdef CONFIG_PM_DEBUG
3933+extern void *toi_kzalloc(int fail_num, size_t size, gfp_t flags);
3934+extern void toi_kfree(int fail_num, const void *arg);
3935+
3936+extern unsigned long toi_get_free_pages(int fail_num, gfp_t mask,
3937+ unsigned int order);
3938+#define toi_get_free_page(FAIL_NUM, MASK) toi_get_free_pages(FAIL_NUM, MASK, 0)
3939+extern unsigned long toi_get_zeroed_page(int fail_num, gfp_t mask);
3940+extern void toi_free_page(int fail_num, unsigned long buf);
3941+extern void toi__free_page(int fail_num, struct page *page);
3942+extern void toi_free_pages(int fail_num, struct page *page, int order);
3943+extern struct page *toi_alloc_page(int fail_num, gfp_t mask);
3944+extern int toi_alloc_init(void);
3945+extern void toi_alloc_exit(void);
3946+
3947+extern void toi_alloc_print_debug_stats(void);
3948+
3949+#else /* CONFIG_PM_DEBUG */
3950+
3951+#define toi_kzalloc(FAIL, SIZE, FLAGS) (kzalloc(SIZE, FLAGS))
3952+#define toi_kfree(FAIL, ALLOCN) (kfree(ALLOCN))
3953+
3954+#define toi_get_free_pages(FAIL, FLAGS, ORDER) __get_free_pages(FLAGS, ORDER)
3955+#define toi_get_free_page(FAIL, FLAGS) __get_free_page(FLAGS)
3956+#define toi_get_zeroed_page(FAIL, FLAGS) get_zeroed_page(FLAGS)
3957+#define toi_free_page(FAIL, ALLOCN) do { free_page(ALLOCN); } while (0)
3958+#define toi__free_page(FAIL, PAGE) __free_page(PAGE)
3959+#define toi_free_pages(FAIL, PAGE, ORDER) __free_pages(PAGE, ORDER)
3960+#define toi_alloc_page(FAIL, MASK) alloc_page(MASK)
3961+static inline int toi_alloc_init(void)
3962+{
3963+ return 0;
3964+}
3965+
3966+static inline void toi_alloc_exit(void) { }
3967+
3968+static inline void toi_alloc_print_debug_stats(void) { }
3969+
3970+#endif
3971diff --git a/kernel/power/tuxonice_atomic_copy.c b/kernel/power/tuxonice_atomic_copy.c
3972new file mode 100644
7f9d2ee0 3973index 0000000..597c0d9
ad8f4a28
AM
3974--- /dev/null
3975+++ b/kernel/power/tuxonice_atomic_copy.c
7f9d2ee0 3976@@ -0,0 +1,378 @@
ad8f4a28
AM
3977+/*
3978+ * kernel/power/tuxonice_atomic_copy.c
3979+ *
3980+ * Copyright 2004-2007 Nigel Cunningham (nigel at tuxonice net)
3981+ * Copyright (C) 2006 Red Hat, inc.
3982+ *
3983+ * Distributed under GPLv2.
3984+ *
3985+ * Routines for doing the atomic save/restore.
3986+ */
3987+
3988+#include <linux/suspend.h>
3989+#include <linux/highmem.h>
3990+#include <linux/cpu.h>
3991+#include <linux/freezer.h>
3992+#include <linux/console.h>
3993+#include "tuxonice.h"
3994+#include "tuxonice_storage.h"
3995+#include "tuxonice_power_off.h"
3996+#include "tuxonice_ui.h"
3997+#include "power.h"
3998+#include "tuxonice_io.h"
3999+#include "tuxonice_prepare_image.h"
4000+#include "tuxonice_pageflags.h"
4001+#include "tuxonice_checksum.h"
4002+#include "tuxonice_builtin.h"
4003+#include "tuxonice_atomic_copy.h"
4004+#include "tuxonice_alloc.h"
4005+
7f9d2ee0 4006+long extra_pd1_pages_used;
ad8f4a28
AM
4007+
4008+/**
4009+ * free_pbe_list: Free page backup entries used by the atomic copy code.
4010+ *
4011+ * Normally, this function isn't used. If, however, we need to abort before
4012+ * doing the atomic copy, we use this to free the pbes previously allocated.
4013+ **/
4014+static void free_pbe_list(struct pbe **list, int highmem)
4015+{
4016+ while (*list) {
4017+ int i;
4018+ struct pbe *free_pbe, *next_page = NULL;
4019+ struct page *page;
4020+
4021+ if (highmem) {
4022+ page = (struct page *) *list;
24613191 4023+ free_pbe = (struct pbe *) kmap(page);
e8d0ad9d 4024+ } else {
4025+ page = virt_to_page(*list);
4026+ free_pbe = *list;
4027+ }
24613191 4028+
4029+ for (i = 0; i < PBES_PER_PAGE; i++) {
4030+ if (!free_pbe)
4031+ break;
e8d0ad9d 4032+ if (highmem)
ad8f4a28 4033+ toi__free_page(29, free_pbe->address);
e8d0ad9d 4034+ else
ad8f4a28
AM
4035+ toi_free_page(29,
4036+ (unsigned long) free_pbe->address);
24613191 4037+ free_pbe = free_pbe->next;
4038+ }
4039+
4040+ if (highmem) {
24613191 4041+ if (free_pbe)
e8d0ad9d 4042+ next_page = free_pbe;
24613191 4043+ kunmap(page);
e8d0ad9d 4044+ } else {
4045+ if (free_pbe)
4046+ next_page = free_pbe;
24613191 4047+ }
4048+
ad8f4a28 4049+ toi__free_page(29, page);
e8d0ad9d 4050+ *list = (struct pbe *) next_page;
4051+ };
24613191 4052+}
4053+
4054+/**
4055+ * copyback_post: Post atomic-restore actions.
4056+ *
4057+ * After doing the atomic restore, we have a few more things to do:
4058+ * 1) We want to retain some values across the restore, so we now copy
4059+ * these from the nosave variables to the normal ones.
4060+ * 2) Set the status flags.
4061+ * 3) Resume devices.
e8d0ad9d 4062+ * 4) Tell userui so it can redraw & restore settings.
24613191 4063+ * 5) Reread the page cache.
4064+ **/
4065+
4066+void copyback_post(void)
4067+{
ad8f4a28
AM
4068+ struct toi_boot_kernel_data *bkd =
4069+ (struct toi_boot_kernel_data *) boot_kernel_data_buffer;
24613191 4070+
ad8f4a28
AM
4071+ /*
4072+ * The boot kernel's data may be larger (newer version) or
4073+ * smaller (older version) than ours. Copy the minimum
4074+ * of the two sizes, so that we don't overwrite valid values
4075+ * from pre-atomic copy.
4076+ */
24613191 4077+
ad8f4a28
AM
4078+ memcpy(&toi_bkd, (char *) boot_kernel_data_buffer,
4079+ min_t(int, sizeof(struct toi_boot_kernel_data),
4080+ bkd->size));
24613191 4081+
4e97e4e9 4082+ if (toi_activate_storage(1))
24613191 4083+ panic("Failed to reactivate our storage.");
4084+
4e97e4e9 4085+ toi_ui_post_atomic_restore();
24613191 4086+
4e97e4e9 4087+ toi_cond_pause(1, "About to reload secondary pagedir.");
24613191 4088+
4089+ if (read_pageset2(0))
4090+ panic("Unable to successfully reread the page cache.");
4091+
4e97e4e9 4092+ /*
4093+ * If the user wants to sleep again after resuming from full-off,
4094+ * it's most likely to be in order to suspend to ram, so we'll
4095+ * do this check after loading pageset2, to give them the fastest
4096+ * wakeup when they are ready to use the computer again.
4097+ */
4098+ toi_check_resleep();
24613191 4099+}
4100+
4101+/**
4e97e4e9 4102+ * toi_copy_pageset1: Do the atomic copy of pageset1.
24613191 4103+ *
4104+ * Make the atomic copy of pageset1. We can't use copy_page (as we once did)
4105+ * because we can't be sure what side effects it has. On my old Duron, with
4106+ * 3DNOW, kernel_fpu_begin increments preempt count, making our preempt
4107+ * count at resume time 4 instead of 3.
ad8f4a28 4108+ *
24613191 4109+ * We don't want to call kmap_atomic unconditionally because it has the side
4110+ * effect of incrementing the preempt count, which will leave it one too high
4111+ * post resume (the page containing the preempt count will be copied after
4112+ * its incremented. This is essentially the same problem.
4113+ **/
4114+
4e97e4e9 4115+void toi_copy_pageset1(void)
24613191 4116+{
4117+ int i;
4118+ unsigned long source_index, dest_index;
4119+
4e97e4e9 4120+ source_index = get_next_bit_on(&pageset1_map, max_pfn + 1);
4121+ dest_index = get_next_bit_on(&pageset1_copy_map, max_pfn + 1);
24613191 4122+
4123+ for (i = 0; i < pagedir1.size; i++) {
4124+ unsigned long *origvirt, *copyvirt;
4125+ struct page *origpage, *copypage;
4126+ int loop = (PAGE_SIZE / sizeof(unsigned long)) - 1;
4127+
4128+ origpage = pfn_to_page(source_index);
4129+ copypage = pfn_to_page(dest_index);
ad8f4a28 4130+
24613191 4131+ origvirt = PageHighMem(origpage) ?
4132+ kmap_atomic(origpage, KM_USER0) :
4133+ page_address(origpage);
4134+
ad8f4a28 4135+ copyvirt = PageHighMem(copypage) ?
24613191 4136+ kmap_atomic(copypage, KM_USER1) :
4137+ page_address(copypage);
4138+
4139+ while (loop >= 0) {
4140+ *(copyvirt + loop) = *(origvirt + loop);
4141+ loop--;
4142+ }
ad8f4a28 4143+
24613191 4144+ if (PageHighMem(origpage))
4145+ kunmap_atomic(origvirt, KM_USER0);
4e97e4e9 4146+ else if (toi_faulted) {
ad8f4a28
AM
4147+ printk(KERN_INFO "%p (%lu) being unmapped after "
4148+ "faulting during atomic copy.\n", origpage,
4149+ source_index);
24613191 4150+ kernel_map_pages(origpage, 1, 0);
4e97e4e9 4151+ clear_toi_fault();
24613191 4152+ }
4153+
4154+ if (PageHighMem(copypage))
4155+ kunmap_atomic(copyvirt, KM_USER1);
ad8f4a28 4156+
4e97e4e9 4157+ source_index = get_next_bit_on(&pageset1_map, source_index);
4158+ dest_index = get_next_bit_on(&pageset1_copy_map, dest_index);
24613191 4159+ }
4160+}
4161+
4162+/**
4e97e4e9 4163+ * __toi_post_context_save: Steps after saving the cpu context.
24613191 4164+ *
4165+ * Steps taken after saving the CPU state to make the actual
4166+ * atomic copy.
4167+ *
4e97e4e9 4168+ * Called from swsusp_save in snapshot.c via toi_post_context_save.
24613191 4169+ **/
4170+
4e97e4e9 4171+int __toi_post_context_save(void)
24613191 4172+{
7f9d2ee0 4173+ long old_ps1_size = pagedir1.size;
ad8f4a28 4174+
4e97e4e9 4175+ check_checksums();
24613191 4176+
4177+ free_checksum_pages();
4178+
4e97e4e9 4179+ toi_recalculate_image_contents(1);
24613191 4180+
4181+ extra_pd1_pages_used = pagedir1.size - old_ps1_size;
4182+
4183+ if (extra_pd1_pages_used > extra_pd1_pages_allowance) {
7f9d2ee0 4184+ printk(KERN_INFO "Pageset1 has grown by %ld pages. "
4185+ "extra_pages_allowance is currently only %lu.\n",
24613191 4186+ pagedir1.size - old_ps1_size,
4187+ extra_pd1_pages_allowance);
4e97e4e9 4188+ set_abort_result(TOI_EXTRA_PAGES_ALLOW_TOO_SMALL);
24613191 4189+ return -1;
4190+ }
4191+
4e97e4e9 4192+ if (!test_action_state(TOI_TEST_FILTER_SPEED) &&
4193+ !test_action_state(TOI_TEST_BIO))
4194+ toi_copy_pageset1();
24613191 4195+
4196+ return 0;
4197+}
4198+
4199+/**
4e97e4e9 4200+ * toi_hibernate: High level code for doing the atomic copy.
24613191 4201+ *
4202+ * High-level code which prepares to do the atomic copy. Loosely based
4203+ * on the swsusp version, but with the following twists:
4e97e4e9 4204+ * - We set toi_running so the swsusp code uses our code paths.
24613191 4205+ * - We give better feedback regarding what goes wrong if there is a problem.
4206+ * - We use an extra function to call the assembly, just in case this code
4207+ * is in a module (return address).
4208+ **/
4209+
4e97e4e9 4210+int toi_hibernate(void)
24613191 4211+{
4212+ int error;
4213+
4e97e4e9 4214+ toi_running = 1; /* For the swsusp code we use :< */
24613191 4215+
4e97e4e9 4216+ error = toi_lowlevel_builtin();
24613191 4217+
4e97e4e9 4218+ toi_running = 0;
24613191 4219+ return error;
4220+}
4221+
4222+/**
4e97e4e9 4223+ * toi_atomic_restore: Prepare to do the atomic restore.
24613191 4224+ *
4225+ * Get ready to do the atomic restore. This part gets us into the same
4e97e4e9 4226+ * state we are in prior to do calling do_toi_lowlevel while
4227+ * hibernating: hot-unplugging secondary cpus and freeze processes,
24613191 4228+ * before starting the thread that will do the restore.
4229+ **/
4230+
4e97e4e9 4231+int toi_atomic_restore(void)
24613191 4232+{
ad8f4a28 4233+ int error;
24613191 4234+
4e97e4e9 4235+ toi_running = 1;
24613191 4236+
4e97e4e9 4237+ toi_prepare_status(DONT_CLEAR_BAR, "Atomic restore.");
73c609d5 4238+
ad8f4a28 4239+ if (add_boot_kernel_data_pbe())
4e97e4e9 4240+ goto Failed;
24613191 4241+
ad8f4a28
AM
4242+ if (toi_go_atomic(PMSG_PRETHAW, 0))
4243+ goto Failed;
24613191 4244+
4245+ /* We'll ignore saved state, but this gets preempt count (etc) right */
4246+ save_processor_state();
4247+
4248+ error = swsusp_arch_resume();
ad8f4a28 4249+ /*
24613191 4250+ * Code below is only ever reached in case of failure. Otherwise
4251+ * execution continues at place where swsusp_arch_suspend was called.
4252+ *
4253+ * We don't know whether it's safe to continue (this shouldn't happen),
4254+ * so lets err on the side of caution.
ad8f4a28 4255+ */
24613191 4256+ BUG();
4257+
4e97e4e9 4258+Failed:
24613191 4259+ free_pbe_list(&restore_pblist, 0);
4260+#ifdef CONFIG_HIGHMEM
4261+ free_pbe_list(&restore_highmem_pblist, 1);
4262+#endif
4e97e4e9 4263+ if (test_action_state(TOI_PM_PREPARE_CONSOLE))
e8d0ad9d 4264+ pm_restore_console();
4e97e4e9 4265+ toi_running = 0;
24613191 4266+ return 1;
4267+}
4e97e4e9 4268+
4269+int toi_go_atomic(pm_message_t state, int suspend_time)
4270+{
7f9d2ee0 4271+ toi_prepare_status(DONT_CLEAR_BAR, "Doing atomic copy/restore.");
4e97e4e9 4272+
7f9d2ee0 4273+ if (suspend_time && toi_platform_begin()) {
4e97e4e9 4274+ set_abort_result(TOI_PLATFORM_PREP_FAILED);
7f9d2ee0 4275+ toi_end_atomic(ATOMIC_STEP_PLATFORM_END, suspend_time);
4e97e4e9 4276+ return 1;
4277+ }
4278+
4279+ suspend_console();
4280+
4281+ if (device_suspend(state)) {
4282+ set_abort_result(TOI_DEVICE_REFUSED);
4283+ toi_end_atomic(ATOMIC_STEP_RESUME_CONSOLE, suspend_time);
4284+ return 1;
4285+ }
4286+
ad8f4a28
AM
4287+ if (suspend_time && toi_platform_pre_snapshot()) {
4288+ set_abort_result(TOI_PRE_SNAPSHOT_FAILED);
7f9d2ee0 4289+ toi_end_atomic(ATOMIC_STEP_DEVICE_RESUME, suspend_time);
ad8f4a28
AM
4290+ return 1;
4291+ }
4292+
4293+ if (!suspend_time && toi_platform_pre_restore()) {
4294+ set_abort_result(TOI_PRE_RESTORE_FAILED);
7f9d2ee0 4295+ toi_end_atomic(ATOMIC_STEP_DEVICE_RESUME, suspend_time);
ad8f4a28
AM
4296+ return 1;
4297+ }
4298+
4e97e4e9 4299+ if (test_action_state(TOI_LATE_CPU_HOTPLUG)) {
4e97e4e9 4300+ if (disable_nonboot_cpus()) {
4301+ set_abort_result(TOI_CPU_HOTPLUG_FAILED);
7f9d2ee0 4302+ toi_end_atomic(ATOMIC_STEP_CPU_HOTPLUG,
4e97e4e9 4303+ suspend_time);
4304+ return 1;
4305+ }
4306+ }
4307+
4308+ if (suspend_time && arch_prepare_suspend()) {
4309+ set_abort_result(TOI_ARCH_PREPARE_FAILED);
4310+ toi_end_atomic(ATOMIC_STEP_CPU_HOTPLUG, suspend_time);
4311+ return 1;
4312+ }
4313+
4314+ local_irq_disable();
4315+
4316+ /* At this point, device_suspend() has been called, but *not*
4317+ * device_power_down(). We *must* device_power_down() now.
4318+ * Otherwise, drivers for some devices (e.g. interrupt controllers)
4319+ * become desynchronized with the actual state of the hardware
4320+ * at resume time, and evil weirdness ensues.
4321+ */
4322+
4323+ if (device_power_down(state)) {
4324+ set_abort_result(TOI_DEVICE_REFUSED);
4325+ toi_end_atomic(ATOMIC_STEP_IRQS, suspend_time);
4326+ return 1;
4327+ }
4328+
4329+ return 0;
4330+}
4331+
4332+void toi_end_atomic(int stage, int suspend_time)
4333+{
4334+ switch (stage) {
ad8f4a28
AM
4335+ case ATOMIC_ALL_STEPS:
4336+ if (!suspend_time)
4337+ toi_platform_leave();
4338+ device_power_up();
4339+ case ATOMIC_STEP_IRQS:
4340+ local_irq_enable();
4341+ case ATOMIC_STEP_CPU_HOTPLUG:
4342+ if (test_action_state(TOI_LATE_CPU_HOTPLUG))
4343+ enable_nonboot_cpus();
ad8f4a28 4344+ toi_platform_finish();
7f9d2ee0 4345+ case ATOMIC_STEP_DEVICE_RESUME:
ad8f4a28
AM
4346+ device_resume();
4347+ case ATOMIC_STEP_RESUME_CONSOLE:
4348+ resume_console();
7f9d2ee0 4349+ case ATOMIC_STEP_PLATFORM_END:
4350+ toi_platform_end();
ad8f4a28
AM
4351+
4352+ toi_prepare_status(DONT_CLEAR_BAR, "Post atomic.");
4e97e4e9 4353+ }
4354+}
4355diff --git a/kernel/power/tuxonice_atomic_copy.h b/kernel/power/tuxonice_atomic_copy.h
4356new file mode 100644
7f9d2ee0 4357index 0000000..5cb9dfe
4e97e4e9 4358--- /dev/null
4359+++ b/kernel/power/tuxonice_atomic_copy.h
4360@@ -0,0 +1,21 @@
24613191 4361+/*
4e97e4e9 4362+ * kernel/power/tuxonice_atomic_copy.h
24613191 4363+ *
4e97e4e9 4364+ * Copyright 2007 Nigel Cunningham (nigel at tuxonice net)
24613191 4365+ *
4366+ * Distributed under GPLv2.
4367+ *
4e97e4e9 4368+ * Routines for doing the atomic save/restore.
24613191 4369+ */
4370+
4e97e4e9 4371+enum {
4372+ ATOMIC_ALL_STEPS,
4373+ ATOMIC_STEP_IRQS,
4374+ ATOMIC_STEP_CPU_HOTPLUG,
4375+ ATOMIC_STEP_DEVICE_RESUME,
4376+ ATOMIC_STEP_RESUME_CONSOLE,
7f9d2ee0 4377+ ATOMIC_STEP_PLATFORM_END,
24613191 4378+};
4379+
4e97e4e9 4380+int toi_go_atomic(pm_message_t state, int toi_time);
4381+void toi_end_atomic(int stage, int toi_time);
4382diff --git a/kernel/power/tuxonice_block_io.c b/kernel/power/tuxonice_block_io.c
4383new file mode 100644
7f9d2ee0 4384index 0000000..913e813
4e97e4e9 4385--- /dev/null
4386+++ b/kernel/power/tuxonice_block_io.c
7f9d2ee0 4387@@ -0,0 +1,1186 @@
24613191 4388+/*
4e97e4e9 4389+ * kernel/power/tuxonice_block_io.c
24613191 4390+ *
4e97e4e9 4391+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
24613191 4392+ *
4e97e4e9 4393+ * Distributed under GPLv2.
4394+ *
4395+ * This file contains block io functions for TuxOnIce. These are
4396+ * used by the swapwriter and it is planned that they will also
4397+ * be used by the NFSwriter.
24613191 4398+ *
24613191 4399+ */
4400+
4e97e4e9 4401+#include <linux/blkdev.h>
4402+#include <linux/syscalls.h>
24613191 4403+#include <linux/suspend.h>
24613191 4404+
4e97e4e9 4405+#include "tuxonice.h"
4406+#include "tuxonice_sysfs.h"
4407+#include "tuxonice_modules.h"
4408+#include "tuxonice_prepare_image.h"
4409+#include "tuxonice_block_io.h"
4410+#include "tuxonice_ui.h"
ad8f4a28 4411+#include "tuxonice_alloc.h"
7f9d2ee0 4412+#include "tuxonice_io.h"
24613191 4413+
24613191 4414+
4e97e4e9 4415+#if 0
7f9d2ee0 4416+static int pr_index;
4417+
4418+static inline void reset_pr_index(void)
4419+{
4420+ pr_index = 0;
4421+}
4422+
ad8f4a28
AM
4423+#define PR_DEBUG(a, b...) do { \
4424+ if (pr_index < 20) \
4425+ printk(a, ##b); \
4426+} while (0)
7f9d2ee0 4427+
4428+static inline void inc_pr_index(void)
4429+{
4430+ pr_index++;
4431+}
24613191 4432+#else
ad8f4a28 4433+#define PR_DEBUG(a, b...) do { } while (0)
7f9d2ee0 4434+#define reset_pr_index() do { } while (0)
4435+#define inc_pr_index do { } while (0)
24613191 4436+#endif
4437+
ad8f4a28 4438+#define TARGET_OUTSTANDING_IO 16384
7f9d2ee0 4439+
4440+#define MEASURE_MUTEX_CONTENTION
4441+#ifndef MEASURE_MUTEX_CONTENTION
4442+#define my_mutex_lock(index, the_lock) mutex_lock(the_lock)
4443+#define my_mutex_unlock(index, the_lock) mutex_unlock(the_lock)
4444+#else
4445+unsigned long mutex_times[2][2][NR_CPUS];
4446+#define my_mutex_lock(index, the_lock) do { \
4447+ int have_mutex; \
4448+ have_mutex = mutex_trylock(the_lock); \
4449+ if (!have_mutex) { \
4450+ mutex_lock(the_lock); \
4451+ mutex_times[index][0][smp_processor_id()]++; \
4452+ } else { \
4453+ mutex_times[index][1][smp_processor_id()]++; \
4454+ }
4455+
4456+#define my_mutex_unlock(index, the_lock) \
4457+ mutex_unlock(the_lock); \
4458+} while (0)
4459+#endif
4460+
4461+static int target_outstanding_io = 1024;
4462+static int max_outstanding_writes, max_outstanding_reads;
24613191 4463+
4e97e4e9 4464+static struct page *bio_queue_head, *bio_queue_tail;
4465+static DEFINE_SPINLOCK(bio_queue_lock);
24613191 4466+
7f9d2ee0 4467+static int free_mem_throttle;
4468+static int more_readahead = 1;
4469+static struct page *readahead_list_head, *readahead_list_tail;
24613191 4470+
4e97e4e9 4471+static struct page *waiting_on;
24613191 4472+
4e97e4e9 4473+static atomic_t toi_io_in_progress;
4e97e4e9 4474+static DECLARE_WAIT_QUEUE_HEAD(num_in_progress_wait);
24613191 4475+
ad8f4a28 4476+static int extra_page_forward;
24613191 4477+
4e97e4e9 4478+static int current_stream;
7f9d2ee0 4479+/* 0 = Header, 1 = Pageset1, 2 = Pageset2, 3 = End of PS1 */
4480+struct extent_iterate_saved_state toi_writer_posn_save[4];
4e97e4e9 4481+
4482+/* Pointer to current entry being loaded/saved. */
4483+struct extent_iterate_state toi_writer_posn;
24613191 4484+
4e97e4e9 4485+/* Not static, so that the allocators can setup and complete
4486+ * writing the header */
4487+char *toi_writer_buffer;
4488+int toi_writer_buffer_posn;
4489+
4490+static struct toi_bdev_info *toi_devinfo;
4491+
4e97e4e9 4492+DEFINE_MUTEX(toi_bio_mutex);
ad8f4a28 4493+
7f9d2ee0 4494+static struct task_struct *toi_queue_flusher;
4495+static int toi_bio_queue_flush_pages(int dedicated_thread);
4496+
ad8f4a28
AM
4497+/**
4498+ * set_throttle: Set the point where we pause to avoid oom.
4499+ *
4500+ * Initially, this value is zero, but when we first fail to allocate memory,
4501+ * we set it (plus a buffer) and thereafter throttle i/o once that limit is
4502+ * reached.
4503+ */
4504+
4505+static void set_throttle(void)
4506+{
7f9d2ee0 4507+ int new_throttle = nr_unallocated_buffer_pages() + 256;
ad8f4a28 4508+
7f9d2ee0 4509+ if (new_throttle > free_mem_throttle)
4510+ free_mem_throttle = new_throttle;
24613191 4511+}
4512+
7f9d2ee0 4513+#define NUM_REASONS 10
ad8f4a28
AM
4514+static atomic_t reasons[NUM_REASONS];
4515+static char *reason_name[NUM_REASONS] = {
4e97e4e9 4516+ "readahead not ready",
4517+ "bio allocation",
4518+ "io_struct allocation",
4519+ "submit buffer",
4520+ "synchronous I/O",
4521+ "bio mutex when reading",
4522+ "bio mutex when writing",
7f9d2ee0 4523+ "toi_bio_get_new_page",
4524+ "memory low",
4525+ "readahead buffer allocation"
4e97e4e9 4526+};
4527+
4528+/**
4529+ * do_bio_wait: Wait for some TuxOnIce i/o to complete.
4530+ *
4531+ * Submit any I/O that's batched up (if we're not already doing
4532+ * that, schedule and clean up whatever we can.
4533+ */
4534+static void do_bio_wait(int reason)
24613191 4535+{
ad8f4a28
AM
4536+ struct page *was_waiting_on = waiting_on;
4537+
4538+ /* On SMP, waiting_on can be reset, so we make a copy */
4539+ if (was_waiting_on) {
4540+ if (PageLocked(was_waiting_on)) {
4541+ wait_on_page_bit(was_waiting_on, PG_locked);
4542+ atomic_inc(&reasons[reason]);
4543+ }
4e97e4e9 4544+ } else {
ad8f4a28
AM
4545+ atomic_inc(&reasons[reason]);
4546+
ad8f4a28 4547+ wait_event(num_in_progress_wait,
7f9d2ee0 4548+ !atomic_read(&toi_io_in_progress) ||
4549+ nr_unallocated_buffer_pages() > free_mem_throttle);
4e97e4e9 4550+ }
24613191 4551+}
4552+
7f9d2ee0 4553+static void throttle_if_memory_low(void)
24613191 4554+{
7f9d2ee0 4555+ int free_pages = nr_unallocated_buffer_pages();
24613191 4556+
7f9d2ee0 4557+ /* Getting low on memory and I/O is in progress? */
4558+ while (unlikely(free_pages < free_mem_throttle) &&
4559+ atomic_read(&toi_io_in_progress)) {
4560+ do_bio_wait(8);
4561+ free_pages = nr_unallocated_buffer_pages();
4562+ }
24613191 4563+
7f9d2ee0 4564+ wait_event(num_in_progress_wait,
4565+ atomic_read(&toi_io_in_progress) < target_outstanding_io);
24613191 4566+}
4567+
4e97e4e9 4568+/**
7f9d2ee0 4569+ * toi_finish_all_io: Complete all outstanding i/o.
24613191 4570+ */
7f9d2ee0 4571+static void toi_finish_all_io(void)
24613191 4572+{
7f9d2ee0 4573+ wait_event(num_in_progress_wait, !atomic_read(&toi_io_in_progress));
24613191 4574+}
24613191 4575+
4e97e4e9 4576+/**
4577+ * toi_end_bio: bio completion function.
4578+ *
4579+ * @bio: bio that has completed.
4e97e4e9 4580+ * @err: Error value. Yes, like end_swap_bio_read, we ignore it.
4581+ *
4582+ * Function called by block driver from interrupt context when I/O is completed.
7f9d2ee0 4583+ * Nearly the fs/buffer.c version, but we want to do our cleanup too. We only
4584+ * free pages if they were buffers used when writing the image.
24613191 4585+ */
ad8f4a28 4586+static void toi_end_bio(struct bio *bio, int err)
24613191 4587+{
4e97e4e9 4588+ struct page *page = bio->bi_io_vec[0].bv_page;
24613191 4589+
4e97e4e9 4590+ BUG_ON(!test_bit(BIO_UPTODATE, &bio->bi_flags));
24613191 4591+
7f9d2ee0 4592+ unlock_page(page);
4593+ bio_put(bio);
4594+
4595+ if (waiting_on == page)
4596+ waiting_on = NULL;
24613191 4597+
7f9d2ee0 4598+ put_page(page);
4599+
4600+ if (bio->bi_private)
4601+ toi__free_page((int) ((unsigned long) bio->bi_private) , page);
24613191 4602+
4e97e4e9 4603+ bio_put(bio);
4604+
4605+ atomic_dec(&toi_io_in_progress);
4e97e4e9 4606+
4607+ wake_up(&num_in_progress_wait);
24613191 4608+}
4609+
4e97e4e9 4610+/**
4611+ * submit - submit BIO request.
4612+ * @writing: READ or WRITE.
4e97e4e9 4613+ *
4614+ * Based on Patrick's pmdisk code from long ago:
4615+ * "Straight from the textbook - allocate and initialize the bio.
4616+ * If we're writing, make sure the page is marked as dirty.
4617+ * Then submit it and carry on."
4618+ *
4619+ * With a twist, though - we handle block_size != PAGE_SIZE.
4620+ * Caller has already checked that our page is not fragmented.
4621+ */
7f9d2ee0 4622+static int submit(int writing, struct block_device *dev, sector_t first_block,
4623+ struct page *page, int free_group)
4e97e4e9 4624+{
4625+ struct bio *bio = NULL;
7f9d2ee0 4626+ int cur_outstanding_io;
4627+
4628+ throttle_if_memory_low();
24613191 4629+
4e97e4e9 4630+ while (!bio) {
ad8f4a28
AM
4631+ bio = bio_alloc(TOI_ATOMIC_GFP, 1);
4632+ if (!bio) {
4633+ set_throttle();
4e97e4e9 4634+ do_bio_wait(1);
ad8f4a28 4635+ }
24613191 4636+ }
24613191 4637+
7f9d2ee0 4638+ bio->bi_bdev = dev;
4639+ bio->bi_sector = first_block;
4640+ bio->bi_private = (void *) ((unsigned long) free_group);
4e97e4e9 4641+ bio->bi_end_io = toi_end_bio;
24613191 4642+
7f9d2ee0 4643+ if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
ad8f4a28 4644+ printk(KERN_INFO "ERROR: adding page to bio at %lld\n",
7f9d2ee0 4645+ (unsigned long long) first_block);
4e97e4e9 4646+ bio_put(bio);
4647+ return -EFAULT;
4648+ }
24613191 4649+
4e97e4e9 4650+ bio_get(bio);
24613191 4651+
7f9d2ee0 4652+ cur_outstanding_io = atomic_add_return(1, &toi_io_in_progress);
4653+ if (writing) {
4654+ if (cur_outstanding_io > max_outstanding_writes)
4655+ max_outstanding_writes = cur_outstanding_io;
4656+ } else {
4657+ if (cur_outstanding_io > max_outstanding_reads)
4658+ max_outstanding_reads = cur_outstanding_io;
4659+ }
4e97e4e9 4660+
4e97e4e9 4661+
4662+ if (unlikely(test_action_state(TOI_TEST_FILTER_SPEED))) {
4663+ /* Fake having done the hard work */
4664+ set_bit(BIO_UPTODATE, &bio->bi_flags);
ad8f4a28 4665+ toi_end_bio(bio, 0);
4e97e4e9 4666+ } else
7f9d2ee0 4667+ submit_bio(writing | (1 << BIO_RW_SYNC), bio);
4e97e4e9 4668+
4669+ return 0;
24613191 4670+}
4671+
4e97e4e9 4672+/**
4e97e4e9 4673+ * toi_do_io: Prepare to do some i/o on a page and submit or batch it.
24613191 4674+ *
4e97e4e9 4675+ * @writing: Whether reading or writing.
4676+ * @bdev: The block device which we're using.
4677+ * @block0: The first sector we're reading or writing.
4678+ * @page: The page on which I/O is being done.
4679+ * @readahead_index: If doing readahead, the index (reset this flag when done).
4680+ * @syncio: Whether the i/o is being done synchronously.
24613191 4681+ *
4e97e4e9 4682+ * Prepare and start a read or write operation.
24613191 4683+ *
4e97e4e9 4684+ * Note that we always work with our own page. If writing, we might be given a
4685+ * compression buffer that will immediately be used to start compressing the
4686+ * next page. For reading, we do readahead and therefore don't know the final
4687+ * address where the data needs to go.
24613191 4688+ */
7f9d2ee0 4689+static int toi_do_io(int writing, struct block_device *bdev, long block0,
4690+ struct page *page, int is_readahead, int syncio, int free_group)
4e97e4e9 4691+{
7f9d2ee0 4692+ page->private = 0;
24613191 4693+
7f9d2ee0 4694+ /* Do here so we don't race against toi_bio_get_next_page_read */
4695+ lock_page(page);
24613191 4696+
7f9d2ee0 4697+ if (is_readahead) {
4698+ if (readahead_list_head)
4699+ readahead_list_tail->private = (unsigned long) page;
4700+ else
4701+ readahead_list_head = page;
24613191 4702+
7f9d2ee0 4703+ readahead_list_tail = page;
4e97e4e9 4704+ }
4705+
4706+ /* Done before submitting to avoid races. */
4707+ if (syncio)
7f9d2ee0 4708+ waiting_on = page;
4e97e4e9 4709+
4710+ /* Submit the page */
7f9d2ee0 4711+ get_page(page);
4e97e4e9 4712+
7f9d2ee0 4713+ if (submit(writing, bdev, block0, page, free_group))
4714+ return -EFAULT;
4e97e4e9 4715+
4716+ if (syncio)
4717+ do_bio_wait(4);
7f9d2ee0 4718+
4719+ return 0;
24613191 4720+}
4721+
4e97e4e9 4722+/**
4723+ * toi_bdev_page_io: Simpler interface to do directly i/o on a single page.
24613191 4724+ *
4e97e4e9 4725+ * @writing: Whether reading or writing.
4726+ * @bdev: Block device on which we're operating.
4727+ * @pos: Sector at which page to read starts.
4728+ * @page: Page to be read/written.
4729+ *
4730+ * We used to use bread here, but it doesn't correctly handle
4731+ * blocksize != PAGE_SIZE. Now we create a submit_info to get the data we
4732+ * want and use our normal routines (synchronously).
24613191 4733+ */
7f9d2ee0 4734+static int toi_bdev_page_io(int writing, struct block_device *bdev,
4e97e4e9 4735+ long pos, struct page *page)
24613191 4736+{
7f9d2ee0 4737+ return toi_do_io(writing, bdev, pos, page, 0, 1, 0);
24613191 4738+}
4739+
4e97e4e9 4740+/**
4741+ * toi_bio_memory_needed: Report amount of memory needed for block i/o.
24613191 4742+ *
ad8f4a28
AM
4743+ * We want to have at least enough memory so as to have target_outstanding_io
4744+ * or more transactions on the fly at once. If we can do more, fine.
24613191 4745+ */
4e97e4e9 4746+static int toi_bio_memory_needed(void)
24613191 4747+{
7f9d2ee0 4748+ return (target_outstanding_io * (PAGE_SIZE + sizeof(struct request) +
4749+ sizeof(struct bio)));
24613191 4750+}
4751+
ad8f4a28
AM
4752+/*
4753+ * toi_bio_print_debug_stats
4754+ *
4755+ * Description:
4756+ */
4757+static int toi_bio_print_debug_stats(char *buffer, int size)
4758+{
7f9d2ee0 4759+ int len = snprintf_used(buffer, size, "- Max outstanding reads %d. Max "
4760+ "writes %d.\n", max_outstanding_reads,
4761+ max_outstanding_writes);
ad8f4a28
AM
4762+
4763+ len += snprintf_used(buffer + len, size - len,
7f9d2ee0 4764+ " Memory_needed: %d x (%lu + %u + %u) = %d bytes.\n",
4765+ target_outstanding_io,
ad8f4a28 4766+ PAGE_SIZE, (unsigned int) sizeof(struct request),
7f9d2ee0 4767+ (unsigned int) sizeof(struct bio), toi_bio_memory_needed());
ad8f4a28 4768+
7f9d2ee0 4769+#ifdef MEASURE_MUTEX_CONTENTION
4770+ {
4771+ int i;
4772+
4773+ len += snprintf_used(buffer + len, size - len,
4774+ " Mutex contention while reading:\n Contended Free\n");
4775+
4776+ for (i = 0; i < NR_CPUS; i++)
4777+ len += snprintf_used(buffer + len, size - len,
4778+ " %9lu %9lu\n",
4779+ mutex_times[0][0][i], mutex_times[0][1][i]);
4780+
4781+ len += snprintf_used(buffer + len, size - len,
4782+ " Mutex contention while writing:\n Contended Free\n");
4783+
4784+ for (i = 0; i < NR_CPUS; i++)
4785+ len += snprintf_used(buffer + len, size - len,
4786+ " %9lu %9lu\n",
4787+ mutex_times[1][0][i], mutex_times[1][1][i]);
4788+
4789+ }
4790+#endif
4791+
4792+ return len + snprintf_used(buffer + len, size - len,
4793+ " Free mem throttle point reached %d.\n", free_mem_throttle);
ad8f4a28
AM
4794+}
4795+
4e97e4e9 4796+/**
4797+ * toi_set_devinfo: Set the bdev info used for i/o.
24613191 4798+ *
4e97e4e9 4799+ * @info: Pointer to array of struct toi_bdev_info - the list of
4800+ * bdevs and blocks on them in which the image is stored.
4801+ *
4802+ * Set the list of bdevs and blocks in which the image will be stored.
4803+ * Sort of like putting a tape in the cassette player.
24613191 4804+ */
4e97e4e9 4805+static void toi_set_devinfo(struct toi_bdev_info *info)
24613191 4806+{
4e97e4e9 4807+ toi_devinfo = info;
24613191 4808+}
4809+
4e97e4e9 4810+/**
4811+ * dump_block_chains: Print the contents of the bdev info array.
24613191 4812+ */
4e97e4e9 4813+static void dump_block_chains(void)
4814+{
4815+ int i;
24613191 4816+
4e97e4e9 4817+ for (i = 0; i < toi_writer_posn.num_chains; i++) {
4818+ struct extent *this;
24613191 4819+
4e97e4e9 4820+ this = (toi_writer_posn.chains + i)->first;
24613191 4821+
4e97e4e9 4822+ if (!this)
ad8f4a28
AM
4823+ continue;
4824+
4825+ printk(KERN_INFO "Chain %d:", i);
24613191 4826+
4e97e4e9 4827+ while (this) {
ad8f4a28
AM
4828+ printk(" [%lu-%lu]%s", this->minimum,
4829+ this->maximum, this->next ? "," : "");
4e97e4e9 4830+ this = this->next;
4831+ }
4832+
4833+ printk("\n");
4834+ }
4835+
7f9d2ee0 4836+ for (i = 0; i < 4; i++)
ad8f4a28
AM
4837+ printk(KERN_INFO "Posn %d: Chain %d, extent %d, offset %lu.\n",
4838+ i, toi_writer_posn_save[i].chain_num,
4e97e4e9 4839+ toi_writer_posn_save[i].extent_num,
4840+ toi_writer_posn_save[i].offset);
4841+}
24613191 4842+
4e97e4e9 4843+/**
4844+ * go_next_page: Skip blocks to the start of the next page.
4845+ *
4846+ * Go forward one page, or two if extra_page_forward is set. It only gets
4847+ * set at the start of reading the image header, to skip the first page
4848+ * of the header, which is read without using the extent chains.
4849+ */
ad8f4a28 4850+static int go_next_page(int writing)
24613191 4851+{
4e97e4e9 4852+ int i, max = (toi_writer_posn.current_chain == -1) ? 1 :
4853+ toi_devinfo[toi_writer_posn.current_chain].blocks_per_page;
24613191 4854+
4e97e4e9 4855+ for (i = 0; i < max; i++)
4856+ toi_extent_state_next(&toi_writer_posn);
4857+
4858+ if (toi_extent_state_eof(&toi_writer_posn)) {
ad8f4a28
AM
4859+ /* Don't complain if readahead falls off the end */
4860+ if (writing) {
4861+ printk(KERN_INFO "Extent state eof. "
4862+ "Expected compression ratio too optimistic?\n");
4863+ dump_block_chains();
4864+ }
4e97e4e9 4865+ return -ENODATA;
4866+ }
4867+
4868+ if (extra_page_forward) {
4869+ extra_page_forward = 0;
ad8f4a28 4870+ return go_next_page(writing);
4e97e4e9 4871+ }
4872+
4873+ return 0;
24613191 4874+}
4875+
4e97e4e9 4876+/**
4877+ * set_extra_page_forward: Make us skip an extra page on next go_next_page.
4878+ *
4879+ * Used in reading header, to jump to 2nd page after getting 1st page
4880+ * direct from image header.
4881+ */
4882+static void set_extra_page_forward(void)
24613191 4883+{
4e97e4e9 4884+ extra_page_forward = 1;
24613191 4885+}
4886+
4e97e4e9 4887+/**
4888+ * toi_bio_rw_page: Do i/o on the next disk page in the image.
24613191 4889+ *
4e97e4e9 4890+ * @writing: Whether reading or writing.
4891+ * @page: Page to do i/o on.
4892+ * @readahead_index: -1 or the index in the readahead ring.
24613191 4893+ *
4e97e4e9 4894+ * Submit a page for reading or writing, possibly readahead.
24613191 4895+ */
4e97e4e9 4896+static int toi_bio_rw_page(int writing, struct page *page,
7f9d2ee0 4897+ int is_readahead, int free_group)
4e97e4e9 4898+{
4899+ struct toi_bdev_info *dev_info;
7f9d2ee0 4900+ int result;
24613191 4901+
ad8f4a28
AM
4902+ if (go_next_page(writing)) {
4903+ printk(KERN_INFO "Failed to advance a page in the extent "
4904+ "data.\n");
4e97e4e9 4905+ return -ENODATA;
4906+ }
24613191 4907+
4e97e4e9 4908+ if (current_stream == 0 && writing &&
ad8f4a28
AM
4909+ toi_writer_posn.current_chain ==
4910+ toi_writer_posn_save[2].chain_num &&
4911+ toi_writer_posn.current_offset ==
4912+ toi_writer_posn_save[2].offset) {
4e97e4e9 4913+ dump_block_chains();
4914+ BUG();
4915+ }
24613191 4916+
4e97e4e9 4917+ dev_info = &toi_devinfo[toi_writer_posn.current_chain];
e8d0ad9d 4918+
7f9d2ee0 4919+ result = toi_do_io(writing, dev_info->bdev,
4e97e4e9 4920+ toi_writer_posn.current_offset <<
4921+ dev_info->bmap_shift,
7f9d2ee0 4922+ page, is_readahead, 0, free_group);
4923+
4924+ if (result) {
4925+ more_readahead = 0;
4926+ return result;
4927+ }
4928+
4929+ if (!writing) {
4930+ int compare_to = 0;
24613191 4931+
7f9d2ee0 4932+ switch (current_stream) {
4933+ case 0:
4934+ compare_to = 2;
4935+ break;
4936+ case 1:
4937+ compare_to = 3;
4938+ break;
4939+ case 2:
4940+ compare_to = 1;
4941+ break;
4942+ }
4943+
4944+ if (toi_writer_posn.current_chain ==
4945+ toi_writer_posn_save[compare_to].chain_num &&
4946+ toi_writer_posn.current_offset ==
4947+ toi_writer_posn_save[compare_to].offset)
4948+ more_readahead = 0;
4949+ }
4e97e4e9 4950+ return 0;
4951+}
4952+
4953+/**
4954+ * toi_rw_init: Prepare to read or write a stream in the image.
24613191 4955+ *
4e97e4e9 4956+ * @writing: Whether reading or writing.
4957+ * @stream number: Section of the image being processed.
24613191 4958+ */
4e97e4e9 4959+static int toi_rw_init(int writing, int stream_number)
4960+{
7f9d2ee0 4961+ if (stream_number)
4962+ toi_extent_state_restore(&toi_writer_posn,
4963+ &toi_writer_posn_save[stream_number]);
4964+ else
4965+ toi_extent_state_goto_start(&toi_writer_posn);
24613191 4966+
7f9d2ee0 4967+ toi_writer_buffer = (char *) toi_get_zeroed_page(11, TOI_ATOMIC_GFP);
4e97e4e9 4968+ toi_writer_buffer_posn = writing ? 0 : PAGE_SIZE;
4969+
4970+ current_stream = stream_number;
4971+
7f9d2ee0 4972+ reset_pr_index();
4973+ more_readahead = 1;
4e97e4e9 4974+
7f9d2ee0 4975+ return toi_writer_buffer ? 0 : -ENOMEM;
4e97e4e9 4976+}
24613191 4977+
4e97e4e9 4978+/**
4979+ * toi_read_header_init: Prepare to read the image header.
24613191 4980+ *
4e97e4e9 4981+ * Reset readahead indices prior to starting to read a section of the image.
24613191 4982+ */
4e97e4e9 4983+static void toi_read_header_init(void)
24613191 4984+{
7f9d2ee0 4985+ toi_writer_buffer = (char *) toi_get_zeroed_page(11, TOI_ATOMIC_GFP);
4986+ more_readahead = 1;
24613191 4987+}
4988+
7f9d2ee0 4989+/*
4990+ * toi_bio_queue_write
4991+ */
4992+static void toi_bio_queue_write(char **full_buffer)
4993+{
4994+ struct page *page = virt_to_page(*full_buffer);
4995+ unsigned long flags;
4996+
4997+ page->private = 0;
4998+
4999+ spin_lock_irqsave(&bio_queue_lock, flags);
5000+ if (!bio_queue_head)
5001+ bio_queue_head = page;
5002+ else
5003+ bio_queue_tail->private = (unsigned long) page;
5004+
5005+ bio_queue_tail = page;
5006+
5007+ spin_unlock_irqrestore(&bio_queue_lock, flags);
5008+ wake_up(&toi_io_queue_flusher);
5009+
5010+ *full_buffer = NULL;
5011+}
5012+
5013+/**
5014+ * toi_rw_cleanup: Cleanup after i/o.
24613191 5015+ *
4e97e4e9 5016+ * @writing: Whether we were reading or writing.
24613191 5017+ */
4e97e4e9 5018+static int toi_rw_cleanup(int writing)
24613191 5019+{
4e97e4e9 5020+ int i;
24613191 5021+
ad8f4a28
AM
5022+ if (writing) {
5023+ if (toi_writer_buffer_posn)
7f9d2ee0 5024+ toi_bio_queue_write(&toi_writer_buffer);
5025+
5026+ toi_bio_queue_flush_pages(0);
24613191 5027+
7f9d2ee0 5028+ if (current_stream == 2)
5029+ toi_extent_state_save(&toi_writer_posn,
5030+ &toi_writer_posn_save[1]);
5031+ else if (current_stream == 1)
5032+ toi_extent_state_save(&toi_writer_posn,
5033+ &toi_writer_posn_save[3]);
5034+ }
4e97e4e9 5035+
5036+ toi_finish_all_io();
5037+
7f9d2ee0 5038+ while (readahead_list_head) {
5039+ void *next = (void *) readahead_list_head->private;
5040+ toi__free_page(12, readahead_list_head);
5041+ readahead_list_head = next;
5042+ }
5043+
5044+ readahead_list_tail = NULL;
4e97e4e9 5045+
7f9d2ee0 5046+ if (!current_stream)
5047+ return 0;
4e97e4e9 5048+
ad8f4a28
AM
5049+ for (i = 0; i < NUM_REASONS; i++) {
5050+ if (!atomic_read(&reasons[i]))
4e97e4e9 5051+ continue;
ad8f4a28
AM
5052+ printk(KERN_INFO "Waited for i/o due to %s %d times.\n",
5053+ reason_name[i], atomic_read(&reasons[i]));
5054+ atomic_set(&reasons[i], 0);
24613191 5055+ }
7f9d2ee0 5056+
5057+ current_stream = 0;
4e97e4e9 5058+ return 0;
24613191 5059+}
5060+
7f9d2ee0 5061+int toi_start_one_readahead(int dedicated_thread)
24613191 5062+{
7f9d2ee0 5063+ char *buffer = NULL;
5064+ int oom = 0;
24613191 5065+
7f9d2ee0 5066+ throttle_if_memory_low();
24613191 5067+
7f9d2ee0 5068+ while (!buffer) {
5069+ buffer = (char *) toi_get_zeroed_page(12,
5070+ TOI_ATOMIC_GFP);
5071+ if (!buffer) {
5072+ if (oom && !dedicated_thread)
5073+ return -EIO;
5074+
5075+ oom = 1;
5076+ set_throttle();
5077+ do_bio_wait(9);
4e97e4e9 5078+ }
4e97e4e9 5079+ }
24613191 5080+
7f9d2ee0 5081+ return toi_bio_rw_page(READ, virt_to_page(buffer), 1, 0);
5082+}
5083+
5084+/*
5085+ * toi_start_new_readahead
5086+ *
5087+ * Start readahead of image pages.
5088+ *
5089+ * No mutex needed because this is only ever called by one cpu.
5090+ */
5091+static int toi_start_new_readahead(int dedicated_thread)
5092+{
5093+ int last_result, num_submitted = 0;
5094+
5095+ /* Start a new readahead? */
5096+ if (!more_readahead)
5097+ return 0;
5098+
4e97e4e9 5099+ do {
7f9d2ee0 5100+ int result = toi_start_one_readahead(dedicated_thread);
24613191 5101+
7f9d2ee0 5102+ if (result == -EIO)
5103+ return 0;
5104+ else
5105+ last_result = result;
ad8f4a28 5106+
7f9d2ee0 5107+ if (last_result == -ENODATA)
5108+ more_readahead = 0;
24613191 5109+
7f9d2ee0 5110+ if (!more_readahead && last_result) {
5111+ /*
ad8f4a28
AM
5112+ * Don't complain about failing to do readahead past
5113+ * the end of storage.
5114+ */
7f9d2ee0 5115+ if (last_result != -ENODATA)
5116+ printk(KERN_INFO
5117+ "Begin read chunk returned %d.\n",
5118+ last_result);
5119+ } else
5120+ num_submitted++;
24613191 5121+
7f9d2ee0 5122+ } while (more_readahead &&
5123+ (dedicated_thread ||
5124+ (num_submitted < target_outstanding_io &&
5125+ atomic_read(&toi_io_in_progress) < target_outstanding_io)));
5126+ return 0;
5127+}
24613191 5128+
7f9d2ee0 5129+static void bio_io_flusher(int writing)
5130+{
24613191 5131+
7f9d2ee0 5132+ if (writing)
5133+ toi_bio_queue_flush_pages(1);
5134+ else
5135+ toi_start_new_readahead(1);
5136+}
24613191 5137+
7f9d2ee0 5138+/**
5139+ * toi_bio_get_next_page_read: Read a disk page with readahead.
5140+ *
5141+ * Read a page from disk, submitting readahead and cleaning up finished i/o
5142+ * while we wait for the page we're after.
5143+ */
5144+static int toi_bio_get_next_page_read(int no_readahead)
5145+{
5146+ unsigned long *virt;
5147+ struct page *next;
24613191 5148+
7f9d2ee0 5149+ /*
5150+ * When reading the second page of the header, we have to
5151+ * delay submitting the read until after we've gotten the
5152+ * extents out of the first page.
5153+ */
5154+ if (unlikely(no_readahead && toi_start_one_readahead(0))) {
5155+ printk("No readahead and toi_start_one_readahead returned non-zero.\n");
5156+ return -EIO;
5157+ }
5158+
5159+ /*
5160+ * On SMP, we may need to wait for the first readahead
5161+ * to be submitted.
5162+ */
5163+ if (unlikely(!readahead_list_head)) {
5164+ BUG_ON(!more_readahead);
5165+ do {
5166+ cpu_relax();
5167+ } while (!readahead_list_head);
5168+ }
5169+
5170+ if (PageLocked(readahead_list_head)) {
5171+ waiting_on = readahead_list_head;
5172+ do_bio_wait(0);
5173+ }
24613191 5174+
7f9d2ee0 5175+ virt = page_address(readahead_list_head);
5176+ memcpy(toi_writer_buffer, virt, PAGE_SIZE);
24613191 5177+
7f9d2ee0 5178+ next = (struct page *) readahead_list_head->private;
5179+ toi__free_page(12, readahead_list_head);
5180+ readahead_list_head = next;
24613191 5181+ return 0;
5182+}
5183+
4e97e4e9 5184+/*
5185+ * toi_bio_queue_flush_pages
24613191 5186+ */
4e97e4e9 5187+
7f9d2ee0 5188+static int toi_bio_queue_flush_pages(int dedicated_thread)
24613191 5189+{
4e97e4e9 5190+ unsigned long flags;
5191+ int result = 0;
24613191 5192+
7f9d2ee0 5193+top:
4e97e4e9 5194+ spin_lock_irqsave(&bio_queue_lock, flags);
5195+ while (bio_queue_head) {
5196+ struct page *page = bio_queue_head;
5197+ bio_queue_head = (struct page *) page->private;
5198+ if (bio_queue_tail == page)
5199+ bio_queue_tail = NULL;
4e97e4e9 5200+ spin_unlock_irqrestore(&bio_queue_lock, flags);
7f9d2ee0 5201+ result = toi_bio_rw_page(WRITE, page, 0, 11);
4e97e4e9 5202+ if (result)
7f9d2ee0 5203+ return result;
4e97e4e9 5204+ spin_lock_irqsave(&bio_queue_lock, flags);
24613191 5205+ }
4e97e4e9 5206+ spin_unlock_irqrestore(&bio_queue_lock, flags);
7f9d2ee0 5207+
5208+ if (dedicated_thread) {
5209+ wait_event(toi_io_queue_flusher, bio_queue_head ||
5210+ toi_bio_queue_flusher_should_finish);
5211+ if (likely(!toi_bio_queue_flusher_should_finish))
5212+ goto top;
5213+ toi_bio_queue_flusher_should_finish = 0;
5214+ }
5215+ return 0;
4e97e4e9 5216+}
24613191 5217+
4e97e4e9 5218+/*
7f9d2ee0 5219+ * toi_bio_get_new_page
4e97e4e9 5220+ */
7f9d2ee0 5221+static void toi_bio_get_new_page(char **full_buffer)
4e97e4e9 5222+{
7f9d2ee0 5223+ throttle_if_memory_low();
4e97e4e9 5224+
5225+ while (!*full_buffer) {
ad8f4a28
AM
5226+ *full_buffer = (char *) toi_get_zeroed_page(11, TOI_ATOMIC_GFP);
5227+ if (!*full_buffer) {
5228+ set_throttle();
4e97e4e9 5229+ do_bio_wait(7);
ad8f4a28 5230+ }
24613191 5231+ }
24613191 5232+}
5233+
4e97e4e9 5234+/*
5235+ * toi_rw_buffer: Combine smaller buffers into PAGE_SIZE I/O.
24613191 5236+ *
4e97e4e9 5237+ * @writing: Bool - whether writing (or reading).
5238+ * @buffer: The start of the buffer to write or fill.
5239+ * @buffer_size: The size of the buffer to write or fill.
24613191 5240+ */
7f9d2ee0 5241+static int toi_rw_buffer(int writing, char *buffer, int buffer_size, int no_readahead)
24613191 5242+{
4e97e4e9 5243+ int bytes_left = buffer_size;
24613191 5244+
4e97e4e9 5245+ while (bytes_left) {
5246+ char *source_start = buffer + buffer_size - bytes_left;
5247+ char *dest_start = toi_writer_buffer + toi_writer_buffer_posn;
5248+ int capacity = PAGE_SIZE - toi_writer_buffer_posn;
5249+ char *to = writing ? dest_start : source_start;
5250+ char *from = writing ? source_start : dest_start;
24613191 5251+
4e97e4e9 5252+ if (bytes_left <= capacity) {
5253+ memcpy(to, from, bytes_left);
5254+ toi_writer_buffer_posn += bytes_left;
5255+ return 0;
24613191 5256+ }
4e97e4e9 5257+
5258+ /* Complete this page and start a new one */
5259+ memcpy(to, from, capacity);
5260+ bytes_left -= capacity;
5261+
5262+ if (!writing) {
7f9d2ee0 5263+ int result = toi_bio_get_next_page_read(no_readahead);
5264+ if (result)
5265+ return result;
5266+ } else {
5267+ toi_bio_queue_write(&toi_writer_buffer);
5268+ toi_bio_get_new_page(&toi_writer_buffer);
5269+ }
4e97e4e9 5270+
5271+ toi_writer_buffer_posn = 0;
5272+ toi_cond_pause(0, NULL);
24613191 5273+ }
4e97e4e9 5274+
24613191 5275+ return 0;
5276+}
5277+
4e97e4e9 5278+/**
5279+ * toi_bio_read_page - read a page of the image.
24613191 5280+ *
4e97e4e9 5281+ * @pfn: The pfn where the data belongs.
5282+ * @buffer_page: The page containing the (possibly compressed) data.
5283+ * @buf_size: The number of bytes on @buffer_page used.
24613191 5284+ *
4e97e4e9 5285+ * Read a (possibly compressed) page from the image, into buffer_page,
5286+ * returning its pfn and the buffer size.
24613191 5287+ */
4e97e4e9 5288+static int toi_bio_read_page(unsigned long *pfn, struct page *buffer_page,
5289+ unsigned int *buf_size)
24613191 5290+{
4e97e4e9 5291+ int result = 0;
5292+ char *buffer_virt = kmap(buffer_page);
24613191 5293+
7f9d2ee0 5294+ inc_pr_index;
24613191 5295+
7f9d2ee0 5296+ /* Only call start_new_readahead if we don't have a dedicated thread */
5297+ if (current == toi_queue_flusher && toi_start_new_readahead(0)) {
5298+ printk("Queue flusher and toi_start_one_readahead returned non-zero.\n");
5299+ return -EIO;
5300+ }
5301+
5302+ my_mutex_lock(0, &toi_bio_mutex);
24613191 5303+
7f9d2ee0 5304+ if (toi_rw_buffer(READ, (char *) pfn, sizeof(unsigned long), 0) ||
5305+ toi_rw_buffer(READ, (char *) buf_size, sizeof(int), 0) ||
5306+ toi_rw_buffer(READ, buffer_virt, *buf_size, 0)) {
4e97e4e9 5307+ abort_hibernate(TOI_FAILED_IO, "Read of data failed.");
5308+ result = 1;
5309+ } else
5310+ PR_DEBUG("%d: PFN %ld, %d bytes.\n", pr_index, *pfn, *buf_size);
24613191 5311+
7f9d2ee0 5312+ my_mutex_unlock(0, &toi_bio_mutex);
4e97e4e9 5313+ kunmap(buffer_page);
5314+ return result;
24613191 5315+}
5316+
4e97e4e9 5317+/**
5318+ * toi_bio_write_page - Write a page of the image.
24613191 5319+ *
4e97e4e9 5320+ * @pfn: The pfn where the data belongs.
5321+ * @buffer_page: The page containing the (possibly compressed) data.
5322+ * @buf_size: The number of bytes on @buffer_page used.
5323+ *
5324+ * Write a (possibly compressed) page to the image from the buffer, together
5325+ * with it's index and buffer size.
24613191 5326+ */
4e97e4e9 5327+static int toi_bio_write_page(unsigned long pfn, struct page *buffer_page,
5328+ unsigned int buf_size)
24613191 5329+{
ad8f4a28 5330+ char *buffer_virt;
7f9d2ee0 5331+ int result = 0, result2 = 0;
4e97e4e9 5332+
7f9d2ee0 5333+ inc_pr_index;
4e97e4e9 5334+
ad8f4a28
AM
5335+ if (unlikely(test_action_state(TOI_TEST_FILTER_SPEED)))
5336+ return 0;
5337+
7f9d2ee0 5338+ my_mutex_lock(1, &toi_bio_mutex);
ad8f4a28 5339+ buffer_virt = kmap(buffer_page);
4e97e4e9 5340+
7f9d2ee0 5341+ if (toi_rw_buffer(WRITE, (char *) &pfn, sizeof(unsigned long), 0) ||
5342+ toi_rw_buffer(WRITE, (char *) &buf_size, sizeof(int), 0) ||
5343+ toi_rw_buffer(WRITE, buffer_virt, buf_size, 0)) {
5344+ printk("toi_rw_buffer returned non-zero to toi_bio_write_page.\n");
4e97e4e9 5345+ result = -EIO;
7f9d2ee0 5346+ }
4e97e4e9 5347+
5348+ PR_DEBUG("%d: Index %ld, %d bytes. Result %d.\n", pr_index, pfn,
5349+ buf_size, result);
5350+
4e97e4e9 5351+ kunmap(buffer_page);
7f9d2ee0 5352+ my_mutex_unlock(1, &toi_bio_mutex);
5353+
5354+ if (current == toi_queue_flusher)
5355+ result2 = toi_bio_queue_flush_pages(0);
5356+
5357+ return result ? result : result2;
24613191 5358+}
5359+
4e97e4e9 5360+/**
5361+ * toi_rw_header_chunk: Read or write a portion of the image header.
24613191 5362+ *
4e97e4e9 5363+ * @writing: Whether reading or writing.
5364+ * @owner: The module for which we're writing. Used for confirming that modules
5365+ * don't use more header space than they asked for.
5366+ * @buffer: Address of the data to write.
5367+ * @buffer_size: Size of the data buffer.
7f9d2ee0 5368+ * @no_readahead: Don't try to start readhead (when still getting extents)
24613191 5369+ */
7f9d2ee0 5370+static int _toi_rw_header_chunk(int writing, struct toi_module_ops *owner,
5371+ char *buffer, int buffer_size, int no_readahead)
24613191 5372+{
7f9d2ee0 5373+ int result = 0;
24613191 5374+
4e97e4e9 5375+ if (owner) {
5376+ owner->header_used += buffer_size;
5377+ toi_message(TOI_HEADER, TOI_LOW, 1,
5378+ "Header: %s : %d bytes (%d/%d).\n",
5379+ buffer_size, owner->header_used,
5380+ owner->header_requested);
5381+ if (owner->header_used > owner->header_requested) {
7f9d2ee0 5382+ printk(KERN_EMERG "TuxOnIce module %s is using more "
4e97e4e9 5383+ "header space (%u) than it requested (%u).\n",
5384+ owner->name,
5385+ owner->header_used,
5386+ owner->header_requested);
5387+ return buffer_size;
5388+ }
5389+ } else
5390+ toi_message(TOI_HEADER, TOI_LOW, 1,
5391+ "Header: (No owner): %d bytes.\n", buffer_size);
24613191 5392+
7f9d2ee0 5393+ if (!writing && !no_readahead)
5394+ result = toi_start_new_readahead(0);
5395+
5396+ if (!result)
5397+ result = toi_rw_buffer(writing, buffer, buffer_size, no_readahead);
5398+
4e97e4e9 5399+ return result;
24613191 5400+}
5401+
7f9d2ee0 5402+static int toi_rw_header_chunk(int writing, struct toi_module_ops *owner,
5403+ char *buffer, int size)
5404+{
5405+ return _toi_rw_header_chunk(writing, owner, buffer, size, 0);
5406+}
5407+
5408+static int toi_rw_header_chunk_noreadahead(int writing,
5409+ struct toi_module_ops *owner, char *buffer, int size)
5410+{
5411+ return _toi_rw_header_chunk(writing, owner, buffer, size, 1);
5412+}
5413+
4e97e4e9 5414+/**
5415+ * write_header_chunk_finish: Flush any buffered header data.
24613191 5416+ */
4e97e4e9 5417+static int write_header_chunk_finish(void)
24613191 5418+{
ad8f4a28
AM
5419+ int result = 0;
5420+
7f9d2ee0 5421+ if (toi_writer_buffer_posn)
5422+ toi_bio_queue_write(&toi_writer_buffer);
ad8f4a28 5423+
7f9d2ee0 5424+ toi_bio_queue_flush_pages(0);
ad8f4a28 5425+ toi_finish_all_io();
24613191 5426+
ad8f4a28 5427+ return result;
4e97e4e9 5428+}
24613191 5429+
4e97e4e9 5430+/**
5431+ * toi_bio_storage_needed: Get the amount of storage needed for my fns.
5432+ */
5433+static int toi_bio_storage_needed(void)
5434+{
5435+ return 2 * sizeof(int);
24613191 5436+}
5437+
4e97e4e9 5438+/**
5439+ * toi_bio_save_config_info: Save block i/o config to image header.
24613191 5440+ *
4e97e4e9 5441+ * @buf: PAGE_SIZE'd buffer into which data should be saved.
5442+ */
5443+static int toi_bio_save_config_info(char *buf)
5444+{
5445+ int *ints = (int *) buf;
ad8f4a28 5446+ ints[0] = target_outstanding_io;
7f9d2ee0 5447+ return sizeof(int);
4e97e4e9 5448+}
5449+
5450+/**
5451+ * toi_bio_load_config_info: Restore block i/o config.
24613191 5452+ *
4e97e4e9 5453+ * @buf: Data to be reloaded.
5454+ * @size: Size of the buffer saved.
5455+ */
5456+static void toi_bio_load_config_info(char *buf, int size)
5457+{
5458+ int *ints = (int *) buf;
ad8f4a28 5459+ target_outstanding_io = ints[0];
4e97e4e9 5460+}
5461+
5462+/**
5463+ * toi_bio_initialise: Initialise bio code at start of some action.
24613191 5464+ *
4e97e4e9 5465+ * @starting_cycle: Whether starting a hibernation cycle, or just reading or
5466+ * writing a sysfs value.
24613191 5467+ */
4e97e4e9 5468+static int toi_bio_initialise(int starting_cycle)
5469+{
7f9d2ee0 5470+ if (starting_cycle) {
5471+ max_outstanding_writes = 0;
5472+ max_outstanding_reads = 0;
5473+ toi_queue_flusher = current;
5474+#ifdef MEASURE_MUTEX_CONTENTION
5475+ {
5476+ int i, j, k;
24613191 5477+
7f9d2ee0 5478+ for (i = 0; i < 2; i++)
5479+ for (j = 0; j < 2; j++)
5480+ for (k = 0; k < NR_CPUS; k++)
5481+ mutex_times[i][j][k] = 0;
5482+ }
5483+#endif
5484+ }
ad8f4a28 5485+
7f9d2ee0 5486+ return 0;
4e97e4e9 5487+}
24613191 5488+
4e97e4e9 5489+/**
5490+ * toi_bio_cleanup: Cleanup after some action.
5491+ *
5492+ * @finishing_cycle: Whether completing a cycle.
5493+ */
5494+static void toi_bio_cleanup(int finishing_cycle)
5495+{
5496+ if (toi_writer_buffer) {
7f9d2ee0 5497+ toi_free_page(11, (unsigned long) toi_writer_buffer);
4e97e4e9 5498+ toi_writer_buffer = NULL;
5499+ }
4e97e4e9 5500+}
24613191 5501+
4e97e4e9 5502+struct toi_bio_ops toi_bio_ops = {
5503+ .bdev_page_io = toi_bdev_page_io,
5504+ .finish_all_io = toi_finish_all_io,
5505+ .forward_one_page = go_next_page,
5506+ .set_extra_page_forward = set_extra_page_forward,
5507+ .set_devinfo = toi_set_devinfo,
5508+ .read_page = toi_bio_read_page,
5509+ .write_page = toi_bio_write_page,
5510+ .rw_init = toi_rw_init,
5511+ .rw_cleanup = toi_rw_cleanup,
5512+ .read_header_init = toi_read_header_init,
5513+ .rw_header_chunk = toi_rw_header_chunk,
7f9d2ee0 5514+ .rw_header_chunk_noreadahead = toi_rw_header_chunk_noreadahead,
4e97e4e9 5515+ .write_header_chunk_finish = write_header_chunk_finish,
7f9d2ee0 5516+ .io_flusher = bio_io_flusher,
24613191 5517+};
5518+
4e97e4e9 5519+static struct toi_sysfs_data sysfs_params[] = {
ad8f4a28
AM
5520+ { TOI_ATTR("target_outstanding_io", SYSFS_RW),
5521+ SYSFS_INT(&target_outstanding_io, 0, TARGET_OUTSTANDING_IO, 0),
7f9d2ee0 5522+ }
24613191 5523+};
5524+
ad8f4a28 5525+static struct toi_module_ops toi_blockwriter_ops = {
4e97e4e9 5526+ .name = "lowlevel i/o",
5527+ .type = MISC_HIDDEN_MODULE,
5528+ .directory = "block_io",
5529+ .module = THIS_MODULE,
ad8f4a28 5530+ .print_debug_info = toi_bio_print_debug_stats,
4e97e4e9 5531+ .memory_needed = toi_bio_memory_needed,
5532+ .storage_needed = toi_bio_storage_needed,
5533+ .save_config_info = toi_bio_save_config_info,
5534+ .load_config_info = toi_bio_load_config_info,
5535+ .initialise = toi_bio_initialise,
5536+ .cleanup = toi_bio_cleanup,
24613191 5537+
4e97e4e9 5538+ .sysfs_data = sysfs_params,
ad8f4a28
AM
5539+ .num_sysfs_entries = sizeof(sysfs_params) /
5540+ sizeof(struct toi_sysfs_data),
4e97e4e9 5541+};
24613191 5542+
4e97e4e9 5543+/**
5544+ * toi_block_io_load: Load time routine for block i/o module.
5545+ *
5546+ * Register block i/o ops and sysfs entries.
5547+ */
5548+static __init int toi_block_io_load(void)
5549+{
5550+ return toi_register_module(&toi_blockwriter_ops);
5551+}
24613191 5552+
4e97e4e9 5553+#if defined(CONFIG_TOI_FILE_EXPORTS) || defined(CONFIG_TOI_SWAP_EXPORTS)
5554+EXPORT_SYMBOL_GPL(toi_writer_posn);
5555+EXPORT_SYMBOL_GPL(toi_writer_posn_save);
5556+EXPORT_SYMBOL_GPL(toi_writer_buffer);
5557+EXPORT_SYMBOL_GPL(toi_writer_buffer_posn);
5558+EXPORT_SYMBOL_GPL(toi_bio_ops);
5559+#endif
5560+#ifdef MODULE
5561+static __exit void toi_block_io_unload(void)
5562+{
5563+ toi_unregister_module(&toi_blockwriter_ops);
5564+}
24613191 5565+
4e97e4e9 5566+module_init(toi_block_io_load);
5567+module_exit(toi_block_io_unload);
5568+MODULE_LICENSE("GPL");
5569+MODULE_AUTHOR("Nigel Cunningham");
5570+MODULE_DESCRIPTION("TuxOnIce block io functions");
5571+#else
5572+late_initcall(toi_block_io_load);
24613191 5573+#endif
4e97e4e9 5574diff --git a/kernel/power/tuxonice_block_io.h b/kernel/power/tuxonice_block_io.h
5575new file mode 100644
7f9d2ee0 5576index 0000000..833a685
4e97e4e9 5577--- /dev/null
5578+++ b/kernel/power/tuxonice_block_io.h
7f9d2ee0 5579@@ -0,0 +1,57 @@
24613191 5580+/*
4e97e4e9 5581+ * kernel/power/tuxonice_block_io.h
24613191 5582+ *
4e97e4e9 5583+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
5584+ * Copyright (C) 2006 Red Hat, inc.
24613191 5585+ *
4e97e4e9 5586+ * Distributed under GPLv2.
24613191 5587+ *
4e97e4e9 5588+ * This file contains declarations for functions exported from
5589+ * tuxonice_block_io.c, which contains low level io functions.
24613191 5590+ */
5591+
4e97e4e9 5592+#include <linux/buffer_head.h>
5593+#include "tuxonice_extent.h"
24613191 5594+
4e97e4e9 5595+struct toi_bdev_info {
5596+ struct block_device *bdev;
5597+ dev_t dev_t;
5598+ int bmap_shift;
5599+ int blocks_per_page;
5600+};
24613191 5601+
ad8f4a28 5602+/*
4e97e4e9 5603+ * Our exported interface so the swapwriter and filewriter don't
5604+ * need these functions duplicated.
24613191 5605+ */
4e97e4e9 5606+struct toi_bio_ops {
7f9d2ee0 5607+ int (*bdev_page_io) (int rw, struct block_device *bdev, long pos,
4e97e4e9 5608+ struct page *page);
5609+ void (*check_io_stats) (void);
5610+ void (*reset_io_stats) (void);
5611+ void (*finish_all_io) (void);
ad8f4a28 5612+ int (*forward_one_page) (int writing);
4e97e4e9 5613+ void (*set_extra_page_forward) (void);
5614+ void (*set_devinfo) (struct toi_bdev_info *info);
5615+ int (*read_page) (unsigned long *index, struct page *buffer_page,
5616+ unsigned int *buf_size);
5617+ int (*write_page) (unsigned long index, struct page *buffer_page,
5618+ unsigned int buf_size);
5619+ void (*read_header_init) (void);
5620+ int (*rw_header_chunk) (int rw, struct toi_module_ops *owner,
5621+ char *buffer, int buffer_size);
7f9d2ee0 5622+ int (*rw_header_chunk_noreadahead) (int rw,
5623+ struct toi_module_ops *owner,
5624+ char *buffer, int buffer_size);
4e97e4e9 5625+ int (*write_header_chunk_finish) (void);
5626+ int (*rw_init) (int rw, int stream_number);
5627+ int (*rw_cleanup) (int rw);
7f9d2ee0 5628+ void (*io_flusher) (int rw);
4e97e4e9 5629+};
24613191 5630+
4e97e4e9 5631+extern struct toi_bio_ops toi_bio_ops;
5632+
5633+extern char *toi_writer_buffer;
5634+extern int toi_writer_buffer_posn;
7f9d2ee0 5635+extern struct extent_iterate_saved_state toi_writer_posn_save[4];
4e97e4e9 5636+extern struct extent_iterate_state toi_writer_posn;
5637diff --git a/kernel/power/tuxonice_builtin.c b/kernel/power/tuxonice_builtin.c
5638new file mode 100644
7f9d2ee0 5639index 0000000..11cf575
4e97e4e9 5640--- /dev/null
5641+++ b/kernel/power/tuxonice_builtin.c
7f9d2ee0 5642@@ -0,0 +1,400 @@
4e97e4e9 5643+/*
5644+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
5645+ *
5646+ * This file is released under the GPLv2.
5647+ */
5648+#include <linux/module.h>
5649+#include <linux/resume-trace.h>
5650+#include <linux/syscalls.h>
5651+#include <linux/kernel.h>
5652+#include <linux/swap.h>
5653+#include <linux/syscalls.h>
5654+#include <linux/bio.h>
5655+#include <linux/root_dev.h>
5656+#include <linux/freezer.h>
5657+#include <linux/reboot.h>
5658+#include <linux/writeback.h>
5659+#include <linux/tty.h>
5660+#include <linux/crypto.h>
5661+#include <linux/cpu.h>
5662+#include <linux/dyn_pageflags.h>
ad8f4a28 5663+#include <linux/ctype.h>
4e97e4e9 5664+#include "tuxonice_io.h"
5665+#include "tuxonice.h"
5666+#include "tuxonice_extent.h"
5667+#include "tuxonice_block_io.h"
5668+#include "tuxonice_netlink.h"
5669+#include "tuxonice_prepare_image.h"
5670+#include "tuxonice_ui.h"
5671+#include "tuxonice_sysfs.h"
5672+#include "tuxonice_pagedir.h"
5673+#include "tuxonice_modules.h"
5674+#include "tuxonice_builtin.h"
ad8f4a28 5675+#include "tuxonice_power_off.h"
7f9d2ee0 5676+#include "power.h"
24613191 5677+
4e97e4e9 5678+/*
5679+ * Highmem related functions (x86 only).
5680+ */
24613191 5681+
4e97e4e9 5682+#ifdef CONFIG_HIGHMEM
24613191 5683+
4e97e4e9 5684+/**
5685+ * copyback_high: Restore highmem pages.
5686+ *
5687+ * Highmem data and pbe lists are/can be stored in highmem.
5688+ * The format is slightly different to the lowmem pbe lists
5689+ * used for the assembly code: the last pbe in each page is
5690+ * a struct page * instead of struct pbe *, pointing to the
5691+ * next page where pbes are stored (or NULL if happens to be
5692+ * the end of the list). Since we don't want to generate
5693+ * unnecessary deltas against swsusp code, we use a cast
5694+ * instead of a union.
5695+ **/
24613191 5696+
4e97e4e9 5697+static void copyback_high(void)
5698+{
ad8f4a28 5699+ struct page *pbe_page = (struct page *) restore_highmem_pblist;
4e97e4e9 5700+ struct pbe *this_pbe, *first_pbe;
5701+ unsigned long *origpage, *copypage;
5702+ int pbe_index = 1;
24613191 5703+
4e97e4e9 5704+ if (!pbe_page)
5705+ return;
24613191 5706+
4e97e4e9 5707+ this_pbe = (struct pbe *) kmap_atomic(pbe_page, KM_BOUNCE_READ);
5708+ first_pbe = this_pbe;
5709+
5710+ while (this_pbe) {
5711+ int loop = (PAGE_SIZE / sizeof(unsigned long)) - 1;
5712+
5713+ origpage = kmap_atomic((struct page *) this_pbe->orig_address,
5714+ KM_BIO_DST_IRQ);
5715+ copypage = kmap_atomic((struct page *) this_pbe->address,
5716+ KM_BIO_SRC_IRQ);
5717+
5718+ while (loop >= 0) {
5719+ *(origpage + loop) = *(copypage + loop);
5720+ loop--;
24613191 5721+ }
24613191 5722+
4e97e4e9 5723+ kunmap_atomic(origpage, KM_BIO_DST_IRQ);
5724+ kunmap_atomic(copypage, KM_BIO_SRC_IRQ);
24613191 5725+
4e97e4e9 5726+ if (!this_pbe->next)
5727+ break;
24613191 5728+
4e97e4e9 5729+ if (pbe_index < PBES_PER_PAGE) {
5730+ this_pbe++;
5731+ pbe_index++;
5732+ } else {
5733+ pbe_page = (struct page *) this_pbe->next;
5734+ kunmap_atomic(first_pbe, KM_BOUNCE_READ);
5735+ if (!pbe_page)
5736+ return;
5737+ this_pbe = (struct pbe *) kmap_atomic(pbe_page,
5738+ KM_BOUNCE_READ);
5739+ first_pbe = this_pbe;
5740+ pbe_index = 1;
5741+ }
24613191 5742+ }
4e97e4e9 5743+ kunmap_atomic(first_pbe, KM_BOUNCE_READ);
24613191 5744+}
5745+
4e97e4e9 5746+#else /* CONFIG_HIGHMEM */
5747+void copyback_high(void) { }
5748+#endif
24613191 5749+
ad8f4a28
AM
5750+char toi_wait_for_keypress_dev_console(int timeout)
5751+{
5752+ int fd, this_timeout = 255;
5753+ char key = '\0';
5754+ struct termios t, t_backup;
5755+
5756+ /* We should be guaranteed /dev/console exists after populate_rootfs()
5757+ * in init/main.c.
5758+ */
5759+ fd = sys_open("/dev/console", O_RDONLY, 0);
5760+ if (fd < 0) {
5761+ printk(KERN_INFO "Couldn't open /dev/console.\n");
5762+ return key;
5763+ }
5764+
5765+ if (sys_ioctl(fd, TCGETS, (long)&t) < 0)
5766+ goto out_close;
5767+
5768+ memcpy(&t_backup, &t, sizeof(t));
5769+
5770+ t.c_lflag &= ~(ISIG|ICANON|ECHO);
5771+ t.c_cc[VMIN] = 0;
5772+
5773+new_timeout:
5774+ if (timeout > 0) {
5775+ this_timeout = timeout < 26 ? timeout : 25;
5776+ timeout -= this_timeout;
5777+ this_timeout *= 10;
5778+ }
5779+
5780+ t.c_cc[VTIME] = this_timeout;
5781+
5782+ if (sys_ioctl(fd, TCSETS, (long)&t) < 0)
5783+ goto out_restore;
5784+
5785+ while (1) {
5786+ if (sys_read(fd, &key, 1) <= 0) {
5787+ if (timeout)
5788+ goto new_timeout;
5789+ key = '\0';
5790+ break;
5791+ }
5792+ key = tolower(key);
5793+ if (test_toi_state(TOI_SANITY_CHECK_PROMPT)) {
5794+ if (key == 'c') {
5795+ set_toi_state(TOI_CONTINUE_REQ);
5796+ break;
5797+ } else if (key == ' ')
5798+ break;
5799+ } else
5800+ break;
5801+ }
5802+
5803+out_restore:
5804+ sys_ioctl(fd, TCSETS, (long)&t_backup);
5805+out_close:
5806+ sys_close(fd);
5807+
5808+ return key;
5809+}
5810+
7f9d2ee0 5811+struct toi_boot_kernel_data toi_bkd __nosavedata
ad8f4a28 5812+ __attribute__((aligned(PAGE_SIZE))) = {
7f9d2ee0 5813+ MY_BOOT_KERNEL_DATA_VERSION,
5814+ 0,
ad8f4a28 5815+#ifdef CONFIG_TOI_REPLACE_SWSUSP
7f9d2ee0 5816+ (1 << TOI_REPLACE_SWSUSP) |
ad8f4a28 5817+#endif
7f9d2ee0 5818+ (1 << TOI_PAGESET2_FULL) | (1 << TOI_LATE_CPU_HOTPLUG),
ad8f4a28
AM
5819+};
5820+EXPORT_SYMBOL_GPL(toi_bkd);
5821+
5822+struct block_device *toi_open_by_devnum(dev_t dev, unsigned mode)
5823+{
5824+ struct block_device *bdev = bdget(dev);
5825+ int err = -ENOMEM;
5826+ int flags = mode & FMODE_WRITE ? O_RDWR : O_RDONLY;
5827+ flags |= O_NONBLOCK;
5828+ if (bdev)
5829+ err = blkdev_get(bdev, mode, flags);
5830+ return err ? ERR_PTR(err) : bdev;
5831+}
5832+EXPORT_SYMBOL_GPL(toi_open_by_devnum);
5833+
5834+EXPORT_SYMBOL_GPL(toi_wait_for_keypress_dev_console);
4e97e4e9 5835+EXPORT_SYMBOL_GPL(hibernation_platform_enter);
7f9d2ee0 5836+EXPORT_SYMBOL_GPL(platform_begin);
ad8f4a28
AM
5837+EXPORT_SYMBOL_GPL(platform_pre_snapshot);
5838+EXPORT_SYMBOL_GPL(platform_leave);
7f9d2ee0 5839+EXPORT_SYMBOL_GPL(platform_end);
ad8f4a28
AM
5840+EXPORT_SYMBOL_GPL(platform_finish);
5841+EXPORT_SYMBOL_GPL(platform_pre_restore);
5842+EXPORT_SYMBOL_GPL(platform_restore_cleanup);
7f9d2ee0 5843+EXPORT_SYMBOL_GPL(power_kobj);
5844+EXPORT_SYMBOL_GPL(pm_notifier_call_chain);
5845+EXPORT_SYMBOL_GPL(init_swsusp_header);
ad8f4a28
AM
5846+
5847+#ifdef CONFIG_ARCH_HIBERNATION_HEADER
5848+EXPORT_SYMBOL_GPL(arch_hibernation_header_save);
5849+EXPORT_SYMBOL_GPL(arch_hibernation_header_restore);
5850+#endif
24613191 5851+
4e97e4e9 5852+#ifdef CONFIG_TOI_CORE_EXPORTS
4e97e4e9 5853+#ifdef CONFIG_X86_64
5854+EXPORT_SYMBOL_GPL(restore_processor_state);
5855+EXPORT_SYMBOL_GPL(save_processor_state);
5856+#endif
5857+
4e97e4e9 5858+EXPORT_SYMBOL_GPL(drop_pagecache);
5859+EXPORT_SYMBOL_GPL(restore_pblist);
5860+EXPORT_SYMBOL_GPL(pm_mutex);
5861+EXPORT_SYMBOL_GPL(pm_restore_console);
5862+EXPORT_SYMBOL_GPL(super_blocks);
5863+EXPORT_SYMBOL_GPL(next_zone);
5864+
5865+EXPORT_SYMBOL_GPL(freeze_processes);
5866+EXPORT_SYMBOL_GPL(thaw_processes);
5867+EXPORT_SYMBOL_GPL(thaw_kernel_threads);
5868+EXPORT_SYMBOL_GPL(shrink_all_memory);
5869+EXPORT_SYMBOL_GPL(shrink_one_zone);
5870+EXPORT_SYMBOL_GPL(saveable_page);
5871+EXPORT_SYMBOL_GPL(swsusp_arch_suspend);
5872+EXPORT_SYMBOL_GPL(swsusp_arch_resume);
4e97e4e9 5873+EXPORT_SYMBOL_GPL(pm_prepare_console);
5874+EXPORT_SYMBOL_GPL(follow_page);
5875+EXPORT_SYMBOL_GPL(machine_halt);
5876+EXPORT_SYMBOL_GPL(block_dump);
5877+EXPORT_SYMBOL_GPL(unlink_lru_lists);
5878+EXPORT_SYMBOL_GPL(relink_lru_lists);
4e97e4e9 5879+EXPORT_SYMBOL_GPL(machine_power_off);
5880+EXPORT_SYMBOL_GPL(suspend_devices_and_enter);
5881+EXPORT_SYMBOL_GPL(first_online_pgdat);
5882+EXPORT_SYMBOL_GPL(next_online_pgdat);
5883+EXPORT_SYMBOL_GPL(machine_restart);
5884+EXPORT_SYMBOL_GPL(saved_command_line);
5885+EXPORT_SYMBOL_GPL(tasklist_lock);
5886+#ifdef CONFIG_PM_SLEEP_SMP
5887+EXPORT_SYMBOL_GPL(disable_nonboot_cpus);
5888+EXPORT_SYMBOL_GPL(enable_nonboot_cpus);
5889+#endif
5890+#endif
5891+
5892+int toi_wait = CONFIG_TOI_DEFAULT_WAIT;
5893+
5894+#ifdef CONFIG_TOI_USERUI_EXPORTS
5895+EXPORT_SYMBOL_GPL(kmsg_redirect);
4e97e4e9 5896+#endif
ad8f4a28 5897+EXPORT_SYMBOL_GPL(toi_wait);
4e97e4e9 5898+
5899+#if defined(CONFIG_TOI_USERUI_EXPORTS) || defined(CONFIG_TOI_CORE_EXPORTS)
5900+EXPORT_SYMBOL_GPL(console_printk);
5901+#endif
5902+#ifdef CONFIG_TOI_SWAP_EXPORTS /* TuxOnIce swap specific */
5903+EXPORT_SYMBOL_GPL(sys_swapon);
5904+EXPORT_SYMBOL_GPL(sys_swapoff);
5905+EXPORT_SYMBOL_GPL(si_swapinfo);
5906+EXPORT_SYMBOL_GPL(map_swap_page);
5907+EXPORT_SYMBOL_GPL(get_swap_page);
5908+EXPORT_SYMBOL_GPL(swap_free);
5909+EXPORT_SYMBOL_GPL(get_swap_info_struct);
5910+#endif
24613191 5911+
4e97e4e9 5912+#ifdef CONFIG_TOI_FILE_EXPORTS
5913+/* TuxOnice file allocator specific support */
5914+EXPORT_SYMBOL_GPL(sys_unlink);
5915+EXPORT_SYMBOL_GPL(sys_mknod);
5916+#endif
5917+
5918+/* Swap or file */
5919+#if defined(CONFIG_TOI_FILE_EXPORTS) || defined(CONFIG_TOI_SWAP_EXPORTS)
5920+EXPORT_SYMBOL_GPL(bio_set_pages_dirty);
5921+EXPORT_SYMBOL_GPL(name_to_dev_t);
5922+#endif
5923+
4e97e4e9 5924+#if defined(CONFIG_TOI_FILE_EXPORTS) || defined(CONFIG_TOI_SWAP_EXPORTS) || \
5925+ defined(CONFIG_TOI_CORE_EXPORTS)
5926+EXPORT_SYMBOL_GPL(resume_file);
5927+#endif
5928+struct toi_core_fns *toi_core_fns;
5929+EXPORT_SYMBOL_GPL(toi_core_fns);
5930+
5931+DECLARE_DYN_PAGEFLAGS(pageset1_map);
5932+DECLARE_DYN_PAGEFLAGS(pageset1_copy_map);
5933+EXPORT_SYMBOL_GPL(pageset1_map);
5934+EXPORT_SYMBOL_GPL(pageset1_copy_map);
5935+
ad8f4a28 5936+unsigned long toi_result;
4e97e4e9 5937+struct pagedir pagedir1 = {1};
5938+
4e97e4e9 5939+EXPORT_SYMBOL_GPL(toi_result);
5940+EXPORT_SYMBOL_GPL(pagedir1);
5941+
5942+unsigned long toi_get_nonconflicting_page(void)
5943+{
5944+ return toi_core_fns->get_nonconflicting_page();
24613191 5945+}
5946+
4e97e4e9 5947+int toi_post_context_save(void)
5948+{
5949+ return toi_core_fns->post_context_save();
5950+}
24613191 5951+
4e97e4e9 5952+int toi_try_hibernate(int have_pmsem)
24613191 5953+{
4e97e4e9 5954+ if (!toi_core_fns)
5955+ return -ENODEV;
24613191 5956+
4e97e4e9 5957+ return toi_core_fns->try_hibernate(have_pmsem);
24613191 5958+}
5959+
4e97e4e9 5960+void toi_try_resume(void)
5961+{
5962+ if (toi_core_fns)
5963+ toi_core_fns->try_resume();
ad8f4a28
AM
5964+ else
5965+ printk(KERN_INFO "TuxOnIce core not loaded yet.\n");
4e97e4e9 5966+}
24613191 5967+
4e97e4e9 5968+int toi_lowlevel_builtin(void)
24613191 5969+{
4e97e4e9 5970+ int error = 0;
24613191 5971+
4e97e4e9 5972+ save_processor_state();
ad8f4a28
AM
5973+ error = swsusp_arch_suspend();
5974+ if (error)
4e97e4e9 5975+ printk(KERN_ERR "Error %d hibernating\n", error);
5976+
5977+ /* Restore control flow appears here */
5978+ if (!toi_in_hibernate) {
5979+ copyback_high();
5980+ set_toi_state(TOI_NOW_RESUMING);
5981+ }
5982+
5983+ restore_processor_state();
5984+
5985+ return error;
24613191 5986+}
5987+
4e97e4e9 5988+EXPORT_SYMBOL_GPL(toi_lowlevel_builtin);
24613191 5989+
4e97e4e9 5990+unsigned long toi_compress_bytes_in, toi_compress_bytes_out;
5991+EXPORT_SYMBOL_GPL(toi_compress_bytes_in);
5992+EXPORT_SYMBOL_GPL(toi_compress_bytes_out);
24613191 5993+
4e97e4e9 5994+unsigned long toi_state = ((1 << TOI_BOOT_TIME) |
5995+ (1 << TOI_IGNORE_LOGLEVEL) |
5996+ (1 << TOI_IO_STOPPED));
5997+EXPORT_SYMBOL_GPL(toi_state);
24613191 5998+
4e97e4e9 5999+/* The number of hibernates we have started (some may have been cancelled) */
6000+unsigned int nr_hibernates;
6001+EXPORT_SYMBOL_GPL(nr_hibernates);
24613191 6002+
ad8f4a28 6003+int toi_running;
4e97e4e9 6004+EXPORT_SYMBOL_GPL(toi_running);
24613191 6005+
4e97e4e9 6006+int toi_in_hibernate __nosavedata;
6007+EXPORT_SYMBOL_GPL(toi_in_hibernate);
24613191 6008+
4e97e4e9 6009+__nosavedata struct pbe *restore_highmem_pblist;
24613191 6010+
4e97e4e9 6011+#ifdef CONFIG_TOI_CORE_EXPORTS
6012+#ifdef CONFIG_HIGHMEM
6013+EXPORT_SYMBOL_GPL(nr_free_highpages);
6014+EXPORT_SYMBOL_GPL(saveable_highmem_page);
6015+EXPORT_SYMBOL_GPL(restore_highmem_pblist);
6016+#endif
ad8f4a28 6017+#endif
4e97e4e9 6018+
ad8f4a28
AM
6019+#if defined(CONFIG_TOI_CORE_EXPORTS) || defined(CONFIG_TOI_PAGEFLAGS_EXPORTS)
6020+EXPORT_SYMBOL_GPL(max_pfn);
6021+#endif
6022+
6023+#if defined(CONFIG_TOI_EXPORTS) || defined(CONFIG_TOI_CORE_EXPORTS)
6024+EXPORT_SYMBOL_GPL(snprintf_used);
4e97e4e9 6025+#endif
6026+
6027+static int __init toi_wait_setup(char *str)
24613191 6028+{
4e97e4e9 6029+ int value;
24613191 6030+
4e97e4e9 6031+ if (sscanf(str, "=%d", &value)) {
6032+ if (value < -1 || value > 255)
ad8f4a28
AM
6033+ printk(KERN_INFO "TuxOnIce_wait outside range -1 to "
6034+ "255.\n");
24613191 6035+ else
4e97e4e9 6036+ toi_wait = value;
24613191 6037+ }
6038+
4e97e4e9 6039+ return 1;
6040+}
24613191 6041+
4e97e4e9 6042+__setup("toi_wait", toi_wait_setup);
6043diff --git a/kernel/power/tuxonice_builtin.h b/kernel/power/tuxonice_builtin.h
6044new file mode 100644
7f9d2ee0 6045index 0000000..f4966c4
4e97e4e9 6046--- /dev/null
6047+++ b/kernel/power/tuxonice_builtin.h
7f9d2ee0 6048@@ -0,0 +1,31 @@
4e97e4e9 6049+/*
6050+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
6051+ *
6052+ * This file is released under the GPLv2.
6053+ */
6054+#include <linux/dyn_pageflags.h>
6055+#include <asm/setup.h>
24613191 6056+
4e97e4e9 6057+extern struct toi_core_fns *toi_core_fns;
6058+extern unsigned long toi_compress_bytes_in, toi_compress_bytes_out;
4e97e4e9 6059+extern unsigned int nr_hibernates;
6060+extern int toi_in_hibernate;
6061+
4e97e4e9 6062+extern __nosavedata struct pbe *restore_highmem_pblist;
24613191 6063+
4e97e4e9 6064+int toi_lowlevel_builtin(void);
24613191 6065+
4e97e4e9 6066+extern struct dyn_pageflags __nosavedata toi_nosave_origmap;
6067+extern struct dyn_pageflags __nosavedata toi_nosave_copymap;
6068+
6069+#ifdef CONFIG_HIGHMEM
6070+extern __nosavedata struct zone_data *toi_nosave_zone_list;
6071+extern __nosavedata unsigned long toi_nosave_max_pfn;
6072+#endif
24613191 6073+
4e97e4e9 6074+extern unsigned long toi_get_nonconflicting_page(void);
6075+extern int toi_post_context_save(void);
6076+extern int toi_try_hibernate(int have_pmsem);
ad8f4a28
AM
6077+extern char toi_wait_for_keypress_dev_console(int timeout);
6078+extern struct block_device *toi_open_by_devnum(dev_t dev, unsigned mode);
7f9d2ee0 6079+extern int toi_wait;
4e97e4e9 6080diff --git a/kernel/power/tuxonice_checksum.c b/kernel/power/tuxonice_checksum.c
6081new file mode 100644
ad8f4a28 6082index 0000000..eea3029
4e97e4e9 6083--- /dev/null
6084+++ b/kernel/power/tuxonice_checksum.c
ad8f4a28 6085@@ -0,0 +1,389 @@
24613191 6086+/*
4e97e4e9 6087+ * kernel/power/tuxonice_checksum.c
24613191 6088+ *
4e97e4e9 6089+ * Copyright (C) 2006-2007 Nigel Cunningham (nigel at tuxonice net)
6090+ * Copyright (C) 2006 Red Hat, inc.
6091+ *
6092+ * This file is released under the GPLv2.
6093+ *
6094+ * This file contains data checksum routines for TuxOnIce,
6095+ * using cryptoapi. They are used to locate any modifications
6096+ * made to pageset 2 while we're saving it.
24613191 6097+ */
24613191 6098+
4e97e4e9 6099+#include <linux/suspend.h>
6100+#include <linux/module.h>
6101+#include <linux/highmem.h>
6102+#include <linux/vmalloc.h>
6103+#include <linux/crypto.h>
6104+#include <linux/scatterlist.h>
24613191 6105+
4e97e4e9 6106+#include "tuxonice.h"
6107+#include "tuxonice_modules.h"
6108+#include "tuxonice_sysfs.h"
6109+#include "tuxonice_io.h"
6110+#include "tuxonice_pageflags.h"
6111+#include "tuxonice_checksum.h"
6112+#include "tuxonice_pagedir.h"
ad8f4a28 6113+#include "tuxonice_alloc.h"
24613191 6114+
4e97e4e9 6115+static struct toi_module_ops toi_checksum_ops;
24613191 6116+
4e97e4e9 6117+/* Constant at the mo, but I might allow tuning later */
6118+static char toi_checksum_name[32] = "md4";
6119+/* Bytes per checksum */
6120+#define CHECKSUM_SIZE (16)
24613191 6121+
4e97e4e9 6122+#define CHECKSUMS_PER_PAGE ((PAGE_SIZE - sizeof(void *)) / CHECKSUM_SIZE)
24613191 6123+
4e97e4e9 6124+struct cpu_context {
6125+ struct crypto_hash *transform;
6126+ struct hash_desc desc;
6127+ struct scatterlist sg[2];
6128+ char *buf;
6129+};
24613191 6130+
4e97e4e9 6131+static DEFINE_PER_CPU(struct cpu_context, contexts);
6132+static int pages_allocated;
6133+static unsigned long page_list;
24613191 6134+
ad8f4a28
AM
6135+static int toi_num_resaved;
6136+
6137+static unsigned long this_checksum, next_page;
6138+static int checksum_index;
24613191 6139+
ad8f4a28
AM
6140+static inline int checksum_pages_needed(void)
6141+{
6142+ return DIV_ROUND_UP(pagedir2.size, CHECKSUMS_PER_PAGE);
6143+}
24613191 6144+
4e97e4e9 6145+/* ---- Local buffer management ---- */
24613191 6146+
ad8f4a28 6147+/*
4e97e4e9 6148+ * toi_checksum_cleanup
6149+ *
6150+ * Frees memory allocated for our labours.
6151+ */
6152+static void toi_checksum_cleanup(int ending_cycle)
6153+{
6154+ int cpu;
24613191 6155+
4e97e4e9 6156+ if (ending_cycle) {
6157+ for_each_online_cpu(cpu) {
6158+ struct cpu_context *this = &per_cpu(contexts, cpu);
6159+ if (this->transform) {
6160+ crypto_free_hash(this->transform);
6161+ this->transform = NULL;
6162+ this->desc.tfm = NULL;
24613191 6163+ }
6164+
4e97e4e9 6165+ if (this->buf) {
ad8f4a28 6166+ toi_free_page(27, (unsigned long) this->buf);
4e97e4e9 6167+ this->buf = NULL;
24613191 6168+ }
6169+ }
4e97e4e9 6170+ }
6171+}
24613191 6172+
ad8f4a28
AM
6173+/*
6174+ * toi_crypto_initialise
4e97e4e9 6175+ *
6176+ * Prepare to do some work by allocating buffers and transforms.
6177+ * Returns: Int: Zero. Even if we can't set up checksum, we still
6178+ * seek to hibernate.
6179+ */
ad8f4a28 6180+static int toi_checksum_initialise(int starting_cycle)
4e97e4e9 6181+{
6182+ int cpu;
24613191 6183+
4e97e4e9 6184+ if (!(starting_cycle & SYSFS_HIBERNATE) || !toi_checksum_ops.enabled)
6185+ return 0;
24613191 6186+
4e97e4e9 6187+ if (!*toi_checksum_name) {
ad8f4a28 6188+ printk(KERN_INFO "TuxOnIce: No checksum algorithm name set.\n");
4e97e4e9 6189+ return 1;
6190+ }
24613191 6191+
4e97e4e9 6192+ for_each_online_cpu(cpu) {
6193+ struct cpu_context *this = &per_cpu(contexts, cpu);
6194+ struct page *page;
24613191 6195+
4e97e4e9 6196+ this->transform = crypto_alloc_hash(toi_checksum_name, 0, 0);
6197+ if (IS_ERR(this->transform)) {
ad8f4a28
AM
6198+ printk(KERN_INFO "TuxOnIce: Failed to initialise the "
6199+ "%s checksum algorithm: %ld.\n",
4e97e4e9 6200+ toi_checksum_name, (long) this->transform);
6201+ this->transform = NULL;
6202+ return 1;
24613191 6203+ }
24613191 6204+
4e97e4e9 6205+ this->desc.tfm = this->transform;
6206+ this->desc.flags = 0;
24613191 6207+
ad8f4a28 6208+ page = toi_alloc_page(27, GFP_KERNEL);
4e97e4e9 6209+ if (!page)
6210+ return 1;
6211+ this->buf = page_address(page);
6212+ sg_set_buf(&this->sg[0], this->buf, PAGE_SIZE);
6213+ }
6214+ return 0;
6215+}
24613191 6216+
ad8f4a28 6217+/*
4e97e4e9 6218+ * toi_checksum_print_debug_stats
6219+ * @buffer: Pointer to a buffer into which the debug info will be printed.
6220+ * @size: Size of the buffer.
6221+ *
6222+ * Print information to be recorded for debugging purposes into a buffer.
6223+ * Returns: Number of characters written to the buffer.
6224+ */
24613191 6225+
4e97e4e9 6226+static int toi_checksum_print_debug_stats(char *buffer, int size)
6227+{
6228+ int len;
24613191 6229+
4e97e4e9 6230+ if (!toi_checksum_ops.enabled)
6231+ return snprintf_used(buffer, size,
6232+ "- Checksumming disabled.\n");
ad8f4a28 6233+
4e97e4e9 6234+ len = snprintf_used(buffer, size, "- Checksum method is '%s'.\n",
6235+ toi_checksum_name);
ad8f4a28 6236+ len += snprintf_used(buffer + len, size - len,
4e97e4e9 6237+ " %d pages resaved in atomic copy.\n", toi_num_resaved);
6238+ return len;
24613191 6239+}
6240+
ad8f4a28
AM
6241+static int toi_checksum_memory_needed(void)
6242+{
6243+ return toi_checksum_ops.enabled ?
6244+ checksum_pages_needed() << PAGE_SHIFT : 0;
6245+}
6246+
4e97e4e9 6247+static int toi_checksum_storage_needed(void)
24613191 6248+{
4e97e4e9 6249+ if (toi_checksum_ops.enabled)
6250+ return strlen(toi_checksum_name) + sizeof(int) + 1;
6251+ else
6252+ return 0;
6253+}
24613191 6254+
ad8f4a28 6255+/*
4e97e4e9 6256+ * toi_checksum_save_config_info
6257+ * @buffer: Pointer to a buffer of size PAGE_SIZE.
6258+ *
6259+ * Save informaton needed when reloading the image at resume time.
6260+ * Returns: Number of bytes used for saving our data.
6261+ */
6262+static int toi_checksum_save_config_info(char *buffer)
6263+{
6264+ int namelen = strlen(toi_checksum_name) + 1;
6265+ int total_len;
ad8f4a28 6266+
4e97e4e9 6267+ *((unsigned int *) buffer) = namelen;
6268+ strncpy(buffer + sizeof(unsigned int), toi_checksum_name, namelen);
6269+ total_len = sizeof(unsigned int) + namelen;
6270+ return total_len;
24613191 6271+}
6272+
4e97e4e9 6273+/* toi_checksum_load_config_info
6274+ * @buffer: Pointer to the start of the data.
6275+ * @size: Number of bytes that were saved.
24613191 6276+ *
4e97e4e9 6277+ * Description: Reload information needed for dechecksuming the image at
6278+ * resume time.
24613191 6279+ */
4e97e4e9 6280+static void toi_checksum_load_config_info(char *buffer, int size)
24613191 6281+{
4e97e4e9 6282+ int namelen;
24613191 6283+
4e97e4e9 6284+ namelen = *((unsigned int *) (buffer));
6285+ strncpy(toi_checksum_name, buffer + sizeof(unsigned int),
6286+ namelen);
6287+ return;
6288+}
24613191 6289+
4e97e4e9 6290+/*
6291+ * Free Checksum Memory
6292+ */
24613191 6293+
4e97e4e9 6294+void free_checksum_pages(void)
6295+{
6296+ while (pages_allocated) {
6297+ unsigned long next = *((unsigned long *) page_list);
6298+ ClearPageNosave(virt_to_page(page_list));
ad8f4a28 6299+ toi_free_page(15, (unsigned long) page_list);
4e97e4e9 6300+ page_list = next;
6301+ pages_allocated--;
24613191 6302+ }
4e97e4e9 6303+}
24613191 6304+
4e97e4e9 6305+/*
6306+ * Allocate Checksum Memory
6307+ */
24613191 6308+
4e97e4e9 6309+int allocate_checksum_pages(void)
6310+{
ad8f4a28 6311+ int pages_needed = checksum_pages_needed();
24613191 6312+
4e97e4e9 6313+ if (!toi_checksum_ops.enabled)
6314+ return 0;
24613191 6315+
4e97e4e9 6316+ while (pages_allocated < pages_needed) {
6317+ unsigned long *new_page =
6318+ (unsigned long *) toi_get_zeroed_page(15, TOI_ATOMIC_GFP);
ad8f4a28
AM
6319+ if (!new_page) {
6320+ printk("Unable to allocate checksum pages.\n");
4e97e4e9 6321+ return -ENOMEM;
ad8f4a28 6322+ }
4e97e4e9 6323+ SetPageNosave(virt_to_page(new_page));
6324+ (*new_page) = page_list;
6325+ page_list = (unsigned long) new_page;
6326+ pages_allocated++;
24613191 6327+ }
6328+
4e97e4e9 6329+ next_page = (unsigned long) page_list;
6330+ checksum_index = 0;
e8d0ad9d 6331+
4e97e4e9 6332+ return 0;
6333+}
24613191 6334+
4e97e4e9 6335+#if 0
6336+static void print_checksum(char *buf, int size)
6337+{
6338+ int index;
24613191 6339+
4e97e4e9 6340+ for (index = 0; index < size; index++)
ad8f4a28 6341+ printk(KERN_INFO "%x ", buf[index]);
24613191 6342+
4e97e4e9 6343+ printk("\n");
6344+}
6345+#endif
24613191 6346+
ad8f4a28 6347+char *tuxonice_get_next_checksum(void)
4e97e4e9 6348+{
6349+ if (!toi_checksum_ops.enabled)
6350+ return NULL;
24613191 6351+
4e97e4e9 6352+ if (checksum_index % CHECKSUMS_PER_PAGE)
6353+ this_checksum += CHECKSUM_SIZE;
6354+ else {
6355+ this_checksum = next_page + sizeof(void *);
6356+ next_page = *((unsigned long *) next_page);
24613191 6357+ }
6358+
4e97e4e9 6359+ checksum_index++;
6360+ return (char *) this_checksum;
6361+}
24613191 6362+
4e97e4e9 6363+int tuxonice_calc_checksum(struct page *page, char *checksum_locn)
6364+{
6365+ char *pa;
6366+ int result, cpu = smp_processor_id();
6367+ struct cpu_context *ctx = &per_cpu(contexts, cpu);
24613191 6368+
4e97e4e9 6369+ if (!toi_checksum_ops.enabled)
6370+ return 0;
24613191 6371+
4e97e4e9 6372+ pa = kmap(page);
6373+ memcpy(ctx->buf, pa, PAGE_SIZE);
6374+ kunmap(page);
6375+ result = crypto_hash_digest(&ctx->desc, ctx->sg, PAGE_SIZE,
6376+ checksum_locn);
6377+ return result;
24613191 6378+}
4e97e4e9 6379+/*
6380+ * Calculate checksums
24613191 6381+ */
6382+
4e97e4e9 6383+void check_checksums(void)
24613191 6384+{
4e97e4e9 6385+ int pfn, index = 0, cpu = smp_processor_id();
6386+ unsigned long next_page, this_checksum = 0;
6387+ char current_checksum[CHECKSUM_SIZE];
6388+ struct cpu_context *ctx = &per_cpu(contexts, cpu);
24613191 6389+
4e97e4e9 6390+ if (!toi_checksum_ops.enabled)
6391+ return;
24613191 6392+
4e97e4e9 6393+ next_page = (unsigned long) page_list;
24613191 6394+
4e97e4e9 6395+ toi_num_resaved = 0;
24613191 6396+
4e97e4e9 6397+ BITMAP_FOR_EACH_SET(&pageset2_map, pfn) {
6398+ int ret;
6399+ char *pa;
6400+ struct page *page = pfn_to_page(pfn);
24613191 6401+
4e97e4e9 6402+ if (index % CHECKSUMS_PER_PAGE) {
6403+ this_checksum += CHECKSUM_SIZE;
6404+ } else {
6405+ this_checksum = next_page + sizeof(void *);
6406+ next_page = *((unsigned long *) next_page);
6407+ }
24613191 6408+
4e97e4e9 6409+ /* Done when IRQs disabled so must be atomic */
6410+ pa = kmap_atomic(page, KM_USER1);
6411+ memcpy(ctx->buf, pa, PAGE_SIZE);
6412+ kunmap_atomic(pa, KM_USER1);
ad8f4a28 6413+ ret = crypto_hash_digest(&ctx->desc, ctx->sg, PAGE_SIZE,
4e97e4e9 6414+ current_checksum);
24613191 6415+
4e97e4e9 6416+ if (ret) {
ad8f4a28 6417+ printk(KERN_INFO "Digest failed. Returned %d.\n", ret);
4e97e4e9 6418+ return;
24613191 6419+ }
24613191 6420+
4e97e4e9 6421+ if (memcmp(current_checksum, (char *) this_checksum,
6422+ CHECKSUM_SIZE)) {
6423+ SetPageResave(pfn_to_page(pfn));
6424+ toi_num_resaved++;
6425+ if (test_action_state(TOI_ABORT_ON_RESAVE_NEEDED))
6426+ set_abort_result(TOI_RESAVE_NEEDED);
6427+ }
24613191 6428+
4e97e4e9 6429+ index++;
24613191 6430+ }
4e97e4e9 6431+}
24613191 6432+
4e97e4e9 6433+static struct toi_sysfs_data sysfs_params[] = {
6434+ { TOI_ATTR("enabled", SYSFS_RW),
6435+ SYSFS_INT(&toi_checksum_ops.enabled, 0, 1, 0)
6436+ },
24613191 6437+
4e97e4e9 6438+ { TOI_ATTR("abort_if_resave_needed", SYSFS_RW),
ad8f4a28 6439+ SYSFS_BIT(&toi_bkd.toi_action, TOI_ABORT_ON_RESAVE_NEEDED, 0)
24613191 6440+ }
4e97e4e9 6441+};
6442+
6443+/*
6444+ * Ops structure.
6445+ */
6446+static struct toi_module_ops toi_checksum_ops = {
6447+ .type = MISC_MODULE,
6448+ .name = "checksumming",
6449+ .directory = "checksum",
6450+ .module = THIS_MODULE,
ad8f4a28 6451+ .initialise = toi_checksum_initialise,
4e97e4e9 6452+ .cleanup = toi_checksum_cleanup,
6453+ .print_debug_info = toi_checksum_print_debug_stats,
6454+ .save_config_info = toi_checksum_save_config_info,
6455+ .load_config_info = toi_checksum_load_config_info,
ad8f4a28 6456+ .memory_needed = toi_checksum_memory_needed,
4e97e4e9 6457+ .storage_needed = toi_checksum_storage_needed,
24613191 6458+
4e97e4e9 6459+ .sysfs_data = sysfs_params,
ad8f4a28
AM
6460+ .num_sysfs_entries = sizeof(sysfs_params) /
6461+ sizeof(struct toi_sysfs_data),
4e97e4e9 6462+};
6463+
6464+/* ---- Registration ---- */
6465+int toi_checksum_init(void)
6466+{
6467+ int result = toi_register_module(&toi_checksum_ops);
24613191 6468+ return result;
6469+}
6470+
4e97e4e9 6471+void toi_checksum_exit(void)
6472+{
6473+ toi_unregister_module(&toi_checksum_ops);
6474+}
6475diff --git a/kernel/power/tuxonice_checksum.h b/kernel/power/tuxonice_checksum.h
6476new file mode 100644
ad8f4a28 6477index 0000000..81b928d
4e97e4e9 6478--- /dev/null
6479+++ b/kernel/power/tuxonice_checksum.h
6480@@ -0,0 +1,32 @@
6481+/*
6482+ * kernel/power/tuxonice_checksum.h
24613191 6483+ *
4e97e4e9 6484+ * Copyright (C) 2006-2007 Nigel Cunningham (nigel at tuxonice net)
6485+ * Copyright (C) 2006 Red Hat, inc.
6486+ *
6487+ * This file is released under the GPLv2.
6488+ *
6489+ * This file contains data checksum routines for TuxOnIce,
6490+ * using cryptoapi. They are used to locate any modifications
6491+ * made to pageset 2 while we're saving it.
24613191 6492+ */
24613191 6493+
4e97e4e9 6494+#if defined(CONFIG_TOI_CHECKSUM)
6495+extern int toi_checksum_init(void);
6496+extern void toi_checksum_exit(void);
6497+void check_checksums(void);
6498+int allocate_checksum_pages(void);
6499+void free_checksum_pages(void);
ad8f4a28 6500+char *tuxonice_get_next_checksum(void);
4e97e4e9 6501+int tuxonice_calc_checksum(struct page *page, char *checksum_locn);
6502+#else
6503+static inline int toi_checksum_init(void) { return 0; }
6504+static inline void toi_checksum_exit(void) { }
6505+static inline void check_checksums(void) { };
6506+static inline int allocate_checksum_pages(void) { return 0; };
6507+static inline void free_checksum_pages(void) { };
6508+static inline char *tuxonice_get_next_checksum(void) { return NULL; };
6509+static inline int tuxonice_calc_checksum(struct page *page, char *checksum_locn)
6510+ { return 0; }
6511+#endif
24613191 6512+
4e97e4e9 6513diff --git a/kernel/power/tuxonice_cluster.c b/kernel/power/tuxonice_cluster.c
6514new file mode 100644
7f9d2ee0 6515index 0000000..b5c9ea1
4e97e4e9 6516--- /dev/null
6517+++ b/kernel/power/tuxonice_cluster.c
7f9d2ee0 6518@@ -0,0 +1,1088 @@
4e97e4e9 6519+/*
6520+ * kernel/power/tuxonice_cluster.c
6521+ *
6522+ * Copyright (C) 2006-2007 Nigel Cunningham (nigel at tuxonice net)
6523+ *
6524+ * This file is released under the GPLv2.
6525+ *
6526+ * This file contains routines for cluster hibernation support.
6527+ *
ad8f4a28
AM
6528+ * Based on ip autoconfiguration code in net/ipv4/ipconfig.c.
6529+ *
6530+ * How does it work?
6531+ *
6532+ * There is no 'master' node that tells everyone else what to do. All nodes
6533+ * send messages to the broadcast address/port, maintain a list of peers
6534+ * and figure out when to progress to the next step in hibernating or resuming.
6535+ * This makes us more fault tolerant when it comes to nodes coming and going
6536+ * (which may be more of an issue if we're hibernating when power supplies
6537+ * are being unreliable).
6538+ *
6539+ * At boot time, we start a ktuxonice thread that handles communication with
6540+ * other nodes. This node maintains a state machine that controls our progress
6541+ * through hibernating and resuming, keeping us in step with other nodes. Nodes
6542+ * are identified by their hw address.
6543+ *
6544+ * On startup, the node sends CLUSTER_PING on the configured interface's
6545+ * broadcast address, port $toi_cluster_port (see below) and begins to listen
6546+ * for other broadcast messages. CLUSTER_PING messages are repeated at
6547+ * intervals of 5 minutes, with a random offset to spread traffic out.
6548+ *
6549+ * A hibernation cycle is initiated from any node via
6550+ *
6551+ * echo > /sys/power/tuxonice/do_hibernate
6552+ *
6553+ * and (possibily) the hibernate script. At each step of the process, the node
6554+ * completes its work, and waits for all other nodes to signal completion of
6555+ * their work (or timeout) before progressing to the next step.
6556+ *
6557+ * Request/state Action before reply Possible reply Next state
6558+ * HIBERNATE capable, pre-script HIBERNATE|ACK NODE_PREP
6559+ * HIBERNATE|NACK INIT_0
6560+ *
6561+ * PREP prepare_image PREP|ACK IMAGE_WRITE
6562+ * PREP|NACK INIT_0
6563+ * ABORT RUNNING
6564+ *
6565+ * IO write image IO|ACK power off
6566+ * ABORT POST_RESUME
6567+ *
6568+ * (Boot time) check for image IMAGE|ACK RESUME_PREP
6569+ * (Note 1)
6570+ * IMAGE|NACK (Note 2)
6571+ *
6572+ * PREP prepare read image PREP|ACK IMAGE_READ
6573+ * PREP|NACK (As NACK_IMAGE)
6574+ *
6575+ * IO read image IO|ACK POST_RESUME
6576+ *
6577+ * POST_RESUME thaw, post-script RUNNING
6578+ *
6579+ * INIT_0 init 0
6580+ *
6581+ * Other messages:
6582+ *
6583+ * - PING: Request for all other live nodes to send a PONG. Used at startup to
6584+ * announce presence, when a node is suspected dead and periodically, in case
6585+ * segments of the network are [un]plugged.
6586+ *
6587+ * - PONG: Response to a PING.
6588+ *
6589+ * - ABORT: Request to cancel writing an image.
6590+ *
6591+ * - BYE: Notification that this node is shutting down.
6592+ *
6593+ * Note 1: Repeated at 3s intervals until we continue to boot/resume, so that
6594+ * nodes which are slower to start up can get state synchronised. If a node
6595+ * starting up sees other nodes sending RESUME_PREP or IMAGE_READ, it may send
6596+ * ACK_IMAGE and they will wait for it to catch up. If it sees ACK_READ, it
6597+ * must invalidate its image (if any) and boot normally.
6598+ *
6599+ * Note 2: May occur when one node lost power or powered off while others
6600+ * hibernated. This node waits for others to complete resuming (ACK_READ)
6601+ * before completing its boot, so that it appears as a fail node restarting.
6602+ *
6603+ * If any node has an image, then it also has a list of nodes that hibernated
6604+ * in synchronisation with it. The node will wait for other nodes to appear
6605+ * or timeout before beginning its restoration.
6606+ *
6607+ * If a node has no image, it needs to wait, in case other nodes which do have
6608+ * an image are going to resume, but are taking longer to announce their
6609+ * presence. For this reason, the user can specify a timeout value and a number
6610+ * of nodes detected before we just continue. (We might want to assume in a
6611+ * cluster of, say, 15 nodes, if 8 others have booted without finding an image,
6612+ * the remaining nodes will too. This might help in situations where some nodes
6613+ * are much slower to boot, or more subject to hardware failures or such like).
4e97e4e9 6614+ */
24613191 6615+
4e97e4e9 6616+#include <linux/suspend.h>
6617+#include <linux/module.h>
ad8f4a28
AM
6618+#include <linux/moduleparam.h>
6619+#include <linux/if.h>
6620+#include <linux/rtnetlink.h>
ad8f4a28
AM
6621+#include <linux/ip.h>
6622+#include <linux/udp.h>
6623+#include <linux/in.h>
6624+#include <linux/if_arp.h>
6625+#include <linux/kthread.h>
6626+#include <linux/wait.h>
7f9d2ee0 6627+#include <linux/netdevice.h>
ad8f4a28 6628+#include <net/ip.h>
24613191 6629+
4e97e4e9 6630+#include "tuxonice.h"
6631+#include "tuxonice_modules.h"
6632+#include "tuxonice_sysfs.h"
7f9d2ee0 6633+#include "tuxonice_alloc.h"
4e97e4e9 6634+#include "tuxonice_io.h"
24613191 6635+
ad8f4a28
AM
6636+#if 1
6637+#define PRINTK(a, b...) do { printk(a, ##b); } while (0)
6638+#else
6639+#define PRINTK(a, b...) do { } while (0)
6640+#endif
6641+
6642+static int loopback_mode;
6643+static int num_local_nodes = 1;
6644+#define MAX_LOCAL_NODES 8
6645+#define SADDR (loopback_mode ? b->sid : h->saddr)
6646+
6647+#define MYNAME "TuxOnIce Clustering"
6648+
6649+enum cluster_message {
6650+ MSG_ACK = 1,
6651+ MSG_NACK = 2,
6652+ MSG_PING = 4,
6653+ MSG_ABORT = 8,
6654+ MSG_BYE = 16,
6655+ MSG_HIBERNATE = 32,
6656+ MSG_IMAGE = 64,
6657+ MSG_IO = 128,
6658+ MSG_RUNNING = 256
6659+};
6660+
6661+static char *str_message(int message)
6662+{
6663+ switch (message) {
6664+ case 4:
6665+ return "Ping";
6666+ case 8:
6667+ return "Abort";
6668+ case 9:
6669+ return "Abort acked";
6670+ case 10:
6671+ return "Abort nacked";
6672+ case 16:
6673+ return "Bye";
6674+ case 17:
6675+ return "Bye acked";
6676+ case 18:
6677+ return "Bye nacked";
6678+ case 32:
6679+ return "Hibernate request";
6680+ case 33:
6681+ return "Hibernate ack";
6682+ case 34:
6683+ return "Hibernate nack";
6684+ case 64:
6685+ return "Image exists?";
6686+ case 65:
6687+ return "Image does exist";
6688+ case 66:
6689+ return "No image here";
6690+ case 128:
6691+ return "I/O";
6692+ case 129:
6693+ return "I/O okay";
6694+ case 130:
6695+ return "I/O failed";
6696+ case 256:
6697+ return "Running";
6698+ default:
6699+ printk("Unrecognised message %d.\n", message);
6700+ return "Unrecognised message (see dmesg)";
6701+ }
6702+}
6703+
6704+#define MSG_ACK_MASK (MSG_ACK | MSG_NACK)
6705+#define MSG_STATE_MASK (~MSG_ACK_MASK)
6706+
6707+struct node_info {
6708+ struct list_head member_list;
6709+ wait_queue_head_t member_events;
6710+ spinlock_t member_list_lock;
6711+ spinlock_t receive_lock;
6712+ int peer_count, ignored_peer_count;
6713+ struct toi_sysfs_data sysfs_data;
6714+ enum cluster_message current_message;
6715+};
6716+
6717+struct node_info node_array[MAX_LOCAL_NODES];
6718+
6719+struct cluster_member {
6720+ __be32 addr;
6721+ enum cluster_message message;
6722+ struct list_head list;
6723+ int ignore;
6724+};
24613191 6725+
ad8f4a28
AM
6726+#define toi_cluster_port_send 3501
6727+#define toi_cluster_port_recv 3502
6728+
6729+static struct net_device *net_dev;
4e97e4e9 6730+static struct toi_module_ops toi_cluster_ops;
24613191 6731+
ad8f4a28
AM
6732+static int toi_recv(struct sk_buff *skb, struct net_device *dev,
6733+ struct packet_type *pt, struct net_device *orig_dev);
6734+
6735+static struct packet_type toi_cluster_packet_type = {
6736+ .type = __constant_htons(ETH_P_IP),
6737+ .func = toi_recv,
6738+};
6739+
6740+struct toi_pkt { /* BOOTP packet format */
6741+ struct iphdr iph; /* IP header */
6742+ struct udphdr udph; /* UDP header */
6743+ u8 htype; /* HW address type */
6744+ u8 hlen; /* HW address length */
6745+ __be32 xid; /* Transaction ID */
6746+ __be16 secs; /* Seconds since we started */
6747+ __be16 flags; /* Just what it says */
6748+ u8 hw_addr[16]; /* Sender's HW address */
6749+ u16 message; /* Message */
6750+ unsigned long sid; /* Source ID for loopback testing */
6751+};
6752+
6753+static char toi_cluster_iface[IFNAMSIZ] = CONFIG_TOI_DEFAULT_CLUSTER_INTERFACE;
6754+
6755+static int added_pack;
6756+
7f9d2ee0 6757+static int others_have_image;
ad8f4a28
AM
6758+
6759+/* Key used to allow multiple clusters on the same lan */
6760+static char toi_cluster_key[32] = CONFIG_TOI_DEFAULT_CLUSTER_KEY;
6761+static char pre_hibernate_script[255] =
6762+ CONFIG_TOI_DEFAULT_CLUSTER_PRE_HIBERNATE;
6763+static char post_hibernate_script[255] =
6764+ CONFIG_TOI_DEFAULT_CLUSTER_POST_HIBERNATE;
6765+
6766+/* List of cluster members */
6767+static unsigned long continue_delay = 5 * HZ;
6768+static unsigned long cluster_message_timeout = 3 * HZ;
6769+
6770+/* === Membership list === */
6771+
6772+static void print_member_info(int index)
6773+{
6774+ struct cluster_member *this;
6775+
6776+ printk(KERN_INFO "==> Dumping node %d.\n", index);
6777+
6778+ list_for_each_entry(this, &node_array[index].member_list, list)
6779+ printk(KERN_INFO "%d.%d.%d.%d last message %s. %s\n",
6780+ NIPQUAD(this->addr),
6781+ str_message(this->message),
6782+ this->ignore ? "(Ignored)" : "");
6783+ printk(KERN_INFO "== Done ==\n");
6784+}
6785+
6786+static struct cluster_member *__find_member(int index, __be32 addr)
6787+{
6788+ struct cluster_member *this;
6789+
6790+ list_for_each_entry(this, &node_array[index].member_list, list) {
6791+ if (this->addr != addr)
6792+ continue;
6793+
6794+ return this;
6795+ }
6796+
6797+ return NULL;
6798+}
6799+
6800+static void set_ignore(int index, __be32 addr, struct cluster_member *this)
6801+{
6802+ if (this->ignore) {
6803+ PRINTK("Node %d already ignoring %d.%d.%d.%d.\n",
6804+ index, NIPQUAD(addr));
6805+ return;
6806+ }
6807+
6808+ PRINTK("Node %d sees node %d.%d.%d.%d now being ignored.\n",
6809+ index, NIPQUAD(addr));
6810+ this->ignore = 1;
6811+ node_array[index].ignored_peer_count++;
6812+}
6813+
6814+static int __add_update_member(int index, __be32 addr, int message)
6815+{
6816+ struct cluster_member *this;
6817+
6818+ this = __find_member(index, addr);
6819+ if (this) {
6820+ if (this->message != message) {
6821+ this->message = message;
6822+ if ((message & MSG_NACK) &&
6823+ (message & (MSG_HIBERNATE | MSG_IMAGE | MSG_IO)))
6824+ set_ignore(index, addr, this);
6825+ PRINTK("Node %d sees node %d.%d.%d.%d now sending "
6826+ "%s.\n", index, NIPQUAD(addr),
6827+ str_message(message));
6828+ wake_up(&node_array[index].member_events);
6829+ }
6830+ return 0;
6831+ }
6832+
6833+ this = (struct cluster_member *) toi_kzalloc(36,
6834+ sizeof(struct cluster_member), GFP_KERNEL);
6835+
6836+ if (!this)
6837+ return -1;
6838+
6839+ this->addr = addr;
6840+ this->message = message;
6841+ this->ignore = 0;
6842+ INIT_LIST_HEAD(&this->list);
6843+
6844+ node_array[index].peer_count++;
6845+
6846+ PRINTK("Node %d sees node %d.%d.%d.%d sending %s.\n", index,
6847+ NIPQUAD(addr), str_message(message));
6848+
6849+ if ((message & MSG_NACK) &&
6850+ (message & (MSG_HIBERNATE | MSG_IMAGE | MSG_IO)))
6851+ set_ignore(index, addr, this);
6852+ list_add_tail(&this->list, &node_array[index].member_list);
6853+ return 1;
6854+}
6855+
6856+static int add_update_member(int index, __be32 addr, int message)
6857+{
6858+ int result;
6859+ unsigned long flags;
6860+ spin_lock_irqsave(&node_array[index].member_list_lock, flags);
6861+ result = __add_update_member(index, addr, message);
6862+ spin_unlock_irqrestore(&node_array[index].member_list_lock, flags);
6863+
6864+ print_member_info(index);
6865+
6866+ wake_up(&node_array[index].member_events);
6867+
6868+ return result;
6869+}
6870+
6871+static void del_member(int index, __be32 addr)
6872+{
6873+ struct cluster_member *this;
6874+ unsigned long flags;
6875+
6876+ spin_lock_irqsave(&node_array[index].member_list_lock, flags);
6877+ this = __find_member(index, addr);
6878+
6879+ if (this) {
6880+ list_del_init(&this->list);
6881+ toi_kfree(36, this);
6882+ node_array[index].peer_count--;
6883+ }
6884+
6885+ spin_unlock_irqrestore(&node_array[index].member_list_lock, flags);
6886+}
6887+
6888+/* === Message transmission === */
6889+
6890+static void toi_send_if(int message, unsigned long my_id);
6891+
6892+/*
6893+ * Process received TOI packet.
6894+ */
6895+static int toi_recv(struct sk_buff *skb, struct net_device *dev,
6896+ struct packet_type *pt, struct net_device *orig_dev)
6897+{
6898+ struct toi_pkt *b;
6899+ struct iphdr *h;
6900+ int len, result, index;
6901+ unsigned long addr, message, ack;
6902+
6903+ /* Perform verifications before taking the lock. */
6904+ if (skb->pkt_type == PACKET_OTHERHOST)
6905+ goto drop;
6906+
6907+ if (dev != net_dev)
6908+ goto drop;
6909+
7f9d2ee0 6910+ skb = skb_share_check(skb, GFP_ATOMIC);
ad8f4a28
AM
6911+ if (!skb)
6912+ return NET_RX_DROP;
6913+
6914+ if (!pskb_may_pull(skb,
6915+ sizeof(struct iphdr) +
6916+ sizeof(struct udphdr)))
6917+ goto drop;
6918+
6919+ b = (struct toi_pkt *)skb_network_header(skb);
6920+ h = &b->iph;
6921+
6922+ if (h->ihl != 5 || h->version != 4 || h->protocol != IPPROTO_UDP)
6923+ goto drop;
6924+
6925+ /* Fragments are not supported */
6926+ if (h->frag_off & htons(IP_OFFSET | IP_MF)) {
6927+ if (net_ratelimit())
6928+ printk(KERN_ERR "TuxOnIce: Ignoring fragmented "
6929+ "cluster message.\n");
6930+ goto drop;
6931+ }
6932+
6933+ if (skb->len < ntohs(h->tot_len))
6934+ goto drop;
6935+
6936+ if (ip_fast_csum((char *) h, h->ihl))
6937+ goto drop;
6938+
6939+ if (b->udph.source != htons(toi_cluster_port_send) ||
6940+ b->udph.dest != htons(toi_cluster_port_recv))
6941+ goto drop;
6942+
6943+ if (ntohs(h->tot_len) < ntohs(b->udph.len) + sizeof(struct iphdr))
6944+ goto drop;
6945+
6946+ len = ntohs(b->udph.len) - sizeof(struct udphdr);
6947+
6948+ /* Ok the front looks good, make sure we can get at the rest. */
6949+ if (!pskb_may_pull(skb, skb->len))
6950+ goto drop;
6951+
6952+ b = (struct toi_pkt *)skb_network_header(skb);
6953+ h = &b->iph;
6954+
6955+ addr = SADDR;
6956+ PRINTK(">>> Message %s received from " NIPQUAD_FMT ".\n",
6957+ str_message(b->message), NIPQUAD(addr));
6958+
6959+ message = b->message & MSG_STATE_MASK;
6960+ ack = b->message & MSG_ACK_MASK;
6961+
6962+ for (index = 0; index < num_local_nodes; index++) {
6963+ int new_message = node_array[index].current_message,
6964+ old_message = new_message;
6965+
6966+ if (index == SADDR || !old_message) {
6967+ PRINTK("Ignoring node %d (offline or self).\n", index);
6968+ continue;
6969+ }
6970+
6971+ /* One message at a time, please. */
6972+ spin_lock(&node_array[index].receive_lock);
6973+
6974+ result = add_update_member(index, SADDR, b->message);
6975+ if (result == -1) {
6976+ printk(KERN_INFO "Failed to add new cluster member "
6977+ NIPQUAD_FMT ".\n",
6978+ NIPQUAD(addr));
6979+ goto drop_unlock;
6980+ }
6981+
6982+ switch (b->message & MSG_STATE_MASK) {
6983+ case MSG_PING:
6984+ break;
6985+ case MSG_ABORT:
6986+ break;
6987+ case MSG_BYE:
6988+ break;
6989+ case MSG_HIBERNATE:
6990+ /* Can I hibernate? */
6991+ new_message = MSG_HIBERNATE |
6992+ ((index & 1) ? MSG_NACK : MSG_ACK);
6993+ break;
6994+ case MSG_IMAGE:
6995+ /* Can I resume? */
6996+ new_message = MSG_IMAGE |
6997+ ((index & 1) ? MSG_NACK : MSG_ACK);
6998+ if (new_message != old_message)
6999+ printk("Setting whether I can resume to %d.\n",
7000+ new_message);
7001+ break;
7002+ case MSG_IO:
7003+ new_message = MSG_IO | MSG_ACK;
7004+ break;
7005+ case MSG_RUNNING:
7006+ break;
7007+ default:
7008+ if (net_ratelimit())
7009+ printk(KERN_ERR "Unrecognised TuxOnIce cluster"
7010+ " message %d from " NIPQUAD_FMT ".\n",
7011+ b->message, NIPQUAD(addr));
7012+ };
7013+
7014+ if (old_message != new_message) {
7015+ node_array[index].current_message = new_message;
7016+ printk(KERN_INFO ">>> Sending new message for node "
7017+ "%d.\n", index);
7018+ toi_send_if(new_message, index);
7019+ } else if (!ack) {
7020+ printk(KERN_INFO ">>> Resending message for node %d.\n",
7021+ index);
7022+ toi_send_if(new_message, index);
7023+ }
7024+drop_unlock:
7025+ spin_unlock(&node_array[index].receive_lock);
7026+ };
7027+
7028+drop:
7029+ /* Throw the packet out. */
7030+ kfree_skb(skb);
7031+
7032+ return 0;
7033+}
7034+
7035+/*
7036+ * Send cluster message to single interface.
7037+ */
7038+static void toi_send_if(int message, unsigned long my_id)
7039+{
7040+ struct sk_buff *skb;
7041+ struct toi_pkt *b;
7042+ int hh_len = LL_RESERVED_SPACE(net_dev);
7043+ struct iphdr *h;
7044+
7045+ /* Allocate packet */
7046+ skb = alloc_skb(sizeof(struct toi_pkt) + hh_len + 15, GFP_KERNEL);
7047+ if (!skb)
7048+ return;
7049+ skb_reserve(skb, hh_len);
7050+ b = (struct toi_pkt *) skb_put(skb, sizeof(struct toi_pkt));
7051+ memset(b, 0, sizeof(struct toi_pkt));
7052+
7053+ /* Construct IP header */
7054+ skb_reset_network_header(skb);
7055+ h = ip_hdr(skb);
7056+ h->version = 4;
7057+ h->ihl = 5;
7058+ h->tot_len = htons(sizeof(struct toi_pkt));
7059+ h->frag_off = htons(IP_DF);
7060+ h->ttl = 64;
7061+ h->protocol = IPPROTO_UDP;
7062+ h->daddr = htonl(INADDR_BROADCAST);
7063+ h->check = ip_fast_csum((unsigned char *) h, h->ihl);
7064+
7065+ /* Construct UDP header */
7066+ b->udph.source = htons(toi_cluster_port_send);
7067+ b->udph.dest = htons(toi_cluster_port_recv);
7068+ b->udph.len = htons(sizeof(struct toi_pkt) - sizeof(struct iphdr));
7069+ /* UDP checksum not calculated -- explicitly allowed in BOOTP RFC */
7070+
7071+ /* Construct message */
7072+ b->message = message;
7073+ b->sid = my_id;
7074+ b->htype = net_dev->type; /* can cause undefined behavior */
7075+ b->hlen = net_dev->addr_len;
7076+ memcpy(b->hw_addr, net_dev->dev_addr, net_dev->addr_len);
7077+ b->secs = htons(3); /* 3 seconds */
7078+
7079+ /* Chain packet down the line... */
7080+ skb->dev = net_dev;
7081+ skb->protocol = htons(ETH_P_IP);
7f9d2ee0 7082+ if ((dev_hard_header(skb, net_dev, ntohs(skb->protocol),
ad8f4a28
AM
7083+ net_dev->broadcast, net_dev->dev_addr, skb->len) < 0) ||
7084+ dev_queue_xmit(skb) < 0)
7085+ printk(KERN_INFO "E");
7086+}
7087+
7088+/* ========================================= */
7089+
7090+/* kTOICluster */
7091+
7092+static atomic_t num_cluster_threads;
7093+static DECLARE_WAIT_QUEUE_HEAD(clusterd_events);
7094+
7095+static int kTOICluster(void *data)
7096+{
7097+ unsigned long my_id;
7098+
7099+ my_id = atomic_add_return(1, &num_cluster_threads) - 1;
7100+ node_array[my_id].current_message = (unsigned long) data;
7101+
7102+ PRINTK("kTOICluster daemon %lu starting.\n", my_id);
7103+
7104+ current->flags |= PF_NOFREEZE;
7105+
7106+ while (node_array[my_id].current_message) {
7107+ toi_send_if(node_array[my_id].current_message, my_id);
7108+ sleep_on_timeout(&clusterd_events,
7109+ cluster_message_timeout);
7110+ PRINTK("Link state %lu is %d.\n", my_id,
7111+ node_array[my_id].current_message);
7112+ }
7113+
7114+ toi_send_if(MSG_BYE, my_id);
7115+ atomic_dec(&num_cluster_threads);
7116+ wake_up(&clusterd_events);
7117+
7118+ PRINTK("kTOICluster daemon %lu exiting.\n", my_id);
7119+ __set_current_state(TASK_RUNNING);
7120+ return 0;
7121+}
7122+
7123+static void kill_clusterd(void)
7124+{
7125+ int i;
7126+
7127+ for (i = 0; i < num_local_nodes; i++) {
7128+ if (node_array[i].current_message) {
7129+ PRINTK("Seeking to kill clusterd %d.\n", i);
7130+ node_array[i].current_message = 0;
7131+ }
7132+ }
7133+ wait_event(clusterd_events,
7134+ !atomic_read(&num_cluster_threads));
7135+ PRINTK("All cluster daemons have exited.\n");
7136+}
7137+
7138+static int peers_not_in_message(int index, int message, int precise)
7139+{
7140+ struct cluster_member *this;
7141+ unsigned long flags;
7142+ int result = 0;
7143+
7144+ spin_lock_irqsave(&node_array[index].member_list_lock, flags);
7145+ list_for_each_entry(this, &node_array[index].member_list, list) {
7146+ if (this->ignore)
7147+ continue;
7148+
7149+ PRINTK("Peer %d.%d.%d.%d sending %s. "
7150+ "Seeking %s.\n",
7151+ NIPQUAD(this->addr),
7152+ str_message(this->message), str_message(message));
7153+ if ((precise ? this->message :
7154+ this->message & MSG_STATE_MASK) !=
7155+ message)
7156+ result++;
7157+ }
7158+ spin_unlock_irqrestore(&node_array[index].member_list_lock, flags);
7159+ PRINTK("%d peers in sought message.\n", result);
7160+ return result;
7161+}
7162+
7163+static void reset_ignored(int index)
7164+{
7165+ struct cluster_member *this;
7166+ unsigned long flags;
7167+
7168+ spin_lock_irqsave(&node_array[index].member_list_lock, flags);
7169+ list_for_each_entry(this, &node_array[index].member_list, list)
7170+ this->ignore = 0;
7171+ node_array[index].ignored_peer_count = 0;
7172+ spin_unlock_irqrestore(&node_array[index].member_list_lock, flags);
7173+}
7174+
7175+static int peers_in_message(int index, int message, int precise)
7176+{
7177+ return node_array[index].peer_count -
7178+ node_array[index].ignored_peer_count -
7179+ peers_not_in_message(index, message, precise);
7180+}
7181+
7182+static int time_to_continue(int index, unsigned long start, int message)
7183+{
7184+ int first = peers_not_in_message(index, message, 0);
7185+ int second = peers_in_message(index, message, 1);
7186+
7187+ PRINTK("First part returns %d, second returns %d.\n", first, second);
7188+
7189+ if (!first && !second) {
7190+ PRINTK("All peers answered message %d.\n",
7191+ message);
7192+ return 1;
7193+ }
7194+
7195+ if (time_after(jiffies, start + continue_delay)) {
7196+ PRINTK("Timeout reached.\n");
7197+ return 1;
7198+ }
7199+
7200+ PRINTK("Not time to continue yet (%lu < %lu).\n", jiffies,
7201+ start + continue_delay);
7202+ return 0;
7203+}
7204+
7205+void toi_initiate_cluster_hibernate(void)
7206+{
7207+ int result;
7208+ unsigned long start;
7209+
7210+ result = do_toi_step(STEP_HIBERNATE_PREPARE_IMAGE);
7211+ if (result)
7212+ return;
7213+
7214+ toi_send_if(MSG_HIBERNATE, 0);
7215+
7216+ start = jiffies;
7217+ wait_event(node_array[0].member_events,
7218+ time_to_continue(0, start, MSG_HIBERNATE));
7219+
7220+ if (test_action_state(TOI_FREEZER_TEST)) {
7221+ toi_send_if(MSG_ABORT, 0);
7222+
7223+ start = jiffies;
7224+ wait_event(node_array[0].member_events,
7225+ time_to_continue(0, start, MSG_RUNNING));
7226+
7227+ do_toi_step(STEP_QUIET_CLEANUP);
7228+ return;
7229+ }
7230+
7231+ toi_send_if(MSG_IO, 0);
7232+
7233+ result = do_toi_step(STEP_HIBERNATE_SAVE_IMAGE);
7234+ if (result)
7235+ return;
7236+
7237+ /* This code runs at resume time too! */
7238+ if (toi_in_hibernate)
7239+ result = do_toi_step(STEP_HIBERNATE_POWERDOWN);
7240+}
7241+EXPORT_SYMBOL_GPL(toi_initiate_cluster_hibernate);
7242+
4e97e4e9 7243+/* toi_cluster_print_debug_stats
7244+ *
7245+ * Description: Print information to be recorded for debugging purposes into a
7246+ * buffer.
7247+ * Arguments: buffer: Pointer to a buffer into which the debug info will be
7248+ * printed.
7249+ * size: Size of the buffer.
7250+ * Returns: Number of characters written to the buffer.
7251+ */
7252+static int toi_cluster_print_debug_stats(char *buffer, int size)
7253+{
7254+ int len;
ad8f4a28
AM
7255+
7256+ if (strlen(toi_cluster_iface))
7257+ len = snprintf_used(buffer, size,
7258+ "- Cluster interface is '%s'.\n",
7259+ toi_cluster_iface);
4e97e4e9 7260+ else
ad8f4a28
AM
7261+ len = snprintf_used(buffer, size,
7262+ "- Cluster support is disabled.\n");
4e97e4e9 7263+ return len;
7264+}
7265+
7266+/* cluster_memory_needed
7267+ *
7268+ * Description: Tell the caller how much memory we need to operate during
7269+ * hibernate/resume.
7270+ * Returns: Unsigned long. Maximum number of bytes of memory required for
7271+ * operation.
7272+ */
7273+static int toi_cluster_memory_needed(void)
7274+{
24613191 7275+ return 0;
7276+}
7277+
4e97e4e9 7278+static int toi_cluster_storage_needed(void)
7279+{
ad8f4a28 7280+ return 1 + strlen(toi_cluster_iface);
4e97e4e9 7281+}
ad8f4a28 7282+
4e97e4e9 7283+/* toi_cluster_save_config_info
24613191 7284+ *
4e97e4e9 7285+ * Description: Save informaton needed when reloading the image at resume time.
7286+ * Arguments: Buffer: Pointer to a buffer of size PAGE_SIZE.
7287+ * Returns: Number of bytes used for saving our data.
24613191 7288+ */
4e97e4e9 7289+static int toi_cluster_save_config_info(char *buffer)
7290+{
ad8f4a28
AM
7291+ strcpy(buffer, toi_cluster_iface);
7292+ return strlen(toi_cluster_iface + 1);
4e97e4e9 7293+}
24613191 7294+
4e97e4e9 7295+/* toi_cluster_load_config_info
7296+ *
ad8f4a28 7297+ * Description: Reload information needed for declustering the image at
4e97e4e9 7298+ * resume time.
7299+ * Arguments: Buffer: Pointer to the start of the data.
7300+ * Size: Number of bytes that were saved.
7301+ */
7302+static void toi_cluster_load_config_info(char *buffer, int size)
24613191 7303+{
ad8f4a28 7304+ strncpy(toi_cluster_iface, buffer, size);
4e97e4e9 7305+ return;
7306+}
24613191 7307+
ad8f4a28
AM
7308+static void cluster_startup(void)
7309+{
7310+ int have_image = do_check_can_resume(), i;
7311+ unsigned long start = jiffies, initial_message;
7312+ struct task_struct *p;
7313+
7314+ initial_message = MSG_IMAGE;
7315+
7316+ have_image = 1;
7317+
7318+ for (i = 0; i < num_local_nodes; i++) {
7319+ PRINTK("Starting ktoiclusterd %d.\n", i);
7320+ p = kthread_create(kTOICluster, (void *) initial_message,
7321+ "ktoiclusterd/%d", i);
7322+ if (IS_ERR(p)) {
7323+ printk("Failed to start ktoiclusterd.\n");
7324+ return;
7325+ }
7326+
7327+ wake_up_process(p);
7328+ }
7329+
7330+ /* Wait for delay or someone else sending first message */
7331+ wait_event(node_array[0].member_events, time_to_continue(0, start,
7332+ MSG_IMAGE));
7333+
7334+ others_have_image = peers_in_message(0, MSG_IMAGE | MSG_ACK, 1);
7335+
7336+ printk(KERN_INFO "Continuing. I %shave an image. Peers with image:"
7337+ " %d.\n", have_image ? "" : "don't ", others_have_image);
7338+
7339+ if (have_image) {
7340+ int result;
7341+
7342+ /* Start to resume */
7343+ printk(KERN_INFO " === Starting to resume === \n");
7344+ node_array[0].current_message = MSG_IO;
7345+ toi_send_if(MSG_IO, 0);
7346+
7347+ /* result = do_toi_step(STEP_RESUME_LOAD_PS1); */
7348+ result = 0;
7349+
7350+ if (!result) {
7351+ /*
7352+ * Atomic restore - we'll come back in the hibernation
7353+ * path.
7354+ */
7355+
7356+ /* result = do_toi_step(STEP_RESUME_DO_RESTORE); */
7357+ result = 0;
7358+
7359+ /* do_toi_step(STEP_QUIET_CLEANUP); */
7360+ }
7361+
7362+ node_array[0].current_message |= MSG_NACK;
7363+
7364+ /* For debugging - disable for real life? */
7365+ wait_event(node_array[0].member_events,
7366+ time_to_continue(0, start, MSG_IO));
7367+ }
7368+
7369+ if (others_have_image) {
7370+ /* Wait for them to resume */
7371+ printk(KERN_INFO "Waiting for other nodes to resume.\n");
7372+ start = jiffies;
7373+ wait_event(node_array[0].member_events,
7374+ time_to_continue(0, start, MSG_RUNNING));
7375+ if (peers_not_in_message(0, MSG_RUNNING, 0))
7376+ printk(KERN_INFO "Timed out while waiting for other "
7377+ "nodes to resume.\n");
7378+ }
7379+
7380+ /* Find out whether an image exists here. Send ACK_IMAGE or NACK_IMAGE
7381+ * as appropriate.
7382+ *
7383+ * If we don't have an image:
7384+ * - Wait until someone else says they have one, or conditions are met
7385+ * for continuing to boot (n machines or t seconds).
7386+ * - If anyone has an image, wait for them to resume before continuing
7387+ * to boot.
7388+ *
7389+ * If we have an image:
7390+ * - Wait until conditions are met before continuing to resume (n
7391+ * machines or t seconds). Send RESUME_PREP and freeze processes.
7392+ * NACK_PREP if freezing fails (shouldn't) and follow logic for
7393+ * us having no image above. On success, wait for [N]ACK_PREP from
7394+ * other machines. Read image (including atomic restore) until done.
7395+ * Wait for ACK_READ from others (should never fail). Thaw processes
7396+ * and do post-resume. (The section after the atomic restore is done
7397+ * via the code for hibernating).
7398+ */
7399+
7400+ node_array[0].current_message = MSG_RUNNING;
7401+}
7402+
7403+/* toi_cluster_open_iface
7404+ *
7405+ * Description: Prepare to use an interface.
7406+ */
7407+
7408+static int toi_cluster_open_iface(void)
7409+{
7410+ struct net_device *dev;
7411+
7412+ rtnl_lock();
7413+
7f9d2ee0 7414+ for_each_netdev(&init_net, dev) {
7415+ if (/* dev == &init_net.loopback_dev || */
ad8f4a28
AM
7416+ strcmp(dev->name, toi_cluster_iface))
7417+ continue;
7418+
7419+ net_dev = dev;
7420+ break;
7421+ }
7422+
7423+ rtnl_unlock();
7424+
7425+ if (!net_dev) {
7426+ printk(KERN_ERR MYNAME ": Device %s not found.\n",
7427+ toi_cluster_iface);
7428+ return -ENODEV;
7429+ }
7430+
7431+ dev_add_pack(&toi_cluster_packet_type);
7432+ added_pack = 1;
7433+
7f9d2ee0 7434+ loopback_mode = (net_dev == init_net.loopback_dev);
ad8f4a28
AM
7435+ num_local_nodes = loopback_mode ? 8 : 1;
7436+
7437+ PRINTK("Loopback mode is %s. Number of local nodes is %d.\n",
7438+ loopback_mode ? "on" : "off", num_local_nodes);
7439+
7440+ cluster_startup();
7441+ return 0;
7442+}
7443+
7444+/* toi_cluster_close_iface
7445+ *
7446+ * Description: Stop using an interface.
7447+ */
7448+
7449+static int toi_cluster_close_iface(void)
7450+{
7451+ kill_clusterd();
7452+ if (added_pack) {
7453+ dev_remove_pack(&toi_cluster_packet_type);
7454+ added_pack = 0;
7455+ }
7456+ return 0;
7457+}
7458+
7459+static void write_side_effect(void)
7460+{
7461+ if (toi_cluster_ops.enabled) {
7462+ toi_cluster_open_iface();
7463+ set_toi_state(TOI_CLUSTER_MODE);
7464+ } else {
7465+ toi_cluster_close_iface();
7466+ clear_toi_state(TOI_CLUSTER_MODE);
7467+ }
7468+}
7469+
7470+static void node_write_side_effect(void)
7471+{
7472+}
7473+
7474+/*
7475+ * data for our sysfs entries.
7476+ */
7477+static struct toi_sysfs_data sysfs_params[] = {
7478+ {
7479+ TOI_ATTR("interface", SYSFS_RW),
7480+ SYSFS_STRING(toi_cluster_iface, IFNAMSIZ, 0)
7481+ },
7482+
7483+ {
7484+ TOI_ATTR("enabled", SYSFS_RW),
7485+ SYSFS_INT(&toi_cluster_ops.enabled, 0, 1, 0),
7486+ .write_side_effect = write_side_effect,
7487+ },
7488+
7489+ {
7490+ TOI_ATTR("cluster_name", SYSFS_RW),
7491+ SYSFS_STRING(toi_cluster_key, 32, 0)
7492+ },
7493+
4e97e4e9 7494+ {
ad8f4a28
AM
7495+ TOI_ATTR("pre-hibernate-script", SYSFS_RW),
7496+ SYSFS_STRING(pre_hibernate_script, 256, 0)
4e97e4e9 7497+ },
7498+
7499+ {
ad8f4a28
AM
7500+ TOI_ATTR("post-hibernate-script", SYSFS_RW),
7501+ SYSFS_STRING(post_hibernate_script, 256, 0)
7502+ },
7503+
7504+ {
7505+ TOI_ATTR("continue_delay", SYSFS_RW),
7506+ SYSFS_UL(&continue_delay, HZ / 2, 60 * HZ, 0)
24613191 7507+ }
4e97e4e9 7508+};
7509+
7510+/*
7511+ * Ops structure.
7512+ */
7513+
7514+static struct toi_module_ops toi_cluster_ops = {
7515+ .type = FILTER_MODULE,
7516+ .name = "Cluster",
7517+ .directory = "cluster",
7518+ .module = THIS_MODULE,
7519+ .memory_needed = toi_cluster_memory_needed,
7520+ .print_debug_info = toi_cluster_print_debug_stats,
7521+ .save_config_info = toi_cluster_save_config_info,
7522+ .load_config_info = toi_cluster_load_config_info,
7523+ .storage_needed = toi_cluster_storage_needed,
ad8f4a28 7524+
4e97e4e9 7525+ .sysfs_data = sysfs_params,
ad8f4a28
AM
7526+ .num_sysfs_entries = sizeof(sysfs_params) /
7527+ sizeof(struct toi_sysfs_data),
4e97e4e9 7528+};
24613191 7529+
4e97e4e9 7530+/* ---- Registration ---- */
24613191 7531+
4e97e4e9 7532+#ifdef MODULE
4e97e4e9 7533+#define INIT static __init
7534+#define EXIT static __exit
7535+#else
7536+#define INIT
7537+#define EXIT
7538+#endif
24613191 7539+
4e97e4e9 7540+INIT int toi_cluster_init(void)
7541+{
ad8f4a28
AM
7542+ int temp = toi_register_module(&toi_cluster_ops), i;
7543+ struct kobject *kobj = toi_cluster_ops.dir_kobj;
7544+
7545+ for (i = 0; i < MAX_LOCAL_NODES; i++) {
7546+ node_array[i].current_message = 0;
7547+ INIT_LIST_HEAD(&node_array[i].member_list);
7548+ init_waitqueue_head(&node_array[i].member_events);
7549+ spin_lock_init(&node_array[i].member_list_lock);
7550+ spin_lock_init(&node_array[i].receive_lock);
7551+
7552+ /* Set up sysfs entry */
7f9d2ee0 7553+ node_array[i].sysfs_data.attr.name = toi_kzalloc(8,
7554+ sizeof(node_array[i].sysfs_data.attr.name),
7555+ GFP_KERNEL);
ad8f4a28
AM
7556+ sprintf((char *) node_array[i].sysfs_data.attr.name, "node_%d",
7557+ i);
7558+ node_array[i].sysfs_data.attr.mode = SYSFS_RW;
7559+ node_array[i].sysfs_data.type = TOI_SYSFS_DATA_INTEGER;
7560+ node_array[i].sysfs_data.flags = 0;
7561+ node_array[i].sysfs_data.data.integer.variable =
7f9d2ee0 7562+ (int *) &node_array[i].current_message;
ad8f4a28
AM
7563+ node_array[i].sysfs_data.data.integer.minimum = 0;
7564+ node_array[i].sysfs_data.data.integer.maximum = INT_MAX;
7565+ node_array[i].sysfs_data.write_side_effect =
7566+ node_write_side_effect;
7567+ toi_register_sysfs_file(kobj, &node_array[i].sysfs_data);
7568+ }
7569+
7570+ toi_cluster_ops.enabled = (strlen(toi_cluster_iface) > 0);
24613191 7571+
ad8f4a28
AM
7572+ if (toi_cluster_ops.enabled)
7573+ toi_cluster_open_iface();
7574+
7575+ return temp;
4e97e4e9 7576+}
24613191 7577+
4e97e4e9 7578+EXIT void toi_cluster_exit(void)
7579+{
ad8f4a28
AM
7580+ int i;
7581+ toi_cluster_close_iface();
7582+
7583+ for (i = 0; i < MAX_LOCAL_NODES; i++)
7584+ toi_unregister_sysfs_file(toi_cluster_ops.dir_kobj,
7585+ &node_array[i].sysfs_data);
4e97e4e9 7586+ toi_unregister_module(&toi_cluster_ops);
24613191 7587+}
7588+
ad8f4a28
AM
7589+static int __init toi_cluster_iface_setup(char *iface)
7590+{
7591+ toi_cluster_ops.enabled = (*iface &&
7592+ strcmp(iface, "off"));
7593+
7594+ if (toi_cluster_ops.enabled)
7595+ strncpy(toi_cluster_iface, iface, strlen(iface));
7596+}
7597+
7598+__setup("toi_cluster=", toi_cluster_iface_setup);
7599+
4e97e4e9 7600+#ifdef MODULE
7601+MODULE_LICENSE("GPL");
7602+module_init(toi_cluster_init);
7603+module_exit(toi_cluster_exit);
7604+MODULE_AUTHOR("Nigel Cunningham");
7605+MODULE_DESCRIPTION("Cluster Support for TuxOnIce");
7606+#endif
7607diff --git a/kernel/power/tuxonice_cluster.h b/kernel/power/tuxonice_cluster.h
7608new file mode 100644
ad8f4a28 7609index 0000000..cd9ee3a
4e97e4e9 7610--- /dev/null
7611+++ b/kernel/power/tuxonice_cluster.h
ad8f4a28 7612@@ -0,0 +1,19 @@
4e97e4e9 7613+/*
7614+ * kernel/power/tuxonice_cluster.h
24613191 7615+ *
4e97e4e9 7616+ * Copyright (C) 2006-2007 Nigel Cunningham (nigel at tuxonice net)
7617+ * Copyright (C) 2006 Red Hat, inc.
7618+ *
7619+ * This file is released under the GPLv2.
24613191 7620+ */
7621+
4e97e4e9 7622+#ifdef CONFIG_TOI_CLUSTER
7623+extern int toi_cluster_init(void);
7624+extern void toi_cluster_exit(void);
ad8f4a28 7625+extern void toi_initiate_cluster_hibernate(void);
4e97e4e9 7626+#else
7627+static inline int toi_cluster_init(void) { return 0; }
7628+static inline void toi_cluster_exit(void) { }
ad8f4a28 7629+static inline void toi_initiate_cluster_hibernate(void) { }
4e97e4e9 7630+#endif
24613191 7631+
4e97e4e9 7632diff --git a/kernel/power/tuxonice_compress.c b/kernel/power/tuxonice_compress.c
7633new file mode 100644
7f9d2ee0 7634index 0000000..9925485
4e97e4e9 7635--- /dev/null
7636+++ b/kernel/power/tuxonice_compress.c
7f9d2ee0 7637@@ -0,0 +1,436 @@
4e97e4e9 7638+/*
7639+ * kernel/power/compression.c
7640+ *
7641+ * Copyright (C) 2003-2007 Nigel Cunningham (nigel at tuxonice net)
7642+ *
7643+ * This file is released under the GPLv2.
7644+ *
7645+ * This file contains data compression routines for TuxOnIce,
7646+ * using cryptoapi.
7647+ */
24613191 7648+
4e97e4e9 7649+#include <linux/module.h>
7650+#include <linux/suspend.h>
7651+#include <linux/highmem.h>
7652+#include <linux/vmalloc.h>
7653+#include <linux/crypto.h>
24613191 7654+
4e97e4e9 7655+#include "tuxonice_builtin.h"
7656+#include "tuxonice.h"
7657+#include "tuxonice_modules.h"
7658+#include "tuxonice_sysfs.h"
7659+#include "tuxonice_io.h"
7660+#include "tuxonice_ui.h"
ad8f4a28 7661+#include "tuxonice_alloc.h"
24613191 7662+
ad8f4a28 7663+static int toi_expected_compression;
24613191 7664+
4e97e4e9 7665+static struct toi_module_ops toi_compression_ops;
7666+static struct toi_module_ops *next_driver;
24613191 7667+
4e97e4e9 7668+static char toi_compressor_name[32] = "lzf";
24613191 7669+
4e97e4e9 7670+static DEFINE_MUTEX(stats_lock);
24613191 7671+
4e97e4e9 7672+struct cpu_context {
ad8f4a28 7673+ u8 *page_buffer;
4e97e4e9 7674+ struct crypto_comp *transform;
7675+ unsigned int len;
7676+ char *buffer_start;
7677+};
24613191 7678+
4e97e4e9 7679+static DEFINE_PER_CPU(struct cpu_context, contexts);
24613191 7680+
4e97e4e9 7681+static int toi_compress_prepare_result;
24613191 7682+
ad8f4a28 7683+/*
4e97e4e9 7684+ * toi_compress_cleanup
24613191 7685+ *
4e97e4e9 7686+ * Frees memory allocated for our labours.
24613191 7687+ */
4e97e4e9 7688+static void toi_compress_cleanup(int toi_or_resume)
24613191 7689+{
4e97e4e9 7690+ int cpu;
24613191 7691+
4e97e4e9 7692+ if (!toi_or_resume)
7693+ return;
24613191 7694+
4e97e4e9 7695+ for_each_online_cpu(cpu) {
7696+ struct cpu_context *this = &per_cpu(contexts, cpu);
7697+ if (this->transform) {
7698+ crypto_free_comp(this->transform);
7699+ this->transform = NULL;
7700+ }
24613191 7701+
4e97e4e9 7702+ if (this->page_buffer)
ad8f4a28 7703+ toi_free_page(16, (unsigned long) this->page_buffer);
24613191 7704+
4e97e4e9 7705+ this->page_buffer = NULL;
24613191 7706+ }
24613191 7707+}
7708+
ad8f4a28 7709+/*
4e97e4e9 7710+ * toi_crypto_prepare
24613191 7711+ *
4e97e4e9 7712+ * Prepare to do some work by allocating buffers and transforms.
24613191 7713+ */
4e97e4e9 7714+static int toi_compress_crypto_prepare(void)
7715+{
7716+ int cpu;
24613191 7717+
4e97e4e9 7718+ if (!*toi_compressor_name) {
ad8f4a28
AM
7719+ printk(KERN_INFO "TuxOnIce: Compression enabled but no "
7720+ "compressor name set.\n");
4e97e4e9 7721+ return 1;
24613191 7722+ }
7723+
4e97e4e9 7724+ for_each_online_cpu(cpu) {
7725+ struct cpu_context *this = &per_cpu(contexts, cpu);
7726+ this->transform = crypto_alloc_comp(toi_compressor_name, 0, 0);
7727+ if (IS_ERR(this->transform)) {
ad8f4a28
AM
7728+ printk(KERN_INFO "TuxOnIce: Failed to initialise the "
7729+ "%s compression transform.\n",
4e97e4e9 7730+ toi_compressor_name);
7731+ this->transform = NULL;
7732+ return 1;
24613191 7733+ }
24613191 7734+
4e97e4e9 7735+ this->page_buffer =
7736+ (char *) toi_get_zeroed_page(16, TOI_ATOMIC_GFP);
ad8f4a28 7737+
4e97e4e9 7738+ if (!this->page_buffer) {
7739+ printk(KERN_ERR
7740+ "Failed to allocate a page buffer for TuxOnIce "
7741+ "encryption driver.\n");
7742+ return -ENOMEM;
7743+ }
24613191 7744+ }
7745+
4e97e4e9 7746+ return 0;
7747+}
24613191 7748+
4e97e4e9 7749+/*
7750+ * toi_compress_init
7751+ */
24613191 7752+
4e97e4e9 7753+static int toi_compress_init(int toi_or_resume)
7754+{
7755+ if (!toi_or_resume)
7756+ return 0;
24613191 7757+
4e97e4e9 7758+ toi_compress_bytes_in = toi_compress_bytes_out = 0;
24613191 7759+
4e97e4e9 7760+ next_driver = toi_get_next_filter(&toi_compression_ops);
e8d0ad9d 7761+
ad8f4a28 7762+ if (!next_driver)
4e97e4e9 7763+ return -ECHILD;
24613191 7764+
4e97e4e9 7765+ toi_compress_prepare_result = toi_compress_crypto_prepare();
24613191 7766+
4e97e4e9 7767+ return 0;
7768+}
e8d0ad9d 7769+
4e97e4e9 7770+/*
7771+ * toi_compress_rw_init()
7772+ */
24613191 7773+
4e97e4e9 7774+int toi_compress_rw_init(int rw, int stream_number)
7775+{
7776+ if (toi_compress_prepare_result) {
7777+ printk("Failed to initialise compression algorithm.\n");
7778+ if (rw == READ)
7779+ return -ENODEV;
7780+ else
7781+ toi_compression_ops.enabled = 0;
24613191 7782+ }
7783+
4e97e4e9 7784+ return 0;
7785+}
24613191 7786+
ad8f4a28 7787+/*
4e97e4e9 7788+ * toi_compress_write_page()
7789+ *
7790+ * Compress a page of data, buffering output and passing on filled
7791+ * pages to the next module in the pipeline.
ad8f4a28 7792+ *
4e97e4e9 7793+ * Buffer_page: Pointer to a buffer of size PAGE_SIZE, containing
7794+ * data to be compressed.
7795+ *
7796+ * Returns: 0 on success. Otherwise the error is that returned by later
7797+ * modules, -ECHILD if we have a broken pipeline or -EIO if
7798+ * zlib errs.
7799+ */
7800+static int toi_compress_write_page(unsigned long index,
7801+ struct page *buffer_page, unsigned int buf_size)
7802+{
7803+ int ret, cpu = smp_processor_id();
7804+ struct cpu_context *ctx = &per_cpu(contexts, cpu);
ad8f4a28 7805+
4e97e4e9 7806+ if (!ctx->transform)
7807+ return next_driver->write_page(index, buffer_page, buf_size);
24613191 7808+
4e97e4e9 7809+ ctx->buffer_start = kmap(buffer_page);
24613191 7810+
4e97e4e9 7811+ ctx->len = buf_size;
24613191 7812+
4e97e4e9 7813+ ret = crypto_comp_compress(ctx->transform,
7814+ ctx->buffer_start, buf_size,
7815+ ctx->page_buffer, &ctx->len);
ad8f4a28 7816+
4e97e4e9 7817+ kunmap(buffer_page);
24613191 7818+
4e97e4e9 7819+ if (ret) {
ad8f4a28 7820+ printk(KERN_INFO "Compression failed.\n");
4e97e4e9 7821+ goto failure;
7822+ }
ad8f4a28 7823+
4e97e4e9 7824+ mutex_lock(&stats_lock);
7825+ toi_compress_bytes_in += buf_size;
7826+ toi_compress_bytes_out += ctx->len;
7827+ mutex_unlock(&stats_lock);
24613191 7828+
4e97e4e9 7829+ if (ctx->len < buf_size) /* some compression */
7830+ ret = next_driver->write_page(index,
7831+ virt_to_page(ctx->page_buffer),
7832+ ctx->len);
7833+ else
7834+ ret = next_driver->write_page(index, buffer_page, buf_size);
24613191 7835+
4e97e4e9 7836+failure:
7837+ return ret;
24613191 7838+}
7839+
ad8f4a28 7840+/*
4e97e4e9 7841+ * toi_compress_read_page()
7842+ * @buffer_page: struct page *. Pointer to a buffer of size PAGE_SIZE.
24613191 7843+ *
4e97e4e9 7844+ * Retrieve data from later modules and decompress it until the input buffer
7845+ * is filled.
7846+ * Zero if successful. Error condition from me or from downstream on failure.
24613191 7847+ */
4e97e4e9 7848+static int toi_compress_read_page(unsigned long *index,
7849+ struct page *buffer_page, unsigned int *buf_size)
24613191 7850+{
ad8f4a28 7851+ int ret, cpu = smp_processor_id();
4e97e4e9 7852+ unsigned int len;
7853+ unsigned int outlen = PAGE_SIZE;
7854+ char *buffer_start;
7855+ struct cpu_context *ctx = &per_cpu(contexts, cpu);
24613191 7856+
4e97e4e9 7857+ if (!ctx->transform)
7858+ return next_driver->read_page(index, buffer_page, buf_size);
24613191 7859+
ad8f4a28 7860+ /*
4e97e4e9 7861+ * All our reads must be synchronous - we can't decompress
7862+ * data that hasn't been read yet.
7863+ */
24613191 7864+
4e97e4e9 7865+ *buf_size = PAGE_SIZE;
24613191 7866+
4e97e4e9 7867+ ret = next_driver->read_page(index, buffer_page, &len);
24613191 7868+
4e97e4e9 7869+ /* Error or uncompressed data */
7870+ if (ret || len == PAGE_SIZE)
7871+ return ret;
24613191 7872+
4e97e4e9 7873+ buffer_start = kmap(buffer_page);
7874+ memcpy(ctx->page_buffer, buffer_start, len);
7875+ ret = crypto_comp_decompress(
7876+ ctx->transform,
7877+ ctx->page_buffer,
7878+ len, buffer_start, &outlen);
7879+ if (ret)
7880+ abort_hibernate(TOI_FAILED_IO,
7881+ "Compress_read returned %d.\n", ret);
7882+ else if (outlen != PAGE_SIZE) {
7883+ abort_hibernate(TOI_FAILED_IO,
7884+ "Decompression yielded %d bytes instead of %ld.\n",
7885+ outlen, PAGE_SIZE);
7f9d2ee0 7886+ printk("Decompression yielded %d bytes instead of %ld.\n",
7887+ outlen, PAGE_SIZE);
4e97e4e9 7888+ ret = -EIO;
7889+ *buf_size = outlen;
24613191 7890+ }
4e97e4e9 7891+ kunmap(buffer_page);
7892+ return ret;
7893+}
24613191 7894+
ad8f4a28 7895+/*
4e97e4e9 7896+ * toi_compress_print_debug_stats
7897+ * @buffer: Pointer to a buffer into which the debug info will be printed.
7898+ * @size: Size of the buffer.
7899+ *
7900+ * Print information to be recorded for debugging purposes into a buffer.
7901+ * Returns: Number of characters written to the buffer.
7902+ */
24613191 7903+
4e97e4e9 7904+static int toi_compress_print_debug_stats(char *buffer, int size)
7905+{
7906+ unsigned long pages_in = toi_compress_bytes_in >> PAGE_SHIFT,
7907+ pages_out = toi_compress_bytes_out >> PAGE_SHIFT;
7908+ int len;
ad8f4a28 7909+
4e97e4e9 7910+ /* Output the compression ratio achieved. */
7911+ if (*toi_compressor_name)
7912+ len = snprintf_used(buffer, size, "- Compressor is '%s'.\n",
7913+ toi_compressor_name);
7914+ else
7915+ len = snprintf_used(buffer, size, "- Compressor is not set.\n");
24613191 7916+
4e97e4e9 7917+ if (pages_in)
ad8f4a28 7918+ len += snprintf_used(buffer+len, size - len,
4e97e4e9 7919+ " Compressed %lu bytes into %lu (%d percent compression).\n",
7920+ toi_compress_bytes_in,
7921+ toi_compress_bytes_out,
7922+ (pages_in - pages_out) * 100 / pages_in);
7923+ return len;
24613191 7924+}
7925+
ad8f4a28 7926+/*
4e97e4e9 7927+ * toi_compress_compression_memory_needed
24613191 7928+ *
4e97e4e9 7929+ * Tell the caller how much memory we need to operate during hibernate/resume.
7930+ * Returns: Unsigned long. Maximum number of bytes of memory required for
7931+ * operation.
24613191 7932+ */
4e97e4e9 7933+static int toi_compress_memory_needed(void)
24613191 7934+{
4e97e4e9 7935+ return 2 * PAGE_SIZE;
7936+}
24613191 7937+
4e97e4e9 7938+static int toi_compress_storage_needed(void)
7939+{
7940+ return 4 * sizeof(unsigned long) + strlen(toi_compressor_name) + 1;
24613191 7941+}
7942+
ad8f4a28 7943+/*
4e97e4e9 7944+ * toi_compress_save_config_info
7945+ * @buffer: Pointer to a buffer of size PAGE_SIZE.
7946+ *
7947+ * Save informaton needed when reloading the image at resume time.
7948+ * Returns: Number of bytes used for saving our data.
24613191 7949+ */
4e97e4e9 7950+static int toi_compress_save_config_info(char *buffer)
24613191 7951+{
4e97e4e9 7952+ int namelen = strlen(toi_compressor_name) + 1;
7953+ int total_len;
ad8f4a28 7954+
4e97e4e9 7955+ *((unsigned long *) buffer) = toi_compress_bytes_in;
7956+ *((unsigned long *) (buffer + 1 * sizeof(unsigned long))) =
7957+ toi_compress_bytes_out;
7958+ *((unsigned long *) (buffer + 2 * sizeof(unsigned long))) =
7959+ toi_expected_compression;
7960+ *((unsigned long *) (buffer + 3 * sizeof(unsigned long))) = namelen;
ad8f4a28 7961+ strncpy(buffer + 4 * sizeof(unsigned long), toi_compressor_name,
4e97e4e9 7962+ namelen);
7963+ total_len = 4 * sizeof(unsigned long) + namelen;
7964+ return total_len;
24613191 7965+}
7966+
4e97e4e9 7967+/* toi_compress_load_config_info
7968+ * @buffer: Pointer to the start of the data.
7969+ * @size: Number of bytes that were saved.
7970+ *
7971+ * Description: Reload information needed for decompressing the image at
7972+ * resume time.
24613191 7973+ */
4e97e4e9 7974+static void toi_compress_load_config_info(char *buffer, int size)
24613191 7975+{
4e97e4e9 7976+ int namelen;
ad8f4a28 7977+
4e97e4e9 7978+ toi_compress_bytes_in = *((unsigned long *) buffer);
ad8f4a28
AM
7979+ toi_compress_bytes_out = *((unsigned long *) (buffer + 1 *
7980+ sizeof(unsigned long)));
4e97e4e9 7981+ toi_expected_compression = *((unsigned long *) (buffer + 2 *
7982+ sizeof(unsigned long)));
7983+ namelen = *((unsigned long *) (buffer + 3 * sizeof(unsigned long)));
7984+ strncpy(toi_compressor_name, buffer + 4 * sizeof(unsigned long),
7985+ namelen);
7986+ return;
7987+}
24613191 7988+
ad8f4a28 7989+/*
4e97e4e9 7990+ * toi_expected_compression_ratio
ad8f4a28 7991+ *
4e97e4e9 7992+ * Description: Returns the expected ratio between data passed into this module
7993+ * and the amount of data output when writing.
7994+ * Returns: 100 if the module is disabled. Otherwise the value set by the
7995+ * user via our sysfs entry.
7996+ */
24613191 7997+
4e97e4e9 7998+static int toi_compress_expected_ratio(void)
7999+{
8000+ if (!toi_compression_ops.enabled)
8001+ return 100;
8002+ else
8003+ return 100 - toi_expected_compression;
24613191 8004+}
8005+
24613191 8006+/*
4e97e4e9 8007+ * data for our sysfs entries.
24613191 8008+ */
4e97e4e9 8009+static struct toi_sysfs_data sysfs_params[] = {
8010+ {
8011+ TOI_ATTR("expected_compression", SYSFS_RW),
8012+ SYSFS_INT(&toi_expected_compression, 0, 99, 0)
8013+ },
24613191 8014+
4e97e4e9 8015+ {
8016+ TOI_ATTR("enabled", SYSFS_RW),
8017+ SYSFS_INT(&toi_compression_ops.enabled, 0, 1, 0)
8018+ },
24613191 8019+
4e97e4e9 8020+ {
8021+ TOI_ATTR("algorithm", SYSFS_RW),
8022+ SYSFS_STRING(toi_compressor_name, 31, 0)
8023+ }
24613191 8024+};
8025+
4e97e4e9 8026+/*
8027+ * Ops structure.
8028+ */
8029+static struct toi_module_ops toi_compression_ops = {
8030+ .type = FILTER_MODULE,
8031+ .name = "compression",
8032+ .directory = "compression",
8033+ .module = THIS_MODULE,
8034+ .initialise = toi_compress_init,
8035+ .cleanup = toi_compress_cleanup,
8036+ .memory_needed = toi_compress_memory_needed,
8037+ .print_debug_info = toi_compress_print_debug_stats,
8038+ .save_config_info = toi_compress_save_config_info,
8039+ .load_config_info = toi_compress_load_config_info,
8040+ .storage_needed = toi_compress_storage_needed,
8041+ .expected_compression = toi_compress_expected_ratio,
ad8f4a28 8042+
4e97e4e9 8043+ .rw_init = toi_compress_rw_init,
24613191 8044+
4e97e4e9 8045+ .write_page = toi_compress_write_page,
8046+ .read_page = toi_compress_read_page,
e8d0ad9d 8047+
4e97e4e9 8048+ .sysfs_data = sysfs_params,
ad8f4a28
AM
8049+ .num_sysfs_entries = sizeof(sysfs_params) /
8050+ sizeof(struct toi_sysfs_data),
4e97e4e9 8051+};
e8d0ad9d 8052+
4e97e4e9 8053+/* ---- Registration ---- */
24613191 8054+
4e97e4e9 8055+static __init int toi_compress_load(void)
8056+{
8057+ return toi_register_module(&toi_compression_ops);
8058+}
8059+
8060+#ifdef MODULE
8061+static __exit void toi_compress_unload(void)
8062+{
8063+ toi_unregister_module(&toi_compression_ops);
8064+}
8065+
8066+module_init(toi_compress_load);
8067+module_exit(toi_compress_unload);
8068+MODULE_LICENSE("GPL");
8069+MODULE_AUTHOR("Nigel Cunningham");
8070+MODULE_DESCRIPTION("Compression Support for TuxOnIce");
8071+#else
8072+late_initcall(toi_compress_load);
8073+#endif
8074diff --git a/kernel/power/tuxonice_extent.c b/kernel/power/tuxonice_extent.c
8075new file mode 100644
7f9d2ee0 8076index 0000000..be7b721
4e97e4e9 8077--- /dev/null
8078+++ b/kernel/power/tuxonice_extent.c
ad8f4a28
AM
8079@@ -0,0 +1,312 @@
8080+/*
4e97e4e9 8081+ * kernel/power/tuxonice_extent.c
ad8f4a28 8082+ *
4e97e4e9 8083+ * Copyright (C) 2003-2007 Nigel Cunningham (nigel at tuxonice net)
24613191 8084+ *
4e97e4e9 8085+ * Distributed under GPLv2.
ad8f4a28 8086+ *
4e97e4e9 8087+ * These functions encapsulate the manipulation of storage metadata. For
8088+ * pageflags, we use dynamically allocated bitmaps.
24613191 8089+ */
8090+
24613191 8091+#include <linux/module.h>
4e97e4e9 8092+#include <linux/suspend.h>
8093+#include "tuxonice_modules.h"
8094+#include "tuxonice_extent.h"
ad8f4a28 8095+#include "tuxonice_alloc.h"
4e97e4e9 8096+#include "tuxonice_ui.h"
8097+#include "tuxonice.h"
24613191 8098+
4e97e4e9 8099+/* toi_get_extent
24613191 8100+ *
4e97e4e9 8101+ * Returns a free extent. May fail, returning NULL instead.
24613191 8102+ */
4e97e4e9 8103+static struct extent *toi_get_extent(void)
24613191 8104+{
4e97e4e9 8105+ struct extent *result;
ad8f4a28
AM
8106+
8107+ result = toi_kzalloc(2, sizeof(struct extent), TOI_ATOMIC_GFP);
8108+ if (!result)
4e97e4e9 8109+ return NULL;
24613191 8110+
4e97e4e9 8111+ result->minimum = result->maximum = 0;
8112+ result->next = NULL;
8113+
8114+ return result;
24613191 8115+}
8116+
4e97e4e9 8117+/* toi_put_extent_chain.
24613191 8118+ *
4e97e4e9 8119+ * Frees a whole chain of extents.
24613191 8120+ */
4e97e4e9 8121+void toi_put_extent_chain(struct extent_chain *chain)
24613191 8122+{
4e97e4e9 8123+ struct extent *this;
24613191 8124+
4e97e4e9 8125+ this = chain->first;
24613191 8126+
ad8f4a28 8127+ while (this) {
4e97e4e9 8128+ struct extent *next = this->next;
ad8f4a28 8129+ toi_kfree(2, this);
4e97e4e9 8130+ chain->num_extents--;
8131+ this = next;
8132+ }
ad8f4a28 8133+
4e97e4e9 8134+ chain->first = chain->last_touched = NULL;
8135+ chain->size = 0;
24613191 8136+}
8137+
ad8f4a28 8138+/*
4e97e4e9 8139+ * toi_add_to_extent_chain
24613191 8140+ *
4e97e4e9 8141+ * Add an extent to an existing chain.
24613191 8142+ */
ad8f4a28 8143+int toi_add_to_extent_chain(struct extent_chain *chain,
4e97e4e9 8144+ unsigned long minimum, unsigned long maximum)
24613191 8145+{
4e97e4e9 8146+ struct extent *new_extent = NULL, *start_at;
24613191 8147+
4e97e4e9 8148+ /* Find the right place in the chain */
ad8f4a28 8149+ start_at = (chain->last_touched &&
4e97e4e9 8150+ (chain->last_touched->minimum < minimum)) ?
8151+ chain->last_touched : NULL;
8152+
8153+ if (!start_at && chain->first && chain->first->minimum < minimum)
8154+ start_at = chain->first;
8155+
8156+ while (start_at && start_at->next && start_at->next->minimum < minimum)
8157+ start_at = start_at->next;
8158+
8159+ if (start_at && start_at->maximum == (minimum - 1)) {
8160+ start_at->maximum = maximum;
8161+
8162+ /* Merge with the following one? */
8163+ if (start_at->next &&
8164+ start_at->maximum + 1 == start_at->next->minimum) {
8165+ struct extent *to_free = start_at->next;
8166+ start_at->maximum = start_at->next->maximum;
8167+ start_at->next = start_at->next->next;
8168+ chain->num_extents--;
ad8f4a28 8169+ toi_kfree(2, to_free);
4e97e4e9 8170+ }
8171+
8172+ chain->last_touched = start_at;
ad8f4a28 8173+ chain->size += (maximum - minimum + 1);
4e97e4e9 8174+
8175+ return 0;
24613191 8176+ }
8177+
4e97e4e9 8178+ new_extent = toi_get_extent();
8179+ if (!new_extent) {
ad8f4a28
AM
8180+ printk(KERN_INFO "Error unable to append a new extent to the "
8181+ "chain.\n");
7f9d2ee0 8182+ return -ENOMEM;
4e97e4e9 8183+ }
24613191 8184+
4e97e4e9 8185+ chain->num_extents++;
ad8f4a28 8186+ chain->size += (maximum - minimum + 1);
4e97e4e9 8187+ new_extent->minimum = minimum;
8188+ new_extent->maximum = maximum;
8189+ new_extent->next = NULL;
24613191 8190+
4e97e4e9 8191+ chain->last_touched = new_extent;
8192+
8193+ if (start_at) {
8194+ struct extent *next = start_at->next;
8195+ start_at->next = new_extent;
8196+ new_extent->next = next;
8197+ } else {
8198+ if (chain->first)
8199+ new_extent->next = chain->first;
8200+ chain->first = new_extent;
24613191 8201+ }
8202+
4e97e4e9 8203+ return 0;
24613191 8204+}
8205+
4e97e4e9 8206+/* toi_serialise_extent_chain
8207+ *
8208+ * Write a chain in the image.
24613191 8209+ */
4e97e4e9 8210+int toi_serialise_extent_chain(struct toi_module_ops *owner,
8211+ struct extent_chain *chain)
24613191 8212+{
4e97e4e9 8213+ struct extent *this;
8214+ int ret, i = 0;
ad8f4a28
AM
8215+
8216+ ret = toiActiveAllocator->rw_header_chunk(WRITE, owner, (char *) chain,
8217+ 2 * sizeof(int));
8218+ if (ret)
4e97e4e9 8219+ return ret;
24613191 8220+
4e97e4e9 8221+ this = chain->first;
8222+ while (this) {
ad8f4a28
AM
8223+ ret = toiActiveAllocator->rw_header_chunk(WRITE, owner,
8224+ (char *) this, 2 * sizeof(unsigned long));
8225+ if (ret)
4e97e4e9 8226+ return ret;
8227+ this = this->next;
8228+ i++;
24613191 8229+ }
8230+
4e97e4e9 8231+ if (i != chain->num_extents) {
ad8f4a28
AM
8232+ printk(KERN_EMERG "Saved %d extents but chain metadata says "
8233+ "there should be %d.\n", i, chain->num_extents);
4e97e4e9 8234+ return 1;
8235+ }
24613191 8236+
4e97e4e9 8237+ return ret;
24613191 8238+}
8239+
4e97e4e9 8240+/* toi_load_extent_chain
24613191 8241+ *
4e97e4e9 8242+ * Read back a chain saved in the image.
24613191 8243+ */
4e97e4e9 8244+int toi_load_extent_chain(struct extent_chain *chain)
24613191 8245+{
4e97e4e9 8246+ struct extent *this, *last = NULL;
8247+ int i, ret;
24613191 8248+
7f9d2ee0 8249+ ret = toiActiveAllocator->rw_header_chunk_noreadahead(READ, NULL,
8250+ (char *) chain, 2 * sizeof(int));
ad8f4a28 8251+ if (ret) {
4e97e4e9 8252+ printk("Failed to read size of extent chain.\n");
8253+ return 1;
24613191 8254+ }
8255+
4e97e4e9 8256+ for (i = 0; i < chain->num_extents; i++) {
ad8f4a28 8257+ this = toi_kzalloc(3, sizeof(struct extent), TOI_ATOMIC_GFP);
4e97e4e9 8258+ if (!this) {
ad8f4a28 8259+ printk(KERN_INFO "Failed to allocate a new extent.\n");
4e97e4e9 8260+ return -ENOMEM;
8261+ }
8262+ this->next = NULL;
7f9d2ee0 8263+ ret = toiActiveAllocator->rw_header_chunk_noreadahead(READ,
8264+ NULL, (char *) this, 2 * sizeof(unsigned long));
ad8f4a28
AM
8265+ if (ret) {
8266+ printk(KERN_INFO "Failed to an extent.\n");
4e97e4e9 8267+ return 1;
8268+ }
8269+ if (last)
8270+ last->next = this;
8271+ else
8272+ chain->first = this;
8273+ last = this;
24613191 8274+ }
4e97e4e9 8275+ return 0;
8276+}
24613191 8277+
4e97e4e9 8278+/* toi_extent_state_next
8279+ *
8280+ * Given a state, progress to the next valid entry. We may begin in an
8281+ * invalid state, as we do when invoked after extent_state_goto_start below.
8282+ *
8283+ * When using compression and expected_compression > 0, we let the image size
8284+ * be larger than storage, so we can validly run out of data to return.
8285+ */
8286+unsigned long toi_extent_state_next(struct extent_iterate_state *state)
8287+{
8288+ if (state->current_chain == state->num_chains)
8289+ return 0;
24613191 8290+
4e97e4e9 8291+ if (state->current_extent) {
8292+ if (state->current_offset == state->current_extent->maximum) {
8293+ if (state->current_extent->next) {
ad8f4a28
AM
8294+ state->current_extent =
8295+ state->current_extent->next;
8296+ state->current_offset =
8297+ state->current_extent->minimum;
4e97e4e9 8298+ } else {
8299+ state->current_extent = NULL;
8300+ state->current_offset = 0;
73c609d5 8301+ }
73c609d5 8302+ } else
4e97e4e9 8303+ state->current_offset++;
24613191 8304+ }
73c609d5 8305+
ad8f4a28 8306+ while (!state->current_extent) {
4e97e4e9 8307+ int chain_num = ++(state->current_chain);
24613191 8308+
4e97e4e9 8309+ if (chain_num == state->num_chains)
8310+ return 0;
24613191 8311+
4e97e4e9 8312+ state->current_extent = (state->chains + chain_num)->first;
24613191 8313+
4e97e4e9 8314+ if (!state->current_extent)
8315+ continue;
24613191 8316+
4e97e4e9 8317+ state->current_offset = state->current_extent->minimum;
24613191 8318+ }
4e97e4e9 8319+
8320+ return state->current_offset;
24613191 8321+}
8322+
4e97e4e9 8323+/* toi_extent_state_goto_start
24613191 8324+ *
4e97e4e9 8325+ * Find the first valid value in a group of chains.
24613191 8326+ */
4e97e4e9 8327+void toi_extent_state_goto_start(struct extent_iterate_state *state)
24613191 8328+{
4e97e4e9 8329+ state->current_chain = -1;
8330+ state->current_extent = NULL;
8331+ state->current_offset = 0;
24613191 8332+}
8333+
4e97e4e9 8334+/* toi_extent_start_save
24613191 8335+ *
4e97e4e9 8336+ * Given a state and a struct extent_state_store, save the current
8337+ * position in a format that can be used with relocated chains (at
8338+ * resume time).
24613191 8339+ */
4e97e4e9 8340+void toi_extent_state_save(struct extent_iterate_state *state,
8341+ struct extent_iterate_saved_state *saved_state)
24613191 8342+{
4e97e4e9 8343+ struct extent *extent;
24613191 8344+
4e97e4e9 8345+ saved_state->chain_num = state->current_chain;
8346+ saved_state->extent_num = 0;
8347+ saved_state->offset = state->current_offset;
24613191 8348+
4e97e4e9 8349+ if (saved_state->chain_num == -1)
8350+ return;
ad8f4a28 8351+
4e97e4e9 8352+ extent = (state->chains + state->current_chain)->first;
8353+
8354+ while (extent != state->current_extent) {
8355+ saved_state->extent_num++;
8356+ extent = extent->next;
24613191 8357+ }
8358+}
8359+
4e97e4e9 8360+/* toi_extent_start_restore
24613191 8361+ *
4e97e4e9 8362+ * Restore the position saved by extent_state_save.
24613191 8363+ */
4e97e4e9 8364+void toi_extent_state_restore(struct extent_iterate_state *state,
8365+ struct extent_iterate_saved_state *saved_state)
24613191 8366+{
4e97e4e9 8367+ int posn = saved_state->extent_num;
24613191 8368+
4e97e4e9 8369+ if (saved_state->chain_num == -1) {
8370+ toi_extent_state_goto_start(state);
8371+ return;
24613191 8372+ }
8373+
4e97e4e9 8374+ state->current_chain = saved_state->chain_num;
8375+ state->current_extent = (state->chains + state->current_chain)->first;
8376+ state->current_offset = saved_state->offset;
24613191 8377+
4e97e4e9 8378+ while (posn--)
8379+ state->current_extent = state->current_extent->next;
24613191 8380+}
8381+
4e97e4e9 8382+#ifdef CONFIG_TOI_EXPORTS
8383+EXPORT_SYMBOL_GPL(toi_add_to_extent_chain);
8384+EXPORT_SYMBOL_GPL(toi_put_extent_chain);
8385+EXPORT_SYMBOL_GPL(toi_load_extent_chain);
8386+EXPORT_SYMBOL_GPL(toi_serialise_extent_chain);
8387+EXPORT_SYMBOL_GPL(toi_extent_state_save);
8388+EXPORT_SYMBOL_GPL(toi_extent_state_restore);
8389+EXPORT_SYMBOL_GPL(toi_extent_state_goto_start);
8390+EXPORT_SYMBOL_GPL(toi_extent_state_next);
24613191 8391+#endif
4e97e4e9 8392diff --git a/kernel/power/tuxonice_extent.h b/kernel/power/tuxonice_extent.h
8393new file mode 100644
ad8f4a28 8394index 0000000..d7dd07e
4e97e4e9 8395--- /dev/null
8396+++ b/kernel/power/tuxonice_extent.h
ad8f4a28 8397@@ -0,0 +1,78 @@
24613191 8398+/*
4e97e4e9 8399+ * kernel/power/tuxonice_extent.h
24613191 8400+ *
4e97e4e9 8401+ * Copyright (C) 2003-2007 Nigel Cunningham (nigel at tuxonice net)
24613191 8402+ *
8403+ * This file is released under the GPLv2.
8404+ *
4e97e4e9 8405+ * It contains declarations related to extents. Extents are
8406+ * TuxOnIce's method of storing some of the metadata for the image.
8407+ * See tuxonice_extent.c for more info.
24613191 8408+ *
8409+ */
8410+
4e97e4e9 8411+#include "tuxonice_modules.h"
24613191 8412+
4e97e4e9 8413+#ifndef EXTENT_H
8414+#define EXTENT_H
24613191 8415+
4e97e4e9 8416+struct extent {
8417+ unsigned long minimum, maximum;
8418+ struct extent *next;
24613191 8419+};
8420+
4e97e4e9 8421+struct extent_chain {
8422+ int size; /* size of the chain ie sum (max-min+1) */
8423+ int num_extents;
8424+ struct extent *first, *last_touched;
24613191 8425+};
8426+
4e97e4e9 8427+struct extent_iterate_state {
8428+ struct extent_chain *chains;
8429+ int num_chains;
8430+ int current_chain;
8431+ struct extent *current_extent;
8432+ unsigned long current_offset;
24613191 8433+};
8434+
4e97e4e9 8435+struct extent_iterate_saved_state {
8436+ int chain_num;
8437+ int extent_num;
8438+ unsigned long offset;
8439+};
24613191 8440+
ad8f4a28
AM
8441+#define toi_extent_state_eof(state) \
8442+ ((state)->num_chains == (state)->current_chain)
24613191 8443+
4e97e4e9 8444+/* Simplify iterating through all the values in an extent chain */
8445+#define toi_extent_for_each(extent_chain, extentpointer, value) \
8446+if ((extent_chain)->first) \
8447+ for ((extentpointer) = (extent_chain)->first, (value) = \
8448+ (extentpointer)->minimum; \
8449+ ((extentpointer) && ((extentpointer)->next || (value) <= \
8450+ (extentpointer)->maximum)); \
8451+ (((value) == (extentpointer)->maximum) ? \
8452+ ((extentpointer) = (extentpointer)->next, (value) = \
8453+ ((extentpointer) ? (extentpointer)->minimum : 0)) : \
8454+ (value)++))
24613191 8455+
4e97e4e9 8456+void toi_put_extent_chain(struct extent_chain *chain);
ad8f4a28 8457+int toi_add_to_extent_chain(struct extent_chain *chain,
4e97e4e9 8458+ unsigned long minimum, unsigned long maximum);
8459+int toi_serialise_extent_chain(struct toi_module_ops *owner,
8460+ struct extent_chain *chain);
8461+int toi_load_extent_chain(struct extent_chain *chain);
24613191 8462+
ad8f4a28 8463+/* swap_entry_to_extent_val & extent_val_to_swap_entry:
4e97e4e9 8464+ * We are putting offset in the low bits so consecutive swap entries
8465+ * make consecutive extent values */
8466+#define swap_entry_to_extent_val(swp_entry) (swp_entry.val)
8467+#define extent_val_to_swap_entry(val) (swp_entry_t) { (val) }
24613191 8468+
4e97e4e9 8469+void toi_extent_state_save(struct extent_iterate_state *state,
8470+ struct extent_iterate_saved_state *saved_state);
8471+void toi_extent_state_restore(struct extent_iterate_state *state,
8472+ struct extent_iterate_saved_state *saved_state);
8473+void toi_extent_state_goto_start(struct extent_iterate_state *state);
8474+unsigned long toi_extent_state_next(struct extent_iterate_state *state);
8475+#endif
8476diff --git a/kernel/power/tuxonice_file.c b/kernel/power/tuxonice_file.c
8477new file mode 100644
7f9d2ee0 8478index 0000000..d702479
4e97e4e9 8479--- /dev/null
8480+++ b/kernel/power/tuxonice_file.c
7f9d2ee0 8481@@ -0,0 +1,1124 @@
4e97e4e9 8482+/*
8483+ * kernel/power/tuxonice_file.c
8484+ *
8485+ * Copyright (C) 2005-2007 Nigel Cunningham (nigel at tuxonice net)
8486+ *
8487+ * Distributed under GPLv2.
ad8f4a28 8488+ *
4e97e4e9 8489+ * This file encapsulates functions for usage of a simple file as a
8490+ * backing store. It is based upon the swapallocator, and shares the
8491+ * same basic working. Here, though, we have nothing to do with
8492+ * swapspace, and only one device to worry about.
8493+ *
8494+ * The user can just
8495+ *
8496+ * echo TuxOnIce > /path/to/my_file
8497+ *
8498+ * dd if=/dev/zero bs=1M count=<file_size_desired> >> /path/to/my_file
8499+ *
8500+ * and
8501+ *
8502+ * echo /path/to/my_file > /sys/power/tuxonice/file/target
8503+ *
8504+ * then put what they find in /sys/power/tuxonice/resume
8505+ * as their resume= parameter in lilo.conf (and rerun lilo if using it).
8506+ *
8507+ * Having done this, they're ready to hibernate and resume.
8508+ *
8509+ * TODO:
8510+ * - File resizing.
8511+ */
24613191 8512+
4e97e4e9 8513+#include <linux/suspend.h>
8514+#include <linux/module.h>
8515+#include <linux/blkdev.h>
8516+#include <linux/file.h>
8517+#include <linux/stat.h>
8518+#include <linux/mount.h>
8519+#include <linux/statfs.h>
8520+#include <linux/syscalls.h>
8521+#include <linux/namei.h>
8522+#include <linux/fs.h>
8523+#include <linux/root_dev.h>
24613191 8524+
4e97e4e9 8525+#include "tuxonice.h"
8526+#include "tuxonice_sysfs.h"
8527+#include "tuxonice_modules.h"
8528+#include "tuxonice_ui.h"
8529+#include "tuxonice_extent.h"
8530+#include "tuxonice_io.h"
8531+#include "tuxonice_storage.h"
8532+#include "tuxonice_block_io.h"
ad8f4a28 8533+#include "tuxonice_alloc.h"
24613191 8534+
4e97e4e9 8535+static struct toi_module_ops toi_fileops;
24613191 8536+
4e97e4e9 8537+/* Details of our target. */
24613191 8538+
4e97e4e9 8539+char toi_file_target[256];
8540+static struct inode *target_inode;
8541+static struct file *target_file;
8542+static struct block_device *toi_file_target_bdev;
8543+static dev_t resume_file_dev_t;
ad8f4a28
AM
8544+static int used_devt;
8545+static int setting_toi_file_target;
8546+static sector_t target_firstblock, target_header_start;
8547+static int target_storage_available;
8548+static int target_claim;
24613191 8549+
7f9d2ee0 8550+/* Old signatures */
4e97e4e9 8551+static char HaveImage[] = "HaveImage\n";
8552+static char NoImage[] = "TuxOnIce\n";
8553+#define sig_size (sizeof(HaveImage) + 1)
8554+
8555+struct toi_file_header {
8556+ char sig[sig_size];
8557+ int resumed_before;
8558+ unsigned long first_header_block;
7f9d2ee0 8559+ int have_image;
24613191 8560+};
8561+
4e97e4e9 8562+/* Header Page Information */
7f9d2ee0 8563+static int header_pages_reserved;
4e97e4e9 8564+
8565+/* Main Storage Pages */
8566+static int main_pages_allocated, main_pages_requested;
24613191 8567+
4e97e4e9 8568+#define target_is_normal_file() (S_ISREG(target_inode->i_mode))
24613191 8569+
4e97e4e9 8570+static struct toi_bdev_info devinfo;
24613191 8571+
4e97e4e9 8572+/* Extent chain for blocks */
8573+static struct extent_chain block_chain;
24613191 8574+
4e97e4e9 8575+/* Signature operations */
8576+enum {
8577+ GET_IMAGE_EXISTS,
8578+ INVALIDATE,
8579+ MARK_RESUME_ATTEMPTED,
8580+ UNMARK_RESUME_ATTEMPTED,
8581+};
24613191 8582+
4e97e4e9 8583+static void set_devinfo(struct block_device *bdev, int target_blkbits)
8584+{
8585+ devinfo.bdev = bdev;
8586+ if (!target_blkbits) {
8587+ devinfo.bmap_shift = devinfo.blocks_per_page = 0;
8588+ } else {
8589+ devinfo.bmap_shift = target_blkbits - 9;
8590+ devinfo.blocks_per_page = (1 << (PAGE_SHIFT - target_blkbits));
8591+ }
8592+}
24613191 8593+
7f9d2ee0 8594+static long raw_to_real(long raw)
4e97e4e9 8595+{
7f9d2ee0 8596+ long result;
8597+
8598+ result = raw - (raw * (sizeof(unsigned long) + sizeof(int)) +
8599+ (PAGE_SIZE + sizeof(unsigned long) + sizeof(int) + 1)) /
8600+ (PAGE_SIZE + sizeof(unsigned long) + sizeof(int));
8601+
8602+ return result < 0 ? 0 : result;
4e97e4e9 8603+}
24613191 8604+
4e97e4e9 8605+static int toi_file_storage_available(void)
8606+{
8607+ int result = 0;
ad8f4a28 8608+ struct block_device *bdev = toi_file_target_bdev;
24613191 8609+
4e97e4e9 8610+ if (!target_inode)
8611+ return 0;
24613191 8612+
4e97e4e9 8613+ switch (target_inode->i_mode & S_IFMT) {
ad8f4a28
AM
8614+ case S_IFSOCK:
8615+ case S_IFCHR:
8616+ case S_IFIFO: /* Socket, Char, Fifo */
8617+ return -1;
8618+ case S_IFREG: /* Regular file: current size - holes + free
8619+ space on part */
8620+ result = target_storage_available;
8621+ break;
8622+ case S_IFBLK: /* Block device */
8623+ if (!bdev->bd_disk) {
8624+ printk(KERN_INFO "bdev->bd_disk null.\n");
8625+ return 0;
8626+ }
24613191 8627+
ad8f4a28
AM
8628+ result = (bdev->bd_part ?
8629+ bdev->bd_part->nr_sects :
8630+ bdev->bd_disk->capacity) >> (PAGE_SHIFT - 9);
4e97e4e9 8631+ }
24613191 8632+
7f9d2ee0 8633+ return raw_to_real(result);
4e97e4e9 8634+}
24613191 8635+
4e97e4e9 8636+static int has_contiguous_blocks(int page_num)
24613191 8637+{
4e97e4e9 8638+ int j;
8639+ sector_t last = 0;
24613191 8640+
4e97e4e9 8641+ for (j = 0; j < devinfo.blocks_per_page; j++) {
8642+ sector_t this = bmap(target_inode,
8643+ page_num * devinfo.blocks_per_page + j);
8644+
8645+ if (!this || (last && (last + 1) != this))
24613191 8646+ break;
8647+
4e97e4e9 8648+ last = this;
24613191 8649+ }
ad8f4a28 8650+
4e97e4e9 8651+ return (j == devinfo.blocks_per_page);
24613191 8652+}
8653+
4e97e4e9 8654+static int size_ignoring_ignored_pages(void)
24613191 8655+{
4e97e4e9 8656+ int mappable = 0, i;
ad8f4a28 8657+
4e97e4e9 8658+ if (!target_is_normal_file())
8659+ return toi_file_storage_available();
24613191 8660+
4e97e4e9 8661+ for (i = 0; i < (target_inode->i_size >> PAGE_SHIFT) ; i++)
8662+ if (has_contiguous_blocks(i))
8663+ mappable++;
ad8f4a28 8664+
4e97e4e9 8665+ return mappable;
24613191 8666+}
8667+
7f9d2ee0 8668+static int __populate_block_list(int min, int max)
24613191 8669+{
4e97e4e9 8670+ if (test_action_state(TOI_TEST_BIO))
ad8f4a28
AM
8671+ printk(KERN_INFO "Adding extent %d-%d.\n",
8672+ min << devinfo.bmap_shift,
8673+ ((max + 1) << devinfo.bmap_shift) - 1);
4e97e4e9 8674+
7f9d2ee0 8675+ return toi_add_to_extent_chain(&block_chain, min, max);
24613191 8676+}
8677+
7f9d2ee0 8678+static int apply_header_reservation(void)
24613191 8679+{
4e97e4e9 8680+ int i;
7f9d2ee0 8681+
8682+ /* Apply header space reservation */
8683+ toi_extent_state_goto_start(&toi_writer_posn);
8684+ toi_bio_ops.forward_one_page(1); /* To first page */
8685+
8686+ for (i = 0; i < header_pages_reserved; i++)
8687+ if (toi_bio_ops.forward_one_page(1))
8688+ return -ENOSPC;
8689+
8690+ /* The end of header pages will be the start of pageset 2 */
8691+ toi_extent_state_save(&toi_writer_posn, &toi_writer_posn_save[2]);
8692+
8693+ return 0;
8694+}
8695+
8696+static int populate_block_list(void)
8697+{
8698+ int i, extent_min = -1, extent_max = -1, got_header = 0, result = 0;
ad8f4a28 8699+
4e97e4e9 8700+ if (block_chain.first)
8701+ toi_put_extent_chain(&block_chain);
24613191 8702+
4e97e4e9 8703+ if (!target_is_normal_file()) {
7f9d2ee0 8704+ return (target_storage_available > 0) ?
ad8f4a28 8705+ __populate_block_list(devinfo.blocks_per_page,
4e97e4e9 8706+ (target_storage_available + 1) *
7f9d2ee0 8707+ devinfo.blocks_per_page - 1) : 0;
24613191 8708+ }
8709+
4e97e4e9 8710+ for (i = 0; i < (target_inode->i_size >> PAGE_SHIFT); i++) {
8711+ sector_t new_sector;
24613191 8712+
4e97e4e9 8713+ if (!has_contiguous_blocks(i))
8714+ continue;
24613191 8715+
4e97e4e9 8716+ new_sector = bmap(target_inode,
8717+ (i * devinfo.blocks_per_page));
24613191 8718+
ad8f4a28 8719+ /*
4e97e4e9 8720+ * Ignore the first block in the file.
8721+ * It gets the header.
8722+ */
8723+ if (new_sector == target_firstblock >> devinfo.bmap_shift) {
8724+ got_header = 1;
8725+ continue;
8726+ }
24613191 8727+
ad8f4a28
AM
8728+ /*
8729+ * I'd love to be able to fill in holes and resize
4e97e4e9 8730+ * files, but not yet...
8731+ */
24613191 8732+
4e97e4e9 8733+ if (new_sector == extent_max + 1)
ad8f4a28 8734+ extent_max += devinfo.blocks_per_page;
4e97e4e9 8735+ else {
7f9d2ee0 8736+ if (extent_min > -1) {
8737+ result = __populate_block_list(extent_min,
4e97e4e9 8738+ extent_max);
7f9d2ee0 8739+ if (result)
8740+ return result;
8741+ }
24613191 8742+
4e97e4e9 8743+ extent_min = new_sector;
8744+ extent_max = extent_min +
8745+ devinfo.blocks_per_page - 1;
8746+ }
8747+ }
8748+
7f9d2ee0 8749+ if (extent_min > -1) {
8750+ result = __populate_block_list(extent_min, extent_max);
8751+ if (result)
8752+ return result;
8753+ }
8754+
8755+ return apply_header_reservation();
24613191 8756+}
8757+
4e97e4e9 8758+static void toi_file_cleanup(int finishing_cycle)
24613191 8759+{
4e97e4e9 8760+ if (toi_file_target_bdev) {
8761+ if (target_claim) {
8762+ bd_release(toi_file_target_bdev);
8763+ target_claim = 0;
8764+ }
24613191 8765+
4e97e4e9 8766+ if (used_devt) {
8767+ blkdev_put(toi_file_target_bdev);
8768+ used_devt = 0;
8769+ }
8770+ toi_file_target_bdev = NULL;
8771+ target_inode = NULL;
8772+ set_devinfo(NULL, 0);
8773+ target_storage_available = 0;
8774+ }
8775+
8776+ if (target_file > 0) {
8777+ filp_close(target_file, NULL);
8778+ target_file = NULL;
8779+ }
24613191 8780+}
8781+
ad8f4a28 8782+/*
4e97e4e9 8783+ * reopen_resume_devt
8784+ *
8785+ * Having opened resume= once, we remember the major and
8786+ * minor nodes and use them to reopen the bdev for checking
8787+ * whether an image exists (possibly when starting a resume).
24613191 8788+ */
4e97e4e9 8789+static void reopen_resume_devt(void)
24613191 8790+{
7f9d2ee0 8791+ toi_file_target_bdev = toi_open_by_devnum(resume_file_dev_t,
8792+ FMODE_READ);
4e97e4e9 8793+ if (IS_ERR(toi_file_target_bdev)) {
ad8f4a28 8794+ printk(KERN_INFO "Got a dev_num (%lx) but failed to open it.\n",
4e97e4e9 8795+ (unsigned long) resume_file_dev_t);
8796+ return;
24613191 8797+ }
4e97e4e9 8798+ target_inode = toi_file_target_bdev->bd_inode;
8799+ set_devinfo(toi_file_target_bdev, target_inode->i_blkbits);
8800+}
24613191 8801+
4e97e4e9 8802+static void toi_file_get_target_info(char *target, int get_size,
8803+ int resume_param)
8804+{
8805+ if (target_file)
8806+ toi_file_cleanup(0);
24613191 8807+
4e97e4e9 8808+ if (!target || !strlen(target))
8809+ return;
24613191 8810+
4e97e4e9 8811+ target_file = filp_open(target, O_RDWR|O_LARGEFILE, 0);
24613191 8812+
4e97e4e9 8813+ if (IS_ERR(target_file) || !target_file) {
24613191 8814+
4e97e4e9 8815+ if (!resume_param) {
ad8f4a28 8816+ printk(KERN_INFO "Open file %s returned %p.\n",
4e97e4e9 8817+ target, target_file);
8818+ target_file = NULL;
8819+ return;
8820+ }
24613191 8821+
4e97e4e9 8822+ target_file = NULL;
8823+ resume_file_dev_t = name_to_dev_t(target);
8824+ if (!resume_file_dev_t) {
8825+ struct kstat stat;
8826+ int error = vfs_stat(target, &stat);
ad8f4a28
AM
8827+ printk(KERN_INFO "Open file %s returned %p and "
8828+ "name_to_devt failed.\n", target,
8829+ target_file);
4e97e4e9 8830+ if (error)
ad8f4a28 8831+ printk(KERN_INFO "Stating the file also failed."
4e97e4e9 8832+ " Nothing more we can do.\n");
8833+ else
8834+ resume_file_dev_t = stat.rdev;
8835+ return;
8836+ }
24613191 8837+
ad8f4a28 8838+ toi_file_target_bdev = toi_open_by_devnum(resume_file_dev_t,
4e97e4e9 8839+ FMODE_READ);
8840+ if (IS_ERR(toi_file_target_bdev)) {
ad8f4a28
AM
8841+ printk(KERN_INFO "Got a dev_num (%lx) but failed to "
8842+ "open it.\n",
4e97e4e9 8843+ (unsigned long) resume_file_dev_t);
8844+ return;
8845+ }
8846+ used_devt = 1;
8847+ target_inode = toi_file_target_bdev->bd_inode;
8848+ } else
8849+ target_inode = target_file->f_mapping->host;
24613191 8850+
4e97e4e9 8851+ if (S_ISLNK(target_inode->i_mode) || S_ISDIR(target_inode->i_mode) ||
8852+ S_ISSOCK(target_inode->i_mode) || S_ISFIFO(target_inode->i_mode)) {
ad8f4a28
AM
8853+ printk(KERN_INFO "File support works with regular files,"
8854+ " character files and block devices.\n");
4e97e4e9 8855+ goto cleanup;
24613191 8856+ }
8857+
4e97e4e9 8858+ if (!used_devt) {
8859+ if (S_ISBLK(target_inode->i_mode)) {
8860+ toi_file_target_bdev = I_BDEV(target_inode);
8861+ if (!bd_claim(toi_file_target_bdev, &toi_fileops))
8862+ target_claim = 1;
8863+ } else
8864+ toi_file_target_bdev = target_inode->i_sb->s_bdev;
8865+ resume_file_dev_t = toi_file_target_bdev->bd_dev;
24613191 8866+ }
8867+
4e97e4e9 8868+ set_devinfo(toi_file_target_bdev, target_inode->i_blkbits);
24613191 8869+
4e97e4e9 8870+ if (get_size)
8871+ target_storage_available = size_ignoring_ignored_pages();
8872+
8873+ if (!resume_param)
8874+ target_firstblock = bmap(target_inode, 0) << devinfo.bmap_shift;
ad8f4a28 8875+
4e97e4e9 8876+ return;
8877+cleanup:
8878+ target_inode = NULL;
8879+ if (target_file) {
8880+ filp_close(target_file, NULL);
8881+ target_file = NULL;
8882+ }
8883+ set_devinfo(NULL, 0);
8884+ target_storage_available = 0;
24613191 8885+}
8886+
7f9d2ee0 8887+static void toi_file_noresume_reset(void)
8888+{
8889+ toi_bio_ops.rw_cleanup(READ);
8890+}
8891+
4e97e4e9 8892+static int parse_signature(struct toi_file_header *header)
24613191 8893+{
4e97e4e9 8894+ int have_image = !memcmp(HaveImage, header->sig, sizeof(HaveImage) - 1);
ad8f4a28
AM
8895+ int no_image_header = !memcmp(NoImage, header->sig,
8896+ sizeof(NoImage) - 1);
7f9d2ee0 8897+ int binary_sig = !memcmp(tuxonice_signature, header->sig,
8898+ sizeof(tuxonice_signature));
24613191 8899+
7f9d2ee0 8900+ if (no_image_header || (binary_sig && !header->have_image))
4e97e4e9 8901+ return 0;
24613191 8902+
7f9d2ee0 8903+ if (!have_image && !binary_sig)
4e97e4e9 8904+ return -1;
24613191 8905+
4e97e4e9 8906+ if (header->resumed_before)
8907+ set_toi_state(TOI_RESUMED_BEFORE);
8908+ else
8909+ clear_toi_state(TOI_RESUMED_BEFORE);
24613191 8910+
4e97e4e9 8911+ target_header_start = header->first_header_block;
8912+ return 1;
8913+}
24613191 8914+
4e97e4e9 8915+/* prepare_signature */
8916+
8917+static int prepare_signature(struct toi_file_header *current_header,
8918+ unsigned long first_header_block)
8919+{
7f9d2ee0 8920+ strncpy(current_header->sig, tuxonice_signature,
8921+ sizeof(tuxonice_signature));
4e97e4e9 8922+ current_header->resumed_before = 0;
8923+ current_header->first_header_block = first_header_block;
7f9d2ee0 8924+ current_header->have_image = 1;
24613191 8925+ return 0;
8926+}
8927+
4e97e4e9 8928+static int toi_file_storage_allocated(void)
24613191 8929+{
4e97e4e9 8930+ if (!target_inode)
8931+ return 0;
24613191 8932+
4e97e4e9 8933+ if (target_is_normal_file())
7f9d2ee0 8934+ return (int) raw_to_real(target_storage_available);
4e97e4e9 8935+ else
7f9d2ee0 8936+ return (int) raw_to_real(main_pages_requested);
4e97e4e9 8937+}
24613191 8938+
4e97e4e9 8939+static int toi_file_release_storage(void)
8940+{
8941+ if (test_action_state(TOI_KEEP_IMAGE) &&
8942+ test_toi_state(TOI_NOW_RESUMING))
8943+ return 0;
24613191 8944+
4e97e4e9 8945+ toi_put_extent_chain(&block_chain);
24613191 8946+
7f9d2ee0 8947+ header_pages_reserved = 0;
4e97e4e9 8948+ main_pages_allocated = 0;
8949+ main_pages_requested = 0;
8950+ return 0;
24613191 8951+}
8952+
7f9d2ee0 8953+static void toi_file_reserve_header_space(int request)
24613191 8954+{
7f9d2ee0 8955+ header_pages_reserved = request;
8956+ apply_header_reservation();
24613191 8957+}
8958+
7f9d2ee0 8959+static int toi_file_allocate_storage(int main_space_requested)
24613191 8960+{
4e97e4e9 8961+ int result = 0;
24613191 8962+
4e97e4e9 8963+ int extra_pages = DIV_ROUND_UP(main_space_requested *
8964+ (sizeof(unsigned long) + sizeof(int)), PAGE_SIZE);
8965+ int pages_to_get = main_space_requested + extra_pages +
7f9d2ee0 8966+ header_pages_reserved;
4e97e4e9 8967+ int blocks_to_get = pages_to_get - block_chain.size;
ad8f4a28 8968+
4e97e4e9 8969+ /* Only release_storage reduces the size */
8970+ if (blocks_to_get < 1)
8971+ return 0;
24613191 8972+
7f9d2ee0 8973+ result = populate_block_list();
8974+
8975+ if (result)
8976+ return result;
24613191 8977+
4e97e4e9 8978+ toi_message(TOI_WRITER, TOI_MEDIUM, 0,
8979+ "Finished with block_chain.size == %d.\n",
8980+ block_chain.size);
24613191 8981+
4e97e4e9 8982+ if (block_chain.size < pages_to_get) {
ad8f4a28
AM
8983+ printk("Block chain size (%d) < header pages (%d) + extra "
8984+ "pages (%d) + main pages (%d) (=%d pages).\n",
7f9d2ee0 8985+ block_chain.size, header_pages_reserved, extra_pages,
4e97e4e9 8986+ main_space_requested, pages_to_get);
8987+ result = -ENOSPC;
24613191 8988+ }
8989+
4e97e4e9 8990+ main_pages_requested = main_space_requested;
8991+ main_pages_allocated = main_space_requested + extra_pages;
4e97e4e9 8992+ return result;
24613191 8993+}
8994+
4e97e4e9 8995+static int toi_file_write_header_init(void)
24613191 8996+{
7f9d2ee0 8997+ int result;
24613191 8998+
7f9d2ee0 8999+ toi_bio_ops.rw_init(WRITE, 0);
4e97e4e9 9000+ toi_writer_buffer_posn = 0;
24613191 9001+
4e97e4e9 9002+ /* Info needed to bootstrap goes at the start of the header.
9003+ * First we save the basic info needed for reading, including the number
9004+ * of header pages. Then we save the structs containing data needed
9005+ * for reading the header pages back.
9006+ * Note that even if header pages take more than one page, when we
9007+ * read back the info, we will have restored the location of the
9008+ * next header page by the time we go to use it.
9009+ */
24613191 9010+
7f9d2ee0 9011+ result = toi_bio_ops.rw_header_chunk(WRITE, &toi_fileops,
ad8f4a28 9012+ (char *) &toi_writer_posn_save,
4e97e4e9 9013+ sizeof(toi_writer_posn_save));
9014+
7f9d2ee0 9015+ if (result)
9016+ return result;
9017+
9018+ result = toi_bio_ops.rw_header_chunk(WRITE, &toi_fileops,
4e97e4e9 9019+ (char *) &devinfo, sizeof(devinfo));
24613191 9020+
7f9d2ee0 9021+ if (result)
9022+ return result;
9023+
4e97e4e9 9024+ toi_serialise_extent_chain(&toi_fileops, &block_chain);
ad8f4a28 9025+
24613191 9026+ return 0;
9027+}
73c609d5 9028+
4e97e4e9 9029+static int toi_file_write_header_cleanup(void)
9030+{
9031+ struct toi_file_header *header;
7f9d2ee0 9032+ int result;
9033+ unsigned long sig_page = toi_get_zeroed_page(38, TOI_ATOMIC_GFP);
24613191 9034+
4e97e4e9 9035+ /* Write any unsaved data */
9036+ if (toi_writer_buffer_posn)
9037+ toi_bio_ops.write_header_chunk_finish();
24613191 9038+
4e97e4e9 9039+ toi_bio_ops.finish_all_io();
24613191 9040+
4e97e4e9 9041+ toi_extent_state_goto_start(&toi_writer_posn);
ad8f4a28 9042+ toi_bio_ops.forward_one_page(1);
24613191 9043+
4e97e4e9 9044+ /* Adjust image header */
7f9d2ee0 9045+ result = toi_bio_ops.bdev_page_io(READ, toi_file_target_bdev,
4e97e4e9 9046+ target_firstblock,
7f9d2ee0 9047+ virt_to_page(sig_page));
9048+ if (result)
9049+ goto out;
24613191 9050+
7f9d2ee0 9051+ header = (struct toi_file_header *) sig_page;
4e97e4e9 9052+
9053+ prepare_signature(header,
9054+ toi_writer_posn.current_offset <<
9055+ devinfo.bmap_shift);
ad8f4a28 9056+
7f9d2ee0 9057+ result = toi_bio_ops.bdev_page_io(WRITE, toi_file_target_bdev,
4e97e4e9 9058+ target_firstblock,
7f9d2ee0 9059+ virt_to_page(sig_page));
4e97e4e9 9060+
7f9d2ee0 9061+out:
4e97e4e9 9062+ toi_bio_ops.finish_all_io();
7f9d2ee0 9063+ toi_free_page(38, sig_page);
4e97e4e9 9064+
7f9d2ee0 9065+ return result;
4e97e4e9 9066+}
9067+
9068+/* HEADER READING */
9069+
24613191 9070+/*
4e97e4e9 9071+ * read_header_init()
ad8f4a28 9072+ *
4e97e4e9 9073+ * Description:
9074+ * 1. Attempt to read the device specified with resume=.
9075+ * 2. Check the contents of the header for our signature.
9076+ * 3. Warn, ignore, reset and/or continue as appropriate.
9077+ * 4. If continuing, read the toi_file configuration section
9078+ * of the header and set up block device info so we can read
9079+ * the rest of the header & image.
24613191 9080+ *
4e97e4e9 9081+ * Returns:
9082+ * May not return if user choose to reboot at a warning.
9083+ * -EINVAL if cannot resume at this time. Booting should continue
9084+ * normally.
24613191 9085+ */
9086+
4e97e4e9 9087+static int toi_file_read_header_init(void)
9088+{
9089+ int result;
9090+ struct block_device *tmp;
24613191 9091+
7f9d2ee0 9092+ toi_bio_ops.read_header_init();
9093+
9094+ /* Read toi_file configuration */
9095+ result = toi_bio_ops.bdev_page_io(READ, toi_file_target_bdev,
9096+ target_header_start,
9097+ virt_to_page((unsigned long) toi_writer_buffer));
ad8f4a28 9098+
4e97e4e9 9099+ if (result) {
9100+ printk("FileAllocator read header init: Failed to initialise "
9101+ "reading the first page of data.\n");
7f9d2ee0 9102+ toi_bio_ops.rw_cleanup(READ);
4e97e4e9 9103+ return result;
9104+ }
24613191 9105+
7f9d2ee0 9106+ memcpy(&toi_writer_posn_save, toi_writer_buffer,
4e97e4e9 9107+ sizeof(toi_writer_posn_save));
ad8f4a28 9108+
7f9d2ee0 9109+ toi_writer_buffer_posn = sizeof(toi_writer_posn_save);
24613191 9110+
4e97e4e9 9111+ tmp = devinfo.bdev;
24613191 9112+
4e97e4e9 9113+ memcpy(&devinfo,
9114+ toi_writer_buffer + toi_writer_buffer_posn,
9115+ sizeof(devinfo));
24613191 9116+
4e97e4e9 9117+ devinfo.bdev = tmp;
9118+ toi_writer_buffer_posn += sizeof(devinfo);
24613191 9119+
4e97e4e9 9120+ toi_extent_state_goto_start(&toi_writer_posn);
9121+ toi_bio_ops.set_extra_page_forward();
24613191 9122+
4e97e4e9 9123+ return toi_load_extent_chain(&block_chain);
9124+}
24613191 9125+
4e97e4e9 9126+static int toi_file_read_header_cleanup(void)
9127+{
9128+ toi_bio_ops.rw_cleanup(READ);
9129+ return 0;
9130+}
9131+
9132+static int toi_file_signature_op(int op)
9133+{
9134+ char *cur;
9135+ int result = 0, changed = 0;
9136+ struct toi_file_header *header;
ad8f4a28
AM
9137+
9138+ if (toi_file_target_bdev <= 0)
4e97e4e9 9139+ return -1;
24613191 9140+
4e97e4e9 9141+ cur = (char *) toi_get_zeroed_page(17, TOI_ATOMIC_GFP);
9142+ if (!cur) {
9143+ printk("Unable to allocate a page for reading the image "
9144+ "signature.\n");
9145+ return -ENOMEM;
9146+ }
24613191 9147+
7f9d2ee0 9148+ result = toi_bio_ops.bdev_page_io(READ, toi_file_target_bdev,
4e97e4e9 9149+ target_firstblock,
9150+ virt_to_page(cur));
24613191 9151+
7f9d2ee0 9152+ if (result)
9153+ goto out;
9154+
4e97e4e9 9155+ header = (struct toi_file_header *) cur;
9156+ result = parse_signature(header);
ad8f4a28 9157+
4e97e4e9 9158+ switch (op) {
ad8f4a28
AM
9159+ case INVALIDATE:
9160+ if (result == -1)
9161+ goto out;
24613191 9162+
7f9d2ee0 9163+ strcpy(header->sig, tuxonice_signature);
ad8f4a28 9164+ header->resumed_before = 0;
7f9d2ee0 9165+ header->have_image = 0;
ad8f4a28
AM
9166+ result = changed = 1;
9167+ break;
9168+ case MARK_RESUME_ATTEMPTED:
9169+ if (result == 1) {
9170+ header->resumed_before = 1;
9171+ changed = 1;
9172+ }
9173+ break;
9174+ case UNMARK_RESUME_ATTEMPTED:
9175+ if (result == 1) {
4e97e4e9 9176+ header->resumed_before = 0;
ad8f4a28
AM
9177+ changed = 1;
9178+ }
9179+ break;
24613191 9180+ }
9181+
7f9d2ee0 9182+ if (changed) {
9183+ int io_result = toi_bio_ops.bdev_page_io(WRITE,
9184+ toi_file_target_bdev, target_firstblock,
4e97e4e9 9185+ virt_to_page(cur));
7f9d2ee0 9186+ if (io_result)
9187+ result = io_result;
9188+ }
24613191 9189+
4e97e4e9 9190+out:
9191+ toi_bio_ops.finish_all_io();
ad8f4a28 9192+ toi_free_page(17, (unsigned long) cur);
4e97e4e9 9193+ return result;
24613191 9194+}
9195+
4e97e4e9 9196+/* Print debug info
24613191 9197+ *
4e97e4e9 9198+ * Description:
24613191 9199+ */
9200+
4e97e4e9 9201+static int toi_file_print_debug_stats(char *buffer, int size)
24613191 9202+{
4e97e4e9 9203+ int len = 0;
ad8f4a28 9204+
4e97e4e9 9205+ if (toiActiveAllocator != &toi_fileops) {
ad8f4a28
AM
9206+ len = snprintf_used(buffer, size,
9207+ "- FileAllocator inactive.\n");
4e97e4e9 9208+ return len;
e8d0ad9d 9209+ }
9210+
4e97e4e9 9211+ len = snprintf_used(buffer, size, "- FileAllocator active.\n");
24613191 9212+
ad8f4a28
AM
9213+ len += snprintf_used(buffer+len, size-len, " Storage available for "
9214+ "image: %ld pages.\n",
4e97e4e9 9215+ toi_file_storage_allocated());
24613191 9216+
4e97e4e9 9217+ return len;
24613191 9218+}
9219+
4e97e4e9 9220+/*
9221+ * Storage needed
9222+ *
9223+ * Returns amount of space in the image header required
9224+ * for the toi_file's data.
9225+ *
9226+ * We ensure the space is allocated, but actually save the
9227+ * data from write_header_init and therefore don't also define a
9228+ * save_config_info routine.
9229+ */
9230+static int toi_file_storage_needed(void)
24613191 9231+{
4e97e4e9 9232+ return sig_size + strlen(toi_file_target) + 1 +
7f9d2ee0 9233+ sizeof(toi_writer_posn_save) +
4e97e4e9 9234+ sizeof(devinfo) +
9235+ sizeof(struct extent_chain) - 2 * sizeof(void *) +
9236+ (2 * sizeof(unsigned long) * block_chain.num_extents);
24613191 9237+}
9238+
ad8f4a28 9239+/*
4e97e4e9 9240+ * toi_file_remove_image
ad8f4a28 9241+ *
4e97e4e9 9242+ */
9243+static int toi_file_remove_image(void)
9244+{
9245+ toi_file_release_storage();
9246+ return toi_file_signature_op(INVALIDATE);
9247+}
24613191 9248+
9249+/*
4e97e4e9 9250+ * Image_exists
9251+ *
24613191 9252+ */
9253+
7f9d2ee0 9254+static int toi_file_image_exists(int quiet)
24613191 9255+{
4e97e4e9 9256+ if (!toi_file_target_bdev)
9257+ reopen_resume_devt();
9258+
9259+ return toi_file_signature_op(GET_IMAGE_EXISTS);
24613191 9260+}
9261+
4e97e4e9 9262+/*
9263+ * Mark resume attempted.
24613191 9264+ *
4e97e4e9 9265+ * Record that we tried to resume from this image.
24613191 9266+ */
9267+
7f9d2ee0 9268+static int toi_file_mark_resume_attempted(int mark)
24613191 9269+{
7f9d2ee0 9270+ return toi_file_signature_op(mark ? MARK_RESUME_ATTEMPTED:
4e97e4e9 9271+ UNMARK_RESUME_ATTEMPTED);
9272+}
24613191 9273+
4e97e4e9 9274+static void toi_file_set_resume_param(void)
9275+{
9276+ char *buffer = (char *) toi_get_zeroed_page(18, TOI_ATOMIC_GFP);
9277+ char *buffer2 = (char *) toi_get_zeroed_page(19, TOI_ATOMIC_GFP);
9278+ unsigned long sector = bmap(target_inode, 0);
9279+ int offset = 0;
24613191 9280+
4e97e4e9 9281+ if (!buffer || !buffer2) {
9282+ if (buffer)
ad8f4a28 9283+ toi_free_page(18, (unsigned long) buffer);
4e97e4e9 9284+ if (buffer2)
ad8f4a28 9285+ toi_free_page(19, (unsigned long) buffer2);
4e97e4e9 9286+ printk("TuxOnIce: Failed to allocate memory while setting "
9287+ "resume= parameter.\n");
9288+ return;
24613191 9289+ }
9290+
4e97e4e9 9291+ if (toi_file_target_bdev) {
9292+ set_devinfo(toi_file_target_bdev, target_inode->i_blkbits);
24613191 9293+
4e97e4e9 9294+ bdevname(toi_file_target_bdev, buffer2);
ad8f4a28 9295+ offset += snprintf(buffer + offset, PAGE_SIZE - offset,
4e97e4e9 9296+ "/dev/%s", buffer2);
ad8f4a28 9297+
4e97e4e9 9298+ if (sector)
9299+ offset += snprintf(buffer + offset, PAGE_SIZE - offset,
9300+ ":0x%lx", sector << devinfo.bmap_shift);
9301+ } else
9302+ offset += snprintf(buffer + offset, PAGE_SIZE - offset,
9303+ "%s is not a valid target.", toi_file_target);
ad8f4a28 9304+
4e97e4e9 9305+ sprintf(resume_file, "file:%s", buffer);
24613191 9306+
ad8f4a28
AM
9307+ toi_free_page(18, (unsigned long) buffer);
9308+ toi_free_page(19, (unsigned long) buffer2);
4e97e4e9 9309+
9310+ toi_attempt_to_parse_resume_device(1);
24613191 9311+}
9312+
4e97e4e9 9313+static int __test_toi_file_target(char *target, int resume_time, int quiet)
24613191 9314+{
4e97e4e9 9315+ toi_file_get_target_info(target, 0, resume_time);
9316+ if (toi_file_signature_op(GET_IMAGE_EXISTS) > -1) {
9317+ if (!quiet)
ad8f4a28
AM
9318+ printk(KERN_INFO "TuxOnIce: FileAllocator: File "
9319+ "signature found.\n");
4e97e4e9 9320+ if (!resume_time)
9321+ toi_file_set_resume_param();
ad8f4a28 9322+
4e97e4e9 9323+ toi_bio_ops.set_devinfo(&devinfo);
9324+ toi_writer_posn.chains = &block_chain;
9325+ toi_writer_posn.num_chains = 1;
24613191 9326+
4e97e4e9 9327+ if (!resume_time)
9328+ set_toi_state(TOI_CAN_HIBERNATE);
9329+ return 0;
9330+ }
24613191 9331+
4e97e4e9 9332+ clear_toi_state(TOI_CAN_HIBERNATE);
24613191 9333+
4e97e4e9 9334+ if (quiet)
9335+ return 1;
24613191 9336+
4e97e4e9 9337+ if (*target)
ad8f4a28
AM
9338+ printk(KERN_INFO "TuxOnIce: FileAllocator: Sorry. No signature "
9339+ "found at %s.\n", target);
4e97e4e9 9340+ else
9341+ if (!resume_time)
ad8f4a28
AM
9342+ printk(KERN_INFO "TuxOnIce: FileAllocator: Sorry. "
9343+ "Target is not set for hibernating.\n");
24613191 9344+
4e97e4e9 9345+ return 1;
9346+}
24613191 9347+
4e97e4e9 9348+static void test_toi_file_target(void)
9349+{
9350+ setting_toi_file_target = 1;
ad8f4a28
AM
9351+
9352+ printk(KERN_INFO "TuxOnIce: Hibernating %sabled.\n",
4e97e4e9 9353+ __test_toi_file_target(toi_file_target, 0, 1) ?
9354+ "dis" : "en");
ad8f4a28 9355+
4e97e4e9 9356+ setting_toi_file_target = 0;
9357+}
24613191 9358+
4e97e4e9 9359+/*
9360+ * Parse Image Location
9361+ *
9362+ * Attempt to parse a resume= parameter.
9363+ * File Allocator accepts:
9364+ * resume=file:DEVNAME[:FIRSTBLOCK]
9365+ *
9366+ * Where:
9367+ * DEVNAME is convertable to a dev_t by name_to_dev_t
9368+ * FIRSTBLOCK is the location of the first block in the file.
ad8f4a28 9369+ * BLOCKSIZE is the logical blocksize >= SECTOR_SIZE & <= PAGE_SIZE,
4e97e4e9 9370+ * mod SECTOR_SIZE == 0 of the device.
9371+ * Data is validated by attempting to read a header from the
9372+ * location given. Failure will result in toi_file refusing to
9373+ * save an image, and a reboot with correct parameters will be
9374+ * necessary.
9375+ */
9376+
9377+static int toi_file_parse_sig_location(char *commandline,
9378+ int only_writer, int quiet)
9379+{
9380+ char *thischar, *devstart = NULL, *colon = NULL, *at_symbol = NULL;
9381+ int result = -EINVAL, target_blocksize = 0;
9382+
9383+ if (strncmp(commandline, "file:", 5)) {
9384+ if (!only_writer)
9385+ return 1;
9386+ } else
9387+ commandline += 5;
24613191 9388+
ad8f4a28 9389+ /*
4e97e4e9 9390+ * Don't check signature again if we're beginning a cycle. If we already
ad8f4a28
AM
9391+ * did the initialisation successfully, assume we'll be okay when it
9392+ * comes to resuming.
24613191 9393+ */
4e97e4e9 9394+ if (toi_file_target_bdev)
9395+ return 0;
ad8f4a28 9396+
4e97e4e9 9397+ devstart = thischar = commandline;
9398+ while ((*thischar != ':') && (*thischar != '@') &&
9399+ ((thischar - commandline) < 250) && (*thischar))
9400+ thischar++;
24613191 9401+
4e97e4e9 9402+ if (*thischar == ':') {
9403+ colon = thischar;
9404+ *colon = 0;
9405+ thischar++;
24613191 9406+ }
9407+
ad8f4a28
AM
9408+ while ((*thischar != '@') && ((thischar - commandline) < 250)
9409+ && (*thischar))
4e97e4e9 9410+ thischar++;
24613191 9411+
4e97e4e9 9412+ if (*thischar == '@') {
9413+ at_symbol = thischar;
9414+ *at_symbol = 0;
24613191 9415+ }
ad8f4a28
AM
9416+
9417+ /*
4e97e4e9 9418+ * For the toi_file, you can be able to resume, but not hibernate,
9419+ * because the resume= is set correctly, but the toi_file_target
ad8f4a28 9420+ * isn't.
4e97e4e9 9421+ *
9422+ * We may have come here as a result of setting resume or
9423+ * toi_file_target. We only test the toi_file target in the
9424+ * former case (it's already done in the later), and we do it before
9425+ * setting the block number ourselves. It will overwrite the values
9426+ * given on the command line if we don't.
24613191 9427+ */
24613191 9428+
4e97e4e9 9429+ if (!setting_toi_file_target)
9430+ __test_toi_file_target(toi_file_target, 1, 0);
24613191 9431+
4e97e4e9 9432+ if (colon)
9433+ target_firstblock = (int) simple_strtoul(colon + 1, NULL, 0);
9434+ else
9435+ target_firstblock = 0;
24613191 9436+
4e97e4e9 9437+ if (at_symbol) {
9438+ target_blocksize = (int) simple_strtoul(at_symbol + 1, NULL, 0);
9439+ if (target_blocksize & (SECTOR_SIZE - 1)) {
ad8f4a28
AM
9440+ printk(KERN_INFO "FileAllocator: Blocksizes are "
9441+ "multiples of %d.\n", SECTOR_SIZE);
4e97e4e9 9442+ result = -EINVAL;
9443+ goto out;
24613191 9444+ }
9445+ }
ad8f4a28 9446+
4e97e4e9 9447+ if (!quiet)
ad8f4a28
AM
9448+ printk(KERN_INFO "TuxOnIce FileAllocator: Testing whether you"
9449+ " can resume:\n");
24613191 9450+
4e97e4e9 9451+ toi_file_get_target_info(commandline, 0, 1);
24613191 9452+
4e97e4e9 9453+ if (!toi_file_target_bdev || IS_ERR(toi_file_target_bdev)) {
9454+ toi_file_target_bdev = NULL;
9455+ result = -1;
9456+ goto out;
24613191 9457+ }
9458+
4e97e4e9 9459+ if (target_blocksize)
9460+ set_devinfo(toi_file_target_bdev, ffs(target_blocksize));
24613191 9461+
4e97e4e9 9462+ result = __test_toi_file_target(commandline, 1, 0);
9463+
9464+out:
9465+ if (result)
9466+ clear_toi_state(TOI_CAN_HIBERNATE);
9467+
9468+ if (!quiet)
ad8f4a28 9469+ printk(KERN_INFO "Resuming %sabled.\n", result ? "dis" : "en");
4e97e4e9 9470+
9471+ if (colon)
9472+ *colon = ':';
9473+ if (at_symbol)
9474+ *at_symbol = '@';
9475+
9476+ return result;
24613191 9477+}
4e97e4e9 9478+
9479+/* toi_file_save_config_info
24613191 9480+ *
ad8f4a28
AM
9481+ * Description: Save the target's name, not for resume time, but for
9482+ * all_settings.
4e97e4e9 9483+ * Arguments: Buffer: Pointer to a buffer of size PAGE_SIZE.
9484+ * Returns: Number of bytes used for saving our data.
24613191 9485+ */
9486+
4e97e4e9 9487+static int toi_file_save_config_info(char *buffer)
9488+{
9489+ strcpy(buffer, toi_file_target);
9490+ return strlen(toi_file_target) + 1;
9491+}
24613191 9492+
4e97e4e9 9493+/* toi_file_load_config_info
24613191 9494+ *
4e97e4e9 9495+ * Description: Reload target's name.
9496+ * Arguments: Buffer: Pointer to the start of the data.
9497+ * Size: Number of bytes that were saved.
24613191 9498+ */
9499+
4e97e4e9 9500+static void toi_file_load_config_info(char *buffer, int size)
9501+{
9502+ strcpy(toi_file_target, buffer);
9503+}
24613191 9504+
4e97e4e9 9505+static int toi_file_initialise(int starting_cycle)
9506+{
9507+ if (starting_cycle) {
9508+ if (toiActiveAllocator != &toi_fileops)
9509+ return 0;
24613191 9510+
4e97e4e9 9511+ if (starting_cycle & SYSFS_HIBERNATE && !*toi_file_target) {
ad8f4a28 9512+ printk(KERN_INFO "FileAllocator is the active writer, "
4e97e4e9 9513+ "but no filename has been set.\n");
9514+ return 1;
9515+ }
9516+ }
9517+
7f9d2ee0 9518+ if (*toi_file_target)
4e97e4e9 9519+ toi_file_get_target_info(toi_file_target, starting_cycle, 0);
9520+
7f9d2ee0 9521+ if (starting_cycle && (toi_file_image_exists(1) == -1)) {
ad8f4a28
AM
9522+ printk("%s is does not have a valid signature for "
9523+ "hibernating.\n", toi_file_target);
4e97e4e9 9524+ return 1;
9525+ }
9526+
9527+ return 0;
9528+}
9529+
9530+static struct toi_sysfs_data sysfs_params[] = {
9531+
9532+ {
9533+ TOI_ATTR("target", SYSFS_RW),
9534+ SYSFS_STRING(toi_file_target, 256, SYSFS_NEEDS_SM_FOR_WRITE),
9535+ .write_side_effect = test_toi_file_target,
9536+ },
9537+
9538+ {
9539+ TOI_ATTR("enabled", SYSFS_RW),
9540+ SYSFS_INT(&toi_fileops.enabled, 0, 1, 0),
9541+ .write_side_effect = attempt_to_parse_resume_device2,
9542+ }
9543+};
9544+
9545+static struct toi_module_ops toi_fileops = {
9546+ .type = WRITER_MODULE,
9547+ .name = "file storage",
9548+ .directory = "file",
9549+ .module = THIS_MODULE,
9550+ .print_debug_info = toi_file_print_debug_stats,
9551+ .save_config_info = toi_file_save_config_info,
9552+ .load_config_info = toi_file_load_config_info,
9553+ .storage_needed = toi_file_storage_needed,
9554+ .initialise = toi_file_initialise,
9555+ .cleanup = toi_file_cleanup,
9556+
7f9d2ee0 9557+ .noresume_reset = toi_file_noresume_reset,
4e97e4e9 9558+ .storage_available = toi_file_storage_available,
9559+ .storage_allocated = toi_file_storage_allocated,
9560+ .release_storage = toi_file_release_storage,
7f9d2ee0 9561+ .reserve_header_space = toi_file_reserve_header_space,
4e97e4e9 9562+ .allocate_storage = toi_file_allocate_storage,
9563+ .image_exists = toi_file_image_exists,
9564+ .mark_resume_attempted = toi_file_mark_resume_attempted,
9565+ .write_header_init = toi_file_write_header_init,
9566+ .write_header_cleanup = toi_file_write_header_cleanup,
9567+ .read_header_init = toi_file_read_header_init,
9568+ .read_header_cleanup = toi_file_read_header_cleanup,
9569+ .remove_image = toi_file_remove_image,
9570+ .parse_sig_location = toi_file_parse_sig_location,
24613191 9571+
4e97e4e9 9572+ .sysfs_data = sysfs_params,
ad8f4a28
AM
9573+ .num_sysfs_entries = sizeof(sysfs_params) /
9574+ sizeof(struct toi_sysfs_data),
4e97e4e9 9575+};
24613191 9576+
4e97e4e9 9577+/* ---- Registration ---- */
9578+static __init int toi_file_load(void)
9579+{
9580+ toi_fileops.rw_init = toi_bio_ops.rw_init;
9581+ toi_fileops.rw_cleanup = toi_bio_ops.rw_cleanup;
9582+ toi_fileops.read_page = toi_bio_ops.read_page;
9583+ toi_fileops.write_page = toi_bio_ops.write_page;
9584+ toi_fileops.rw_header_chunk = toi_bio_ops.rw_header_chunk;
7f9d2ee0 9585+ toi_fileops.rw_header_chunk_noreadahead =
9586+ toi_bio_ops.rw_header_chunk_noreadahead;
9587+ toi_fileops.io_flusher = toi_bio_ops.io_flusher;
24613191 9588+
4e97e4e9 9589+ return toi_register_module(&toi_fileops);
9590+}
24613191 9591+
4e97e4e9 9592+#ifdef MODULE
9593+static __exit void toi_file_unload(void)
9594+{
9595+ toi_unregister_module(&toi_fileops);
9596+}
24613191 9597+
4e97e4e9 9598+module_init(toi_file_load);
9599+module_exit(toi_file_unload);
9600+MODULE_LICENSE("GPL");
9601+MODULE_AUTHOR("Nigel Cunningham");
9602+MODULE_DESCRIPTION("TuxOnIce FileAllocator");
9603+#else
9604+late_initcall(toi_file_load);
24613191 9605+#endif
4e97e4e9 9606diff --git a/kernel/power/tuxonice_highlevel.c b/kernel/power/tuxonice_highlevel.c
9607new file mode 100644
7f9d2ee0 9608index 0000000..f362017
4e97e4e9 9609--- /dev/null
9610+++ b/kernel/power/tuxonice_highlevel.c
7f9d2ee0 9611@@ -0,0 +1,1329 @@
24613191 9612+/*
4e97e4e9 9613+ * kernel/power/tuxonice_highlevel.c
9614+ */
9615+/** \mainpage TuxOnIce.
24613191 9616+ *
4e97e4e9 9617+ * TuxOnIce provides support for saving and restoring an image of
9618+ * system memory to an arbitrary storage device, either on the local computer,
9619+ * or across some network. The support is entirely OS based, so TuxOnIce
9620+ * works without requiring BIOS, APM or ACPI support. The vast majority of the
9621+ * code is also architecture independant, so it should be very easy to port
9622+ * the code to new architectures. TuxOnIce includes support for SMP, 4G HighMem
9623+ * and preemption. Initramfses and initrds are also supported.
24613191 9624+ *
4e97e4e9 9625+ * TuxOnIce uses a modular design, in which the method of storing the image is
9626+ * completely abstracted from the core code, as are transformations on the data
9627+ * such as compression and/or encryption (multiple 'modules' can be used to
9628+ * provide arbitrary combinations of functionality). The user interface is also
9629+ * modular, so that arbitrarily simple or complex interfaces can be used to
9630+ * provide anything from debugging information through to eye candy.
9631+ *
9632+ * \section Copyright
9633+ *
9634+ * TuxOnIce is released under the GPLv2.
9635+ *
9636+ * Copyright (C) 1998-2001 Gabor Kuti <seasons@fornax.hu><BR>
9637+ * Copyright (C) 1998,2001,2002 Pavel Machek <pavel@suse.cz><BR>
9638+ * Copyright (C) 2002-2003 Florent Chabaud <fchabaud@free.fr><BR>
9639+ * Copyright (C) 2002-2007 Nigel Cunningham (nigel at tuxonice net)<BR>
9640+ *
9641+ * \section Credits
9642+ *
9643+ * Nigel would like to thank the following people for their work:
9644+ *
9645+ * Bernard Blackham <bernard@blackham.com.au><BR>
9646+ * Web page & Wiki administration, some coding. A person without whom
9647+ * TuxOnIce would not be where it is.
9648+ *
9649+ * Michael Frank <mhf@linuxmail.org><BR>
9650+ * Extensive testing and help with improving stability. I was constantly
9651+ * amazed by the quality and quantity of Michael's help.
9652+ *
9653+ * Pavel Machek <pavel@ucw.cz><BR>
ad8f4a28
AM
9654+ * Modifications, defectiveness pointing, being with Gabor at the very
9655+ * beginning, suspend to swap space, stop all tasks. Port to 2.4.18-ac and
9656+ * 2.5.17. Even though Pavel and I disagree on the direction suspend to
9657+ * disk should take, I appreciate the valuable work he did in helping Gabor
9658+ * get the concept working.
4e97e4e9 9659+ *
9660+ * ..and of course the myriads of TuxOnIce users who have helped diagnose
9661+ * and fix bugs, made suggestions on how to improve the code, proofread
9662+ * documentation, and donated time and money.
9663+ *
9664+ * Thanks also to corporate sponsors:
9665+ *
9666+ * <B>Redhat.</B>Sometime employer from May 2006 (my fault, not Redhat's!).
9667+ *
9668+ * <B>Cyclades.com.</B> Nigel's employers from Dec 2004 until May 2006, who
9669+ * allowed him to work on TuxOnIce and PM related issues on company time.
9670+ *
ad8f4a28
AM
9671+ * <B>LinuxFund.org.</B> Sponsored Nigel's work on TuxOnIce for four months Oct
9672+ * 2003 to Jan 2004.
4e97e4e9 9673+ *
9674+ * <B>LAC Linux.</B> Donated P4 hardware that enabled development and ongoing
9675+ * maintenance of SMP and Highmem support.
9676+ *
ad8f4a28
AM
9677+ * <B>OSDL.</B> Provided access to various hardware configurations, make
9678+ * occasional small donations to the project.
24613191 9679+ */
9680+
24613191 9681+#include <linux/suspend.h>
4e97e4e9 9682+#include <linux/module.h>
9683+#include <linux/freezer.h>
9684+#include <linux/utsrelease.h>
9685+#include <linux/cpu.h>
9686+#include <linux/console.h>
ad8f4a28 9687+#include <linux/writeback.h>
4e97e4e9 9688+#include <asm/uaccess.h>
24613191 9689+
4e97e4e9 9690+#include "tuxonice_modules.h"
9691+#include "tuxonice_sysfs.h"
9692+#include "tuxonice_prepare_image.h"
9693+#include "tuxonice_io.h"
9694+#include "tuxonice_ui.h"
9695+#include "tuxonice_power_off.h"
9696+#include "tuxonice_storage.h"
9697+#include "tuxonice_checksum.h"
9698+#include "tuxonice_cluster.h"
9699+#include "tuxonice_builtin.h"
9700+#include "tuxonice_atomic_copy.h"
ad8f4a28 9701+#include "tuxonice_alloc.h"
24613191 9702+
4e97e4e9 9703+/*! Pageset metadata. */
9704+struct pagedir pagedir2 = {2};
24613191 9705+
4e97e4e9 9706+static int get_pmsem = 0, got_pmsem;
9707+static mm_segment_t oldfs;
9708+static atomic_t actions_running;
9709+static int block_dump_save;
ad8f4a28
AM
9710+static char pre_hibernate_command[256];
9711+static char post_hibernate_command[256];
24613191 9712+
7f9d2ee0 9713+char *tuxonice_signature = "\xed\xc3\x02\xe9\x98\x56\xe5\x0c";
9714+
4e97e4e9 9715+int toi_fail_num;
24613191 9716+
4e97e4e9 9717+int do_toi_step(int step);
24613191 9718+
ad8f4a28
AM
9719+unsigned long boot_kernel_data_buffer;
9720+
4e97e4e9 9721+/**
9722+ * toi_finish_anything - Cleanup after doing anything.
24613191 9723+ *
4e97e4e9 9724+ * @toi_or_resume: Whether finishing a cycle or attempt at resuming.
9725+ *
9726+ * This is our basic clean-up routine, matching start_anything below. We
9727+ * call cleanup routines, drop module references and restore process fs and
9728+ * cpus allowed masks, together with the global block_dump variable's value.
24613191 9729+ */
ad8f4a28 9730+void toi_finish_anything(int hibernate_or_resume)
24613191 9731+{
4e97e4e9 9732+ if (!atomic_dec_and_test(&actions_running))
24613191 9733+ return;
9734+
ad8f4a28 9735+ toi_cleanup_modules(hibernate_or_resume);
4e97e4e9 9736+ toi_put_modules();
9737+ set_fs(oldfs);
ad8f4a28 9738+ if (hibernate_or_resume) {
4e97e4e9 9739+ block_dump = block_dump_save;
9740+ set_cpus_allowed(current, CPU_MASK_ALL);
ad8f4a28
AM
9741+ toi_alloc_print_debug_stats();
9742+
9743+ if (hibernate_or_resume == SYSFS_HIBERNATE &&
9744+ strlen(post_hibernate_command))
9745+ toi_launch_userspace_program(post_hibernate_command,
9746+ 0, UMH_WAIT_PROC);
24613191 9747+ }
24613191 9748+}
9749+
4e97e4e9 9750+/**
9751+ * toi_start_anything - Basic initialisation for TuxOnIce.
24613191 9752+ *
4e97e4e9 9753+ * @toi_or_resume: Whether starting a cycle or attempt at resuming.
9754+ *
9755+ * Our basic initialisation routine. Take references on modules, use the
9756+ * kernel segment, recheck resume= if no active allocator is set, initialise
9757+ * modules, save and reset block_dump and ensure we're running on CPU0.
24613191 9758+ */
ad8f4a28 9759+int toi_start_anything(int hibernate_or_resume)
24613191 9760+{
4e97e4e9 9761+ if (atomic_add_return(1, &actions_running) != 1) {
ad8f4a28
AM
9762+ if (hibernate_or_resume) {
9763+ printk(KERN_INFO "Can't start a cycle when actions are "
4e97e4e9 9764+ "already running.\n");
9765+ atomic_dec(&actions_running);
9766+ return -EBUSY;
9767+ } else
9768+ return 0;
9769+ }
24613191 9770+
4e97e4e9 9771+ oldfs = get_fs();
9772+ set_fs(KERNEL_DS);
24613191 9773+
ad8f4a28
AM
9774+ if (hibernate_or_resume == SYSFS_HIBERNATE &&
9775+ strlen(pre_hibernate_command)) {
9776+ int result = toi_launch_userspace_program(pre_hibernate_command,
9777+ 0, UMH_WAIT_PROC);
9778+ if (result) {
9779+ printk("Pre-hibernate command '%s' returned %d. "
9780+ "Aborting.\n", pre_hibernate_command,
9781+ result);
9782+ goto out_err;
9783+ }
9784+ }
7f9d2ee0 9785+
ad8f4a28
AM
9786+ if (hibernate_or_resume == SYSFS_HIBERNATE)
9787+ toi_print_modules();
24613191 9788+
4e97e4e9 9789+ if (toi_get_modules()) {
9790+ printk("TuxOnIce: Get modules failed!\n");
9791+ goto out_err;
9792+ }
24613191 9793+
ad8f4a28 9794+ if (hibernate_or_resume) {
4e97e4e9 9795+ block_dump_save = block_dump;
9796+ block_dump = 0;
7f9d2ee0 9797+ set_cpus_allowed(current,
9798+ cpumask_of_cpu(first_cpu(cpu_online_map)));
24613191 9799+ }
9800+
ad8f4a28
AM
9801+ if (toi_initialise_modules_early(hibernate_or_resume))
9802+ goto out_err;
9803+
9804+ if (!toiActiveAllocator)
9805+ toi_attempt_to_parse_resume_device(!hibernate_or_resume);
9806+
9807+ if (toi_initialise_modules_late(hibernate_or_resume))
9808+ goto out_err;
9809+
24613191 9810+ return 0;
4e97e4e9 9811+
9812+out_err:
ad8f4a28 9813+ if (hibernate_or_resume)
4e97e4e9 9814+ block_dump_save = block_dump;
ad8f4a28 9815+ toi_finish_anything(hibernate_or_resume);
4e97e4e9 9816+ return -EBUSY;
24613191 9817+}
4e97e4e9 9818+
24613191 9819+/*
4e97e4e9 9820+ * Nosave page tracking.
24613191 9821+ *
4e97e4e9 9822+ * Here rather than in prepare_image because we want to do it once only at the
9823+ * start of a cycle.
9824+ */
4e97e4e9 9825+
9826+/**
9827+ * mark_nosave_pages - Set up our Nosave bitmap.
24613191 9828+ *
4e97e4e9 9829+ * Build a bitmap of Nosave pages from the list. The bitmap allows faster
9830+ * use when preparing the image.
24613191 9831+ */
4e97e4e9 9832+static void mark_nosave_pages(void)
9833+{
9834+ struct nosave_region *region;
24613191 9835+
4e97e4e9 9836+ list_for_each_entry(region, &nosave_regions, list) {
9837+ unsigned long pfn;
24613191 9838+
4e97e4e9 9839+ for (pfn = region->start_pfn; pfn < region->end_pfn; pfn++)
9840+ SetPageNosave(pfn_to_page(pfn));
9841+ }
9842+}
24613191 9843+
4e97e4e9 9844+/**
9845+ * allocate_bitmaps: Allocate bitmaps used to record page states.
9846+ *
9847+ * Allocate the bitmaps we use to record the various TuxOnIce related
9848+ * page states.
9849+ */
9850+static int allocate_bitmaps(void)
9851+{
9852+ if (allocate_dyn_pageflags(&pageset1_map, 0) ||
9853+ allocate_dyn_pageflags(&pageset1_copy_map, 0) ||
9854+ allocate_dyn_pageflags(&pageset2_map, 0) ||
9855+ allocate_dyn_pageflags(&io_map, 0) ||
9856+ allocate_dyn_pageflags(&nosave_map, 0) ||
9857+ allocate_dyn_pageflags(&free_map, 0) ||
9858+ allocate_dyn_pageflags(&page_resave_map, 0))
9859+ return 1;
24613191 9860+
4e97e4e9 9861+ return 0;
9862+}
24613191 9863+
4e97e4e9 9864+/**
9865+ * free_bitmaps: Free the bitmaps used to record page states.
9866+ *
9867+ * Free the bitmaps allocated above. It is not an error to call
9868+ * free_dyn_pageflags on a bitmap that isn't currentyl allocated.
9869+ */
9870+static void free_bitmaps(void)
9871+{
9872+ free_dyn_pageflags(&pageset1_map);
9873+ free_dyn_pageflags(&pageset1_copy_map);
9874+ free_dyn_pageflags(&pageset2_map);
9875+ free_dyn_pageflags(&io_map);
9876+ free_dyn_pageflags(&nosave_map);
9877+ free_dyn_pageflags(&free_map);
9878+ free_dyn_pageflags(&page_resave_map);
9879+}
24613191 9880+
4e97e4e9 9881+/**
9882+ * io_MB_per_second: Return the number of MB/s read or written.
9883+ *
9884+ * @write: Whether to return the speed at which we wrote.
9885+ *
9886+ * Calculate the number of megabytes per second that were read or written.
9887+ */
9888+static int io_MB_per_second(int write)
9889+{
ad8f4a28
AM
9890+ return (toi_bkd.toi_io_time[write][1]) ?
9891+ MB((unsigned long) toi_bkd.toi_io_time[write][0]) * HZ /
9892+ toi_bkd.toi_io_time[write][1] : 0;
4e97e4e9 9893+}
24613191 9894+
4e97e4e9 9895+/**
9896+ * get_debug_info: Fill a buffer with debugging information.
9897+ *
9898+ * @buffer: The buffer to be filled.
9899+ * @count: The size of the buffer, in bytes.
9900+ *
9901+ * Fill a (usually PAGE_SIZEd) buffer with the debugging info that we will
9902+ * either printk or return via sysfs.
9903+ */
9904+#define SNPRINTF(a...) len += snprintf_used(((char *)buffer) + len, \
9905+ count - len - 1, ## a)
9906+static int get_toi_debug_info(const char *buffer, int count)
9907+{
9908+ int len = 0;
24613191 9909+
4e97e4e9 9910+ SNPRINTF("TuxOnIce debugging info:\n");
9911+ SNPRINTF("- TuxOnIce core : " TOI_CORE_VERSION "\n");
9912+ SNPRINTF("- Kernel Version : " UTS_RELEASE "\n");
9913+ SNPRINTF("- Compiler vers. : %d.%d\n", __GNUC__, __GNUC_MINOR__);
9914+ SNPRINTF("- Attempt number : %d\n", nr_hibernates);
9915+ SNPRINTF("- Parameters : %ld %ld %ld %d %d %ld\n",
9916+ toi_result,
ad8f4a28
AM
9917+ toi_bkd.toi_action,
9918+ toi_bkd.toi_debug_state,
9919+ toi_bkd.toi_default_console_level,
4e97e4e9 9920+ image_size_limit,
9921+ toi_poweroff_method);
9922+ SNPRINTF("- Overall expected compression percentage: %d.\n",
9923+ 100 - toi_expected_compression_ratio());
ad8f4a28 9924+ len += toi_print_module_debug_info(((char *) buffer) + len,
4e97e4e9 9925+ count - len - 1);
ad8f4a28 9926+ if (toi_bkd.toi_io_time[0][1]) {
4e97e4e9 9927+ if ((io_MB_per_second(0) < 5) || (io_MB_per_second(1) < 5)) {
9928+ SNPRINTF("- I/O speed: Write %d KB/s",
ad8f4a28
AM
9929+ (KB((unsigned long) toi_bkd.toi_io_time[0][0]) * HZ /
9930+ toi_bkd.toi_io_time[0][1]));
9931+ if (toi_bkd.toi_io_time[1][1])
4e97e4e9 9932+ SNPRINTF(", Read %d KB/s",
ad8f4a28
AM
9933+ (KB((unsigned long)
9934+ toi_bkd.toi_io_time[1][0]) * HZ /
9935+ toi_bkd.toi_io_time[1][1]));
4e97e4e9 9936+ } else {
9937+ SNPRINTF("- I/O speed: Write %d MB/s",
ad8f4a28
AM
9938+ (MB((unsigned long) toi_bkd.toi_io_time[0][0]) * HZ /
9939+ toi_bkd.toi_io_time[0][1]));
9940+ if (toi_bkd.toi_io_time[1][1])
4e97e4e9 9941+ SNPRINTF(", Read %d MB/s",
ad8f4a28
AM
9942+ (MB((unsigned long)
9943+ toi_bkd.toi_io_time[1][0]) * HZ /
9944+ toi_bkd.toi_io_time[1][1]));
4e97e4e9 9945+ }
9946+ SNPRINTF(".\n");
ad8f4a28 9947+ } else
4e97e4e9 9948+ SNPRINTF("- No I/O speed stats available.\n");
7f9d2ee0 9949+ SNPRINTF("- Extra pages : %ld used/%ld.\n",
4e97e4e9 9950+ extra_pd1_pages_used, extra_pd1_pages_allowance);
43540741 9951+
4e97e4e9 9952+ return len;
9953+}
43540741 9954+
4e97e4e9 9955+/**
9956+ * do_cleanup: Cleanup after attempting to hibernate or resume.
9957+ *
9958+ * @get_debug_info: Whether to allocate and return debugging info.
9959+ *
9960+ * Cleanup after attempting to hibernate or resume, possibly getting
9961+ * debugging info as we do so.
24613191 9962+ */
4e97e4e9 9963+static void do_cleanup(int get_debug_info)
9964+{
9965+ int i = 0;
9966+ char *buffer = NULL;
24613191 9967+
4e97e4e9 9968+ if (get_debug_info)
9969+ toi_prepare_status(DONT_CLEAR_BAR, "Cleaning up...");
9970+ relink_lru_lists();
43540741 9971+
4e97e4e9 9972+ free_checksum_pages();
43540741 9973+
4e97e4e9 9974+ if (get_debug_info)
9975+ buffer = (char *) toi_get_zeroed_page(20, TOI_ATOMIC_GFP);
24613191 9976+
4e97e4e9 9977+ if (buffer)
9978+ i = get_toi_debug_info(buffer, PAGE_SIZE);
9979+
9980+ toi_free_extra_pagedir_memory();
9981+
9982+ pagedir1.size = pagedir2.size = 0;
9983+ set_highmem_size(pagedir1, 0);
9984+ set_highmem_size(pagedir2, 0);
9985+
ad8f4a28
AM
9986+ if (boot_kernel_data_buffer) {
9987+ toi_free_page(37, boot_kernel_data_buffer);
9988+ boot_kernel_data_buffer = 0;
9989+ }
9990+
4e97e4e9 9991+ if (test_toi_state(TOI_NOTIFIERS_PREPARE)) {
9992+ pm_notifier_call_chain(PM_POST_HIBERNATION);
9993+ clear_toi_state(TOI_NOTIFIERS_PREPARE);
9994+ }
9995+
9996+ thaw_processes();
9997+
9998+#ifdef CONFIG_TOI_KEEP_IMAGE
9999+ if (test_action_state(TOI_KEEP_IMAGE) &&
10000+ !test_result_state(TOI_ABORTED)) {
10001+ toi_message(TOI_ANY_SECTION, TOI_LOW, 1,
10002+ "TuxOnIce: Not invalidating the image due "
10003+ "to Keep Image being enabled.\n");
10004+ set_result_state(TOI_KEPT_IMAGE);
10005+ } else
24613191 10006+#endif
4e97e4e9 10007+ if (toiActiveAllocator)
10008+ toiActiveAllocator->remove_image();
24613191 10009+
4e97e4e9 10010+ free_bitmaps();
24613191 10011+
4e97e4e9 10012+ if (buffer && i) {
10013+ /* Printk can only handle 1023 bytes, including
10014+ * its level mangling. */
10015+ for (i = 0; i < 3; i++)
10016+ printk("%s", buffer + (1023 * i));
ad8f4a28 10017+ toi_free_page(20, (unsigned long) buffer);
4e97e4e9 10018+ }
24613191 10019+
4e97e4e9 10020+ if (!test_action_state(TOI_LATE_CPU_HOTPLUG))
10021+ enable_nonboot_cpus();
10022+ toi_cleanup_console();
24613191 10023+
4e97e4e9 10024+ free_attention_list();
43540741 10025+
4e97e4e9 10026+ toi_deactivate_storage(0);
43540741 10027+
4e97e4e9 10028+ clear_toi_state(TOI_IGNORE_LOGLEVEL);
10029+ clear_toi_state(TOI_TRYING_TO_RESUME);
10030+ clear_toi_state(TOI_NOW_RESUMING);
10031+
10032+ if (got_pmsem) {
10033+ mutex_unlock(&pm_mutex);
10034+ got_pmsem = 0;
10035+ }
43540741 10036+}
10037+
4e97e4e9 10038+/**
10039+ * check_still_keeping_image: We kept an image; check whether to reuse it.
10040+ *
10041+ * We enter this routine when we have kept an image. If the user has said they
10042+ * want to still keep it, all we need to do is powerdown. If powering down
10043+ * means hibernating to ram and the power doesn't run out, we'll return 1.
10044+ * If we do power off properly or the battery runs out, we'll resume via the
10045+ * normal paths.
10046+ *
10047+ * If the user has said they want to remove the previously kept image, we
10048+ * remove it, and return 0. We'll then store a new image.
24613191 10049+ */
4e97e4e9 10050+static int check_still_keeping_image(void)
24613191 10051+{
4e97e4e9 10052+ if (test_action_state(TOI_KEEP_IMAGE)) {
10053+ printk("Image already stored: powering down immediately.");
10054+ do_toi_step(STEP_HIBERNATE_POWERDOWN);
10055+ return 1; /* Just in case we're using S3 */
24613191 10056+ }
10057+
4e97e4e9 10058+ printk("Invalidating previous image.\n");
10059+ toiActiveAllocator->remove_image();
e8d0ad9d 10060+
4e97e4e9 10061+ return 0;
10062+}
24613191 10063+
4e97e4e9 10064+/**
10065+ * toi_init: Prepare to hibernate to disk.
10066+ *
10067+ * Initialise variables & data structures, in preparation for
10068+ * hibernating to disk.
10069+ */
10070+static int toi_init(void)
10071+{
10072+ int result;
24613191 10073+
4e97e4e9 10074+ toi_result = 0;
e8d0ad9d 10075+
4e97e4e9 10076+ printk(KERN_INFO "Initiating a hibernation cycle.\n");
e8d0ad9d 10077+
4e97e4e9 10078+ nr_hibernates++;
e8d0ad9d 10079+
ad8f4a28
AM
10080+ toi_bkd.toi_io_time[0][0] = toi_bkd.toi_io_time[0][1] =
10081+ toi_bkd.toi_io_time[1][0] = toi_bkd.toi_io_time[1][1] = 0;
e8d0ad9d 10082+
4e97e4e9 10083+ if (!test_toi_state(TOI_CAN_HIBERNATE) ||
10084+ allocate_bitmaps())
10085+ return 1;
e8d0ad9d 10086+
4e97e4e9 10087+ mark_nosave_pages();
e8d0ad9d 10088+
4e97e4e9 10089+ toi_prepare_console();
e8d0ad9d 10090+
4e97e4e9 10091+ result = pm_notifier_call_chain(PM_HIBERNATION_PREPARE);
10092+ if (result) {
10093+ set_result_state(TOI_NOTIFIERS_PREPARE_FAILED);
10094+ return 1;
e8d0ad9d 10095+ }
4e97e4e9 10096+ set_toi_state(TOI_NOTIFIERS_PREPARE);
24613191 10097+
ad8f4a28
AM
10098+ boot_kernel_data_buffer = toi_get_zeroed_page(37, TOI_ATOMIC_GFP);
10099+ if (!boot_kernel_data_buffer) {
7f9d2ee0 10100+ printk(KERN_ERR "TuxOnIce: Failed to allocate "
10101+ "boot_kernel_data_buffer.\n");
ad8f4a28
AM
10102+ set_result_state(TOI_OUT_OF_MEMORY);
10103+ return 1;
10104+ }
10105+
4e97e4e9 10106+ if (test_action_state(TOI_LATE_CPU_HOTPLUG) ||
10107+ !disable_nonboot_cpus())
10108+ return 1;
73c609d5 10109+
4e97e4e9 10110+ set_abort_result(TOI_CPU_HOTPLUG_FAILED);
10111+ return 0;
43540741 10112+}
10113+
4e97e4e9 10114+/**
10115+ * can_hibernate: Perform basic 'Can we hibernate?' tests.
24613191 10116+ *
4e97e4e9 10117+ * Perform basic tests that must pass if we're going to be able to hibernate:
10118+ * Can we get the pm_mutex? Is resume= valid (we need to know where to write
10119+ * the image header).
24613191 10120+ */
4e97e4e9 10121+static int can_hibernate(void)
10122+{
10123+ if (get_pmsem) {
10124+ if (!mutex_trylock(&pm_mutex)) {
ad8f4a28
AM
10125+ printk(KERN_INFO "TuxOnIce: Failed to obtain "
10126+ "pm_mutex.\n");
4e97e4e9 10127+ dump_stack();
10128+ set_abort_result(TOI_PM_SEM);
10129+ return 0;
10130+ }
10131+ got_pmsem = 1;
10132+ }
24613191 10133+
4e97e4e9 10134+ if (!test_toi_state(TOI_CAN_HIBERNATE))
10135+ toi_attempt_to_parse_resume_device(0);
24613191 10136+
4e97e4e9 10137+ if (!test_toi_state(TOI_CAN_HIBERNATE)) {
ad8f4a28 10138+ printk(KERN_INFO "TuxOnIce: Hibernation is disabled.\n"
4e97e4e9 10139+ "This may be because you haven't put something along "
10140+ "the lines of\n\nresume=swap:/dev/hda1\n\n"
10141+ "in lilo.conf or equivalent. (Where /dev/hda1 is your "
10142+ "swap partition).\n");
10143+ set_abort_result(TOI_CANT_SUSPEND);
10144+ if (!got_pmsem) {
10145+ mutex_unlock(&pm_mutex);
10146+ got_pmsem = 0;
10147+ }
10148+ return 0;
10149+ }
24613191 10150+
4e97e4e9 10151+ return 1;
10152+}
e8d0ad9d 10153+
4e97e4e9 10154+/**
10155+ * do_post_image_write: Having written an image, figure out what to do next.
10156+ *
10157+ * After writing an image, we might load an alternate image or power down.
10158+ * Powering down might involve hibernating to ram, in which case we also
10159+ * need to handle reloading pageset2.
e8d0ad9d 10160+ */
4e97e4e9 10161+static int do_post_image_write(void)
10162+{
10163+ /* If switching images fails, do normal powerdown */
10164+ if (alt_resume_param[0])
10165+ do_toi_step(STEP_RESUME_ALT_IMAGE);
24613191 10166+
4e97e4e9 10167+ toi_cond_pause(1, "About to power down or reboot.");
10168+ toi_power_down();
24613191 10169+
4e97e4e9 10170+ /* If we return, it's because we hibernated to ram */
10171+ if (read_pageset2(1))
10172+ panic("Attempt to reload pagedir 2 failed. Try rebooting.");
24613191 10173+
4e97e4e9 10174+ barrier();
10175+ mb();
10176+ do_cleanup(1);
10177+ return 0;
10178+}
24613191 10179+
4e97e4e9 10180+/**
10181+ * __save_image: Do the hard work of saving the image.
e8d0ad9d 10182+ *
4e97e4e9 10183+ * High level routine for getting the image saved. The key assumptions made
10184+ * are that processes have been frozen and sufficient memory is available.
10185+ *
10186+ * We also exit through here at resume time, coming back from toi_hibernate
10187+ * after the atomic restore. This is the reason for the toi_in_hibernate
10188+ * test.
24613191 10189+ */
4e97e4e9 10190+static int __save_image(void)
24613191 10191+{
4e97e4e9 10192+ int temp_result, did_copy = 0;
24613191 10193+
4e97e4e9 10194+ toi_prepare_status(DONT_CLEAR_BAR, "Starting to save the image..");
24613191 10195+
4e97e4e9 10196+ toi_message(TOI_ANY_SECTION, TOI_LOW, 1,
10197+ " - Final values: %d and %d.\n",
10198+ pagedir1.size, pagedir2.size);
e8d0ad9d 10199+
4e97e4e9 10200+ toi_cond_pause(1, "About to write pagedir2.");
24613191 10201+
4e97e4e9 10202+ temp_result = write_pageset(&pagedir2);
24613191 10203+
4e97e4e9 10204+ if (temp_result == -1 || test_result_state(TOI_ABORTED))
10205+ return 1;
24613191 10206+
4e97e4e9 10207+ toi_cond_pause(1, "About to copy pageset 1.");
24613191 10208+
4e97e4e9 10209+ if (test_result_state(TOI_ABORTED))
10210+ return 1;
24613191 10211+
4e97e4e9 10212+ toi_deactivate_storage(1);
e8d0ad9d 10213+
4e97e4e9 10214+ toi_prepare_status(DONT_CLEAR_BAR, "Doing atomic copy.");
e8d0ad9d 10215+
4e97e4e9 10216+ toi_in_hibernate = 1;
24613191 10217+
4e97e4e9 10218+ if (toi_go_atomic(PMSG_FREEZE, 1))
10219+ goto Failed;
24613191 10220+
4e97e4e9 10221+ temp_result = toi_hibernate();
10222+ if (!temp_result)
10223+ did_copy = 1;
24613191 10224+
4e97e4e9 10225+ /* We return here at resume time too! */
10226+ toi_end_atomic(ATOMIC_ALL_STEPS, toi_in_hibernate);
24613191 10227+
4e97e4e9 10228+Failed:
10229+ if (toi_activate_storage(1))
10230+ panic("Failed to reactivate our storage.");
24613191 10231+
4e97e4e9 10232+ /* Resume time? */
10233+ if (!toi_in_hibernate) {
10234+ copyback_post();
10235+ return 0;
10236+ }
24613191 10237+
4e97e4e9 10238+ /* Nope. Hibernating. So, see if we can save the image... */
24613191 10239+
4e97e4e9 10240+ if (temp_result || test_result_state(TOI_ABORTED)) {
10241+ if (did_copy)
10242+ goto abort_reloading_pagedir_two;
10243+ else
10244+ return 1;
24613191 10245+ }
10246+
4e97e4e9 10247+ toi_update_status(pagedir2.size,
10248+ pagedir1.size + pagedir2.size,
10249+ NULL);
e8d0ad9d 10250+
4e97e4e9 10251+ if (test_result_state(TOI_ABORTED))
10252+ goto abort_reloading_pagedir_two;
e8d0ad9d 10253+
4e97e4e9 10254+ toi_cond_pause(1, "About to write pageset1.");
e8d0ad9d 10255+
4e97e4e9 10256+ toi_message(TOI_ANY_SECTION, TOI_LOW, 1,
10257+ "-- Writing pageset1\n");
e8d0ad9d 10258+
4e97e4e9 10259+ temp_result = write_pageset(&pagedir1);
e8d0ad9d 10260+
4e97e4e9 10261+ /* We didn't overwrite any memory, so no reread needs to be done. */
10262+ if (test_action_state(TOI_TEST_FILTER_SPEED))
10263+ return 1;
e8d0ad9d 10264+
4e97e4e9 10265+ if (temp_result == 1 || test_result_state(TOI_ABORTED))
10266+ goto abort_reloading_pagedir_two;
e8d0ad9d 10267+
4e97e4e9 10268+ toi_cond_pause(1, "About to write header.");
e8d0ad9d 10269+
4e97e4e9 10270+ if (test_result_state(TOI_ABORTED))
10271+ goto abort_reloading_pagedir_two;
e8d0ad9d 10272+
4e97e4e9 10273+ temp_result = write_image_header();
e8d0ad9d 10274+
4e97e4e9 10275+ if (test_action_state(TOI_TEST_BIO))
10276+ return 1;
24613191 10277+
4e97e4e9 10278+ if (!temp_result && !test_result_state(TOI_ABORTED))
10279+ return 0;
24613191 10280+
4e97e4e9 10281+abort_reloading_pagedir_two:
10282+ temp_result = read_pageset2(1);
24613191 10283+
4e97e4e9 10284+ /* If that failed, we're sunk. Panic! */
10285+ if (temp_result)
10286+ panic("Attempt to reload pagedir 2 while aborting "
10287+ "a hibernate failed.");
10288+
10289+ return 1;
24613191 10290+}
10291+
4e97e4e9 10292+/**
10293+ * do_save_image: Save the image and handle the result.
e8d0ad9d 10294+ *
4e97e4e9 10295+ * Save the prepared image. If we fail or we're in the path returning
10296+ * from the atomic restore, cleanup.
e8d0ad9d 10297+ */
24613191 10298+
4e97e4e9 10299+static int do_save_image(void)
24613191 10300+{
4e97e4e9 10301+ int result = __save_image();
10302+ if (!toi_in_hibernate || result)
10303+ do_cleanup(1);
10304+ return result;
24613191 10305+}
10306+
24613191 10307+
4e97e4e9 10308+/**
10309+ * do_prepare_image: Try to prepare an image.
10310+ *
10311+ * Seek to initialise and prepare an image to be saved. On failure,
10312+ * cleanup.
10313+ */
24613191 10314+
4e97e4e9 10315+static int do_prepare_image(void)
10316+{
10317+ if (toi_activate_storage(0))
10318+ return 1;
24613191 10319+
4e97e4e9 10320+ /*
10321+ * If kept image and still keeping image and hibernating to RAM, we will
10322+ * return 1 after hibernating and resuming (provided the power doesn't
10323+ * run out. In that case, we skip directly to cleaning up and exiting.
10324+ */
10325+
10326+ if (!can_hibernate() ||
10327+ (test_result_state(TOI_KEPT_IMAGE) &&
10328+ check_still_keeping_image()))
10329+ goto cleanup;
10330+
10331+ if (toi_init() && !toi_prepare_image() &&
10332+ !test_result_state(TOI_ABORTED))
10333+ return 0;
10334+
10335+cleanup:
10336+ do_cleanup(0);
10337+ return 1;
24613191 10338+}
10339+
4e97e4e9 10340+/**
10341+ * do_check_can_resume: Find out whether an image has been stored.
e8d0ad9d 10342+ *
4e97e4e9 10343+ * Read whether an image exists. We use the same routine as the
10344+ * image_exists sysfs entry, and just look to see whether the
10345+ * first character in the resulting buffer is a '1'.
e8d0ad9d 10346+ */
ad8f4a28 10347+int do_check_can_resume(void)
24613191 10348+{
4e97e4e9 10349+ char *buf = (char *) toi_get_zeroed_page(21, TOI_ATOMIC_GFP);
10350+ int result = 0;
10351+
10352+ if (!buf)
10353+ return 0;
10354+
10355+ /* Only interested in first byte, so throw away return code. */
10356+ image_exists_read(buf, PAGE_SIZE);
10357+
10358+ if (buf[0] == '1')
10359+ result = 1;
10360+
ad8f4a28 10361+ toi_free_page(21, (unsigned long) buf);
4e97e4e9 10362+ return result;
24613191 10363+}
10364+
4e97e4e9 10365+/**
10366+ * do_load_atomic_copy: Load the first part of an image, if it exists.
10367+ *
10368+ * Check whether we have an image. If one exists, do sanity checking
10369+ * (possibly invalidating the image or even rebooting if the user
10370+ * requests that) before loading it into memory in preparation for the
10371+ * atomic restore.
10372+ *
10373+ * If and only if we have an image loaded and ready to restore, we return 1.
10374+ */
10375+static int do_load_atomic_copy(void)
24613191 10376+{
4e97e4e9 10377+ int read_image_result = 0;
10378+
10379+ if (sizeof(swp_entry_t) != sizeof(long)) {
10380+ printk(KERN_WARNING "TuxOnIce: The size of swp_entry_t != size"
10381+ " of long. Please report this!\n");
10382+ return 1;
10383+ }
24613191 10384+
4e97e4e9 10385+ if (!resume_file[0])
10386+ printk(KERN_WARNING "TuxOnIce: "
10387+ "You need to use a resume= command line parameter to "
10388+ "tell TuxOnIce where to look for an image.\n");
e8d0ad9d 10389+
4e97e4e9 10390+ toi_activate_storage(0);
24613191 10391+
4e97e4e9 10392+ if (!(test_toi_state(TOI_RESUME_DEVICE_OK)) &&
10393+ !toi_attempt_to_parse_resume_device(0)) {
10394+ /*
10395+ * Without a usable storage device we can do nothing -
10396+ * even if noresume is given
10397+ */
24613191 10398+
4e97e4e9 10399+ if (!toiNumAllocators)
10400+ printk(KERN_ALERT "TuxOnIce: "
10401+ "No storage allocators have been registered.\n");
10402+ else
10403+ printk(KERN_ALERT "TuxOnIce: "
10404+ "Missing or invalid storage location "
10405+ "(resume= parameter). Please correct and "
10406+ "rerun lilo (or equivalent) before "
10407+ "hibernating.\n");
10408+ toi_deactivate_storage(0);
10409+ return 1;
10410+ }
24613191 10411+
4e97e4e9 10412+ read_image_result = read_pageset1(); /* non fatal error ignored */
24613191 10413+
4e97e4e9 10414+ if (test_toi_state(TOI_NORESUME_SPECIFIED))
10415+ clear_toi_state(TOI_NORESUME_SPECIFIED);
24613191 10416+
4e97e4e9 10417+ toi_deactivate_storage(0);
24613191 10418+
4e97e4e9 10419+ if (read_image_result)
10420+ return 1;
10421+
10422+ return 0;
24613191 10423+}
10424+
4e97e4e9 10425+/**
10426+ * prepare_restore_load_alt_image: Save & restore alt image variables.
24613191 10427+ *
4e97e4e9 10428+ * Save and restore the pageset1 maps, when loading an alternate image.
24613191 10429+ */
4e97e4e9 10430+static void prepare_restore_load_alt_image(int prepare)
24613191 10431+{
4e97e4e9 10432+ static struct dyn_pageflags pageset1_map_save, pageset1_copy_map_save;
24613191 10433+
4e97e4e9 10434+ if (prepare) {
10435+ memcpy(&pageset1_map_save, &pageset1_map,
10436+ sizeof(struct dyn_pageflags));
10437+ pageset1_map.bitmap = NULL;
10438+ pageset1_map.sparse = 0;
10439+ pageset1_map.initialised = 0;
10440+ memcpy(&pageset1_copy_map_save, &pageset1_copy_map,
10441+ sizeof(struct dyn_pageflags));
10442+ pageset1_copy_map.bitmap = NULL;
10443+ pageset1_copy_map.sparse = 0;
10444+ pageset1_copy_map.initialised = 0;
10445+ set_toi_state(TOI_LOADING_ALT_IMAGE);
10446+ toi_reset_alt_image_pageset2_pfn();
10447+ } else {
10448+ if (pageset1_map.bitmap)
10449+ free_dyn_pageflags(&pageset1_map);
10450+ memcpy(&pageset1_map, &pageset1_map_save,
10451+ sizeof(struct dyn_pageflags));
10452+ if (pageset1_copy_map.bitmap)
10453+ free_dyn_pageflags(&pageset1_copy_map);
10454+ memcpy(&pageset1_copy_map, &pageset1_copy_map_save,
10455+ sizeof(struct dyn_pageflags));
10456+ clear_toi_state(TOI_NOW_RESUMING);
10457+ clear_toi_state(TOI_LOADING_ALT_IMAGE);
10458+ }
10459+}
24613191 10460+
4e97e4e9 10461+/**
10462+ * pre_resume_freeze: Freeze the system, before doing an atomic restore.
10463+ *
10464+ * Hot unplug cpus (if we didn't do it early) and freeze processes, in
10465+ * preparation for doing an atomic restore.
10466+ */
10467+int pre_resume_freeze(void)
10468+{
10469+ if (!test_action_state(TOI_LATE_CPU_HOTPLUG)) {
ad8f4a28 10470+ toi_prepare_status(DONT_CLEAR_BAR, "Disable nonboot cpus.");
4e97e4e9 10471+ if (disable_nonboot_cpus()) {
10472+ set_abort_result(TOI_CPU_HOTPLUG_FAILED);
10473+ return 1;
10474+ }
10475+ }
24613191 10476+
4e97e4e9 10477+ toi_prepare_status(DONT_CLEAR_BAR, "Freeze processes.");
24613191 10478+
4e97e4e9 10479+ if (freeze_processes()) {
10480+ printk("Some processes failed to stop.\n");
10481+ return 1;
24613191 10482+ }
4e97e4e9 10483+
10484+ return 0;
24613191 10485+}
10486+
4e97e4e9 10487+/**
10488+ * do_toi_step: Perform a step in hibernating or resuming.
10489+ *
10490+ * Perform a step in hibernating or resuming an image. This abstraction
10491+ * is in preparation for implementing cluster support, and perhaps replacing
10492+ * uswsusp too (haven't looked whether that's possible yet).
24613191 10493+ */
4e97e4e9 10494+int do_toi_step(int step)
24613191 10495+{
4e97e4e9 10496+ switch (step) {
ad8f4a28
AM
10497+ case STEP_HIBERNATE_PREPARE_IMAGE:
10498+ return do_prepare_image();
10499+ case STEP_HIBERNATE_SAVE_IMAGE:
10500+ return do_save_image();
10501+ case STEP_HIBERNATE_POWERDOWN:
10502+ return do_post_image_write();
10503+ case STEP_RESUME_CAN_RESUME:
10504+ return do_check_can_resume();
10505+ case STEP_RESUME_LOAD_PS1:
10506+ return do_load_atomic_copy();
10507+ case STEP_RESUME_DO_RESTORE:
10508+ /*
10509+ * If we succeed, this doesn't return.
10510+ * Instead, we return from do_save_image() in the
10511+ * hibernated kernel.
10512+ */
10513+ return toi_atomic_restore();
10514+ case STEP_RESUME_ALT_IMAGE:
10515+ printk(KERN_INFO "Trying to resume alternate image.\n");
10516+ toi_in_hibernate = 0;
10517+ save_restore_alt_param(SAVE, NOQUIET);
10518+ prepare_restore_load_alt_image(1);
10519+ if (!do_check_can_resume()) {
10520+ printk(KERN_INFO "Nothing to resume from.\n");
10521+ goto out;
10522+ }
10523+ if (!do_load_atomic_copy())
10524+ toi_atomic_restore();
10525+
10526+ printk(KERN_INFO "Failed to load image.\n");
4e97e4e9 10527+out:
ad8f4a28
AM
10528+ prepare_restore_load_alt_image(0);
10529+ save_restore_alt_param(RESTORE, NOQUIET);
10530+ break;
10531+ case STEP_CLEANUP:
10532+ do_cleanup(1);
10533+ break;
10534+ case STEP_QUIET_CLEANUP:
10535+ do_cleanup(0);
10536+ break;
4e97e4e9 10537+ }
24613191 10538+
4e97e4e9 10539+ return 0;
24613191 10540+}
ad8f4a28 10541+EXPORT_SYMBOL_GPL(do_toi_step);
24613191 10542+
4e97e4e9 10543+/* -- Functions for kickstarting a hibernate or resume --- */
10544+
10545+/**
10546+ * __toi_try_resume: Try to do the steps in resuming.
24613191 10547+ *
4e97e4e9 10548+ * Check if we have an image and if so try to resume. Clear the status
10549+ * flags too.
24613191 10550+ */
4e97e4e9 10551+void __toi_try_resume(void)
24613191 10552+{
4e97e4e9 10553+ set_toi_state(TOI_TRYING_TO_RESUME);
10554+ resume_attempted = 1;
24613191 10555+
4e97e4e9 10556+ current->flags |= PF_MEMALLOC;
24613191 10557+
4e97e4e9 10558+ if (do_toi_step(STEP_RESUME_CAN_RESUME) &&
10559+ !do_toi_step(STEP_RESUME_LOAD_PS1))
10560+ do_toi_step(STEP_RESUME_DO_RESTORE);
24613191 10561+
4e97e4e9 10562+ do_cleanup(0);
24613191 10563+
4e97e4e9 10564+ current->flags &= ~PF_MEMALLOC;
24613191 10565+
4e97e4e9 10566+ clear_toi_state(TOI_IGNORE_LOGLEVEL);
10567+ clear_toi_state(TOI_TRYING_TO_RESUME);
10568+ clear_toi_state(TOI_NOW_RESUMING);
10569+}
24613191 10570+
4e97e4e9 10571+/**
10572+ * _toi_try_resume: Wrapper calling __toi_try_resume from do_mounts.
10573+ *
10574+ * Wrapper for when __toi_try_resume is called from init/do_mounts.c,
10575+ * rather than from echo > /sys/power/tuxonice/do_resume.
10576+ */
10577+void _toi_try_resume(void)
10578+{
10579+ resume_attempted = 1;
24613191 10580+
7f9d2ee0 10581+ /*
10582+ * There's a comment in kernel/power/disk.c that indicates
10583+ * we should be able to use mutex_lock_nested below. That
10584+ * doesn't seem to cut it, though, so let's just turn lockdep
10585+ * off for now.
10586+ */
10587+ lockdep_off();
10588+
4e97e4e9 10589+ if (toi_start_anything(SYSFS_RESUMING))
7f9d2ee0 10590+ goto out;
24613191 10591+
4e97e4e9 10592+ /* Unlock will be done in do_cleanup */
10593+ mutex_lock(&pm_mutex);
10594+ got_pmsem = 1;
24613191 10595+
4e97e4e9 10596+ __toi_try_resume();
24613191 10597+
4e97e4e9 10598+ /*
10599+ * For initramfs, we have to clear the boot time
10600+ * flag after trying to resume
10601+ */
10602+ clear_toi_state(TOI_BOOT_TIME);
7f9d2ee0 10603+
10604+out:
4e97e4e9 10605+ toi_finish_anything(SYSFS_RESUMING);
7f9d2ee0 10606+ lockdep_on();
10607+
4e97e4e9 10608+}
24613191 10609+
4e97e4e9 10610+/**
10611+ * _toi_try_hibernate: Try to start a hibernation cycle.
10612+ *
10613+ * have_pmsem: Whther the pm_sem is already taken.
10614+ *
10615+ * Start a hibernation cycle, coming in from either
10616+ * echo > /sys/power/tuxonice/do_suspend
10617+ *
10618+ * or
10619+ *
10620+ * echo disk > /sys/power/state
10621+ *
10622+ * In the later case, we come in without pm_sem taken; in the
10623+ * former, it has been taken.
10624+ */
10625+int _toi_try_hibernate(int have_pmsem)
10626+{
10627+ int result = 0, sys_power_disk = 0;
24613191 10628+
4e97e4e9 10629+ if (!atomic_read(&actions_running)) {
10630+ /* Came in via /sys/power/disk */
10631+ if (toi_start_anything(SYSFS_HIBERNATING))
10632+ return -EBUSY;
10633+ sys_power_disk = 1;
10634+ }
24613191 10635+
4e97e4e9 10636+ get_pmsem = !have_pmsem;
10637+
10638+ if (strlen(alt_resume_param)) {
10639+ attempt_to_parse_alt_resume_param();
10640+
10641+ if (!strlen(alt_resume_param)) {
ad8f4a28
AM
10642+ printk(KERN_INFO "Alternate resume parameter now "
10643+ "invalid. Aborting.\n");
4e97e4e9 10644+ goto out;
24613191 10645+ }
10646+ }
10647+
4e97e4e9 10648+ current->flags |= PF_MEMALLOC;
e8d0ad9d 10649+
ad8f4a28
AM
10650+ if (test_toi_state(TOI_CLUSTER_MODE)) {
10651+ toi_initiate_cluster_hibernate();
10652+ goto out;
10653+ }
10654+
10655+ result = do_toi_step(STEP_HIBERNATE_PREPARE_IMAGE);
10656+ if (result)
4e97e4e9 10657+ goto out;
24613191 10658+
4e97e4e9 10659+ if (test_action_state(TOI_FREEZER_TEST)) {
10660+ do_cleanup(0);
10661+ goto out;
24613191 10662+ }
24613191 10663+
ad8f4a28
AM
10664+ result = do_toi_step(STEP_HIBERNATE_SAVE_IMAGE);
10665+ if (result)
10666+ goto out;
10667+
10668+ /* This code runs at resume time too! */
10669+ if (toi_in_hibernate)
10670+ result = do_toi_step(STEP_HIBERNATE_POWERDOWN);
10671+out:
10672+ current->flags &= ~PF_MEMALLOC;
10673+
10674+ if (sys_power_disk)
10675+ toi_finish_anything(SYSFS_HIBERNATING);
10676+
10677+ return result;
10678+}
10679+
10680+/*
10681+ * channel_no: If !0, -c <channel_no> is added to args (userui).
10682+ */
10683+int toi_launch_userspace_program(char *command, int channel_no,
10684+ enum umh_wait wait)
10685+{
10686+ int retval;
10687+ static char *envp[] = {
10688+ "HOME=/",
10689+ "TERM=linux",
10690+ "PATH=/sbin:/usr/sbin:/bin:/usr/bin",
10691+ NULL };
10692+ static char *argv[] =
10693+ { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL };
10694+ char *channel = NULL;
10695+ int arg = 0, size;
10696+ char test_read[255];
10697+ char *orig_posn = command;
10698+
10699+ if (!strlen(orig_posn))
10700+ return 1;
10701+
10702+ if (channel_no) {
10703+ channel = toi_kzalloc(4, 6, GFP_KERNEL);
10704+ if (!channel) {
10705+ printk(KERN_INFO "Failed to allocate memory in "
10706+ "preparing to launch userspace program.\n");
10707+ return 1;
10708+ }
10709+ }
10710+
10711+ /* Up to 7 args supported */
10712+ while (arg < 7) {
10713+ sscanf(orig_posn, "%s", test_read);
10714+ size = strlen(test_read);
10715+ if (!(size))
10716+ break;
10717+ argv[arg] = toi_kzalloc(5, size + 1, TOI_ATOMIC_GFP);
10718+ strcpy(argv[arg], test_read);
10719+ orig_posn += size + 1;
10720+ *test_read = 0;
10721+ arg++;
10722+ }
10723+
10724+ if (channel_no) {
10725+ sprintf(channel, "-c%d", channel_no);
10726+ argv[arg] = channel;
10727+ } else
10728+ arg--;
10729+
10730+ retval = call_usermodehelper(argv[0], argv, envp, wait);
10731+
7f9d2ee0 10732+ /*
ad8f4a28
AM
10733+ * If the program reports an error, retval = 256. Don't complain
10734+ * about that here.
10735+ */
10736+ if (retval && retval != 256)
10737+ printk("Failed to launch userspace program '%s': Error %d\n",
10738+ command, retval);
10739+
10740+ {
10741+ int i;
10742+ for (i = 0; i < arg; i++)
10743+ if (argv[i] && argv[i] != channel)
10744+ toi_kfree(5, argv[i]);
10745+ }
10746+
10747+ toi_kfree(4, channel);
4e97e4e9 10748+
ad8f4a28 10749+ return retval;
24613191 10750+}
10751+
4e97e4e9 10752+/*
10753+ * This array contains entries that are automatically registered at
10754+ * boot. Modules and the console code register their own entries separately.
24613191 10755+ */
4e97e4e9 10756+static struct toi_sysfs_data sysfs_params[] = {
10757+ { TOI_ATTR("extra_pages_allowance", SYSFS_RW),
7f9d2ee0 10758+ SYSFS_LONG(&extra_pd1_pages_allowance, 0,
10759+ LONG_MAX, 0)
4e97e4e9 10760+ },
24613191 10761+
4e97e4e9 10762+ { TOI_ATTR("image_exists", SYSFS_RW),
10763+ SYSFS_CUSTOM(image_exists_read, image_exists_write,
10764+ SYSFS_NEEDS_SM_FOR_BOTH)
10765+ },
24613191 10766+
4e97e4e9 10767+ { TOI_ATTR("resume", SYSFS_RW),
10768+ SYSFS_STRING(resume_file, 255, SYSFS_NEEDS_SM_FOR_WRITE),
10769+ .write_side_effect = attempt_to_parse_resume_device2,
10770+ },
24613191 10771+
4e97e4e9 10772+ { TOI_ATTR("alt_resume_param", SYSFS_RW),
10773+ SYSFS_STRING(alt_resume_param, 255, SYSFS_NEEDS_SM_FOR_WRITE),
10774+ .write_side_effect = attempt_to_parse_alt_resume_param,
10775+ },
10776+ { TOI_ATTR("debug_info", SYSFS_READONLY),
10777+ SYSFS_CUSTOM(get_toi_debug_info, NULL, 0)
10778+ },
24613191 10779+
4e97e4e9 10780+ { TOI_ATTR("ignore_rootfs", SYSFS_RW),
ad8f4a28 10781+ SYSFS_BIT(&toi_bkd.toi_action, TOI_IGNORE_ROOTFS, 0)
4e97e4e9 10782+ },
24613191 10783+
4e97e4e9 10784+ { TOI_ATTR("image_size_limit", SYSFS_RW),
10785+ SYSFS_INT(&image_size_limit, -2, INT_MAX, 0)
10786+ },
24613191 10787+
4e97e4e9 10788+ { TOI_ATTR("last_result", SYSFS_RW),
10789+ SYSFS_UL(&toi_result, 0, 0, 0)
10790+ },
24613191 10791+
4e97e4e9 10792+ { TOI_ATTR("no_multithreaded_io", SYSFS_RW),
ad8f4a28 10793+ SYSFS_BIT(&toi_bkd.toi_action, TOI_NO_MULTITHREADED_IO, 0)
4e97e4e9 10794+ },
e8d0ad9d 10795+
7f9d2ee0 10796+ { TOI_ATTR("no_flusher_thread", SYSFS_RW),
10797+ SYSFS_BIT(&toi_bkd.toi_action, TOI_NO_FLUSHER_THREAD, 0)
10798+ },
10799+
4e97e4e9 10800+ { TOI_ATTR("full_pageset2", SYSFS_RW),
ad8f4a28 10801+ SYSFS_BIT(&toi_bkd.toi_action, TOI_PAGESET2_FULL, 0)
4e97e4e9 10802+ },
24613191 10803+
4e97e4e9 10804+ { TOI_ATTR("reboot", SYSFS_RW),
ad8f4a28 10805+ SYSFS_BIT(&toi_bkd.toi_action, TOI_REBOOT, 0)
4e97e4e9 10806+ },
24613191 10807+
4e97e4e9 10808+ { TOI_ATTR("replace_swsusp", SYSFS_RW),
ad8f4a28 10809+ SYSFS_BIT(&toi_bkd.toi_action, TOI_REPLACE_SWSUSP, 0)
4e97e4e9 10810+ },
24613191 10811+
4e97e4e9 10812+ { TOI_ATTR("resume_commandline", SYSFS_RW),
ad8f4a28 10813+ SYSFS_STRING(toi_bkd.toi_nosave_commandline, COMMAND_LINE_SIZE, 0)
4e97e4e9 10814+ },
24613191 10815+
4e97e4e9 10816+ { TOI_ATTR("version", SYSFS_READONLY),
10817+ SYSFS_STRING(TOI_CORE_VERSION, 0, 0)
10818+ },
24613191 10819+
4e97e4e9 10820+ { TOI_ATTR("no_load_direct", SYSFS_RW),
ad8f4a28 10821+ SYSFS_BIT(&toi_bkd.toi_action, TOI_NO_DIRECT_LOAD, 0)
4e97e4e9 10822+ },
24613191 10823+
4e97e4e9 10824+ { TOI_ATTR("freezer_test", SYSFS_RW),
ad8f4a28 10825+ SYSFS_BIT(&toi_bkd.toi_action, TOI_FREEZER_TEST, 0)
4e97e4e9 10826+ },
24613191 10827+
4e97e4e9 10828+ { TOI_ATTR("test_bio", SYSFS_RW),
ad8f4a28 10829+ SYSFS_BIT(&toi_bkd.toi_action, TOI_TEST_BIO, 0)
4e97e4e9 10830+ },
24613191 10831+
4e97e4e9 10832+ { TOI_ATTR("test_filter_speed", SYSFS_RW),
ad8f4a28 10833+ SYSFS_BIT(&toi_bkd.toi_action, TOI_TEST_FILTER_SPEED, 0)
4e97e4e9 10834+ },
24613191 10835+
4e97e4e9 10836+ { TOI_ATTR("no_pageset2", SYSFS_RW),
ad8f4a28 10837+ SYSFS_BIT(&toi_bkd.toi_action, TOI_NO_PAGESET2, 0)
4e97e4e9 10838+ },
24613191 10839+
4e97e4e9 10840+ { TOI_ATTR("late_cpu_hotplug", SYSFS_RW),
ad8f4a28
AM
10841+ SYSFS_BIT(&toi_bkd.toi_action, TOI_LATE_CPU_HOTPLUG, 0)
10842+ },
10843+
10844+ { TOI_ATTR("pre_hibernate_command", SYSFS_RW),
10845+ SYSFS_STRING(pre_hibernate_command, 0, 255)
10846+ },
10847+
10848+ { TOI_ATTR("post_hibernate_command", SYSFS_RW),
10849+ SYSFS_STRING(post_hibernate_command, 0, 255)
4e97e4e9 10850+ },
24613191 10851+
4e97e4e9 10852+#ifdef CONFIG_TOI_KEEP_IMAGE
10853+ { TOI_ATTR("keep_image", SYSFS_RW),
ad8f4a28 10854+ SYSFS_BIT(&toi_bkd.toi_action, TOI_KEEP_IMAGE, 0)
4e97e4e9 10855+ },
10856+#endif
10857+};
e8d0ad9d 10858+
4e97e4e9 10859+struct toi_core_fns my_fns = {
10860+ .get_nonconflicting_page = __toi_get_nonconflicting_page,
10861+ .post_context_save = __toi_post_context_save,
10862+ .try_hibernate = _toi_try_hibernate,
10863+ .try_resume = _toi_try_resume,
10864+};
e8d0ad9d 10865+
4e97e4e9 10866+/**
10867+ * core_load: Initialisation of TuxOnIce core.
10868+ *
10869+ * Initialise the core, beginning with sysfs. Checksum and so on are part of
10870+ * the core, but have their own initialisation routines because they either
10871+ * aren't compiled in all the time or have their own subdirectories.
10872+ */
10873+static __init int core_load(void)
10874+{
10875+ int i,
10876+ numfiles = sizeof(sysfs_params) / sizeof(struct toi_sysfs_data);
24613191 10877+
ad8f4a28
AM
10878+ strncpy(pre_hibernate_command, CONFIG_TOI_DEFAULT_PRE_HIBERNATE, 255);
10879+ strncpy(post_hibernate_command, CONFIG_TOI_DEFAULT_POST_HIBERNATE, 255);
10880+
4e97e4e9 10881+ if (toi_sysfs_init())
10882+ return 1;
24613191 10883+
ad8f4a28 10884+ for (i = 0; i < numfiles; i++)
7f9d2ee0 10885+ toi_register_sysfs_file(tuxonice_kobj, &sysfs_params[i]);
24613191 10886+
4e97e4e9 10887+ toi_core_fns = &my_fns;
24613191 10888+
ad8f4a28
AM
10889+ if (toi_alloc_init())
10890+ return 1;
4e97e4e9 10891+ if (toi_checksum_init())
10892+ return 1;
10893+ if (toi_cluster_init())
10894+ return 1;
10895+ if (toi_usm_init())
10896+ return 1;
10897+ if (toi_ui_init())
10898+ return 1;
10899+ if (toi_poweroff_init())
10900+ return 1;
24613191 10901+
4e97e4e9 10902+ return 0;
10903+}
24613191 10904+
4e97e4e9 10905+#ifdef MODULE
10906+/**
10907+ * core_unload: Prepare to unload the core code.
10908+ */
10909+static __exit void core_unload(void)
10910+{
10911+ int i,
10912+ numfiles = sizeof(sysfs_params) / sizeof(struct toi_sysfs_data);
24613191 10913+
ad8f4a28 10914+ toi_alloc_exit();
4e97e4e9 10915+ toi_poweroff_exit();
10916+ toi_ui_exit();
10917+ toi_checksum_exit();
10918+ toi_cluster_exit();
10919+ toi_usm_exit();
24613191 10920+
ad8f4a28 10921+ for (i = 0; i < numfiles; i++)
7f9d2ee0 10922+ toi_unregister_sysfs_file(tuxonice_kobj, &sysfs_params[i]);
24613191 10923+
4e97e4e9 10924+ toi_core_fns = NULL;
24613191 10925+
4e97e4e9 10926+ toi_sysfs_exit();
24613191 10927+}
4e97e4e9 10928+MODULE_LICENSE("GPL");
10929+module_init(core_load);
10930+module_exit(core_unload);
10931+#else
10932+late_initcall(core_load);
10933+#endif
24613191 10934+
4e97e4e9 10935+#ifdef CONFIG_TOI_EXPORTS
7f9d2ee0 10936+EXPORT_SYMBOL_GPL(tuxonice_signature);
4e97e4e9 10937+EXPORT_SYMBOL_GPL(pagedir2);
10938+EXPORT_SYMBOL_GPL(toi_fail_num);
ad8f4a28 10939+EXPORT_SYMBOL_GPL(do_check_can_resume);
4e97e4e9 10940+#endif
10941diff --git a/kernel/power/tuxonice_io.c b/kernel/power/tuxonice_io.c
10942new file mode 100644
7f9d2ee0 10943index 0000000..2efd973
4e97e4e9 10944--- /dev/null
10945+++ b/kernel/power/tuxonice_io.c
7f9d2ee0 10946@@ -0,0 +1,1427 @@
4e97e4e9 10947+/*
10948+ * kernel/power/tuxonice_io.c
24613191 10949+ *
4e97e4e9 10950+ * Copyright (C) 1998-2001 Gabor Kuti <seasons@fornax.hu>
10951+ * Copyright (C) 1998,2001,2002 Pavel Machek <pavel@suse.cz>
10952+ * Copyright (C) 2002-2003 Florent Chabaud <fchabaud@free.fr>
10953+ * Copyright (C) 2002-2007 Nigel Cunningham (nigel at tuxonice net)
10954+ *
10955+ * This file is released under the GPLv2.
10956+ *
10957+ * It contains high level IO routines for hibernating.
24613191 10958+ *
24613191 10959+ */
24613191 10960+
4e97e4e9 10961+#include <linux/suspend.h>
10962+#include <linux/version.h>
10963+#include <linux/utsname.h>
10964+#include <linux/mount.h>
10965+#include <linux/highmem.h>
10966+#include <linux/module.h>
10967+#include <linux/kthread.h>
10968+#include <linux/dyn_pageflags.h>
10969+#include <asm/tlbflush.h>
24613191 10970+
4e97e4e9 10971+#include "tuxonice.h"
10972+#include "tuxonice_modules.h"
10973+#include "tuxonice_pageflags.h"
10974+#include "tuxonice_io.h"
10975+#include "tuxonice_ui.h"
10976+#include "tuxonice_storage.h"
10977+#include "tuxonice_prepare_image.h"
10978+#include "tuxonice_extent.h"
10979+#include "tuxonice_sysfs.h"
10980+#include "tuxonice_builtin.h"
10981+#include "tuxonice_checksum.h"
ad8f4a28 10982+#include "tuxonice_alloc.h"
4e97e4e9 10983+char alt_resume_param[256];
24613191 10984+
4e97e4e9 10985+/* Variables shared between threads and updated under the mutex */
10986+static int io_write, io_finish_at, io_base, io_barmax, io_pageset, io_result;
10987+static int io_index, io_nextupdate, io_pc, io_pc_step;
10988+static unsigned long pfn, other_pfn;
10989+static DEFINE_MUTEX(io_mutex);
10990+static DEFINE_PER_CPU(struct page *, last_sought);
10991+static DEFINE_PER_CPU(struct page *, last_high_page);
10992+static DEFINE_PER_CPU(char *, checksum_locn);
10993+static DEFINE_PER_CPU(struct pbe *, last_low_page);
4e97e4e9 10994+static atomic_t io_count;
7f9d2ee0 10995+atomic_t toi_io_workers;
10996+DECLARE_WAIT_QUEUE_HEAD(toi_io_queue_flusher);
10997+int toi_bio_queue_flusher_should_finish;
24613191 10998+
4e97e4e9 10999+/* toi_attempt_to_parse_resume_device
11000+ *
11001+ * Can we hibernate, using the current resume= parameter?
11002+ */
11003+int toi_attempt_to_parse_resume_device(int quiet)
11004+{
11005+ struct list_head *Allocator;
11006+ struct toi_module_ops *thisAllocator;
11007+ int result, returning = 0;
24613191 11008+
4e97e4e9 11009+ if (toi_activate_storage(0))
11010+ return 0;
11011+
11012+ toiActiveAllocator = NULL;
11013+ clear_toi_state(TOI_RESUME_DEVICE_OK);
11014+ clear_toi_state(TOI_CAN_RESUME);
11015+ clear_result_state(TOI_ABORTED);
11016+
11017+ if (!toiNumAllocators) {
11018+ if (!quiet)
ad8f4a28
AM
11019+ printk(KERN_INFO "TuxOnIce: No storage allocators have "
11020+ "been registered. Hibernating will be "
11021+ "disabled.\n");
4e97e4e9 11022+ goto cleanup;
11023+ }
ad8f4a28 11024+
4e97e4e9 11025+ if (!resume_file[0]) {
11026+ if (!quiet)
11027+ printk("TuxOnIce: Resume= parameter is empty."
11028+ " Hibernating will be disabled.\n");
11029+ goto cleanup;
11030+ }
24613191 11031+
4e97e4e9 11032+ list_for_each(Allocator, &toiAllocators) {
11033+ thisAllocator = list_entry(Allocator, struct toi_module_ops,
11034+ type_list);
24613191 11035+
ad8f4a28 11036+ /*
4e97e4e9 11037+ * Not sure why you'd want to disable an allocator, but
11038+ * we should honour the flag if we're providing it
11039+ */
11040+ if (!thisAllocator->enabled)
11041+ continue;
24613191 11042+
4e97e4e9 11043+ result = thisAllocator->parse_sig_location(
11044+ resume_file, (toiNumAllocators == 1),
11045+ quiet);
24613191 11046+
4e97e4e9 11047+ switch (result) {
ad8f4a28
AM
11048+ case -EINVAL:
11049+ /* For this allocator, but not a valid
11050+ * configuration. Error already printed. */
11051+ goto cleanup;
11052+
11053+ case 0:
11054+ /* For this allocator and valid. */
11055+ toiActiveAllocator = thisAllocator;
11056+
11057+ set_toi_state(TOI_RESUME_DEVICE_OK);
11058+ set_toi_state(TOI_CAN_RESUME);
11059+ returning = 1;
11060+ goto cleanup;
24613191 11061+ }
11062+ }
4e97e4e9 11063+ if (!quiet)
11064+ printk("TuxOnIce: No matching enabled allocator found. "
11065+ "Resuming disabled.\n");
11066+cleanup:
11067+ toi_deactivate_storage(0);
11068+ return returning;
24613191 11069+}
11070+
4e97e4e9 11071+void attempt_to_parse_resume_device2(void)
11072+{
11073+ toi_prepare_usm();
11074+ toi_attempt_to_parse_resume_device(0);
11075+ toi_cleanup_usm();
11076+}
24613191 11077+
4e97e4e9 11078+void save_restore_alt_param(int replace, int quiet)
11079+{
11080+ static char resume_param_save[255];
11081+ static unsigned long toi_state_save;
e8d0ad9d 11082+
4e97e4e9 11083+ if (replace) {
11084+ toi_state_save = toi_state;
11085+ strcpy(resume_param_save, resume_file);
ad8f4a28 11086+ strcpy(resume_file, alt_resume_param);
4e97e4e9 11087+ } else {
11088+ strcpy(resume_file, resume_param_save);
11089+ toi_state = toi_state_save;
11090+ }
11091+ toi_attempt_to_parse_resume_device(quiet);
11092+}
24613191 11093+
4e97e4e9 11094+void attempt_to_parse_alt_resume_param(void)
11095+{
11096+ int ok = 0;
24613191 11097+
4e97e4e9 11098+ /* Temporarily set resume_param to the poweroff value */
11099+ if (!strlen(alt_resume_param))
11100+ return;
24613191 11101+
4e97e4e9 11102+ printk("=== Trying Poweroff Resume2 ===\n");
11103+ save_restore_alt_param(SAVE, NOQUIET);
11104+ if (test_toi_state(TOI_CAN_RESUME))
11105+ ok = 1;
ad8f4a28
AM
11106+
11107+ printk(KERN_INFO "=== Done ===\n");
4e97e4e9 11108+ save_restore_alt_param(RESTORE, QUIET);
ad8f4a28 11109+
4e97e4e9 11110+ /* If not ok, clear the string */
11111+ if (ok)
24613191 11112+ return;
11113+
ad8f4a28
AM
11114+ printk(KERN_INFO "Can't resume from that location; clearing "
11115+ "alt_resume_param.\n");
4e97e4e9 11116+ alt_resume_param[0] = '\0';
11117+}
24613191 11118+
4e97e4e9 11119+/* noresume_reset_modules
11120+ *
11121+ * Description: When we read the start of an image, modules (and especially the
11122+ * active allocator) might need to reset data structures if we
11123+ * decide to remove the image rather than resuming from it.
11124+ */
24613191 11125+
4e97e4e9 11126+static void noresume_reset_modules(void)
24613191 11127+{
4e97e4e9 11128+ struct toi_module_ops *this_filter;
ad8f4a28 11129+
4e97e4e9 11130+ list_for_each_entry(this_filter, &toi_filters, type_list)
11131+ if (this_filter->noresume_reset)
11132+ this_filter->noresume_reset();
11133+
11134+ if (toiActiveAllocator && toiActiveAllocator->noresume_reset)
11135+ toiActiveAllocator->noresume_reset();
24613191 11136+}
11137+
4e97e4e9 11138+/* fill_toi_header()
ad8f4a28 11139+ *
4e97e4e9 11140+ * Description: Fill the hibernate header structure.
11141+ * Arguments: struct toi_header: Header data structure to be filled.
11142+ */
e8d0ad9d 11143+
ad8f4a28 11144+static int fill_toi_header(struct toi_header *sh)
4e97e4e9 11145+{
ad8f4a28
AM
11146+ int i, error;
11147+
11148+ error = init_swsusp_header((struct swsusp_info *) sh);
11149+ if (error)
11150+ return error;
24613191 11151+
4e97e4e9 11152+ sh->pagedir = pagedir1;
11153+ sh->pageset_2_size = pagedir2.size;
11154+ sh->param0 = toi_result;
ad8f4a28
AM
11155+ sh->param1 = toi_bkd.toi_action;
11156+ sh->param2 = toi_bkd.toi_debug_state;
11157+ sh->param3 = toi_bkd.toi_default_console_level;
7f9d2ee0 11158+ sh->root_fs = current->fs->root.mnt->mnt_sb->s_dev;
4e97e4e9 11159+ for (i = 0; i < 4; i++)
ad8f4a28
AM
11160+ sh->io_time[i/2][i%2] = toi_bkd.toi_io_time[i/2][i%2];
11161+ sh->bkd = boot_kernel_data_buffer;
11162+ return 0;
4e97e4e9 11163+}
43540741 11164+
24613191 11165+/*
4e97e4e9 11166+ * rw_init_modules
24613191 11167+ *
4e97e4e9 11168+ * Iterate over modules, preparing the ones that will be used to read or write
11169+ * data.
24613191 11170+ */
4e97e4e9 11171+static int rw_init_modules(int rw, int which)
24613191 11172+{
4e97e4e9 11173+ struct toi_module_ops *this_module;
11174+ /* Initialise page transformers */
11175+ list_for_each_entry(this_module, &toi_filters, type_list) {
11176+ if (!this_module->enabled)
11177+ continue;
11178+ if (this_module->rw_init && this_module->rw_init(rw, which)) {
11179+ abort_hibernate(TOI_FAILED_MODULE_INIT,
11180+ "Failed to initialise the %s filter.",
11181+ this_module->name);
11182+ return 1;
11183+ }
11184+ }
24613191 11185+
4e97e4e9 11186+ /* Initialise allocator */
11187+ if (toiActiveAllocator->rw_init(rw, which)) {
11188+ abort_hibernate(TOI_FAILED_MODULE_INIT,
ad8f4a28 11189+ "Failed to initialise the allocator.");
4e97e4e9 11190+ return 1;
11191+ }
24613191 11192+
4e97e4e9 11193+ /* Initialise other modules */
11194+ list_for_each_entry(this_module, &toi_modules, module_list) {
11195+ if (!this_module->enabled ||
11196+ this_module->type == FILTER_MODULE ||
11197+ this_module->type == WRITER_MODULE)
11198+ continue;
11199+ if (this_module->rw_init && this_module->rw_init(rw, which)) {
11200+ set_abort_result(TOI_FAILED_MODULE_INIT);
ad8f4a28
AM
11201+ printk(KERN_INFO "Setting aborted flag due to module "
11202+ "init failure.\n");
4e97e4e9 11203+ return 1;
11204+ }
24613191 11205+ }
11206+
4e97e4e9 11207+ return 0;
24613191 11208+}
11209+
4e97e4e9 11210+/*
11211+ * rw_cleanup_modules
11212+ *
11213+ * Cleanup components after reading or writing a set of pages.
11214+ * Only the allocator may fail.
11215+ */
11216+static int rw_cleanup_modules(int rw)
24613191 11217+{
4e97e4e9 11218+ struct toi_module_ops *this_module;
11219+ int result = 0;
24613191 11220+
4e97e4e9 11221+ /* Cleanup other modules */
11222+ list_for_each_entry(this_module, &toi_modules, module_list) {
11223+ if (!this_module->enabled ||
11224+ this_module->type == FILTER_MODULE ||
11225+ this_module->type == WRITER_MODULE)
11226+ continue;
11227+ if (this_module->rw_cleanup)
11228+ result |= this_module->rw_cleanup(rw);
11229+ }
24613191 11230+
4e97e4e9 11231+ /* Flush data and cleanup */
11232+ list_for_each_entry(this_module, &toi_filters, type_list) {
11233+ if (!this_module->enabled)
11234+ continue;
11235+ if (this_module->rw_cleanup)
11236+ result |= this_module->rw_cleanup(rw);
11237+ }
24613191 11238+
4e97e4e9 11239+ result |= toiActiveAllocator->rw_cleanup(rw);
24613191 11240+
4e97e4e9 11241+ return result;
11242+}
24613191 11243+
4e97e4e9 11244+static struct page *copy_page_from_orig_page(struct page *orig_page)
11245+{
11246+ int is_high = PageHighMem(orig_page), index, min, max;
11247+ struct page *high_page = NULL,
11248+ **my_last_high_page = &__get_cpu_var(last_high_page),
11249+ **my_last_sought = &__get_cpu_var(last_sought);
11250+ struct pbe *this, **my_last_low_page = &__get_cpu_var(last_low_page);
11251+ void *compare;
24613191 11252+
4e97e4e9 11253+ if (is_high) {
ad8f4a28
AM
11254+ if (*my_last_sought && *my_last_high_page &&
11255+ *my_last_sought < orig_page)
4e97e4e9 11256+ high_page = *my_last_high_page;
11257+ else
11258+ high_page = (struct page *) restore_highmem_pblist;
11259+ this = (struct pbe *) kmap(high_page);
11260+ compare = orig_page;
11261+ } else {
ad8f4a28
AM
11262+ if (*my_last_sought && *my_last_low_page &&
11263+ *my_last_sought < orig_page)
4e97e4e9 11264+ this = *my_last_low_page;
11265+ else
11266+ this = restore_pblist;
11267+ compare = page_address(orig_page);
11268+ }
24613191 11269+
4e97e4e9 11270+ *my_last_sought = orig_page;
24613191 11271+
4e97e4e9 11272+ /* Locate page containing pbe */
ad8f4a28
AM
11273+ while (this[PBES_PER_PAGE - 1].next &&
11274+ this[PBES_PER_PAGE - 1].orig_address < compare) {
4e97e4e9 11275+ if (is_high) {
11276+ struct page *next_high_page = (struct page *)
11277+ this[PBES_PER_PAGE - 1].next;
11278+ kunmap(high_page);
11279+ this = kmap(next_high_page);
11280+ high_page = next_high_page;
11281+ } else
11282+ this = this[PBES_PER_PAGE - 1].next;
24613191 11283+ }
11284+
4e97e4e9 11285+ /* Do a binary search within the page */
11286+ min = 0;
11287+ max = PBES_PER_PAGE;
11288+ index = PBES_PER_PAGE / 2;
11289+ while (max - min) {
11290+ if (!this[index].orig_address ||
11291+ this[index].orig_address > compare)
11292+ max = index;
11293+ else if (this[index].orig_address == compare) {
11294+ if (is_high) {
11295+ struct page *page = this[index].address;
11296+ *my_last_high_page = high_page;
11297+ kunmap(high_page);
11298+ return page;
11299+ }
11300+ *my_last_low_page = this;
11301+ return virt_to_page(this[index].address);
11302+ } else
11303+ min = index;
11304+ index = ((max + min) / 2);
11305+ };
11306+
11307+ if (is_high)
11308+ kunmap(high_page);
11309+
11310+ abort_hibernate(TOI_FAILED_IO, "Failed to get destination page for"
11311+ " orig page %p. This[min].orig_address=%p.\n", orig_page,
11312+ this[index].orig_address);
11313+ return NULL;
24613191 11314+}
11315+
4e97e4e9 11316+/*
11317+ * do_rw_loop
11318+ *
11319+ * The main I/O loop for reading or writing pages.
11320+ */
11321+static int worker_rw_loop(void *data)
24613191 11322+{
4e97e4e9 11323+ unsigned long orig_pfn, write_pfn;
7f9d2ee0 11324+ int result, my_io_index = 0, temp;
4e97e4e9 11325+ struct toi_module_ops *first_filter = toi_get_next_filter(NULL);
ad8f4a28 11326+ struct page *buffer = toi_alloc_page(28, TOI_ATOMIC_GFP);
24613191 11327+
7f9d2ee0 11328+ atomic_inc(&toi_io_workers);
4e97e4e9 11329+ mutex_lock(&io_mutex);
24613191 11330+
4e97e4e9 11331+ do {
11332+ int buf_size;
24613191 11333+
4e97e4e9 11334+ /*
11335+ * What page to use? If reading, don't know yet which page's
11336+ * data will be read, so always use the buffer. If writing,
11337+ * use the copy (Pageset1) or original page (Pageset2), but
11338+ * always write the pfn of the original page.
11339+ */
11340+ if (io_write) {
11341+ struct page *page;
11342+ char **my_checksum_locn = &__get_cpu_var(checksum_locn);
24613191 11343+
4e97e4e9 11344+ pfn = get_next_bit_on(&io_map, pfn);
24613191 11345+
4e97e4e9 11346+ /* Another thread could have beaten us to it. */
11347+ if (pfn == max_pfn + 1) {
11348+ if (atomic_read(&io_count)) {
ad8f4a28
AM
11349+ printk("Ran out of pfns but io_count "
11350+ "is still %d.\n",
11351+ atomic_read(&io_count));
4e97e4e9 11352+ BUG();
11353+ }
11354+ break;
11355+ }
24613191 11356+
7f9d2ee0 11357+ my_io_index = io_finish_at -
11358+ atomic_sub_return(1, &io_count);
24613191 11359+
4e97e4e9 11360+ orig_pfn = pfn;
11361+ write_pfn = pfn;
24613191 11362+
ad8f4a28 11363+ /*
4e97e4e9 11364+ * Other_pfn is updated by all threads, so we're not
11365+ * writing the same page multiple times.
11366+ */
11367+ clear_dynpageflag(&io_map, pfn_to_page(pfn));
11368+ if (io_pageset == 1) {
ad8f4a28
AM
11369+ other_pfn = get_next_bit_on(&pageset1_map,
11370+ other_pfn);
4e97e4e9 11371+ write_pfn = other_pfn;
11372+ }
11373+ page = pfn_to_page(pfn);
24613191 11374+
4e97e4e9 11375+ if (io_pageset == 2)
ad8f4a28
AM
11376+ *my_checksum_locn =
11377+ tuxonice_get_next_checksum();
24613191 11378+
4e97e4e9 11379+ mutex_unlock(&io_mutex);
24613191 11380+
4e97e4e9 11381+ if (io_pageset == 2 &&
11382+ tuxonice_calc_checksum(page, *my_checksum_locn))
11383+ return 1;
24613191 11384+
4e97e4e9 11385+ result = first_filter->write_page(write_pfn, page,
11386+ PAGE_SIZE);
11387+ } else {
7f9d2ee0 11388+ my_io_index = io_finish_at -
11389+ atomic_sub_return(1, &io_count);
4e97e4e9 11390+ mutex_unlock(&io_mutex);
24613191 11391+
ad8f4a28 11392+ /*
4e97e4e9 11393+ * Are we aborting? If so, don't submit any more I/O as
11394+ * resetting the resume_attempted flag (from ui.c) will
11395+ * clear the bdev flags, making this thread oops.
11396+ */
11397+ if (unlikely(test_toi_state(TOI_STOP_RESUME))) {
7f9d2ee0 11398+ atomic_dec(&toi_io_workers);
11399+ if (!atomic_read(&toi_io_workers))
4e97e4e9 11400+ set_toi_state(TOI_IO_STOPPED);
11401+ while (1)
11402+ schedule();
11403+ }
24613191 11404+
4e97e4e9 11405+ result = first_filter->read_page(&write_pfn, buffer,
11406+ &buf_size);
11407+ if (buf_size != PAGE_SIZE) {
11408+ abort_hibernate(TOI_FAILED_IO,
ad8f4a28
AM
11409+ "I/O pipeline returned %d bytes instead"
11410+ " of %d.\n", buf_size, PAGE_SIZE);
4e97e4e9 11411+ mutex_lock(&io_mutex);
11412+ break;
11413+ }
11414+ }
24613191 11415+
4e97e4e9 11416+ if (result) {
11417+ io_result = result;
11418+ if (io_write) {
ad8f4a28
AM
11419+ printk(KERN_INFO "Write chunk returned %d.\n",
11420+ result);
4e97e4e9 11421+ abort_hibernate(TOI_FAILED_IO,
11422+ "Failed to write a chunk of the "
11423+ "image.");
11424+ mutex_lock(&io_mutex);
11425+ break;
11426+ }
11427+ panic("Read chunk returned (%d)", result);
11428+ }
24613191 11429+
ad8f4a28 11430+ /*
4e97e4e9 11431+ * Discard reads of resaved pages while reading ps2
11432+ * and unwanted pages while rereading ps2 when aborting.
11433+ */
11434+ if (!io_write && !PageResave(pfn_to_page(write_pfn))) {
11435+ struct page *final_page = pfn_to_page(write_pfn),
11436+ *copy_page = final_page;
11437+ char *virt, *buffer_virt;
24613191 11438+
4e97e4e9 11439+ if (io_pageset == 1 && !load_direct(final_page)) {
ad8f4a28
AM
11440+ copy_page =
11441+ copy_page_from_orig_page(final_page);
4e97e4e9 11442+ BUG_ON(!copy_page);
11443+ }
24613191 11444+
4e97e4e9 11445+ if (test_dynpageflag(&io_map, final_page)) {
11446+ virt = kmap(copy_page);
11447+ buffer_virt = kmap(buffer);
11448+ memcpy(virt, buffer_virt, PAGE_SIZE);
11449+ kunmap(copy_page);
11450+ kunmap(buffer);
11451+ clear_dynpageflag(&io_map, final_page);
4e97e4e9 11452+ } else {
11453+ mutex_lock(&io_mutex);
11454+ atomic_inc(&io_count);
11455+ mutex_unlock(&io_mutex);
11456+ }
11457+ }
24613191 11458+
7f9d2ee0 11459+ temp = my_io_index + io_base - io_nextupdate;
11460+
11461+ if (my_io_index + io_base == io_nextupdate)
4e97e4e9 11462+ io_nextupdate = toi_update_status(my_io_index +
11463+ io_base, io_barmax, " %d/%d MB ",
11464+ MB(io_base+my_io_index+1), MB(io_barmax));
24613191 11465+
7f9d2ee0 11466+ if (my_io_index == io_pc) {
11467+ printk("%s%d%%...", io_pc_step == 1 ? KERN_ERR : "",
ad8f4a28 11468+ 20 * io_pc_step);
4e97e4e9 11469+ io_pc_step++;
11470+ io_pc = io_finish_at * io_pc_step / 5;
11471+ }
ad8f4a28 11472+
4e97e4e9 11473+ toi_cond_pause(0, NULL);
24613191 11474+
ad8f4a28 11475+ /*
4e97e4e9 11476+ * Subtle: If there's less I/O still to be done than threads
11477+ * running, quit. This stops us doing I/O beyond the end of
11478+ * the image when reading.
11479+ *
11480+ * Possible race condition. Two threads could do the test at
11481+ * the same time; one should exit and one should continue.
11482+ * Therefore we take the mutex before comparing and exiting.
11483+ */
24613191 11484+
4e97e4e9 11485+ mutex_lock(&io_mutex);
24613191 11486+
7f9d2ee0 11487+ } while (atomic_read(&io_count) >= atomic_read(&toi_io_workers) &&
4e97e4e9 11488+ !(io_write && test_result_state(TOI_ABORTED)));
24613191 11489+
7f9d2ee0 11490+ if (atomic_dec_and_test(&toi_io_workers)) {
11491+ toi_bio_queue_flusher_should_finish = 1;
11492+ wake_up(&toi_io_queue_flusher);
11493+ }
4e97e4e9 11494+ mutex_unlock(&io_mutex);
24613191 11495+
ad8f4a28 11496+ toi__free_page(28, buffer);
24613191 11497+
4e97e4e9 11498+ return 0;
11499+}
24613191 11500+
7f9d2ee0 11501+int start_other_threads(void)
24613191 11502+{
7f9d2ee0 11503+ int cpu, num_started = 0;
4e97e4e9 11504+ struct task_struct *p;
24613191 11505+
4e97e4e9 11506+ for_each_online_cpu(cpu) {
11507+ if (cpu == smp_processor_id())
11508+ continue;
24613191 11509+
4e97e4e9 11510+ p = kthread_create(worker_rw_loop, NULL, "ks2io/%d", cpu);
11511+ if (IS_ERR(p)) {
11512+ printk("ks2io for %i failed\n", cpu);
11513+ continue;
11514+ }
11515+ kthread_bind(p, cpu);
ad8f4a28 11516+ p->flags |= PF_MEMALLOC;
4e97e4e9 11517+ wake_up_process(p);
7f9d2ee0 11518+ num_started++;
4e97e4e9 11519+ }
7f9d2ee0 11520+
11521+ return num_started;
24613191 11522+}
4e97e4e9 11523+
24613191 11524+/*
4e97e4e9 11525+ * do_rw_loop
24613191 11526+ *
4e97e4e9 11527+ * The main I/O loop for reading or writing pages.
24613191 11528+ */
4e97e4e9 11529+static int do_rw_loop(int write, int finish_at, struct dyn_pageflags *pageflags,
11530+ int base, int barmax, int pageset)
24613191 11531+{
7f9d2ee0 11532+ int index = 0, cpu, num_other_threads = 0;
24613191 11533+
4e97e4e9 11534+ if (!finish_at)
11535+ return 0;
24613191 11536+
4e97e4e9 11537+ io_write = write;
11538+ io_finish_at = finish_at;
11539+ io_base = base;
11540+ io_barmax = barmax;
11541+ io_pageset = pageset;
11542+ io_index = 0;
11543+ io_pc = io_finish_at / 5;
11544+ io_pc_step = 1;
11545+ io_result = 0;
7f9d2ee0 11546+ io_nextupdate = base + 1;
11547+ toi_bio_queue_flusher_should_finish = 0;
24613191 11548+
4e97e4e9 11549+ for_each_online_cpu(cpu) {
11550+ per_cpu(last_sought, cpu) = NULL;
11551+ per_cpu(last_low_page, cpu) = NULL;
11552+ per_cpu(last_high_page, cpu) = NULL;
11553+ }
24613191 11554+
4e97e4e9 11555+ /* Ensure all bits clear */
11556+ clear_dyn_pageflags(&io_map);
24613191 11557+
4e97e4e9 11558+ /* Set the bits for the pages to write */
11559+ pfn = get_next_bit_on(pageflags, max_pfn + 1);
24613191 11560+
4e97e4e9 11561+ while (pfn < max_pfn + 1 && index < finish_at) {
11562+ set_dynpageflag(&io_map, pfn_to_page(pfn));
11563+ pfn = get_next_bit_on(pageflags, pfn);
11564+ index++;
11565+ }
24613191 11566+
4e97e4e9 11567+ BUG_ON(index < finish_at);
24613191 11568+
4e97e4e9 11569+ atomic_set(&io_count, finish_at);
24613191 11570+
4e97e4e9 11571+ pfn = max_pfn + 1;
11572+ other_pfn = pfn;
24613191 11573+
4e97e4e9 11574+ clear_toi_state(TOI_IO_STOPPED);
24613191 11575+
4e97e4e9 11576+ if (!test_action_state(TOI_NO_MULTITHREADED_IO))
7f9d2ee0 11577+ num_other_threads = start_other_threads();
4e97e4e9 11578+
7f9d2ee0 11579+ if (!num_other_threads || !toiActiveAllocator->io_flusher ||
11580+ test_action_state(TOI_NO_FLUSHER_THREAD))
11581+ worker_rw_loop(NULL);
11582+ else
11583+ toiActiveAllocator->io_flusher(write);
11584+
11585+ while (atomic_read(&toi_io_workers))
4e97e4e9 11586+ schedule();
11587+
11588+ set_toi_state(TOI_IO_STOPPED);
11589+ if (unlikely(test_toi_state(TOI_STOP_RESUME))) {
11590+ while (1)
11591+ schedule();
11592+ }
11593+
11594+ if (!io_result) {
11595+ printk("done.\n");
11596+
ad8f4a28
AM
11597+ toi_update_status(io_base + io_finish_at, io_barmax,
11598+ " %d/%d MB ",
4e97e4e9 11599+ MB(io_base + io_finish_at), MB(io_barmax));
11600+ }
11601+
11602+ if (io_write && test_result_state(TOI_ABORTED))
11603+ io_result = 1;
11604+ else { /* All I/O done? */
ad8f4a28
AM
11605+ if (get_next_bit_on(&io_map, max_pfn + 1) != max_pfn + 1) {
11606+ printk(KERN_INFO "Finished I/O loop but still work to "
11607+ "do?\nFinish at = %d. io_count = %d.\n",
11608+ finish_at, atomic_read(&io_count));
4e97e4e9 11609+ BUG();
11610+ }
73c609d5 11611+ }
4e97e4e9 11612+
11613+ return io_result;
24613191 11614+}
11615+
4e97e4e9 11616+/* write_pageset()
11617+ *
11618+ * Description: Write a pageset to disk.
11619+ * Arguments: pagedir: Which pagedir to write..
11620+ * Returns: Zero on success or -1 on failure.
24613191 11621+ */
24613191 11622+
4e97e4e9 11623+int write_pageset(struct pagedir *pagedir)
11624+{
11625+ int finish_at, base = 0, start_time, end_time;
11626+ int barmax = pagedir1.size + pagedir2.size;
11627+ long error = 0;
11628+ struct dyn_pageflags *pageflags;
24613191 11629+
ad8f4a28 11630+ /*
4e97e4e9 11631+ * Even if there is nothing to read or write, the allocator
11632+ * may need the init/cleanup for it's housekeeping. (eg:
11633+ * Pageset1 may start where pageset2 ends when writing).
11634+ */
11635+ finish_at = pagedir->size;
24613191 11636+
4e97e4e9 11637+ if (pagedir->id == 1) {
11638+ toi_prepare_status(DONT_CLEAR_BAR,
11639+ "Writing kernel & process data...");
11640+ base = pagedir2.size;
11641+ if (test_action_state(TOI_TEST_FILTER_SPEED) ||
11642+ test_action_state(TOI_TEST_BIO))
11643+ pageflags = &pageset1_map;
11644+ else
11645+ pageflags = &pageset1_copy_map;
11646+ } else {
11647+ toi_prepare_status(CLEAR_BAR, "Writing caches...");
11648+ pageflags = &pageset2_map;
ad8f4a28
AM
11649+ }
11650+
4e97e4e9 11651+ start_time = jiffies;
24613191 11652+
4e97e4e9 11653+ if (rw_init_modules(1, pagedir->id)) {
11654+ abort_hibernate(TOI_FAILED_MODULE_INIT,
11655+ "Failed to initialise modules for writing.");
11656+ error = 1;
24613191 11657+ }
11658+
4e97e4e9 11659+ if (!error)
11660+ error = do_rw_loop(1, finish_at, pageflags, base, barmax,
11661+ pagedir->id);
24613191 11662+
4e97e4e9 11663+ if (rw_cleanup_modules(WRITE) && !error) {
11664+ abort_hibernate(TOI_FAILED_MODULE_CLEANUP,
11665+ "Failed to cleanup after writing.");
11666+ error = 1;
24613191 11667+ }
11668+
4e97e4e9 11669+ end_time = jiffies;
ad8f4a28 11670+
4e97e4e9 11671+ if ((end_time - start_time) && (!test_result_state(TOI_ABORTED))) {
ad8f4a28
AM
11672+ toi_bkd.toi_io_time[0][0] += finish_at,
11673+ toi_bkd.toi_io_time[0][1] += (end_time - start_time);
4e97e4e9 11674+ }
24613191 11675+
4e97e4e9 11676+ return error;
24613191 11677+}
11678+
4e97e4e9 11679+/* read_pageset()
43540741 11680+ *
4e97e4e9 11681+ * Description: Read a pageset from disk.
11682+ * Arguments: whichtowrite: Controls what debugging output is printed.
11683+ * overwrittenpagesonly: Whether to read the whole pageset or
11684+ * only part.
11685+ * Returns: Zero on success or -1 on failure.
43540741 11686+ */
43540741 11687+
4e97e4e9 11688+static int read_pageset(struct pagedir *pagedir, int overwrittenpagesonly)
43540741 11689+{
4e97e4e9 11690+ int result = 0, base = 0, start_time, end_time;
11691+ int finish_at = pagedir->size;
11692+ int barmax = pagedir1.size + pagedir2.size;
11693+ struct dyn_pageflags *pageflags;
43540741 11694+
4e97e4e9 11695+ if (pagedir->id == 1) {
11696+ toi_prepare_status(CLEAR_BAR,
11697+ "Reading kernel & process data...");
11698+ pageflags = &pageset1_map;
11699+ } else {
11700+ toi_prepare_status(DONT_CLEAR_BAR, "Reading caches...");
11701+ if (overwrittenpagesonly)
ad8f4a28 11702+ barmax = finish_at = min(pagedir1.size,
4e97e4e9 11703+ pagedir2.size);
ad8f4a28 11704+ else
4e97e4e9 11705+ base = pagedir1.size;
4e97e4e9 11706+ pageflags = &pageset2_map;
ad8f4a28
AM
11707+ }
11708+
4e97e4e9 11709+ start_time = jiffies;
43540741 11710+
4e97e4e9 11711+ if (rw_init_modules(0, pagedir->id)) {
11712+ toiActiveAllocator->remove_image();
11713+ result = 1;
11714+ } else
11715+ result = do_rw_loop(0, finish_at, pageflags, base, barmax,
11716+ pagedir->id);
11717+
11718+ if (rw_cleanup_modules(READ) && !result) {
11719+ abort_hibernate(TOI_FAILED_MODULE_CLEANUP,
11720+ "Failed to cleanup after reading.");
11721+ result = 1;
43540741 11722+ }
4e97e4e9 11723+
11724+ /* Statistics */
ad8f4a28 11725+ end_time = jiffies;
4e97e4e9 11726+
11727+ if ((end_time - start_time) && (!test_result_state(TOI_ABORTED))) {
ad8f4a28
AM
11728+ toi_bkd.toi_io_time[1][0] += finish_at,
11729+ toi_bkd.toi_io_time[1][1] += (end_time - start_time);
4e97e4e9 11730+ }
11731+
11732+ return result;
43540741 11733+}
11734+
4e97e4e9 11735+/* write_module_configs()
11736+ *
11737+ * Description: Store the configuration for each module in the image header.
11738+ * Returns: Int: Zero on success, Error value otherwise.
e8d0ad9d 11739+ */
4e97e4e9 11740+static int write_module_configs(void)
24613191 11741+{
4e97e4e9 11742+ struct toi_module_ops *this_module;
11743+ char *buffer = (char *) toi_get_zeroed_page(22, TOI_ATOMIC_GFP);
11744+ int len, index = 1;
11745+ struct toi_module_header toi_module_header;
24613191 11746+
4e97e4e9 11747+ if (!buffer) {
ad8f4a28 11748+ printk(KERN_INFO "Failed to allocate a buffer for saving "
4e97e4e9 11749+ "module configuration info.\n");
11750+ return -ENOMEM;
11751+ }
ad8f4a28
AM
11752+
11753+ /*
4e97e4e9 11754+ * We have to know which data goes with which module, so we at
11755+ * least write a length of zero for a module. Note that we are
11756+ * also assuming every module's config data takes <= PAGE_SIZE.
11757+ */
24613191 11758+
4e97e4e9 11759+ /* For each module (in registration order) */
11760+ list_for_each_entry(this_module, &toi_modules, module_list) {
11761+ if (!this_module->enabled || !this_module->storage_needed ||
11762+ (this_module->type == WRITER_MODULE &&
11763+ toiActiveAllocator != this_module))
11764+ continue;
24613191 11765+
4e97e4e9 11766+ /* Get the data from the module */
11767+ len = 0;
11768+ if (this_module->save_config_info)
11769+ len = this_module->save_config_info(buffer);
24613191 11770+
4e97e4e9 11771+ /* Save the details of the module */
11772+ toi_module_header.enabled = this_module->enabled;
11773+ toi_module_header.type = this_module->type;
11774+ toi_module_header.index = index++;
ad8f4a28 11775+ strncpy(toi_module_header.name, this_module->name,
4e97e4e9 11776+ sizeof(toi_module_header.name));
11777+ toiActiveAllocator->rw_header_chunk(WRITE,
11778+ this_module,
11779+ (char *) &toi_module_header,
11780+ sizeof(toi_module_header));
24613191 11781+
4e97e4e9 11782+ /* Save the size of the data and any data returned */
11783+ toiActiveAllocator->rw_header_chunk(WRITE,
11784+ this_module,
11785+ (char *) &len, sizeof(int));
11786+ if (len)
11787+ toiActiveAllocator->rw_header_chunk(
11788+ WRITE, this_module, buffer, len);
24613191 11789+ }
24613191 11790+
4e97e4e9 11791+ /* Write a blank header to terminate the list */
11792+ toi_module_header.name[0] = '\0';
ad8f4a28
AM
11793+ toiActiveAllocator->rw_header_chunk(WRITE, NULL,
11794+ (char *) &toi_module_header, sizeof(toi_module_header));
4e97e4e9 11795+
ad8f4a28 11796+ toi_free_page(22, (unsigned long) buffer);
4e97e4e9 11797+ return 0;
24613191 11798+}
11799+
4e97e4e9 11800+/* read_module_configs()
11801+ *
11802+ * Description: Reload module configurations from the image header.
11803+ * Returns: Int. Zero on success, error value otherwise.
e8d0ad9d 11804+ */
11805+
4e97e4e9 11806+static int read_module_configs(void)
24613191 11807+{
4e97e4e9 11808+ struct toi_module_ops *this_module;
11809+ char *buffer = (char *) toi_get_zeroed_page(23, TOI_ATOMIC_GFP);
11810+ int len, result = 0;
11811+ struct toi_module_header toi_module_header;
e8d0ad9d 11812+
4e97e4e9 11813+ if (!buffer) {
11814+ printk("Failed to allocate a buffer for reloading module "
11815+ "configuration info.\n");
11816+ return -ENOMEM;
11817+ }
ad8f4a28 11818+
4e97e4e9 11819+ /* All modules are initially disabled. That way, if we have a module
11820+ * loaded now that wasn't loaded when we hibernated, it won't be used
11821+ * in trying to read the data.
11822+ */
11823+ list_for_each_entry(this_module, &toi_modules, module_list)
11824+ this_module->enabled = 0;
ad8f4a28 11825+
4e97e4e9 11826+ /* Get the first module header */
11827+ result = toiActiveAllocator->rw_header_chunk(READ, NULL,
11828+ (char *) &toi_module_header,
11829+ sizeof(toi_module_header));
11830+ if (result) {
11831+ printk("Failed to read the next module header.\n");
ad8f4a28 11832+ toi_free_page(23, (unsigned long) buffer);
4e97e4e9 11833+ return -EINVAL;
11834+ }
24613191 11835+
4e97e4e9 11836+ /* For each module (in registration order) */
11837+ while (toi_module_header.name[0]) {
24613191 11838+
4e97e4e9 11839+ /* Find the module */
ad8f4a28
AM
11840+ this_module =
11841+ toi_find_module_given_name(toi_module_header.name);
24613191 11842+
4e97e4e9 11843+ if (!this_module) {
ad8f4a28
AM
11844+ /*
11845+ * Is it used? Only need to worry about filters. The
11846+ * active allocator must be loaded!
4e97e4e9 11847+ */
11848+ if (toi_module_header.enabled) {
11849+ toi_early_boot_message(1, TOI_CONTINUE_REQ,
11850+ "It looks like we need module %s for "
11851+ "reading the image but it hasn't been "
11852+ "registered.\n",
11853+ toi_module_header.name);
11854+ if (!(test_toi_state(TOI_CONTINUE_REQ))) {
ad8f4a28
AM
11855+ toi_free_page(23,
11856+ (unsigned long) buffer);
4e97e4e9 11857+ return -EINVAL;
11858+ }
11859+ } else
ad8f4a28
AM
11860+ printk(KERN_INFO "Module %s configuration data "
11861+ "found, but the module hasn't "
11862+ "registered. Looks like it was "
11863+ "disabled, so we're ignoring its data.",
4e97e4e9 11864+ toi_module_header.name);
11865+ }
ad8f4a28 11866+
4e97e4e9 11867+ /* Get the length of the data (if any) */
11868+ result = toiActiveAllocator->rw_header_chunk(READ, NULL,
11869+ (char *) &len, sizeof(int));
11870+ if (result) {
11871+ printk("Failed to read the length of the module %s's"
11872+ " configuration data.\n",
11873+ toi_module_header.name);
ad8f4a28 11874+ toi_free_page(23, (unsigned long) buffer);
4e97e4e9 11875+ return -EINVAL;
11876+ }
24613191 11877+
4e97e4e9 11878+ /* Read any data and pass to the module (if we found one) */
11879+ if (len) {
11880+ toiActiveAllocator->rw_header_chunk(READ, NULL,
11881+ buffer, len);
11882+ if (this_module) {
11883+ if (!this_module->save_config_info) {
11884+ printk("Huh? Module %s appears to have "
11885+ "a save_config_info, but not a "
11886+ "load_config_info function!\n",
11887+ this_module->name);
11888+ } else
ad8f4a28
AM
11889+ this_module->load_config_info(buffer,
11890+ len);
4e97e4e9 11891+ }
11892+ }
24613191 11893+
4e97e4e9 11894+ if (this_module) {
11895+ /* Now move this module to the tail of its lists. This
11896+ * will put it in order. Any new modules will end up at
11897+ * the top of the lists. They should have been set to
11898+ * disabled when loaded (people will normally not edit
11899+ * an initrd to load a new module and then hibernate
11900+ * without using it!).
11901+ */
24613191 11902+
4e97e4e9 11903+ toi_move_module_tail(this_module);
24613191 11904+
ad8f4a28 11905+ /*
4e97e4e9 11906+ * We apply the disabled state; modules don't need to
11907+ * save whether they were disabled and if they do, we
11908+ * override them anyway.
11909+ */
11910+ this_module->enabled = toi_module_header.enabled;
11911+ }
24613191 11912+
4e97e4e9 11913+ /* Get the next module header */
11914+ result = toiActiveAllocator->rw_header_chunk(READ, NULL,
11915+ (char *) &toi_module_header,
11916+ sizeof(toi_module_header));
11917+
11918+ if (result) {
11919+ printk("Failed to read the next module header.\n");
ad8f4a28 11920+ toi_free_page(23, (unsigned long) buffer);
4e97e4e9 11921+ return -EINVAL;
11922+ }
24613191 11923+
24613191 11924+ }
11925+
ad8f4a28 11926+ toi_free_page(23, (unsigned long) buffer);
4e97e4e9 11927+ return 0;
11928+}
24613191 11929+
4e97e4e9 11930+/* write_image_header()
11931+ *
11932+ * Description: Write the image header after write the image proper.
11933+ * Returns: Int. Zero on success or -1 on failure.
11934+ */
e8d0ad9d 11935+
4e97e4e9 11936+int write_image_header(void)
11937+{
11938+ int ret;
11939+ int total = pagedir1.size + pagedir2.size+2;
11940+ char *header_buffer = NULL;
24613191 11941+
4e97e4e9 11942+ /* Now prepare to write the header */
ad8f4a28
AM
11943+ ret = toiActiveAllocator->write_header_init();
11944+ if (ret) {
4e97e4e9 11945+ abort_hibernate(TOI_FAILED_MODULE_INIT,
11946+ "Active allocator's write_header_init"
11947+ " function failed.");
11948+ goto write_image_header_abort;
e8d0ad9d 11949+ }
e8d0ad9d 11950+
4e97e4e9 11951+ /* Get a buffer */
11952+ header_buffer = (char *) toi_get_zeroed_page(24, TOI_ATOMIC_GFP);
11953+ if (!header_buffer) {
11954+ abort_hibernate(TOI_OUT_OF_MEMORY,
11955+ "Out of memory when trying to get page for header!");
11956+ goto write_image_header_abort;
e8d0ad9d 11957+ }
11958+
4e97e4e9 11959+ /* Write hibernate header */
ad8f4a28
AM
11960+ if (fill_toi_header((struct toi_header *) header_buffer)) {
11961+ abort_hibernate(TOI_OUT_OF_MEMORY,
11962+ "Failure to fill header information!");
11963+ goto write_image_header_abort;
11964+ }
4e97e4e9 11965+ toiActiveAllocator->rw_header_chunk(WRITE, NULL,
11966+ header_buffer, sizeof(struct toi_header));
e8d0ad9d 11967+
ad8f4a28 11968+ toi_free_page(24, (unsigned long) header_buffer);
e8d0ad9d 11969+
4e97e4e9 11970+ /* Write module configurations */
ad8f4a28
AM
11971+ ret = write_module_configs();
11972+ if (ret) {
4e97e4e9 11973+ abort_hibernate(TOI_FAILED_IO,
11974+ "Failed to write module configs.");
11975+ goto write_image_header_abort;
11976+ }
e8d0ad9d 11977+
4e97e4e9 11978+ save_dyn_pageflags(&pageset1_map);
43540741 11979+
4e97e4e9 11980+ /* Flush data and let allocator cleanup */
11981+ if (toiActiveAllocator->write_header_cleanup()) {
11982+ abort_hibernate(TOI_FAILED_IO,
11983+ "Failed to cleanup writing header.");
11984+ goto write_image_header_abort_no_cleanup;
11985+ }
e8d0ad9d 11986+
4e97e4e9 11987+ if (test_result_state(TOI_ABORTED))
11988+ goto write_image_header_abort_no_cleanup;
43540741 11989+
4e97e4e9 11990+ toi_update_status(total, total, NULL);
e8d0ad9d 11991+
e8d0ad9d 11992+ return 0;
4e97e4e9 11993+
11994+write_image_header_abort:
11995+ toiActiveAllocator->write_header_cleanup();
11996+write_image_header_abort_no_cleanup:
11997+ return -1;
e8d0ad9d 11998+}
11999+
4e97e4e9 12000+/* sanity_check()
12001+ *
12002+ * Description: Perform a few checks, seeking to ensure that the kernel being
12003+ * booted matches the one hibernated. They need to match so we can
12004+ * be _sure_ things will work. It is not absolutely impossible for
12005+ * resuming from a different kernel to work, just not assured.
12006+ * Arguments: Struct toi_header. The header which was saved at hibernate
12007+ * time.
12008+ */
12009+static char *sanity_check(struct toi_header *sh)
e8d0ad9d 12010+{
ad8f4a28 12011+ char *reason = check_swsusp_image_kernel((struct swsusp_info *) sh);
e8d0ad9d 12012+
ad8f4a28
AM
12013+ if (reason)
12014+ return reason;
4e97e4e9 12015+
12016+ if (!test_action_state(TOI_IGNORE_ROOTFS)) {
12017+ const struct super_block *sb;
12018+ list_for_each_entry(sb, &super_blocks, s_list) {
12019+ if ((!(sb->s_flags & MS_RDONLY)) &&
12020+ (sb->s_type->fs_flags & FS_REQUIRES_DEV))
12021+ return "Device backed fs has been mounted "
12022+ "rw prior to resume or initrd/ramfs "
12023+ "is mounted rw.";
12024+ }
12025+ }
e8d0ad9d 12026+
e8d0ad9d 12027+ return 0;
24613191 12028+}
12029+
4e97e4e9 12030+/* __read_pageset1
12031+ *
12032+ * Description: Test for the existence of an image and attempt to load it.
12033+ * Returns: Int. Zero if image found and pageset1 successfully loaded.
12034+ * Error if no image found or loaded.
24613191 12035+ */
4e97e4e9 12036+static int __read_pageset1(void)
ad8f4a28 12037+{
4e97e4e9 12038+ int i, result = 0;
12039+ char *header_buffer = (char *) toi_get_zeroed_page(25, TOI_ATOMIC_GFP),
12040+ *sanity_error = NULL;
12041+ struct toi_header *toi_header;
24613191 12042+
4e97e4e9 12043+ if (!header_buffer) {
ad8f4a28
AM
12044+ printk(KERN_INFO "Unable to allocate a page for reading the "
12045+ "signature.\n");
4e97e4e9 12046+ return -ENOMEM;
12047+ }
ad8f4a28 12048+
4e97e4e9 12049+ /* Check for an image */
7f9d2ee0 12050+ result = toiActiveAllocator->image_exists(1);
ad8f4a28 12051+ if (!result) {
4e97e4e9 12052+ result = -ENODATA;
12053+ noresume_reset_modules();
ad8f4a28 12054+ printk(KERN_INFO "TuxOnIce: No image found.\n");
4e97e4e9 12055+ goto out;
12056+ }
24613191 12057+
7f9d2ee0 12058+ /*
12059+ * Prepare the active allocator for reading the image header. The
12060+ * activate allocator might read its own configuration.
12061+ *
12062+ * NB: This call may never return because there might be a signature
12063+ * for a different image such that we warn the user and they choose
12064+ * to reboot. (If the device ids look erroneous (2.4 vs 2.6) or the
12065+ * location of the image might be unavailable if it was stored on a
12066+ * network connection).
12067+ */
12068+
12069+ result = toiActiveAllocator->read_header_init();
12070+ if (result) {
12071+ printk("TuxOnIce: Failed to initialise, reading the image "
12072+ "header.\n");
12073+ goto out_remove_image;
12074+ }
12075+
4e97e4e9 12076+ /* Check for noresume command line option */
12077+ if (test_toi_state(TOI_NORESUME_SPECIFIED)) {
ad8f4a28
AM
12078+ printk(KERN_INFO "TuxOnIce: Noresume on command line. Removed "
12079+ "image.\n");
4e97e4e9 12080+ goto out_remove_image;
12081+ }
24613191 12082+
4e97e4e9 12083+ /* Check whether we've resumed before */
12084+ if (test_toi_state(TOI_RESUMED_BEFORE)) {
12085+ toi_early_boot_message(1, 0, NULL);
12086+ if (!(test_toi_state(TOI_CONTINUE_REQ))) {
ad8f4a28 12087+ printk(KERN_INFO "TuxOnIce: Tried to resume before: "
4e97e4e9 12088+ "Invalidated image.\n");
12089+ goto out_remove_image;
12090+ }
12091+ }
24613191 12092+
4e97e4e9 12093+ clear_toi_state(TOI_CONTINUE_REQ);
24613191 12094+
4e97e4e9 12095+ /* Read hibernate header */
ad8f4a28
AM
12096+ result = toiActiveAllocator->rw_header_chunk(READ, NULL,
12097+ header_buffer, sizeof(struct toi_header));
12098+ if (result < 0) {
4e97e4e9 12099+ printk("TuxOnIce: Failed to read the image signature.\n");
12100+ goto out_remove_image;
12101+ }
ad8f4a28 12102+
4e97e4e9 12103+ toi_header = (struct toi_header *) header_buffer;
24613191 12104+
4e97e4e9 12105+ /*
12106+ * NB: This call may also result in a reboot rather than returning.
12107+ */
e8d0ad9d 12108+
4e97e4e9 12109+ sanity_error = sanity_check(toi_header);
12110+ if (sanity_error) {
12111+ toi_early_boot_message(1, TOI_CONTINUE_REQ,
12112+ sanity_error);
ad8f4a28 12113+ printk(KERN_INFO "TuxOnIce: Sanity check failed.\n");
4e97e4e9 12114+ goto out_remove_image;
12115+ }
43540741 12116+
4e97e4e9 12117+ /*
12118+ * We have an image and it looks like it will load okay.
12119+ *
12120+ * Get metadata from header. Don't override commandline parameters.
12121+ *
12122+ * We don't need to save the image size limit because it's not used
12123+ * during resume and will be restored with the image anyway.
12124+ */
ad8f4a28 12125+
4e97e4e9 12126+ memcpy((char *) &pagedir1,
12127+ (char *) &toi_header->pagedir, sizeof(pagedir1));
12128+ toi_result = toi_header->param0;
ad8f4a28
AM
12129+ toi_bkd.toi_action = toi_header->param1;
12130+ toi_bkd.toi_debug_state = toi_header->param2;
12131+ toi_bkd.toi_default_console_level = toi_header->param3;
4e97e4e9 12132+ clear_toi_state(TOI_IGNORE_LOGLEVEL);
12133+ pagedir2.size = toi_header->pageset_2_size;
12134+ for (i = 0; i < 4; i++)
ad8f4a28 12135+ toi_bkd.toi_io_time[i/2][i%2] =
4e97e4e9 12136+ toi_header->io_time[i/2][i%2];
ad8f4a28 12137+ boot_kernel_data_buffer = toi_header->bkd;
4e97e4e9 12138+
12139+ /* Read module configurations */
ad8f4a28
AM
12140+ result = read_module_configs();
12141+ if (result) {
4e97e4e9 12142+ pagedir1.size = pagedir2.size = 0;
ad8f4a28 12143+ printk(KERN_INFO "TuxOnIce: Failed to read TuxOnIce module "
4e97e4e9 12144+ "configurations.\n");
12145+ clear_action_state(TOI_KEEP_IMAGE);
12146+ goto out_remove_image;
24613191 12147+ }
12148+
4e97e4e9 12149+ toi_prepare_console();
24613191 12150+
4e97e4e9 12151+ set_toi_state(TOI_NOW_RESUMING);
24613191 12152+
4e97e4e9 12153+ if (pre_resume_freeze())
12154+ goto out_reset_console;
24613191 12155+
4e97e4e9 12156+ toi_cond_pause(1, "About to read original pageset1 locations.");
24613191 12157+
4e97e4e9 12158+ /*
12159+ * Read original pageset1 locations. These are the addresses we can't
12160+ * use for the data to be restored.
12161+ */
24613191 12162+
4e97e4e9 12163+ if (allocate_dyn_pageflags(&pageset1_map, 0) ||
12164+ allocate_dyn_pageflags(&pageset1_copy_map, 0) ||
12165+ allocate_dyn_pageflags(&io_map, 0))
12166+ goto out_reset_console;
24613191 12167+
4e97e4e9 12168+ if (load_dyn_pageflags(&pageset1_map))
12169+ goto out_reset_console;
24613191 12170+
4e97e4e9 12171+ /* Clean up after reading the header */
ad8f4a28
AM
12172+ result = toiActiveAllocator->read_header_cleanup();
12173+ if (result) {
4e97e4e9 12174+ printk("TuxOnIce: Failed to cleanup after reading the image "
12175+ "header.\n");
12176+ goto out_reset_console;
12177+ }
24613191 12178+
4e97e4e9 12179+ toi_cond_pause(1, "About to read pagedir.");
24613191 12180+
ad8f4a28 12181+ /*
4e97e4e9 12182+ * Get the addresses of pages into which we will load the kernel to
12183+ * be copied back
12184+ */
12185+ if (toi_get_pageset1_load_addresses()) {
ad8f4a28
AM
12186+ printk(KERN_INFO "TuxOnIce: Failed to get load addresses for "
12187+ "pageset1.\n");
4e97e4e9 12188+ goto out_reset_console;
12189+ }
24613191 12190+
4e97e4e9 12191+ /* Read the original kernel back */
12192+ toi_cond_pause(1, "About to read pageset 1.");
24613191 12193+
4e97e4e9 12194+ if (read_pageset(&pagedir1, 0)) {
12195+ toi_prepare_status(CLEAR_BAR, "Failed to read pageset 1.");
12196+ result = -EIO;
ad8f4a28 12197+ printk(KERN_INFO "TuxOnIce: Failed to get load pageset1.\n");
4e97e4e9 12198+ goto out_reset_console;
12199+ }
24613191 12200+
4e97e4e9 12201+ toi_cond_pause(1, "About to restore original kernel.");
12202+ result = 0;
24613191 12203+
4e97e4e9 12204+ if (!test_action_state(TOI_KEEP_IMAGE) &&
12205+ toiActiveAllocator->mark_resume_attempted)
12206+ toiActiveAllocator->mark_resume_attempted(1);
24613191 12207+
4e97e4e9 12208+out:
ad8f4a28 12209+ toi_free_page(25, (unsigned long) header_buffer);
4e97e4e9 12210+ return result;
12211+
12212+out_reset_console:
12213+ toi_cleanup_console();
12214+
12215+out_remove_image:
12216+ free_dyn_pageflags(&pageset1_map);
12217+ free_dyn_pageflags(&pageset1_copy_map);
12218+ free_dyn_pageflags(&io_map);
12219+ result = -EINVAL;
12220+ if (!test_action_state(TOI_KEEP_IMAGE))
12221+ toiActiveAllocator->remove_image();
12222+ toiActiveAllocator->read_header_cleanup();
12223+ noresume_reset_modules();
12224+ goto out;
24613191 12225+}
12226+
4e97e4e9 12227+/* read_pageset1()
24613191 12228+ *
4e97e4e9 12229+ * Description: Attempt to read the header and pageset1 of a hibernate image.
12230+ * Handle the outcome, complaining where appropriate.
24613191 12231+ */
12232+
4e97e4e9 12233+int read_pageset1(void)
24613191 12234+{
4e97e4e9 12235+ int error;
24613191 12236+
4e97e4e9 12237+ error = __read_pageset1();
24613191 12238+
ad8f4a28
AM
12239+ if (error && error != -ENODATA && error != -EINVAL &&
12240+ !test_result_state(TOI_ABORTED))
12241+ abort_hibernate(TOI_IMAGE_ERROR,
12242+ "TuxOnIce: Error %d resuming\n", error);
24613191 12243+
4e97e4e9 12244+ return error;
12245+}
12246+
12247+/*
12248+ * get_have_image_data()
12249+ */
12250+static char *get_have_image_data(void)
24613191 12251+{
4e97e4e9 12252+ char *output_buffer = (char *) toi_get_zeroed_page(26, TOI_ATOMIC_GFP);
12253+ struct toi_header *toi_header;
24613191 12254+
4e97e4e9 12255+ if (!output_buffer) {
ad8f4a28 12256+ printk(KERN_INFO "Output buffer null.\n");
4e97e4e9 12257+ return NULL;
12258+ }
24613191 12259+
4e97e4e9 12260+ /* Check for an image */
7f9d2ee0 12261+ if (!toiActiveAllocator->image_exists(1) ||
4e97e4e9 12262+ toiActiveAllocator->read_header_init() ||
12263+ toiActiveAllocator->rw_header_chunk(READ, NULL,
12264+ output_buffer, sizeof(struct toi_header))) {
12265+ sprintf(output_buffer, "0\n");
ad8f4a28
AM
12266+ /*
12267+ * From an initrd/ramfs, catting have_image and
12268+ * getting a result of 0 is sufficient.
12269+ */
12270+ clear_toi_state(TOI_BOOT_TIME);
4e97e4e9 12271+ goto out;
12272+ }
24613191 12273+
4e97e4e9 12274+ toi_header = (struct toi_header *) output_buffer;
24613191 12275+
4e97e4e9 12276+ sprintf(output_buffer, "1\n%s\n%s\n",
12277+ toi_header->uts.machine,
12278+ toi_header->uts.version);
12279+
12280+ /* Check whether we've resumed before */
12281+ if (test_toi_state(TOI_RESUMED_BEFORE))
12282+ strcat(output_buffer, "Resumed before.\n");
12283+
12284+out:
12285+ noresume_reset_modules();
12286+ return output_buffer;
24613191 12287+}
12288+
4e97e4e9 12289+/* read_pageset2()
12290+ *
12291+ * Description: Read in part or all of pageset2 of an image, depending upon
12292+ * whether we are hibernating and have only overwritten a portion
ad8f4a28 12293+ * with pageset1 pages, or are resuming and need to read them
4e97e4e9 12294+ * all.
12295+ * Arguments: Int. Boolean. Read only pages which would have been
12296+ * overwritten by pageset1?
12297+ * Returns: Int. Zero if no error, otherwise the error value.
12298+ */
12299+int read_pageset2(int overwrittenpagesonly)
24613191 12300+{
24613191 12301+ int result = 0;
12302+
4e97e4e9 12303+ if (!pagedir2.size)
24613191 12304+ return 0;
12305+
4e97e4e9 12306+ result = read_pageset(&pagedir2, overwrittenpagesonly);
24613191 12307+
4e97e4e9 12308+ toi_update_status(100, 100, NULL);
12309+ toi_cond_pause(1, "Pagedir 2 read.");
24613191 12310+
24613191 12311+ return result;
12312+}
12313+
4e97e4e9 12314+/* image_exists_read
ad8f4a28 12315+ *
4e97e4e9 12316+ * Return 0 or 1, depending on whether an image is found.
12317+ * Incoming buffer is PAGE_SIZE and result is guaranteed
12318+ * to be far less than that, so we don't worry about
12319+ * overflow.
12320+ */
12321+int image_exists_read(const char *page, int count)
12322+{
12323+ int len = 0;
12324+ char *result;
ad8f4a28 12325+
4e97e4e9 12326+ if (toi_activate_storage(0))
12327+ return count;
24613191 12328+
4e97e4e9 12329+ if (!test_toi_state(TOI_RESUME_DEVICE_OK))
12330+ toi_attempt_to_parse_resume_device(0);
24613191 12331+
4e97e4e9 12332+ if (!toiActiveAllocator) {
12333+ len = sprintf((char *) page, "-1\n");
12334+ } else {
12335+ result = get_have_image_data();
12336+ if (result) {
12337+ len = sprintf((char *) page, "%s", result);
ad8f4a28 12338+ toi_free_page(26, (unsigned long) result);
4e97e4e9 12339+ }
24613191 12340+ }
12341+
4e97e4e9 12342+ toi_deactivate_storage(0);
24613191 12343+
4e97e4e9 12344+ return len;
24613191 12345+}
12346+
4e97e4e9 12347+/* image_exists_write
ad8f4a28 12348+ *
4e97e4e9 12349+ * Invalidate an image if one exists.
12350+ */
12351+int image_exists_write(const char *buffer, int count)
24613191 12352+{
4e97e4e9 12353+ if (toi_activate_storage(0))
12354+ return count;
24613191 12355+
7f9d2ee0 12356+ if (toiActiveAllocator && toiActiveAllocator->image_exists(1))
4e97e4e9 12357+ toiActiveAllocator->remove_image();
12358+
12359+ toi_deactivate_storage(0);
12360+
12361+ clear_result_state(TOI_KEPT_IMAGE);
12362+
12363+ return count;
24613191 12364+}
12365+
4e97e4e9 12366+#ifdef CONFIG_TOI_EXPORTS
12367+EXPORT_SYMBOL_GPL(toi_attempt_to_parse_resume_device);
12368+EXPORT_SYMBOL_GPL(attempt_to_parse_resume_device2);
7f9d2ee0 12369+EXPORT_SYMBOL_GPL(toi_io_workers);
12370+EXPORT_SYMBOL_GPL(toi_io_queue_flusher);
12371+EXPORT_SYMBOL_GPL(toi_bio_queue_flusher_should_finish);
4e97e4e9 12372+#endif
e8d0ad9d 12373+
4e97e4e9 12374diff --git a/kernel/power/tuxonice_io.h b/kernel/power/tuxonice_io.h
12375new file mode 100644
7f9d2ee0 12376index 0000000..d4b470b
4e97e4e9 12377--- /dev/null
12378+++ b/kernel/power/tuxonice_io.h
7f9d2ee0 12379@@ -0,0 +1,71 @@
4e97e4e9 12380+/*
12381+ * kernel/power/tuxonice_io.h
12382+ *
12383+ * Copyright (C) 2005-2007 Nigel Cunningham (nigel at tuxonice net)
12384+ *
12385+ * This file is released under the GPLv2.
12386+ *
12387+ * It contains high level IO routines for hibernating.
12388+ *
12389+ */
e8d0ad9d 12390+
4e97e4e9 12391+#include <linux/utsname.h>
12392+#include "tuxonice_pagedir.h"
ad8f4a28 12393+#include "power.h"
e8d0ad9d 12394+
4e97e4e9 12395+/* Non-module data saved in our image header */
12396+struct toi_header {
ad8f4a28
AM
12397+ /*
12398+ * Mirror struct swsusp_info, but without
12399+ * the page aligned attribute
12400+ */
12401+ struct new_utsname uts;
4e97e4e9 12402+ u32 version_code;
12403+ unsigned long num_physpages;
ad8f4a28
AM
12404+ int cpus;
12405+ unsigned long image_pages;
12406+ unsigned long pages;
12407+ unsigned long size;
12408+
12409+ /* Our own data */
4e97e4e9 12410+ unsigned long orig_mem_free;
4e97e4e9 12411+ int page_size;
12412+ int pageset_2_size;
12413+ int param0;
12414+ int param1;
12415+ int param2;
12416+ int param3;
12417+ int progress0;
12418+ int progress1;
12419+ int progress2;
12420+ int progress3;
12421+ int io_time[2][2];
12422+ struct pagedir pagedir;
12423+ dev_t root_fs;
ad8f4a28 12424+ unsigned long bkd; /* Boot kernel data locn */
4e97e4e9 12425+};
e8d0ad9d 12426+
4e97e4e9 12427+extern int write_pageset(struct pagedir *pagedir);
12428+extern int write_image_header(void);
12429+extern int read_pageset1(void);
12430+extern int read_pageset2(int overwrittenpagesonly);
e8d0ad9d 12431+
4e97e4e9 12432+extern int toi_attempt_to_parse_resume_device(int quiet);
12433+extern void attempt_to_parse_resume_device2(void);
12434+extern void attempt_to_parse_alt_resume_param(void);
12435+int image_exists_read(const char *page, int count);
12436+int image_exists_write(const char *buffer, int count);
12437+extern void save_restore_alt_param(int replace, int quiet);
7f9d2ee0 12438+extern atomic_t toi_io_workers;
e8d0ad9d 12439+
4e97e4e9 12440+/* Args to save_restore_alt_param */
12441+#define RESTORE 0
12442+#define SAVE 1
24613191 12443+
4e97e4e9 12444+#define NOQUIET 0
12445+#define QUIET 1
24613191 12446+
4e97e4e9 12447+extern dev_t name_to_dev_t(char *line);
7f9d2ee0 12448+
12449+extern wait_queue_head_t toi_io_queue_flusher;
12450+extern int toi_bio_queue_flusher_should_finish;
4e97e4e9 12451diff --git a/kernel/power/tuxonice_modules.c b/kernel/power/tuxonice_modules.c
12452new file mode 100644
7f9d2ee0 12453index 0000000..95ce455
4e97e4e9 12454--- /dev/null
12455+++ b/kernel/power/tuxonice_modules.c
ad8f4a28 12456@@ -0,0 +1,461 @@
4e97e4e9 12457+/*
12458+ * kernel/power/tuxonice_modules.c
12459+ *
12460+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
12461+ *
12462+ */
24613191 12463+
4e97e4e9 12464+#include <linux/suspend.h>
12465+#include <linux/module.h>
12466+#include "tuxonice.h"
12467+#include "tuxonice_modules.h"
12468+#include "tuxonice_sysfs.h"
12469+#include "tuxonice_ui.h"
12470+
12471+LIST_HEAD(toi_filters);
12472+LIST_HEAD(toiAllocators);
12473+LIST_HEAD(toi_modules);
12474+
12475+struct toi_module_ops *toiActiveAllocator;
12476+int toi_num_filters;
12477+int toiNumAllocators, toi_num_modules;
ad8f4a28 12478+
24613191 12479+/*
4e97e4e9 12480+ * toi_header_storage_for_modules
12481+ *
12482+ * Returns the amount of space needed to store configuration
12483+ * data needed by the modules prior to copying back the original
12484+ * kernel. We can exclude data for pageset2 because it will be
12485+ * available anyway once the kernel is copied back.
24613191 12486+ */
7f9d2ee0 12487+long toi_header_storage_for_modules(void)
24613191 12488+{
4e97e4e9 12489+ struct toi_module_ops *this_module;
12490+ int bytes = 0;
ad8f4a28 12491+
4e97e4e9 12492+ list_for_each_entry(this_module, &toi_modules, module_list) {
12493+ if (!this_module->enabled ||
12494+ (this_module->type == WRITER_MODULE &&
12495+ toiActiveAllocator != this_module))
12496+ continue;
12497+ if (this_module->storage_needed) {
12498+ int this = this_module->storage_needed() +
12499+ sizeof(struct toi_module_header) +
12500+ sizeof(int);
12501+ this_module->header_requested = this;
12502+ bytes += this;
12503+ }
12504+ }
24613191 12505+
4e97e4e9 12506+ /* One more for the empty terminator */
12507+ return bytes + sizeof(struct toi_module_header);
24613191 12508+}
12509+
4e97e4e9 12510+/*
12511+ * toi_memory_for_modules
12512+ *
12513+ * Returns the amount of memory requested by modules for
12514+ * doing their work during the cycle.
12515+ */
24613191 12516+
7f9d2ee0 12517+long toi_memory_for_modules(int print_parts)
4e97e4e9 12518+{
7f9d2ee0 12519+ long bytes = 0, result;
4e97e4e9 12520+ struct toi_module_ops *this_module;
e8d0ad9d 12521+
ad8f4a28
AM
12522+ if (print_parts)
12523+ printk(KERN_INFO "Memory for modules:\n===================\n");
4e97e4e9 12524+ list_for_each_entry(this_module, &toi_modules, module_list) {
ad8f4a28 12525+ int this;
4e97e4e9 12526+ if (!this_module->enabled)
12527+ continue;
ad8f4a28
AM
12528+ if (this_module->memory_needed) {
12529+ this = this_module->memory_needed();
12530+ if (print_parts)
7f9d2ee0 12531+ printk(KERN_INFO "%10d bytes (%5ld pages) for "
12532+ "module '%s'.\n", this,
12533+ DIV_ROUND_UP(this, PAGE_SIZE),
ad8f4a28
AM
12534+ this_module->name);
12535+ bytes += this;
12536+ }
4e97e4e9 12537+ }
24613191 12538+
7f9d2ee0 12539+ result = DIV_ROUND_UP(bytes, PAGE_SIZE);
ad8f4a28 12540+ if (print_parts)
7f9d2ee0 12541+ printk(KERN_INFO " => %ld bytes, %ld pages.\n", bytes, result);
ad8f4a28
AM
12542+
12543+ return result;
24613191 12544+}
12545+
12546+/*
4e97e4e9 12547+ * toi_expected_compression_ratio
12548+ *
12549+ * Returns the compression ratio expected when saving the image.
24613191 12550+ */
4e97e4e9 12551+
12552+int toi_expected_compression_ratio(void)
24613191 12553+{
4e97e4e9 12554+ int ratio = 100;
12555+ struct toi_module_ops *this_module;
24613191 12556+
4e97e4e9 12557+ list_for_each_entry(this_module, &toi_modules, module_list) {
12558+ if (!this_module->enabled)
12559+ continue;
12560+ if (this_module->expected_compression)
ad8f4a28
AM
12561+ ratio = ratio * this_module->expected_compression()
12562+ / 100;
24613191 12563+ }
12564+
4e97e4e9 12565+ return ratio;
12566+}
24613191 12567+
4e97e4e9 12568+/* toi_find_module_given_dir
12569+ * Functionality : Return a module (if found), given a pointer
12570+ * to its directory name
12571+ */
24613191 12572+
4e97e4e9 12573+static struct toi_module_ops *toi_find_module_given_dir(char *name)
12574+{
12575+ struct toi_module_ops *this_module, *found_module = NULL;
ad8f4a28 12576+
4e97e4e9 12577+ list_for_each_entry(this_module, &toi_modules, module_list) {
12578+ if (!strcmp(name, this_module->directory)) {
12579+ found_module = this_module;
12580+ break;
ad8f4a28 12581+ }
24613191 12582+ }
12583+
4e97e4e9 12584+ return found_module;
12585+}
24613191 12586+
4e97e4e9 12587+/* toi_find_module_given_name
12588+ * Functionality : Return a module (if found), given a pointer
12589+ * to its name
12590+ */
24613191 12591+
4e97e4e9 12592+struct toi_module_ops *toi_find_module_given_name(char *name)
12593+{
12594+ struct toi_module_ops *this_module, *found_module = NULL;
ad8f4a28 12595+
4e97e4e9 12596+ list_for_each_entry(this_module, &toi_modules, module_list) {
12597+ if (!strcmp(name, this_module->name)) {
12598+ found_module = this_module;
12599+ break;
ad8f4a28 12600+ }
4e97e4e9 12601+ }
24613191 12602+
4e97e4e9 12603+ return found_module;
24613191 12604+}
12605+
12606+/*
4e97e4e9 12607+ * toi_print_module_debug_info
12608+ * Functionality : Get debugging info from modules into a buffer.
24613191 12609+ */
4e97e4e9 12610+int toi_print_module_debug_info(char *buffer, int buffer_size)
12611+{
12612+ struct toi_module_ops *this_module;
12613+ int len = 0;
24613191 12614+
4e97e4e9 12615+ list_for_each_entry(this_module, &toi_modules, module_list) {
12616+ if (!this_module->enabled)
12617+ continue;
12618+ if (this_module->print_debug_info) {
12619+ int result;
ad8f4a28 12620+ result = this_module->print_debug_info(buffer + len,
4e97e4e9 12621+ buffer_size - len);
12622+ len += result;
12623+ }
12624+ }
24613191 12625+
4e97e4e9 12626+ /* Ensure null terminated */
12627+ buffer[buffer_size] = 0;
24613191 12628+
4e97e4e9 12629+ return len;
12630+}
24613191 12631+
4e97e4e9 12632+/*
12633+ * toi_register_module
12634+ *
12635+ * Register a module.
12636+ */
12637+int toi_register_module(struct toi_module_ops *module)
24613191 12638+{
4e97e4e9 12639+ int i;
12640+ struct kobject *kobj;
24613191 12641+
4e97e4e9 12642+ module->enabled = 1;
ad8f4a28 12643+
4e97e4e9 12644+ if (toi_find_module_given_name(module->name)) {
ad8f4a28 12645+ printk(KERN_INFO "TuxOnIce: Trying to load module %s,"
4e97e4e9 12646+ " which is already registered.\n",
12647+ module->name);
12648+ return -EBUSY;
12649+ }
24613191 12650+
4e97e4e9 12651+ switch (module->type) {
ad8f4a28
AM
12652+ case FILTER_MODULE:
12653+ list_add_tail(&module->type_list, &toi_filters);
12654+ toi_num_filters++;
12655+ break;
12656+ case WRITER_MODULE:
12657+ list_add_tail(&module->type_list, &toiAllocators);
12658+ toiNumAllocators++;
12659+ break;
12660+ case MISC_MODULE:
12661+ case MISC_HIDDEN_MODULE:
12662+ break;
12663+ default:
12664+ printk("Hmmm. Module '%s' has an invalid type."
12665+ " It has been ignored.\n", module->name);
12666+ return -EINVAL;
4e97e4e9 12667+ }
12668+ list_add_tail(&module->module_list, &toi_modules);
12669+ toi_num_modules++;
24613191 12670+
ad8f4a28
AM
12671+ if (!module->directory && !module->shared_directory)
12672+ return 0;
12673+
12674+ /*
12675+ * Modules may share a directory, but those with shared_dir
12676+ * set must be loaded (via symbol dependencies) after parents
12677+ * and unloaded beforehand.
12678+ */
12679+ if (module->shared_directory) {
12680+ struct toi_module_ops *shared =
12681+ toi_find_module_given_dir(module->shared_directory);
12682+ if (!shared) {
12683+ printk("TuxOnIce: Module %s wants to share %s's "
12684+ "directory but %s isn't loaded.\n",
12685+ module->name, module->shared_directory,
12686+ module->shared_directory);
12687+ toi_unregister_module(module);
12688+ return -ENODEV;
4e97e4e9 12689+ }
ad8f4a28
AM
12690+ kobj = shared->dir_kobj;
12691+ } else {
12692+ if (!strncmp(module->directory, "[ROOT]", 6))
7f9d2ee0 12693+ kobj = tuxonice_kobj;
ad8f4a28
AM
12694+ else
12695+ kobj = make_toi_sysdir(module->directory);
12696+ }
12697+ module->dir_kobj = kobj;
12698+ for (i = 0; i < module->num_sysfs_entries; i++) {
12699+ int result = toi_register_sysfs_file(kobj,
12700+ &module->sysfs_data[i]);
12701+ if (result)
12702+ return result;
4e97e4e9 12703+ }
24613191 12704+ return 0;
12705+}
12706+
4e97e4e9 12707+/*
12708+ * toi_unregister_module
12709+ *
12710+ * Remove a module.
12711+ */
12712+void toi_unregister_module(struct toi_module_ops *module)
24613191 12713+{
4e97e4e9 12714+ int i;
24613191 12715+
4e97e4e9 12716+ if (module->dir_kobj)
ad8f4a28
AM
12717+ for (i = 0; i < module->num_sysfs_entries; i++)
12718+ toi_unregister_sysfs_file(module->dir_kobj,
12719+ &module->sysfs_data[i]);
24613191 12720+
4e97e4e9 12721+ if (!module->shared_directory && module->directory &&
12722+ strncmp(module->directory, "[ROOT]", 6))
12723+ remove_toi_sysdir(module->dir_kobj);
24613191 12724+
4e97e4e9 12725+ switch (module->type) {
ad8f4a28
AM
12726+ case FILTER_MODULE:
12727+ list_del(&module->type_list);
12728+ toi_num_filters--;
12729+ break;
12730+ case WRITER_MODULE:
12731+ list_del(&module->type_list);
12732+ toiNumAllocators--;
12733+ if (toiActiveAllocator == module) {
12734+ toiActiveAllocator = NULL;
12735+ clear_toi_state(TOI_CAN_RESUME);
12736+ clear_toi_state(TOI_CAN_HIBERNATE);
12737+ }
12738+ break;
12739+ case MISC_MODULE:
12740+ case MISC_HIDDEN_MODULE:
12741+ break;
12742+ default:
12743+ printk("Hmmm. Module '%s' has an invalid type."
12744+ " It has been ignored.\n", module->name);
12745+ return;
4e97e4e9 12746+ }
12747+ list_del(&module->module_list);
12748+ toi_num_modules--;
24613191 12749+}
24613191 12750+
24613191 12751+/*
4e97e4e9 12752+ * toi_move_module_tail
24613191 12753+ *
4e97e4e9 12754+ * Rearrange modules when reloading the config.
24613191 12755+ */
4e97e4e9 12756+void toi_move_module_tail(struct toi_module_ops *module)
e8d0ad9d 12757+{
4e97e4e9 12758+ switch (module->type) {
ad8f4a28
AM
12759+ case FILTER_MODULE:
12760+ if (toi_num_filters > 1)
12761+ list_move_tail(&module->type_list, &toi_filters);
12762+ break;
12763+ case WRITER_MODULE:
12764+ if (toiNumAllocators > 1)
12765+ list_move_tail(&module->type_list, &toiAllocators);
12766+ break;
12767+ case MISC_MODULE:
12768+ case MISC_HIDDEN_MODULE:
12769+ break;
12770+ default:
12771+ printk("Hmmm. Module '%s' has an invalid type."
12772+ " It has been ignored.\n", module->name);
12773+ return;
4e97e4e9 12774+ }
12775+ if ((toi_num_filters + toiNumAllocators) > 1)
12776+ list_move_tail(&module->module_list, &toi_modules);
e8d0ad9d 12777+}
12778+
e8d0ad9d 12779+/*
4e97e4e9 12780+ * toi_initialise_modules
e8d0ad9d 12781+ *
4e97e4e9 12782+ * Get ready to do some work!
e8d0ad9d 12783+ */
ad8f4a28 12784+int toi_initialise_modules(int starting_cycle, int early)
4e97e4e9 12785+{
12786+ struct toi_module_ops *this_module;
12787+ int result;
ad8f4a28 12788+
4e97e4e9 12789+ list_for_each_entry(this_module, &toi_modules, module_list) {
12790+ this_module->header_requested = 0;
12791+ this_module->header_used = 0;
12792+ if (!this_module->enabled)
12793+ continue;
ad8f4a28
AM
12794+ if (this_module->early != early)
12795+ continue;
4e97e4e9 12796+ if (this_module->initialise) {
12797+ toi_message(TOI_MEMORY, TOI_MEDIUM, 1,
12798+ "Initialising module %s.\n",
12799+ this_module->name);
ad8f4a28
AM
12800+ result = this_module->initialise(starting_cycle);
12801+ if (result)
4e97e4e9 12802+ return result;
4e97e4e9 12803+ }
12804+ }
43540741 12805+
4e97e4e9 12806+ return 0;
12807+}
43540741 12808+
ad8f4a28 12809+/*
4e97e4e9 12810+ * toi_cleanup_modules
12811+ *
12812+ * Tell modules the work is done.
12813+ */
12814+void toi_cleanup_modules(int finishing_cycle)
43540741 12815+{
4e97e4e9 12816+ struct toi_module_ops *this_module;
ad8f4a28 12817+
4e97e4e9 12818+ list_for_each_entry(this_module, &toi_modules, module_list) {
12819+ if (!this_module->enabled)
12820+ continue;
12821+ if (this_module->cleanup) {
12822+ toi_message(TOI_MEMORY, TOI_MEDIUM, 1,
12823+ "Cleaning up module %s.\n",
12824+ this_module->name);
12825+ this_module->cleanup(finishing_cycle);
12826+ }
43540741 12827+ }
43540741 12828+}
e8d0ad9d 12829+
4e97e4e9 12830+/*
12831+ * toi_get_next_filter
12832+ *
12833+ * Get the next filter in the pipeline.
12834+ */
12835+struct toi_module_ops *toi_get_next_filter(struct toi_module_ops *filter_sought)
12836+{
12837+ struct toi_module_ops *last_filter = NULL, *this_filter = NULL;
e8d0ad9d 12838+
4e97e4e9 12839+ list_for_each_entry(this_filter, &toi_filters, type_list) {
12840+ if (!this_filter->enabled)
12841+ continue;
12842+ if ((last_filter == filter_sought) || (!filter_sought))
12843+ return this_filter;
12844+ last_filter = this_filter;
12845+ }
e8d0ad9d 12846+
4e97e4e9 12847+ return toiActiveAllocator;
e8d0ad9d 12848+}
12849+
4e97e4e9 12850+/**
12851+ * toi_show_modules: Printk what support is loaded.
12852+ */
12853+void toi_print_modules(void)
e8d0ad9d 12854+{
4e97e4e9 12855+ struct toi_module_ops *this_module;
12856+ int prev = 0;
e8d0ad9d 12857+
4e97e4e9 12858+ printk("TuxOnIce " TOI_CORE_VERSION ", with support for");
ad8f4a28 12859+
4e97e4e9 12860+ list_for_each_entry(this_module, &toi_modules, module_list) {
12861+ if (this_module->type == MISC_HIDDEN_MODULE)
12862+ continue;
12863+ printk("%s %s%s%s", prev ? "," : "",
12864+ this_module->enabled ? "" : "[",
12865+ this_module->name,
12866+ this_module->enabled ? "" : "]");
12867+ prev = 1;
12868+ }
e8d0ad9d 12869+
4e97e4e9 12870+ printk(".\n");
e8d0ad9d 12871+}
12872+
4e97e4e9 12873+/* toi_get_modules
ad8f4a28 12874+ *
4e97e4e9 12875+ * Take a reference to modules so they can't go away under us.
12876+ */
e8d0ad9d 12877+
4e97e4e9 12878+int toi_get_modules(void)
e8d0ad9d 12879+{
4e97e4e9 12880+ struct toi_module_ops *this_module;
ad8f4a28 12881+
4e97e4e9 12882+ list_for_each_entry(this_module, &toi_modules, module_list) {
ad8f4a28
AM
12883+ struct toi_module_ops *this_module2;
12884+
12885+ if (try_module_get(this_module->module))
12886+ continue;
12887+
12888+ /* Failed! Reverse gets and return error */
12889+ list_for_each_entry(this_module2, &toi_modules,
12890+ module_list) {
12891+ if (this_module == this_module2)
12892+ return -EINVAL;
12893+ module_put(this_module2->module);
4e97e4e9 12894+ }
12895+ }
4e97e4e9 12896+ return 0;
e8d0ad9d 12897+}
12898+
4e97e4e9 12899+/* toi_put_modules
12900+ *
12901+ * Release our references to modules we used.
12902+ */
12903+
12904+void toi_put_modules(void)
43540741 12905+{
4e97e4e9 12906+ struct toi_module_ops *this_module;
ad8f4a28 12907+
4e97e4e9 12908+ list_for_each_entry(this_module, &toi_modules, module_list)
12909+ module_put(this_module->module);
43540741 12910+}
4e97e4e9 12911+
12912+#ifdef CONFIG_TOI_EXPORTS
12913+EXPORT_SYMBOL_GPL(toi_register_module);
12914+EXPORT_SYMBOL_GPL(toi_unregister_module);
12915+EXPORT_SYMBOL_GPL(toi_get_next_filter);
12916+EXPORT_SYMBOL_GPL(toiActiveAllocator);
43540741 12917+#endif
4e97e4e9 12918diff --git a/kernel/power/tuxonice_modules.h b/kernel/power/tuxonice_modules.h
12919new file mode 100644
7f9d2ee0 12920index 0000000..8397ddf
4e97e4e9 12921--- /dev/null
12922+++ b/kernel/power/tuxonice_modules.h
7f9d2ee0 12923@@ -0,0 +1,176 @@
4e97e4e9 12924+/*
12925+ * kernel/power/tuxonice_modules.h
12926+ *
12927+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
12928+ *
12929+ * This file is released under the GPLv2.
12930+ *
12931+ * It contains declarations for modules. Modules are additions to
12932+ * TuxOnIce that provide facilities such as image compression or
12933+ * encryption, backends for storage of the image and user interfaces.
12934+ *
12935+ */
43540741 12936+
4e97e4e9 12937+#ifndef TOI_MODULES_H
12938+#define TOI_MODULES_H
e8d0ad9d 12939+
4e97e4e9 12940+/* This is the maximum size we store in the image header for a module name */
12941+#define TOI_MAX_MODULE_NAME_LENGTH 30
e8d0ad9d 12942+
4e97e4e9 12943+/* Per-module metadata */
12944+struct toi_module_header {
12945+ char name[TOI_MAX_MODULE_NAME_LENGTH];
12946+ int enabled;
12947+ int type;
12948+ int index;
12949+ int data_length;
12950+ unsigned long signature;
12951+};
e8d0ad9d 12952+
4e97e4e9 12953+enum {
12954+ FILTER_MODULE,
12955+ WRITER_MODULE,
12956+ MISC_MODULE, /* Block writer, eg. */
12957+ MISC_HIDDEN_MODULE,
12958+};
e8d0ad9d 12959+
4e97e4e9 12960+enum {
12961+ TOI_ASYNC,
12962+ TOI_SYNC
12963+};
e8d0ad9d 12964+
4e97e4e9 12965+struct toi_module_ops {
12966+ /* Functions common to all modules */
12967+ int type;
12968+ char *name;
12969+ char *directory;
12970+ char *shared_directory;
12971+ struct kobject *dir_kobj;
12972+ struct module *module;
ad8f4a28 12973+ int enabled, early;
4e97e4e9 12974+ struct list_head module_list;
e8d0ad9d 12975+
4e97e4e9 12976+ /* List of filters or allocators */
12977+ struct list_head list, type_list;
e8d0ad9d 12978+
4e97e4e9 12979+ /*
12980+ * Requirements for memory and storage in
12981+ * the image header..
12982+ */
12983+ int (*memory_needed) (void);
12984+ int (*storage_needed) (void);
e8d0ad9d 12985+
4e97e4e9 12986+ int header_requested, header_used;
e8d0ad9d 12987+
4e97e4e9 12988+ int (*expected_compression) (void);
ad8f4a28
AM
12989+
12990+ /*
4e97e4e9 12991+ * Debug info
12992+ */
12993+ int (*print_debug_info) (char *buffer, int size);
12994+ int (*save_config_info) (char *buffer);
12995+ void (*load_config_info) (char *buffer, int len);
e8d0ad9d 12996+
ad8f4a28 12997+ /*
4e97e4e9 12998+ * Initialise & cleanup - general routines called
12999+ * at the start and end of a cycle.
13000+ */
13001+ int (*initialise) (int starting_cycle);
13002+ void (*cleanup) (int finishing_cycle);
e8d0ad9d 13003+
ad8f4a28 13004+ /*
4e97e4e9 13005+ * Calls for allocating storage (allocators only).
13006+ *
13007+ * Header space is allocated separately. Note that allocation
ad8f4a28 13008+ * of space for the header might result in allocated space
4e97e4e9 13009+ * being stolen from the main pool if there is no unallocated
13010+ * space. We have to be able to allocate enough space for
13011+ * the header. We can eat memory to ensure there is enough
13012+ * for the main pool.
13013+ */
e8d0ad9d 13014+
4e97e4e9 13015+ int (*storage_available) (void);
7f9d2ee0 13016+ void (*reserve_header_space) (int space_requested);
4e97e4e9 13017+ int (*allocate_storage) (int space_requested);
13018+ int (*storage_allocated) (void);
13019+ int (*release_storage) (void);
ad8f4a28 13020+
4e97e4e9 13021+ /*
13022+ * Routines used in image I/O.
13023+ */
13024+ int (*rw_init) (int rw, int stream_number);
13025+ int (*rw_cleanup) (int rw);
13026+ int (*write_page) (unsigned long index, struct page *buffer_page,
13027+ unsigned int buf_size);
13028+ int (*read_page) (unsigned long *index, struct page *buffer_page,
13029+ unsigned int *buf_size);
7f9d2ee0 13030+ void (*io_flusher) (int rw);
e8d0ad9d 13031+
4e97e4e9 13032+ /* Reset module if image exists but reading aborted */
13033+ void (*noresume_reset) (void);
e8d0ad9d 13034+
ad8f4a28 13035+ /* Read and write the metadata */
4e97e4e9 13036+ int (*write_header_init) (void);
13037+ int (*write_header_cleanup) (void);
e8d0ad9d 13038+
4e97e4e9 13039+ int (*read_header_init) (void);
13040+ int (*read_header_cleanup) (void);
13041+
13042+ int (*rw_header_chunk) (int rw, struct toi_module_ops *owner,
13043+ char *buffer_start, int buffer_size);
ad8f4a28 13044+
7f9d2ee0 13045+ int (*rw_header_chunk_noreadahead) (int rw,
13046+ struct toi_module_ops *owner, char *buffer_start,
13047+ int buffer_size);
13048+
4e97e4e9 13049+ /* Attempt to parse an image location */
13050+ int (*parse_sig_location) (char *buffer, int only_writer, int quiet);
e8d0ad9d 13051+
4e97e4e9 13052+ /* Determine whether image exists that we can restore */
7f9d2ee0 13053+ int (*image_exists) (int quiet);
ad8f4a28 13054+
4e97e4e9 13055+ /* Mark the image as having tried to resume */
7f9d2ee0 13056+ int (*mark_resume_attempted) (int);
e8d0ad9d 13057+
4e97e4e9 13058+ /* Destroy image if one exists */
13059+ int (*remove_image) (void);
ad8f4a28 13060+
4e97e4e9 13061+ /* Sysfs Data */
13062+ struct toi_sysfs_data *sysfs_data;
13063+ int num_sysfs_entries;
13064+};
e8d0ad9d 13065+
4e97e4e9 13066+extern int toi_num_modules, toiNumAllocators;
e8d0ad9d 13067+
4e97e4e9 13068+extern struct toi_module_ops *toiActiveAllocator;
13069+extern struct list_head toi_filters, toiAllocators, toi_modules;
24613191 13070+
4e97e4e9 13071+extern void toi_prepare_console_modules(void);
13072+extern void toi_cleanup_console_modules(void);
24613191 13073+
4e97e4e9 13074+extern struct toi_module_ops *toi_find_module_given_name(char *name);
13075+extern struct toi_module_ops *toi_get_next_filter(struct toi_module_ops *);
24613191 13076+
4e97e4e9 13077+extern int toi_register_module(struct toi_module_ops *module);
13078+extern void toi_move_module_tail(struct toi_module_ops *module);
13079+
7f9d2ee0 13080+extern long toi_header_storage_for_modules(void);
13081+extern long toi_memory_for_modules(int print_parts);
4e97e4e9 13082+extern int toi_expected_compression_ratio(void);
13083+
13084+extern int toi_print_module_debug_info(char *buffer, int buffer_size);
13085+extern int toi_register_module(struct toi_module_ops *module);
13086+extern void toi_unregister_module(struct toi_module_ops *module);
e8d0ad9d 13087+
ad8f4a28
AM
13088+extern int toi_initialise_modules(int starting_cycle, int early);
13089+#define toi_initialise_modules_early(starting) \
13090+ toi_initialise_modules(starting, 1)
13091+#define toi_initialise_modules_late(starting) \
13092+ toi_initialise_modules(starting, 0)
4e97e4e9 13093+extern void toi_cleanup_modules(int finishing_cycle);
13094+
13095+extern void toi_print_modules(void);
13096+
13097+int toi_get_modules(void);
13098+void toi_put_modules(void);
13099+#endif
13100diff --git a/kernel/power/tuxonice_netlink.c b/kernel/power/tuxonice_netlink.c
13101new file mode 100644
7f9d2ee0 13102index 0000000..f459a7f
4e97e4e9 13103--- /dev/null
13104+++ b/kernel/power/tuxonice_netlink.c
ad8f4a28 13105@@ -0,0 +1,323 @@
24613191 13106+/*
4e97e4e9 13107+ * kernel/power/tuxonice_netlink.c
24613191 13108+ *
4e97e4e9 13109+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
24613191 13110+ *
4e97e4e9 13111+ * This file is released under the GPLv2.
24613191 13112+ *
4e97e4e9 13113+ * Functions for communicating with a userspace helper via netlink.
24613191 13114+ */
13115+
4e97e4e9 13116+
24613191 13117+#include <linux/suspend.h>
4e97e4e9 13118+#include "tuxonice_netlink.h"
13119+#include "tuxonice.h"
13120+#include "tuxonice_modules.h"
ad8f4a28 13121+#include "tuxonice_alloc.h"
24613191 13122+
ad8f4a28 13123+struct user_helper_data *uhd_list;
24613191 13124+
ad8f4a28
AM
13125+/*
13126+ * Refill our pool of SKBs for use in emergencies (eg, when eating memory and
13127+ * none can be allocated).
4e97e4e9 13128+ */
13129+static void toi_fill_skb_pool(struct user_helper_data *uhd)
13130+{
13131+ while (uhd->pool_level < uhd->pool_limit) {
13132+ struct sk_buff *new_skb =
13133+ alloc_skb(NLMSG_SPACE(uhd->skb_size), TOI_ATOMIC_GFP);
24613191 13134+
4e97e4e9 13135+ if (!new_skb)
13136+ break;
24613191 13137+
4e97e4e9 13138+ new_skb->next = uhd->emerg_skbs;
13139+ uhd->emerg_skbs = new_skb;
13140+ uhd->pool_level++;
13141+ }
13142+}
e8d0ad9d 13143+
ad8f4a28 13144+/*
4e97e4e9 13145+ * Try to allocate a single skb. If we can't get one, try to use one from
13146+ * our pool.
13147+ */
13148+static struct sk_buff *toi_get_skb(struct user_helper_data *uhd)
13149+{
13150+ struct sk_buff *skb =
13151+ alloc_skb(NLMSG_SPACE(uhd->skb_size), TOI_ATOMIC_GFP);
24613191 13152+
4e97e4e9 13153+ if (skb)
13154+ return skb;
13155+
13156+ skb = uhd->emerg_skbs;
13157+ if (skb) {
13158+ uhd->pool_level--;
13159+ uhd->emerg_skbs = skb->next;
13160+ skb->next = NULL;
13161+ }
13162+
13163+ return skb;
13164+}
13165+
13166+static void put_skb(struct user_helper_data *uhd, struct sk_buff *skb)
13167+{
13168+ if (uhd->pool_level < uhd->pool_limit) {
13169+ skb->next = uhd->emerg_skbs;
13170+ uhd->emerg_skbs = skb;
13171+ } else
13172+ kfree_skb(skb);
13173+}
13174+
13175+void toi_send_netlink_message(struct user_helper_data *uhd,
ad8f4a28 13176+ int type, void *params, size_t len)
4e97e4e9 13177+{
13178+ struct sk_buff *skb;
13179+ struct nlmsghdr *nlh;
13180+ void *dest;
13181+ struct task_struct *t;
24613191 13182+
4e97e4e9 13183+ if (uhd->pid == -1)
13184+ return;
24613191 13185+
4e97e4e9 13186+ skb = toi_get_skb(uhd);
13187+ if (!skb) {
ad8f4a28 13188+ printk(KERN_INFO "toi_netlink: Can't allocate skb!\n");
4e97e4e9 13189+ return;
13190+ }
24613191 13191+
4e97e4e9 13192+ /* NLMSG_PUT contains a hidden goto nlmsg_failure */
13193+ nlh = NLMSG_PUT(skb, 0, uhd->sock_seq, type, len);
13194+ uhd->sock_seq++;
24613191 13195+
4e97e4e9 13196+ dest = NLMSG_DATA(nlh);
13197+ if (params && len > 0)
13198+ memcpy(dest, params, len);
24613191 13199+
4e97e4e9 13200+ netlink_unicast(uhd->nl, skb, uhd->pid, 0);
24613191 13201+
4e97e4e9 13202+ read_lock(&tasklist_lock);
ad8f4a28
AM
13203+ t = find_task_by_pid(uhd->pid);
13204+ if (!t) {
4e97e4e9 13205+ read_unlock(&tasklist_lock);
13206+ if (uhd->pid > -1)
ad8f4a28
AM
13207+ printk(KERN_INFO "Hmm. Can't find the userspace task"
13208+ " %d.\n", uhd->pid);
4e97e4e9 13209+ return;
13210+ }
13211+ wake_up_process(t);
13212+ read_unlock(&tasklist_lock);
24613191 13213+
4e97e4e9 13214+ yield();
24613191 13215+
4e97e4e9 13216+ return;
24613191 13217+
4e97e4e9 13218+nlmsg_failure:
13219+ if (skb)
13220+ put_skb(uhd, skb);
13221+}
ad8f4a28 13222+EXPORT_SYMBOL_GPL(toi_send_netlink_message);
24613191 13223+
4e97e4e9 13224+static void send_whether_debugging(struct user_helper_data *uhd)
13225+{
13226+ static int is_debugging = 1;
24613191 13227+
4e97e4e9 13228+ toi_send_netlink_message(uhd, NETLINK_MSG_IS_DEBUGGING,
13229+ &is_debugging, sizeof(int));
13230+}
13231+
13232+/*
13233+ * Set the PF_NOFREEZE flag on the given process to ensure it can run whilst we
13234+ * are hibernating.
13235+ */
13236+static int nl_set_nofreeze(struct user_helper_data *uhd, int pid)
13237+{
13238+ struct task_struct *t;
24613191 13239+
4e97e4e9 13240+ read_lock(&tasklist_lock);
ad8f4a28
AM
13241+ t = find_task_by_pid(pid);
13242+ if (!t) {
4e97e4e9 13243+ read_unlock(&tasklist_lock);
ad8f4a28
AM
13244+ printk(KERN_INFO "Strange. Can't find the userspace task %d.\n",
13245+ pid);
4e97e4e9 13246+ return -EINVAL;
13247+ }
13248+
13249+ t->flags |= PF_NOFREEZE;
24613191 13250+
4e97e4e9 13251+ read_unlock(&tasklist_lock);
13252+ uhd->pid = pid;
24613191 13253+
4e97e4e9 13254+ toi_send_netlink_message(uhd, NETLINK_MSG_NOFREEZE_ACK, NULL, 0);
24613191 13255+
4e97e4e9 13256+ return 0;
13257+}
24613191 13258+
13259+/*
4e97e4e9 13260+ * Called when the userspace process has informed us that it's ready to roll.
24613191 13261+ */
4e97e4e9 13262+static int nl_ready(struct user_helper_data *uhd, int version)
24613191 13263+{
4e97e4e9 13264+ if (version != uhd->interface_version) {
ad8f4a28
AM
13265+ printk(KERN_INFO "%s userspace process using invalid interface"
13266+ " version. Trying to continue without it.\n",
4e97e4e9 13267+ uhd->name);
13268+ if (uhd->not_ready)
13269+ uhd->not_ready();
ad8f4a28 13270+ return -EINVAL;
24613191 13271+ }
13272+
4e97e4e9 13273+ complete(&uhd->wait_for_process);
13274+
13275+ return 0;
24613191 13276+}
13277+
4e97e4e9 13278+void toi_netlink_close_complete(struct user_helper_data *uhd)
13279+{
13280+ if (uhd->nl) {
13281+ sock_release(uhd->nl->sk_socket);
13282+ uhd->nl = NULL;
13283+ }
13284+
13285+ while (uhd->emerg_skbs) {
13286+ struct sk_buff *next = uhd->emerg_skbs->next;
13287+ kfree_skb(uhd->emerg_skbs);
13288+ uhd->emerg_skbs = next;
13289+ }
24613191 13290+
4e97e4e9 13291+ uhd->pid = -1;
4e97e4e9 13292+}
13293+
13294+static int toi_nl_gen_rcv_msg(struct user_helper_data *uhd,
13295+ struct sk_buff *skb, struct nlmsghdr *nlh)
24613191 13296+{
4e97e4e9 13297+ int type;
13298+ int *data;
13299+ int err;
24613191 13300+
4e97e4e9 13301+ /* Let the more specific handler go first. It returns
13302+ * 1 for valid messages that it doesn't know. */
ad8f4a28
AM
13303+ err = uhd->rcv_msg(skb, nlh);
13304+ if (err != 1)
4e97e4e9 13305+ return err;
ad8f4a28 13306+
4e97e4e9 13307+ type = nlh->nlmsg_type;
24613191 13308+
4e97e4e9 13309+ /* Only allow one task to receive NOFREEZE privileges */
13310+ if (type == NETLINK_MSG_NOFREEZE_ME && uhd->pid != -1) {
13311+ printk("Received extra nofreeze me requests.\n");
13312+ return -EBUSY;
24613191 13313+ }
ad8f4a28
AM
13314+
13315+ data = (int *)NLMSG_DATA(nlh);
13316+
13317+ switch (type) {
13318+ case NETLINK_MSG_NOFREEZE_ME:
13319+ return nl_set_nofreeze(uhd, nlh->nlmsg_pid);
13320+ case NETLINK_MSG_GET_DEBUGGING:
13321+ send_whether_debugging(uhd);
13322+ return 0;
13323+ case NETLINK_MSG_READY:
13324+ if (nlh->nlmsg_len < NLMSG_LENGTH(sizeof(int))) {
13325+ printk(KERN_INFO "Invalid ready mesage.\n");
13326+ return -EINVAL;
13327+ }
13328+ return nl_ready(uhd, *data);
13329+ case NETLINK_MSG_CLEANUP:
13330+ toi_netlink_close_complete(uhd);
13331+ return 0;
4e97e4e9 13332+ }
24613191 13333+
ad8f4a28 13334+ return -EINVAL;
4e97e4e9 13335+}
13336+
ad8f4a28 13337+static void toi_user_rcv_skb(struct sk_buff *skb)
24613191 13338+{
4e97e4e9 13339+ int err;
13340+ struct nlmsghdr *nlh;
ad8f4a28
AM
13341+ struct user_helper_data *uhd = uhd_list;
13342+
13343+ while (uhd && uhd->netlink_id != skb->sk->sk_protocol)
13344+ uhd = uhd->next;
13345+
13346+ if (!uhd)
13347+ return;
24613191 13348+
4e97e4e9 13349+ while (skb->len >= NLMSG_SPACE(0)) {
13350+ u32 rlen;
24613191 13351+
4e97e4e9 13352+ nlh = (struct nlmsghdr *) skb->data;
13353+ if (nlh->nlmsg_len < sizeof(*nlh) || skb->len < nlh->nlmsg_len)
13354+ return;
24613191 13355+
4e97e4e9 13356+ rlen = NLMSG_ALIGN(nlh->nlmsg_len);
13357+ if (rlen > skb->len)
13358+ rlen = skb->len;
24613191 13359+
ad8f4a28
AM
13360+ err = toi_nl_gen_rcv_msg(uhd, skb, nlh);
13361+ if (err)
4e97e4e9 13362+ netlink_ack(skb, nlh, err);
13363+ else if (nlh->nlmsg_flags & NLM_F_ACK)
13364+ netlink_ack(skb, nlh, 0);
13365+ skb_pull(skb, rlen);
24613191 13366+ }
24613191 13367+}
13368+
4e97e4e9 13369+static int netlink_prepare(struct user_helper_data *uhd)
24613191 13370+{
4e97e4e9 13371+ uhd->next = uhd_list;
13372+ uhd_list = uhd;
13373+
13374+ uhd->sock_seq = 0x42c0ffee;
ad8f4a28
AM
13375+ uhd->nl = netlink_kernel_create(&init_net, uhd->netlink_id, 0,
13376+ toi_user_rcv_skb, NULL, THIS_MODULE);
4e97e4e9 13377+ if (!uhd->nl) {
ad8f4a28 13378+ printk(KERN_INFO "Failed to allocate netlink socket for %s.\n",
4e97e4e9 13379+ uhd->name);
13380+ return -ENOMEM;
13381+ }
13382+
13383+ toi_fill_skb_pool(uhd);
13384+
13385+ return 0;
24613191 13386+}
13387+
4e97e4e9 13388+void toi_netlink_close(struct user_helper_data *uhd)
24613191 13389+{
4e97e4e9 13390+ struct task_struct *t;
13391+
13392+ read_lock(&tasklist_lock);
ad8f4a28
AM
13393+ t = find_task_by_pid(uhd->pid);
13394+ if (t)
4e97e4e9 13395+ t->flags &= ~PF_NOFREEZE;
13396+ read_unlock(&tasklist_lock);
24613191 13397+
4e97e4e9 13398+ toi_send_netlink_message(uhd, NETLINK_MSG_CLEANUP, NULL, 0);
24613191 13399+}
ad8f4a28 13400+EXPORT_SYMBOL_GPL(toi_netlink_close);
24613191 13401+
4e97e4e9 13402+int toi_netlink_setup(struct user_helper_data *uhd)
24613191 13403+{
4e97e4e9 13404+ if (netlink_prepare(uhd) < 0) {
ad8f4a28 13405+ printk(KERN_INFO "Netlink prepare failed.\n");
4e97e4e9 13406+ return 1;
13407+ }
24613191 13408+
ad8f4a28 13409+ if (toi_launch_userspace_program(uhd->program, uhd->netlink_id,
7f9d2ee0 13410+ UMH_WAIT_EXEC) < 0) {
ad8f4a28 13411+ printk(KERN_INFO "Launch userspace program failed.\n");
4e97e4e9 13412+ toi_netlink_close_complete(uhd);
13413+ return 1;
13414+ }
24613191 13415+
4e97e4e9 13416+ /* Wait 2 seconds for the userspace process to make contact */
13417+ wait_for_completion_timeout(&uhd->wait_for_process, 2*HZ);
24613191 13418+
4e97e4e9 13419+ if (uhd->pid == -1) {
ad8f4a28 13420+ printk(KERN_INFO "%s: Failed to contact userspace process.\n",
4e97e4e9 13421+ uhd->name);
13422+ toi_netlink_close_complete(uhd);
13423+ return 1;
13424+ }
24613191 13425+
24613191 13426+ return 0;
13427+}
4e97e4e9 13428+EXPORT_SYMBOL_GPL(toi_netlink_setup);
4e97e4e9 13429diff --git a/kernel/power/tuxonice_netlink.h b/kernel/power/tuxonice_netlink.h
13430new file mode 100644
ad8f4a28 13431index 0000000..721c222
4e97e4e9 13432--- /dev/null
13433+++ b/kernel/power/tuxonice_netlink.h
13434@@ -0,0 +1,58 @@
13435+/*
13436+ * kernel/power/tuxonice_netlink.h
24613191 13437+ *
4e97e4e9 13438+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
24613191 13439+ *
4e97e4e9 13440+ * This file is released under the GPLv2.
13441+ *
13442+ * Declarations for functions for communicating with a userspace helper
13443+ * via netlink.
24613191 13444+ */
13445+
4e97e4e9 13446+#include <linux/netlink.h>
13447+#include <net/sock.h>
24613191 13448+
4e97e4e9 13449+#define NETLINK_MSG_BASE 0x10
24613191 13450+
4e97e4e9 13451+#define NETLINK_MSG_READY 0x10
13452+#define NETLINK_MSG_NOFREEZE_ME 0x16
13453+#define NETLINK_MSG_GET_DEBUGGING 0x19
13454+#define NETLINK_MSG_CLEANUP 0x24
13455+#define NETLINK_MSG_NOFREEZE_ACK 0x27
13456+#define NETLINK_MSG_IS_DEBUGGING 0x28
24613191 13457+
4e97e4e9 13458+struct user_helper_data {
13459+ int (*rcv_msg) (struct sk_buff *skb, struct nlmsghdr *nlh);
ad8f4a28 13460+ void (*not_ready) (void);
4e97e4e9 13461+ struct sock *nl;
13462+ u32 sock_seq;
13463+ pid_t pid;
13464+ char *comm;
13465+ char program[256];
13466+ int pool_level;
13467+ int pool_limit;
13468+ struct sk_buff *emerg_skbs;
13469+ int skb_size;
ad8f4a28 13470+ int netlink_id;
4e97e4e9 13471+ char *name;
13472+ struct user_helper_data *next;
13473+ struct completion wait_for_process;
13474+ int interface_version;
13475+ int must_init;
13476+};
24613191 13477+
4e97e4e9 13478+#ifdef CONFIG_NET
13479+int toi_netlink_setup(struct user_helper_data *uhd);
13480+void toi_netlink_close(struct user_helper_data *uhd);
13481+void toi_send_netlink_message(struct user_helper_data *uhd,
ad8f4a28 13482+ int type, void *params, size_t len);
4e97e4e9 13483+#else
13484+static inline int toi_netlink_setup(struct user_helper_data *uhd)
13485+{
24613191 13486+ return 0;
13487+}
13488+
4e97e4e9 13489+static inline void toi_netlink_close(struct user_helper_data *uhd) { };
13490+static inline void toi_send_netlink_message(struct user_helper_data *uhd,
ad8f4a28 13491+ int type, void *params, size_t len) { };
4e97e4e9 13492+#endif
13493diff --git a/kernel/power/tuxonice_pagedir.c b/kernel/power/tuxonice_pagedir.c
13494new file mode 100644
ad8f4a28 13495index 0000000..93b65cd
4e97e4e9 13496--- /dev/null
13497+++ b/kernel/power/tuxonice_pagedir.c
ad8f4a28 13498@@ -0,0 +1,347 @@
4e97e4e9 13499+/*
13500+ * kernel/power/tuxonice_pagedir.c
13501+ *
13502+ * Copyright (C) 1998-2001 Gabor Kuti <seasons@fornax.hu>
13503+ * Copyright (C) 1998,2001,2002 Pavel Machek <pavel@suse.cz>
13504+ * Copyright (C) 2002-2003 Florent Chabaud <fchabaud@free.fr>
13505+ * Copyright (C) 2006-2007 Nigel Cunningham (nigel at tuxonice net)
13506+ *
13507+ * This file is released under the GPLv2.
13508+ *
13509+ * Routines for handling pagesets.
13510+ * Note that pbes aren't actually stored as such. They're stored as
13511+ * bitmaps and extents.
24613191 13512+ */
24613191 13513+
4e97e4e9 13514+#include <linux/suspend.h>
13515+#include <linux/highmem.h>
13516+#include <linux/bootmem.h>
13517+#include <linux/hardirq.h>
13518+#include <linux/sched.h>
13519+#include <asm/tlbflush.h>
24613191 13520+
4e97e4e9 13521+#include "tuxonice_pageflags.h"
13522+#include "tuxonice_ui.h"
13523+#include "tuxonice_pagedir.h"
13524+#include "tuxonice_prepare_image.h"
13525+#include "tuxonice.h"
13526+#include "power.h"
13527+#include "tuxonice_builtin.h"
ad8f4a28 13528+#include "tuxonice_alloc.h"
24613191 13529+
4e97e4e9 13530+static int ptoi_pfn;
ad8f4a28
AM
13531+static struct pbe *this_low_pbe;
13532+static struct pbe **last_low_pbe_ptr;
24613191 13533+
4e97e4e9 13534+void toi_reset_alt_image_pageset2_pfn(void)
24613191 13535+{
4e97e4e9 13536+ ptoi_pfn = max_pfn + 1;
24613191 13537+}
13538+
4e97e4e9 13539+static struct page *first_conflicting_page;
13540+
24613191 13541+/*
4e97e4e9 13542+ * free_conflicting_pages
24613191 13543+ */
24613191 13544+
4e97e4e9 13545+void free_conflicting_pages(void)
24613191 13546+{
4e97e4e9 13547+ while (first_conflicting_page) {
ad8f4a28
AM
13548+ struct page *next =
13549+ *((struct page **) kmap(first_conflicting_page));
4e97e4e9 13550+ kunmap(first_conflicting_page);
ad8f4a28 13551+ toi__free_page(29, first_conflicting_page);
4e97e4e9 13552+ first_conflicting_page = next;
13553+ }
13554+}
24613191 13555+
4e97e4e9 13556+/* __toi_get_nonconflicting_page
13557+ *
13558+ * Description: Gets order zero pages that won't be overwritten
13559+ * while copying the original pages.
13560+ */
24613191 13561+
ad8f4a28 13562+struct page *___toi_get_nonconflicting_page(int can_be_highmem)
4e97e4e9 13563+{
13564+ struct page *page;
13565+ int flags = TOI_ATOMIC_GFP;
13566+ if (can_be_highmem)
13567+ flags |= __GFP_HIGHMEM;
24613191 13568+
24613191 13569+
4e97e4e9 13570+ if (test_toi_state(TOI_LOADING_ALT_IMAGE) && pageset2_map.bitmap &&
13571+ (ptoi_pfn < (max_pfn + 2))) {
13572+ /*
ad8f4a28
AM
13573+ * ptoi_pfn = max_pfn + 1 when yet to find first ps2 pfn that
13574+ * can be used.
4e97e4e9 13575+ * = 0..max_pfn when going through list.
13576+ * = max_pfn + 2 when gone through whole list.
13577+ */
13578+ do {
13579+ ptoi_pfn = get_next_bit_on(&pageset2_map, ptoi_pfn);
13580+ if (ptoi_pfn <= max_pfn) {
13581+ page = pfn_to_page(ptoi_pfn);
13582+ if (!PagePageset1(page) &&
13583+ (can_be_highmem || !PageHighMem(page)))
13584+ return page;
13585+ } else
13586+ ptoi_pfn++;
13587+ } while (ptoi_pfn < max_pfn);
24613191 13588+ }
13589+
4e97e4e9 13590+ do {
ad8f4a28 13591+ page = toi_alloc_page(29, flags);
4e97e4e9 13592+ if (!page) {
ad8f4a28
AM
13593+ printk(KERN_INFO "Failed to get nonconflicting "
13594+ "page.\n");
4e97e4e9 13595+ return 0;
13596+ }
13597+ if (PagePageset1(page)) {
13598+ struct page **next = (struct page **) kmap(page);
13599+ *next = first_conflicting_page;
13600+ first_conflicting_page = page;
13601+ kunmap(page);
13602+ }
ad8f4a28 13603+ } while (PagePageset1(page));
24613191 13604+
4e97e4e9 13605+ return page;
24613191 13606+}
13607+
4e97e4e9 13608+unsigned long __toi_get_nonconflicting_page(void)
24613191 13609+{
4e97e4e9 13610+ struct page *page = ___toi_get_nonconflicting_page(0);
13611+ return page ? (unsigned long) page_address(page) : 0;
24613191 13612+}
13613+
ad8f4a28
AM
13614+struct pbe *get_next_pbe(struct page **page_ptr, struct pbe *this_pbe,
13615+ int highmem)
24613191 13616+{
ad8f4a28 13617+ if (((((unsigned long) this_pbe) & (PAGE_SIZE - 1))
4e97e4e9 13618+ + 2 * sizeof(struct pbe)) > PAGE_SIZE) {
13619+ struct page *new_page =
13620+ ___toi_get_nonconflicting_page(highmem);
13621+ if (!new_page)
13622+ return ERR_PTR(-ENOMEM);
13623+ this_pbe = (struct pbe *) kmap(new_page);
13624+ memset(this_pbe, 0, PAGE_SIZE);
13625+ *page_ptr = new_page;
13626+ } else
13627+ this_pbe++;
24613191 13628+
4e97e4e9 13629+ return this_pbe;
24613191 13630+}
13631+
4e97e4e9 13632+/* get_pageset1_load_addresses
ad8f4a28 13633+ *
4e97e4e9 13634+ * Description: We check here that pagedir & pages it points to won't collide
13635+ * with pages where we're going to restore from the loaded pages
13636+ * later.
13637+ * Returns: Zero on success, one if couldn't find enough pages (shouldn't
13638+ * happen).
13639+ */
13640+
13641+int toi_get_pageset1_load_addresses(void)
73c609d5 13642+{
4e97e4e9 13643+ int pfn, highallocd = 0, lowallocd = 0;
13644+ int low_needed = pagedir1.size - get_highmem_size(pagedir1);
13645+ int high_needed = get_highmem_size(pagedir1);
13646+ int low_pages_for_highmem = 0;
13647+ unsigned long flags = GFP_ATOMIC | __GFP_NOWARN | __GFP_HIGHMEM;
13648+ struct page *page, *high_pbe_page = NULL, *last_high_pbe_page = NULL,
13649+ *low_pbe_page;
ad8f4a28
AM
13650+ struct pbe **last_high_pbe_ptr = &restore_highmem_pblist,
13651+ *this_high_pbe = NULL;
4e97e4e9 13652+ int orig_low_pfn = max_pfn + 1, orig_high_pfn = max_pfn + 1;
ad8f4a28 13653+ int high_pbes_done = 0, low_pbes_done = 0;
4e97e4e9 13654+ int low_direct = 0, high_direct = 0;
13655+ int high_to_free, low_to_free;
73c609d5 13656+
ad8f4a28
AM
13657+ last_low_pbe_ptr = &restore_pblist;
13658+
4e97e4e9 13659+ /* First, allocate pages for the start of our pbe lists. */
13660+ if (high_needed) {
13661+ high_pbe_page = ___toi_get_nonconflicting_page(1);
13662+ if (!high_pbe_page)
13663+ return 1;
13664+ this_high_pbe = (struct pbe *) kmap(high_pbe_page);
13665+ memset(this_high_pbe, 0, PAGE_SIZE);
13666+ }
73c609d5 13667+
4e97e4e9 13668+ low_pbe_page = ___toi_get_nonconflicting_page(0);
13669+ if (!low_pbe_page)
13670+ return 1;
13671+ this_low_pbe = (struct pbe *) page_address(low_pbe_page);
73c609d5 13672+
ad8f4a28 13673+ /*
4e97e4e9 13674+ * Next, allocate all possible memory to find where we can
13675+ * load data directly into destination pages. I'd like to do
13676+ * this in bigger chunks, but then we can't free pages
13677+ * individually later.
13678+ */
73c609d5 13679+
4e97e4e9 13680+ do {
ad8f4a28 13681+ page = toi_alloc_page(30, flags);
4e97e4e9 13682+ if (page)
13683+ SetPagePageset1Copy(page);
13684+ } while (page);
73c609d5 13685+
ad8f4a28 13686+ /*
4e97e4e9 13687+ * Find out how many high- and lowmem pages we allocated above,
13688+ * and how many pages we can reload directly to their original
13689+ * location.
13690+ */
13691+ BITMAP_FOR_EACH_SET(&pageset1_copy_map, pfn) {
13692+ int is_high;
13693+ page = pfn_to_page(pfn);
13694+ is_high = PageHighMem(page);
13695+
13696+ if (PagePageset1(page)) {
13697+ if (test_action_state(TOI_NO_DIRECT_LOAD)) {
13698+ ClearPagePageset1Copy(page);
ad8f4a28 13699+ toi__free_page(30, page);
4e97e4e9 13700+ continue;
13701+ } else {
13702+ if (is_high)
13703+ high_direct++;
13704+ else
13705+ low_direct++;
13706+ }
13707+ } else {
13708+ if (is_high)
13709+ highallocd++;
13710+ else
13711+ lowallocd++;
73c609d5 13712+ }
4e97e4e9 13713+ }
73c609d5 13714+
ad8f4a28
AM
13715+ high_needed -= high_direct;
13716+ low_needed -= low_direct;
4e97e4e9 13717+
13718+ /*
13719+ * Do we need to use some lowmem pages for the copies of highmem
13720+ * pages?
13721+ */
13722+ if (high_needed > highallocd) {
13723+ low_pages_for_highmem = high_needed - highallocd;
13724+ high_needed -= low_pages_for_highmem;
13725+ low_needed += low_pages_for_highmem;
73c609d5 13726+ }
ad8f4a28 13727+
4e97e4e9 13728+ high_to_free = highallocd - high_needed;
13729+ low_to_free = lowallocd - low_needed;
73c609d5 13730+
4e97e4e9 13731+ /*
13732+ * Now generate our pbes (which will be used for the atomic restore,
13733+ * and free unneeded pages.
13734+ */
13735+ BITMAP_FOR_EACH_SET(&pageset1_copy_map, pfn) {
13736+ int is_high;
13737+ page = pfn_to_page(pfn);
13738+ is_high = PageHighMem(page);
24613191 13739+
4e97e4e9 13740+ if (PagePageset1(page))
13741+ continue;
24613191 13742+
4e97e4e9 13743+ /* Free the page? */
13744+ if ((is_high && high_to_free) ||
13745+ (!is_high && low_to_free)) {
13746+ ClearPagePageset1Copy(page);
ad8f4a28 13747+ toi__free_page(30, page);
4e97e4e9 13748+ if (is_high)
13749+ high_to_free--;
13750+ else
13751+ low_to_free--;
13752+ continue;
13753+ }
24613191 13754+
4e97e4e9 13755+ /* Nope. We're going to use this page. Add a pbe. */
13756+ if (is_high || low_pages_for_highmem) {
13757+ struct page *orig_page;
13758+ high_pbes_done++;
13759+ if (!is_high)
13760+ low_pages_for_highmem--;
13761+ do {
13762+ orig_high_pfn = get_next_bit_on(&pageset1_map,
13763+ orig_high_pfn);
13764+ BUG_ON(orig_high_pfn > max_pfn);
13765+ orig_page = pfn_to_page(orig_high_pfn);
ad8f4a28
AM
13766+ } while (!PageHighMem(orig_page) ||
13767+ load_direct(orig_page));
24613191 13768+
4e97e4e9 13769+ this_high_pbe->orig_address = orig_page;
13770+ this_high_pbe->address = page;
13771+ this_high_pbe->next = NULL;
13772+ if (last_high_pbe_page != high_pbe_page) {
ad8f4a28
AM
13773+ *last_high_pbe_ptr =
13774+ (struct pbe *) high_pbe_page;
4e97e4e9 13775+ if (!last_high_pbe_page)
13776+ last_high_pbe_page = high_pbe_page;
13777+ } else
13778+ *last_high_pbe_ptr = this_high_pbe;
13779+ last_high_pbe_ptr = &this_high_pbe->next;
13780+ if (last_high_pbe_page != high_pbe_page) {
13781+ kunmap(last_high_pbe_page);
13782+ last_high_pbe_page = high_pbe_page;
13783+ }
ad8f4a28
AM
13784+ this_high_pbe = get_next_pbe(&high_pbe_page,
13785+ this_high_pbe, 1);
4e97e4e9 13786+ if (IS_ERR(this_high_pbe)) {
ad8f4a28
AM
13787+ printk(KERN_INFO
13788+ "This high pbe is an error.\n");
4e97e4e9 13789+ return -ENOMEM;
13790+ }
13791+ } else {
13792+ struct page *orig_page;
13793+ low_pbes_done++;
13794+ do {
13795+ orig_low_pfn = get_next_bit_on(&pageset1_map,
13796+ orig_low_pfn);
13797+ BUG_ON(orig_low_pfn > max_pfn);
13798+ orig_page = pfn_to_page(orig_low_pfn);
ad8f4a28
AM
13799+ } while (PageHighMem(orig_page) ||
13800+ load_direct(orig_page));
24613191 13801+
4e97e4e9 13802+ this_low_pbe->orig_address = page_address(orig_page);
13803+ this_low_pbe->address = page_address(page);
13804+ this_low_pbe->next = NULL;
13805+ *last_low_pbe_ptr = this_low_pbe;
13806+ last_low_pbe_ptr = &this_low_pbe->next;
ad8f4a28
AM
13807+ this_low_pbe = get_next_pbe(&low_pbe_page,
13808+ this_low_pbe, 0);
4e97e4e9 13809+ if (IS_ERR(this_low_pbe)) {
ad8f4a28 13810+ printk(KERN_INFO "this_low_pbe is an error.\n");
4e97e4e9 13811+ return -ENOMEM;
13812+ }
13813+ }
13814+ }
24613191 13815+
4e97e4e9 13816+ if (high_pbe_page)
13817+ kunmap(high_pbe_page);
24613191 13818+
4e97e4e9 13819+ if (last_high_pbe_page != high_pbe_page) {
13820+ if (last_high_pbe_page)
13821+ kunmap(last_high_pbe_page);
ad8f4a28 13822+ toi__free_page(29, high_pbe_page);
24613191 13823+ }
13824+
4e97e4e9 13825+ free_conflicting_pages();
24613191 13826+
4e97e4e9 13827+ return 0;
24613191 13828+}
ad8f4a28
AM
13829+
13830+int add_boot_kernel_data_pbe(void)
13831+{
13832+ this_low_pbe->address = (char *) __toi_get_nonconflicting_page();
13833+ if (!this_low_pbe->address) {
13834+ printk(KERN_INFO "Failed to get bkd atomic restore buffer.");
13835+ return -ENOMEM;
13836+ }
13837+
13838+ toi_bkd.size = sizeof(toi_bkd);
13839+ memcpy(this_low_pbe->address, &toi_bkd, sizeof(toi_bkd));
13840+
13841+ *last_low_pbe_ptr = this_low_pbe;
13842+ this_low_pbe->orig_address = (char *) boot_kernel_data_buffer;
13843+ this_low_pbe->next = NULL;
13844+ return 0;
13845+}
4e97e4e9 13846diff --git a/kernel/power/tuxonice_pagedir.h b/kernel/power/tuxonice_pagedir.h
13847new file mode 100644
7f9d2ee0 13848index 0000000..081e744
4e97e4e9 13849--- /dev/null
13850+++ b/kernel/power/tuxonice_pagedir.h
ad8f4a28 13851@@ -0,0 +1,50 @@
4e97e4e9 13852+/*
13853+ * kernel/power/tuxonice_pagedir.h
13854+ *
13855+ * Copyright (C) 2006-2007 Nigel Cunningham (nigel at tuxonice net)
13856+ *
13857+ * This file is released under the GPLv2.
13858+ *
13859+ * Declarations for routines for handling pagesets.
13860+ */
24613191 13861+
4e97e4e9 13862+#ifndef KERNEL_POWER_PAGEDIR_H
13863+#define KERNEL_POWER_PAGEDIR_H
24613191 13864+
4e97e4e9 13865+/* Pagedir
13866+ *
13867+ * Contains the metadata for a set of pages saved in the image.
13868+ */
24613191 13869+
4e97e4e9 13870+struct pagedir {
13871+ int id;
7f9d2ee0 13872+ long size;
4e97e4e9 13873+#ifdef CONFIG_HIGHMEM
7f9d2ee0 13874+ long size_high;
4e97e4e9 13875+#endif
13876+};
24613191 13877+
4e97e4e9 13878+#ifdef CONFIG_HIGHMEM
13879+#define get_highmem_size(pagedir) (pagedir.size_high)
ad8f4a28
AM
13880+#define set_highmem_size(pagedir, sz) do { pagedir.size_high = sz; } while (0)
13881+#define inc_highmem_size(pagedir) do { pagedir.size_high++; } while (0)
4e97e4e9 13882+#define get_lowmem_size(pagedir) (pagedir.size - pagedir.size_high)
13883+#else
13884+#define get_highmem_size(pagedir) (0)
ad8f4a28
AM
13885+#define set_highmem_size(pagedir, sz) do { } while (0)
13886+#define inc_highmem_size(pagedir) do { } while (0)
4e97e4e9 13887+#define get_lowmem_size(pagedir) (pagedir.size)
13888+#endif
24613191 13889+
4e97e4e9 13890+extern struct pagedir pagedir1, pagedir2;
24613191 13891+
4e97e4e9 13892+extern void toi_copy_pageset1(void);
24613191 13893+
4e97e4e9 13894+extern int toi_get_pageset1_load_addresses(void);
24613191 13895+
4e97e4e9 13896+extern unsigned long __toi_get_nonconflicting_page(void);
ad8f4a28 13897+struct page *___toi_get_nonconflicting_page(int can_be_highmem);
24613191 13898+
4e97e4e9 13899+extern void toi_reset_alt_image_pageset2_pfn(void);
ad8f4a28 13900+extern int add_boot_kernel_data_pbe(void);
4e97e4e9 13901+#endif
13902diff --git a/kernel/power/tuxonice_pageflags.c b/kernel/power/tuxonice_pageflags.c
13903new file mode 100644
ad8f4a28 13904index 0000000..574858c
4e97e4e9 13905--- /dev/null
13906+++ b/kernel/power/tuxonice_pageflags.c
ad8f4a28 13907@@ -0,0 +1,162 @@
4e97e4e9 13908+/*
13909+ * kernel/power/tuxonice_pageflags.c
13910+ *
13911+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
ad8f4a28 13912+ *
4e97e4e9 13913+ * This file is released under the GPLv2.
13914+ *
13915+ * Routines for serialising and relocating pageflags in which we
13916+ * store our image metadata.
13917+ */
24613191 13918+
4e97e4e9 13919+#include <linux/kernel.h>
13920+#include <linux/mm.h>
13921+#include <linux/module.h>
13922+#include <linux/bitops.h>
13923+#include <linux/list.h>
13924+#include <linux/suspend.h>
13925+#include "tuxonice_pageflags.h"
13926+#include "tuxonice_modules.h"
13927+#include "tuxonice_pagedir.h"
13928+#include "tuxonice.h"
24613191 13929+
4e97e4e9 13930+DECLARE_DYN_PAGEFLAGS(pageset2_map);
13931+DECLARE_DYN_PAGEFLAGS(page_resave_map);
13932+DECLARE_DYN_PAGEFLAGS(io_map);
13933+DECLARE_DYN_PAGEFLAGS(nosave_map);
13934+DECLARE_DYN_PAGEFLAGS(free_map);
24613191 13935+
4e97e4e9 13936+static int pages_for_zone(struct zone *zone)
24613191 13937+{
4e97e4e9 13938+ return DIV_ROUND_UP(zone->spanned_pages, (PAGE_SIZE << 3));
24613191 13939+}
13940+
4e97e4e9 13941+int toi_pageflags_space_needed(void)
24613191 13942+{
4e97e4e9 13943+ int total = 0;
13944+ struct zone *zone;
24613191 13945+
4e97e4e9 13946+ for_each_zone(zone)
13947+ if (populated_zone(zone))
ad8f4a28
AM
13948+ total += sizeof(int) * 3 + pages_for_zone(zone) *
13949+ PAGE_SIZE;
24613191 13950+
4e97e4e9 13951+ total += sizeof(int);
24613191 13952+
4e97e4e9 13953+ return total;
24613191 13954+}
13955+
4e97e4e9 13956+/* save_dyn_pageflags
13957+ *
13958+ * Description: Save a set of pageflags.
13959+ * Arguments: struct dyn_pageflags *: Pointer to the bitmap being saved.
13960+ */
13961+
13962+void save_dyn_pageflags(struct dyn_pageflags *pagemap)
24613191 13963+{
4e97e4e9 13964+ int i, zone_idx, size, node = 0;
13965+ struct zone *zone;
13966+ struct pglist_data *pgdat;
24613191 13967+
4e97e4e9 13968+ if (!pagemap)
13969+ return;
24613191 13970+
4e97e4e9 13971+ for_each_online_pgdat(pgdat) {
13972+ for (zone_idx = 0; zone_idx < MAX_NR_ZONES; zone_idx++) {
13973+ zone = &pgdat->node_zones[zone_idx];
13974+
13975+ if (!populated_zone(zone))
13976+ continue;
13977+
13978+ toiActiveAllocator->rw_header_chunk(WRITE, NULL,
13979+ (char *) &node, sizeof(int));
13980+ toiActiveAllocator->rw_header_chunk(WRITE, NULL,
13981+ (char *) &zone_idx, sizeof(int));
13982+ size = pages_for_zone(zone);
13983+ toiActiveAllocator->rw_header_chunk(WRITE, NULL,
13984+ (char *) &size, sizeof(int));
13985+
13986+ for (i = 0; i < size; i++) {
13987+ if (!pagemap->bitmap[node][zone_idx][i+2]) {
ad8f4a28 13988+ printk(KERN_INFO "Sparse pagemap?\n");
4e97e4e9 13989+ dump_pagemap(pagemap);
13990+ BUG();
13991+ }
13992+ toiActiveAllocator->rw_header_chunk(WRITE,
ad8f4a28
AM
13993+ NULL, (char *) pagemap->bitmap[node]
13994+ [zone_idx][i+2],
4e97e4e9 13995+ PAGE_SIZE);
13996+ }
24613191 13997+ }
4e97e4e9 13998+ node++;
24613191 13999+ }
4e97e4e9 14000+ node = -1;
14001+ toiActiveAllocator->rw_header_chunk(WRITE, NULL,
14002+ (char *) &node, sizeof(int));
14003+}
24613191 14004+
4e97e4e9 14005+/* load_dyn_pageflags
14006+ *
14007+ * Description: Load a set of pageflags.
14008+ * Arguments: struct dyn_pageflags *: Pointer to the bitmap being loaded.
14009+ * (It must be allocated before calling this routine).
14010+ */
24613191 14011+
4e97e4e9 14012+int load_dyn_pageflags(struct dyn_pageflags *pagemap)
14013+{
14014+ int i, zone_idx, zone_check = 0, size, node = 0;
14015+ struct zone *zone;
14016+ struct pglist_data *pgdat;
24613191 14017+
4e97e4e9 14018+ if (!pagemap)
14019+ return 1;
24613191 14020+
4e97e4e9 14021+ for_each_online_pgdat(pgdat) {
14022+ for (zone_idx = 0; zone_idx < MAX_NR_ZONES; zone_idx++) {
14023+ zone = &pgdat->node_zones[zone_idx];
24613191 14024+
4e97e4e9 14025+ if (!populated_zone(zone))
14026+ continue;
24613191 14027+
4e97e4e9 14028+ /* Same node? */
14029+ toiActiveAllocator->rw_header_chunk(READ, NULL,
14030+ (char *) &zone_check, sizeof(int));
14031+ if (zone_check != node) {
ad8f4a28
AM
14032+ printk(KERN_INFO "Node read (%d) != node "
14033+ "(%d).\n",
4e97e4e9 14034+ zone_check, node);
14035+ return 1;
14036+ }
24613191 14037+
4e97e4e9 14038+ /* Same zone? */
14039+ toiActiveAllocator->rw_header_chunk(READ, NULL,
14040+ (char *) &zone_check, sizeof(int));
14041+ if (zone_check != zone_idx) {
ad8f4a28
AM
14042+ printk(KERN_INFO "Zone read (%d) != node "
14043+ "(%d).\n",
4e97e4e9 14044+ zone_check, zone_idx);
14045+ return 1;
14046+ }
24613191 14047+
4e97e4e9 14048+
14049+ toiActiveAllocator->rw_header_chunk(READ, NULL,
14050+ (char *) &size, sizeof(int));
14051+
14052+ for (i = 0; i < size; i++)
14053+ toiActiveAllocator->rw_header_chunk(READ, NULL,
ad8f4a28
AM
14054+ (char *) pagemap->bitmap[node][zone_idx]
14055+ [i+2],
4e97e4e9 14056+ PAGE_SIZE);
14057+ }
14058+ node++;
14059+ }
14060+ toiActiveAllocator->rw_header_chunk(READ, NULL, (char *) &zone_check,
14061+ sizeof(int));
14062+ if (zone_check != -1) {
ad8f4a28
AM
14063+ printk(KERN_INFO "Didn't read end of dyn pageflag data marker."
14064+ "(%x)\n", zone_check);
4e97e4e9 14065+ return 1;
14066+ }
24613191 14067+
14068+ return 0;
14069+}
4e97e4e9 14070diff --git a/kernel/power/tuxonice_pageflags.h b/kernel/power/tuxonice_pageflags.h
14071new file mode 100644
ad8f4a28 14072index 0000000..f976b5c
4e97e4e9 14073--- /dev/null
14074+++ b/kernel/power/tuxonice_pageflags.h
ad8f4a28 14075@@ -0,0 +1,63 @@
24613191 14076+/*
4e97e4e9 14077+ * kernel/power/tuxonice_pageflags.h
24613191 14078+ *
4e97e4e9 14079+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
14080+ *
14081+ * This file is released under the GPLv2.
14082+ *
14083+ * TuxOnIce needs a few pageflags while working that aren't otherwise
14084+ * used. To save the struct page pageflags, we dynamically allocate
14085+ * a bitmap and use that. These are the only non order-0 allocations
14086+ * we do.
14087+ *
14088+ * NOTE!!!
14089+ * We assume that PAGE_SIZE - sizeof(void *) is a multiple of
14090+ * sizeof(unsigned long). Is this ever false?
24613191 14091+ */
14092+
4e97e4e9 14093+#include <linux/dyn_pageflags.h>
14094+#include <linux/suspend.h>
24613191 14095+
4e97e4e9 14096+extern struct dyn_pageflags pageset1_map;
14097+extern struct dyn_pageflags pageset1_copy_map;
14098+extern struct dyn_pageflags pageset2_map;
14099+extern struct dyn_pageflags page_resave_map;
14100+extern struct dyn_pageflags io_map;
14101+extern struct dyn_pageflags nosave_map;
14102+extern struct dyn_pageflags free_map;
24613191 14103+
4e97e4e9 14104+#define PagePageset1(page) (test_dynpageflag(&pageset1_map, page))
14105+#define SetPagePageset1(page) (set_dynpageflag(&pageset1_map, page))
14106+#define ClearPagePageset1(page) (clear_dynpageflag(&pageset1_map, page))
24613191 14107+
4e97e4e9 14108+#define PagePageset1Copy(page) (test_dynpageflag(&pageset1_copy_map, page))
14109+#define SetPagePageset1Copy(page) (set_dynpageflag(&pageset1_copy_map, page))
ad8f4a28
AM
14110+#define ClearPagePageset1Copy(page) \
14111+ (clear_dynpageflag(&pageset1_copy_map, page))
24613191 14112+
4e97e4e9 14113+#define PagePageset2(page) (test_dynpageflag(&pageset2_map, page))
14114+#define SetPagePageset2(page) (set_dynpageflag(&pageset2_map, page))
14115+#define ClearPagePageset2(page) (clear_dynpageflag(&pageset2_map, page))
24613191 14116+
4e97e4e9 14117+#define PageWasRW(page) (test_dynpageflag(&pageset2_map, page))
14118+#define SetPageWasRW(page) (set_dynpageflag(&pageset2_map, page))
14119+#define ClearPageWasRW(page) (clear_dynpageflag(&pageset2_map, page))
24613191 14120+
ad8f4a28
AM
14121+#define PageResave(page) (page_resave_map.bitmap ? \
14122+ test_dynpageflag(&page_resave_map, page) : 0)
4e97e4e9 14123+#define SetPageResave(page) (set_dynpageflag(&page_resave_map, page))
14124+#define ClearPageResave(page) (clear_dynpageflag(&page_resave_map, page))
14125+
ad8f4a28
AM
14126+#define PageNosave(page) (nosave_map.bitmap ? \
14127+ test_dynpageflag(&nosave_map, page) : 0)
4e97e4e9 14128+#define SetPageNosave(page) (set_dynpageflag(&nosave_map, page))
14129+#define ClearPageNosave(page) (clear_dynpageflag(&nosave_map, page))
24613191 14130+
ad8f4a28
AM
14131+#define PageNosaveFree(page) (free_map.bitmap ? \
14132+ test_dynpageflag(&free_map, page) : 0)
4e97e4e9 14133+#define SetPageNosaveFree(page) (set_dynpageflag(&free_map, page))
14134+#define ClearPageNosaveFree(page) (clear_dynpageflag(&free_map, page))
14135+
14136+extern void save_dyn_pageflags(struct dyn_pageflags *pagemap);
14137+extern int load_dyn_pageflags(struct dyn_pageflags *pagemap);
14138+extern int toi_pageflags_space_needed(void);
14139diff --git a/kernel/power/tuxonice_power_off.c b/kernel/power/tuxonice_power_off.c
14140new file mode 100644
7f9d2ee0 14141index 0000000..70168e6
4e97e4e9 14142--- /dev/null
14143+++ b/kernel/power/tuxonice_power_off.c
7f9d2ee0 14144@@ -0,0 +1,276 @@
24613191 14145+/*
4e97e4e9 14146+ * kernel/power/tuxonice_power_off.c
14147+ *
14148+ * Copyright (C) 2006-2007 Nigel Cunningham (nigel at tuxonice net)
24613191 14149+ *
4e97e4e9 14150+ * This file is released under the GPLv2.
24613191 14151+ *
4e97e4e9 14152+ * Support for powering down.
24613191 14153+ */
14154+
4e97e4e9 14155+#include <linux/device.h>
14156+#include <linux/suspend.h>
14157+#include <linux/mm.h>
14158+#include <linux/pm.h>
14159+#include <linux/reboot.h>
14160+#include <linux/cpu.h>
14161+#include <linux/console.h>
14162+#include <linux/fs.h>
14163+#include "tuxonice.h"
14164+#include "tuxonice_ui.h"
14165+#include "tuxonice_power_off.h"
14166+#include "tuxonice_sysfs.h"
14167+#include "tuxonice_modules.h"
14168+
ad8f4a28
AM
14169+unsigned long toi_poweroff_method; /* 0 - Kernel power off */
14170+EXPORT_SYMBOL_GPL(toi_poweroff_method);
14171+
14172+int wake_delay;
4e97e4e9 14173+static char lid_state_file[256], wake_alarm_dir[256];
14174+static struct file *lid_file, *alarm_file, *epoch_file;
14175+int post_wake_state = -1;
24613191 14176+
7f9d2ee0 14177+static int did_suspend_to_both;
14178+
24613191 14179+/*
4e97e4e9 14180+ * __toi_power_down
14181+ * Functionality : Powers down or reboots the computer once the image
14182+ * has been written to disk.
14183+ * Key Assumptions : Able to reboot/power down via code called or that
14184+ * the warning emitted if the calls fail will be visible
14185+ * to the user (ie printk resumes devices).
24613191 14186+ */
14187+
4e97e4e9 14188+static void __toi_power_down(int method)
24613191 14189+{
4e97e4e9 14190+ int error;
24613191 14191+
4e97e4e9 14192+ if (test_action_state(TOI_REBOOT)) {
7f9d2ee0 14193+ toi_cond_pause(1, "Ready to reboot.");
4e97e4e9 14194+ kernel_restart(NULL);
14195+ }
14196+
7f9d2ee0 14197+ toi_cond_pause(1, "Powering down.");
4e97e4e9 14198+
14199+ switch (method) {
ad8f4a28
AM
14200+ case 0:
14201+ break;
14202+ case 3:
14203+ error = pm_notifier_call_chain(PM_SUSPEND_PREPARE);
14204+ if (!error)
14205+ error = suspend_devices_and_enter(PM_SUSPEND_MEM);
14206+ pm_notifier_call_chain(PM_POST_SUSPEND);
7f9d2ee0 14207+ if (!error) {
14208+ did_suspend_to_both = 1;
ad8f4a28 14209+ return;
7f9d2ee0 14210+ }
ad8f4a28
AM
14211+ break;
14212+ case 4:
14213+ if (!hibernation_platform_enter())
14214+ return;
14215+ break;
14216+ case 5:
14217+ /* Historic entry only now */
14218+ break;
4e97e4e9 14219+ }
14220+
14221+ if (method && method != 5)
7f9d2ee0 14222+ toi_cond_pause(1,
14223+ "Falling back to alternate power off method.");
ad8f4a28
AM
14224+
14225+ if (test_result_state(TOI_ABORTED))
14226+ return;
14227+
4e97e4e9 14228+ kernel_power_off();
14229+ kernel_halt();
7f9d2ee0 14230+ toi_cond_pause(1, "Powerdown failed.");
4e97e4e9 14231+ while (1)
14232+ cpu_relax();
14233+}
24613191 14234+
4e97e4e9 14235+#define CLOSE_FILE(file) \
ad8f4a28
AM
14236+ if (file) { \
14237+ filp_close(file, NULL); file = NULL; \
14238+ }
24613191 14239+
7f9d2ee0 14240+static void powerdown_cleanup(int toi_or_resume)
4e97e4e9 14241+{
14242+ if (!toi_or_resume)
14243+ return;
24613191 14244+
4e97e4e9 14245+ CLOSE_FILE(lid_file);
14246+ CLOSE_FILE(alarm_file);
14247+ CLOSE_FILE(epoch_file);
24613191 14248+}
14249+
4e97e4e9 14250+static void open_file(char *format, char *arg, struct file **var, int mode,
14251+ char *desc)
24613191 14252+{
4e97e4e9 14253+ char buf[256];
14254+
14255+ if (strlen(arg)) {
14256+ sprintf(buf, format, arg);
14257+ *var = filp_open(buf, mode, 0);
14258+ if (IS_ERR(*var) || !*var) {
ad8f4a28 14259+ printk(KERN_INFO "Failed to open %s file '%s' (%p).\n",
4e97e4e9 14260+ desc, buf, *var);
14261+ *var = 0;
24613191 14262+ }
14263+ }
24613191 14264+}
14265+
7f9d2ee0 14266+static int powerdown_init(int toi_or_resume)
24613191 14267+{
4e97e4e9 14268+ if (!toi_or_resume)
14269+ return 0;
14270+
7f9d2ee0 14271+ did_suspend_to_both = 0;
14272+
ad8f4a28
AM
14273+ open_file("/proc/acpi/button/%s/state", lid_state_file, &lid_file,
14274+ O_RDONLY, "lid");
4e97e4e9 14275+
14276+ if (strlen(wake_alarm_dir)) {
14277+ open_file("/sys/class/rtc/%s/wakealarm", wake_alarm_dir,
14278+ &alarm_file, O_WRONLY, "alarm");
14279+
14280+ open_file("/sys/class/rtc/%s/since_epoch", wake_alarm_dir,
14281+ &epoch_file, O_RDONLY, "epoch");
14282+ }
14283+
14284+ return 0;
24613191 14285+}
14286+
4e97e4e9 14287+static int lid_closed(void)
24613191 14288+{
4e97e4e9 14289+ char array[25];
14290+ ssize_t size;
14291+ loff_t pos = 0;
14292+
14293+ if (!lid_file)
14294+ return 0;
14295+
14296+ size = vfs_read(lid_file, (char __user *) array, 25, &pos);
14297+ if ((int) size < 1) {
ad8f4a28
AM
14298+ printk(KERN_INFO "Failed to read lid state file (%d).\n",
14299+ (int) size);
4e97e4e9 14300+ return 0;
14301+ }
14302+
14303+ if (!strcmp(array, "state: closed\n"))
14304+ return 1;
14305+
14306+ return 0;
24613191 14307+}
14308+
4e97e4e9 14309+static void write_alarm_file(int value)
24613191 14310+{
4e97e4e9 14311+ ssize_t size;
14312+ char buf[40];
14313+ loff_t pos = 0;
14314+
14315+ if (!alarm_file)
14316+ return;
14317+
14318+ sprintf(buf, "%d\n", value);
14319+
14320+ size = vfs_write(alarm_file, (char __user *)buf, strlen(buf), &pos);
14321+
14322+ if (size < 0)
ad8f4a28
AM
14323+ printk(KERN_INFO "Error %d writing alarm value %s.\n",
14324+ (int) size, buf);
24613191 14325+}
14326+
4e97e4e9 14327+/**
14328+ * toi_check_resleep: See whether to powerdown again after waking.
14329+ *
14330+ * After waking, check whether we should powerdown again in a (usually
14331+ * different) way. We only do this if the lid switch is still closed.
14332+ */
14333+void toi_check_resleep(void)
24613191 14334+{
4e97e4e9 14335+ /* We only return if we suspended to ram and woke. */
14336+ if (lid_closed() && post_wake_state >= 0)
14337+ __toi_power_down(post_wake_state);
24613191 14338+}
14339+
4e97e4e9 14340+void toi_power_down(void)
24613191 14341+{
4e97e4e9 14342+ if (alarm_file && wake_delay) {
14343+ char array[25];
14344+ loff_t pos = 0;
ad8f4a28
AM
14345+ size_t size = vfs_read(epoch_file, (char __user *) array, 25,
14346+ &pos);
e8d0ad9d 14347+
4e97e4e9 14348+ if (((int) size) < 1)
ad8f4a28
AM
14349+ printk(KERN_INFO "Failed to read epoch file (%d).\n",
14350+ (int) size);
4e97e4e9 14351+ else {
ad8f4a28
AM
14352+ unsigned long since_epoch =
14353+ simple_strtol(array, NULL, 0);
24613191 14354+
4e97e4e9 14355+ /* Clear any wakeup time. */
14356+ write_alarm_file(0);
14357+
14358+ /* Set new wakeup time. */
14359+ write_alarm_file(since_epoch + wake_delay);
14360+ }
e8d0ad9d 14361+ }
4e97e4e9 14362+
14363+ __toi_power_down(toi_poweroff_method);
14364+
14365+ toi_check_resleep();
e8d0ad9d 14366+}
ad8f4a28 14367+EXPORT_SYMBOL_GPL(toi_power_down);
24613191 14368+
4e97e4e9 14369+static struct toi_sysfs_data sysfs_params[] = {
14370+#if defined(CONFIG_ACPI)
14371+ {
14372+ TOI_ATTR("lid_file", SYSFS_RW),
14373+ SYSFS_STRING(lid_state_file, 256, 0),
14374+ },
24613191 14375+
4e97e4e9 14376+ {
14377+ TOI_ATTR("wake_delay", SYSFS_RW),
14378+ SYSFS_INT(&wake_delay, 0, INT_MAX, 0)
e8d0ad9d 14379+ },
24613191 14380+
4e97e4e9 14381+ {
14382+ TOI_ATTR("wake_alarm_dir", SYSFS_RW),
14383+ SYSFS_STRING(wake_alarm_dir, 256, 0)
14384+ },
24613191 14385+
4e97e4e9 14386+ { TOI_ATTR("post_wake_state", SYSFS_RW),
14387+ SYSFS_INT(&post_wake_state, -1, 5, 0)
14388+ },
24613191 14389+
4e97e4e9 14390+ { TOI_ATTR("powerdown_method", SYSFS_RW),
14391+ SYSFS_UL(&toi_poweroff_method, 0, 5, 0)
14392+ },
7f9d2ee0 14393+
14394+ { TOI_ATTR("did_suspend_to_both", SYSFS_READONLY),
14395+ SYSFS_INT(&did_suspend_to_both, 0, 0, 0)
14396+ },
4e97e4e9 14397+#endif
14398+};
14399+
14400+static struct toi_module_ops powerdown_ops = {
14401+ .type = MISC_HIDDEN_MODULE,
14402+ .name = "poweroff",
7f9d2ee0 14403+ .initialise = powerdown_init,
14404+ .cleanup = powerdown_cleanup,
4e97e4e9 14405+ .directory = "[ROOT]",
14406+ .module = THIS_MODULE,
14407+ .sysfs_data = sysfs_params,
ad8f4a28
AM
14408+ .num_sysfs_entries = sizeof(sysfs_params) /
14409+ sizeof(struct toi_sysfs_data),
e8d0ad9d 14410+};
24613191 14411+
4e97e4e9 14412+int toi_poweroff_init(void)
e8d0ad9d 14413+{
4e97e4e9 14414+ return toi_register_module(&powerdown_ops);
e8d0ad9d 14415+}
14416+
4e97e4e9 14417+void toi_poweroff_exit(void)
e8d0ad9d 14418+{
4e97e4e9 14419+ toi_unregister_module(&powerdown_ops);
e8d0ad9d 14420+}
4e97e4e9 14421diff --git a/kernel/power/tuxonice_power_off.h b/kernel/power/tuxonice_power_off.h
14422new file mode 100644
7f9d2ee0 14423index 0000000..540b081
4e97e4e9 14424--- /dev/null
14425+++ b/kernel/power/tuxonice_power_off.h
7f9d2ee0 14426@@ -0,0 +1,35 @@
24613191 14427+/*
4e97e4e9 14428+ * kernel/power/tuxonice_power_off.h
24613191 14429+ *
4e97e4e9 14430+ * Copyright (C) 2006-2007 Nigel Cunningham (nigel at tuxonice net)
24613191 14431+ *
14432+ * This file is released under the GPLv2.
14433+ *
4e97e4e9 14434+ * Support for the powering down.
14435+ */
14436+
14437+int toi_pm_state_finish(void);
14438+void toi_power_down(void);
14439+extern unsigned long toi_poweroff_method;
14440+extern int toi_platform_prepare(void);
7f9d2ee0 14441+extern void toi_platform_end(void);
4e97e4e9 14442+int toi_poweroff_init(void);
14443+void toi_poweroff_exit(void);
14444+void toi_check_resleep(void);
ad8f4a28 14445+
7f9d2ee0 14446+extern int platform_begin(int platform_mode);
ad8f4a28
AM
14447+extern int platform_pre_snapshot(int platform_mode);
14448+extern int platform_leave(int platform_mode);
7f9d2ee0 14449+extern int platform_end(int platform_mode);
ad8f4a28
AM
14450+extern int platform_finish(int platform_mode);
14451+extern int platform_pre_restore(int platform_mode);
14452+extern int platform_restore_cleanup(int platform_mode);
14453+
14454+#define platform_test() (toi_poweroff_method == 4)
7f9d2ee0 14455+#define toi_platform_begin() platform_begin(platform_test())
ad8f4a28
AM
14456+#define toi_platform_pre_snapshot() platform_pre_snapshot(platform_test())
14457+#define toi_platform_leave() platform_leave(platform_test())
7f9d2ee0 14458+#define toi_platform_end() platform_end(platform_test())
ad8f4a28
AM
14459+#define toi_platform_finish() platform_finish(platform_test())
14460+#define toi_platform_pre_restore() platform_pre_restore(platform_test())
14461+#define toi_platform_restore_cleanup() platform_restore_cleanup(platform_test())
4e97e4e9 14462diff --git a/kernel/power/tuxonice_prepare_image.c b/kernel/power/tuxonice_prepare_image.c
14463new file mode 100644
7f9d2ee0 14464index 0000000..ec507e8
4e97e4e9 14465--- /dev/null
14466+++ b/kernel/power/tuxonice_prepare_image.c
7f9d2ee0 14467@@ -0,0 +1,1052 @@
4e97e4e9 14468+/*
14469+ * kernel/power/tuxonice_prepare_image.c
14470+ *
14471+ * Copyright (C) 2003-2007 Nigel Cunningham (nigel at tuxonice net)
14472+ *
14473+ * This file is released under the GPLv2.
14474+ *
14475+ * We need to eat memory until we can:
14476+ * 1. Perform the save without changing anything (RAM_NEEDED < #pages)
14477+ * 2. Fit it all in available space (toiActiveAllocator->available_space() >=
14478+ * main_storage_needed())
14479+ * 3. Reload the pagedir and pageset1 to places that don't collide with their
14480+ * final destinations, not knowing to what extent the resumed kernel will
14481+ * overlap with the one loaded at boot time. I think the resumed kernel
ad8f4a28 14482+ * should overlap completely, but I don't want to rely on this as it is
4e97e4e9 14483+ * an unproven assumption. We therefore assume there will be no overlap at
14484+ * all (worse case).
14485+ * 4. Meet the user's requested limit (if any) on the size of the image.
14486+ * The limit is in MB, so pages/256 (assuming 4K pages).
14487+ *
24613191 14488+ */
14489+
14490+#include <linux/module.h>
24613191 14491+#include <linux/highmem.h>
4e97e4e9 14492+#include <linux/freezer.h>
14493+#include <linux/hardirq.h>
14494+#include <linux/mmzone.h>
14495+#include <linux/console.h>
24613191 14496+
4e97e4e9 14497+#include "tuxonice_pageflags.h"
14498+#include "tuxonice_modules.h"
14499+#include "tuxonice_io.h"
14500+#include "tuxonice_ui.h"
14501+#include "tuxonice_extent.h"
14502+#include "tuxonice_prepare_image.h"
14503+#include "tuxonice_block_io.h"
14504+#include "tuxonice.h"
14505+#include "tuxonice_checksum.h"
14506+#include "tuxonice_sysfs.h"
ad8f4a28 14507+#include "tuxonice_alloc.h"
24613191 14508+
7f9d2ee0 14509+static long num_nosave, header_space_allocated, main_storage_allocated,
ad8f4a28 14510+ storage_available;
7f9d2ee0 14511+long extra_pd1_pages_allowance = DEFAULT_EXTRA_PAGES_ALLOWANCE;
ad8f4a28 14512+int image_size_limit;
24613191 14513+
4e97e4e9 14514+struct attention_list {
14515+ struct task_struct *task;
14516+ struct attention_list *next;
14517+};
24613191 14518+
ad8f4a28 14519+static struct attention_list *attention_list;
24613191 14520+
4e97e4e9 14521+#define PAGESET1 0
14522+#define PAGESET2 1
24613191 14523+
4e97e4e9 14524+void free_attention_list(void)
14525+{
14526+ struct attention_list *last = NULL;
24613191 14527+
4e97e4e9 14528+ while (attention_list) {
14529+ last = attention_list;
14530+ attention_list = attention_list->next;
ad8f4a28 14531+ toi_kfree(6, last);
4e97e4e9 14532+ }
14533+}
24613191 14534+
4e97e4e9 14535+static int build_attention_list(void)
24613191 14536+{
4e97e4e9 14537+ int i, task_count = 0;
14538+ struct task_struct *p;
14539+ struct attention_list *next;
24613191 14540+
ad8f4a28 14541+ /*
4e97e4e9 14542+ * Count all userspace process (with task->mm) marked PF_NOFREEZE.
14543+ */
14544+ read_lock(&tasklist_lock);
14545+ for_each_process(p)
14546+ if ((p->flags & PF_NOFREEZE) || p == current)
14547+ task_count++;
14548+ read_unlock(&tasklist_lock);
24613191 14549+
ad8f4a28 14550+ /*
4e97e4e9 14551+ * Allocate attention list structs.
14552+ */
14553+ for (i = 0; i < task_count; i++) {
14554+ struct attention_list *this =
ad8f4a28 14555+ toi_kzalloc(6, sizeof(struct attention_list),
4e97e4e9 14556+ TOI_WAIT_GFP);
14557+ if (!this) {
ad8f4a28
AM
14558+ printk(KERN_INFO "Failed to allocate slab for "
14559+ "attention list.\n");
4e97e4e9 14560+ free_attention_list();
14561+ return 1;
24613191 14562+ }
4e97e4e9 14563+ this->next = NULL;
14564+ if (attention_list)
14565+ this->next = attention_list;
14566+ attention_list = this;
14567+ }
24613191 14568+
4e97e4e9 14569+ next = attention_list;
14570+ read_lock(&tasklist_lock);
14571+ for_each_process(p)
14572+ if ((p->flags & PF_NOFREEZE) || p == current) {
14573+ next->task = p;
14574+ next = next->next;
14575+ }
14576+ read_unlock(&tasklist_lock);
14577+ return 0;
14578+}
24613191 14579+
4e97e4e9 14580+static void pageset2_full(void)
14581+{
14582+ struct zone *zone;
14583+ unsigned long flags;
14584+
14585+ for_each_zone(zone) {
14586+ spin_lock_irqsave(&zone->lru_lock, flags);
14587+ if (zone_page_state(zone, NR_INACTIVE)) {
14588+ struct page *page;
14589+ list_for_each_entry(page, &zone->inactive_list, lru)
14590+ SetPagePageset2(page);
14591+ }
14592+ if (zone_page_state(zone, NR_ACTIVE)) {
14593+ struct page *page;
14594+ list_for_each_entry(page, &zone->active_list, lru)
14595+ SetPagePageset2(page);
14596+ }
14597+ spin_unlock_irqrestore(&zone->lru_lock, flags);
24613191 14598+ }
14599+}
14600+
4e97e4e9 14601+/*
14602+ * toi_mark_task_as_pageset
14603+ * Functionality : Marks all the saveable pages belonging to a given process
14604+ * as belonging to a particular pageset.
24613191 14605+ */
4e97e4e9 14606+
14607+static void toi_mark_task_as_pageset(struct task_struct *t, int pageset2)
24613191 14608+{
4e97e4e9 14609+ struct vm_area_struct *vma;
14610+ struct mm_struct *mm;
24613191 14611+
4e97e4e9 14612+ mm = t->active_mm;
24613191 14613+
ad8f4a28
AM
14614+ if (!mm || !mm->mmap)
14615+ return;
24613191 14616+
4e97e4e9 14617+ if (!irqs_disabled())
14618+ down_read(&mm->mmap_sem);
ad8f4a28 14619+
4e97e4e9 14620+ for (vma = mm->mmap; vma; vma = vma->vm_next) {
14621+ unsigned long posn;
14622+
14623+ if (vma->vm_flags & (VM_PFNMAP | VM_IO | VM_RESERVED) ||
14624+ !vma->vm_start)
14625+ continue;
14626+
14627+ for (posn = vma->vm_start; posn < vma->vm_end;
14628+ posn += PAGE_SIZE) {
14629+ struct page *page = follow_page(vma, posn, 0);
14630+ if (!page)
14631+ continue;
14632+
14633+ if (pageset2)
14634+ SetPagePageset2(page);
14635+ else {
14636+ ClearPagePageset2(page);
14637+ SetPagePageset1(page);
14638+ }
24613191 14639+ }
14640+ }
14641+
4e97e4e9 14642+ if (!irqs_disabled())
14643+ up_read(&mm->mmap_sem);
24613191 14644+}
14645+
4e97e4e9 14646+/* mark_pages_for_pageset2
14647+ *
14648+ * Description: Mark unshared pages in processes not needed for hibernate as
14649+ * being able to be written out in a separate pagedir.
14650+ * HighMem pages are simply marked as pageset2. They won't be
14651+ * needed during hibernate.
24613191 14652+ */
14653+
4e97e4e9 14654+static void toi_mark_pages_for_pageset2(void)
24613191 14655+{
4e97e4e9 14656+ struct task_struct *p;
14657+ struct attention_list *this = attention_list;
73c609d5 14658+
4e97e4e9 14659+ if (test_action_state(TOI_NO_PAGESET2))
14660+ return;
24613191 14661+
4e97e4e9 14662+ clear_dyn_pageflags(&pageset2_map);
ad8f4a28 14663+
4e97e4e9 14664+ if (test_action_state(TOI_PAGESET2_FULL))
14665+ pageset2_full();
14666+ else {
14667+ read_lock(&tasklist_lock);
14668+ for_each_process(p) {
14669+ if (!p->mm || (p->flags & PF_BORROWED_MM))
14670+ continue;
24613191 14671+
4e97e4e9 14672+ toi_mark_task_as_pageset(p, PAGESET2);
14673+ }
14674+ read_unlock(&tasklist_lock);
24613191 14675+ }
14676+
ad8f4a28 14677+ /*
4e97e4e9 14678+ * Because the tasks in attention_list are ones related to hibernating,
14679+ * we know that they won't go away under us.
14680+ */
24613191 14681+
4e97e4e9 14682+ while (this) {
14683+ if (!test_result_state(TOI_ABORTED))
14684+ toi_mark_task_as_pageset(this->task, PAGESET1);
14685+ this = this->next;
14686+ }
24613191 14687+}
14688+
14689+/*
4e97e4e9 14690+ * The atomic copy of pageset1 is stored in pageset2 pages.
14691+ * But if pageset1 is larger (normally only just after boot),
14692+ * we need to allocate extra pages to store the atomic copy.
14693+ * The following data struct and functions are used to handle
14694+ * the allocation and freeing of that memory.
24613191 14695+ */
14696+
7f9d2ee0 14697+static long extra_pages_allocated;
24613191 14698+
4e97e4e9 14699+struct extras {
14700+ struct page *page;
14701+ int order;
14702+ struct extras *next;
14703+};
24613191 14704+
4e97e4e9 14705+static struct extras *extras_list;
14706+
14707+/* toi_free_extra_pagedir_memory
24613191 14708+ *
4e97e4e9 14709+ * Description: Free previously allocated extra pagedir memory.
24613191 14710+ */
4e97e4e9 14711+void toi_free_extra_pagedir_memory(void)
24613191 14712+{
4e97e4e9 14713+ /* Free allocated pages */
14714+ while (extras_list) {
14715+ struct extras *this = extras_list;
14716+ int i;
24613191 14717+
4e97e4e9 14718+ extras_list = this->next;
24613191 14719+
4e97e4e9 14720+ for (i = 0; i < (1 << this->order); i++)
14721+ ClearPageNosave(this->page + i);
24613191 14722+
ad8f4a28
AM
14723+ toi_free_pages(9, this->page, this->order);
14724+ toi_kfree(7, this);
24613191 14725+ }
24613191 14726+
4e97e4e9 14727+ extra_pages_allocated = 0;
24613191 14728+}
14729+
4e97e4e9 14730+/* toi_allocate_extra_pagedir_memory
24613191 14731+ *
4e97e4e9 14732+ * Description: Allocate memory for making the atomic copy of pagedir1 in the
14733+ * case where it is bigger than pagedir2.
14734+ * Arguments: int num_to_alloc: Number of extra pages needed.
14735+ * Result: int. Number of extra pages we now have allocated.
24613191 14736+ */
4e97e4e9 14737+static int toi_allocate_extra_pagedir_memory(int extra_pages_needed)
24613191 14738+{
4e97e4e9 14739+ int j, order, num_to_alloc = extra_pages_needed - extra_pages_allocated;
14740+ unsigned long flags = TOI_ATOMIC_GFP;
24613191 14741+
4e97e4e9 14742+ if (num_to_alloc < 1)
14743+ return 0;
24613191 14744+
4e97e4e9 14745+ order = fls(num_to_alloc);
14746+ if (order >= MAX_ORDER)
14747+ order = MAX_ORDER - 1;
24613191 14748+
4e97e4e9 14749+ while (num_to_alloc) {
14750+ struct page *newpage;
14751+ unsigned long virt;
14752+ struct extras *extras_entry;
ad8f4a28 14753+
4e97e4e9 14754+ while ((1 << order) > num_to_alloc)
14755+ order--;
24613191 14756+
ad8f4a28 14757+ extras_entry = (struct extras *) toi_kzalloc(7,
4e97e4e9 14758+ sizeof(struct extras), TOI_ATOMIC_GFP);
24613191 14759+
4e97e4e9 14760+ if (!extras_entry)
14761+ return extra_pages_allocated;
24613191 14762+
4e97e4e9 14763+ virt = toi_get_free_pages(9, flags, order);
14764+ while (!virt && order) {
14765+ order--;
14766+ virt = toi_get_free_pages(9, flags, order);
14767+ }
24613191 14768+
4e97e4e9 14769+ if (!virt) {
ad8f4a28 14770+ toi_kfree(7, extras_entry);
4e97e4e9 14771+ return extra_pages_allocated;
14772+ }
24613191 14773+
4e97e4e9 14774+ newpage = virt_to_page(virt);
24613191 14775+
4e97e4e9 14776+ extras_entry->page = newpage;
14777+ extras_entry->order = order;
14778+ extras_entry->next = NULL;
24613191 14779+
4e97e4e9 14780+ if (extras_list)
14781+ extras_entry->next = extras_list;
24613191 14782+
4e97e4e9 14783+ extras_list = extras_entry;
24613191 14784+
4e97e4e9 14785+ for (j = 0; j < (1 << order); j++) {
14786+ SetPageNosave(newpage + j);
14787+ SetPagePageset1Copy(newpage + j);
14788+ }
24613191 14789+
4e97e4e9 14790+ extra_pages_allocated += (1 << order);
14791+ num_to_alloc -= (1 << order);
14792+ }
14793+
14794+ return extra_pages_allocated;
24613191 14795+}
14796+
4e97e4e9 14797+/*
14798+ * real_nr_free_pages: Count pcp pages for a zone type or all zones
14799+ * (-1 for all, otherwise zone_idx() result desired).
24613191 14800+ */
7f9d2ee0 14801+long real_nr_free_pages(unsigned long zone_idx_mask)
4e97e4e9 14802+{
14803+ struct zone *zone;
7f9d2ee0 14804+ int result = 0, cpu;
24613191 14805+
4e97e4e9 14806+ /* PCP lists */
14807+ for_each_zone(zone) {
14808+ if (!populated_zone(zone))
14809+ continue;
ad8f4a28 14810+
4e97e4e9 14811+ if (!(zone_idx_mask & (1 << zone_idx(zone))))
14812+ continue;
24613191 14813+
4e97e4e9 14814+ for_each_online_cpu(cpu) {
14815+ struct per_cpu_pageset *pset = zone_pcp(zone, cpu);
7f9d2ee0 14816+ struct per_cpu_pages *pcp = &pset->pcp;
14817+ result += pcp->count;
4e97e4e9 14818+ }
14819+
14820+ result += zone_page_state(zone, NR_FREE_PAGES);
24613191 14821+ }
4e97e4e9 14822+ return result;
14823+}
24613191 14824+
14825+/*
4e97e4e9 14826+ * Discover how much extra memory will be required by the drivers
14827+ * when they're asked to hibernate. We can then ensure that amount
14828+ * of memory is available when we really want it.
24613191 14829+ */
4e97e4e9 14830+static void get_extra_pd1_allowance(void)
14831+{
7f9d2ee0 14832+ long orig_num_free = real_nr_free_pages(all_zones_mask), final;
ad8f4a28 14833+
4e97e4e9 14834+ toi_prepare_status(CLEAR_BAR, "Finding allowance for drivers.");
24613191 14835+
4e97e4e9 14836+ suspend_console();
14837+ device_suspend(PMSG_FREEZE);
14838+ local_irq_disable(); /* irqs might have been re-enabled on us */
14839+ device_power_down(PMSG_FREEZE);
ad8f4a28 14840+
4e97e4e9 14841+ final = real_nr_free_pages(all_zones_mask);
24613191 14842+
4e97e4e9 14843+ device_power_up();
14844+ local_irq_enable();
14845+ device_resume();
14846+ resume_console();
24613191 14847+
4e97e4e9 14848+ extra_pd1_pages_allowance = max(
7f9d2ee0 14849+ orig_num_free - final + DEFAULT_EXTRA_PAGES_ALLOWANCE,
14850+ (long) DEFAULT_EXTRA_PAGES_ALLOWANCE);
4e97e4e9 14851+}
24613191 14852+
4e97e4e9 14853+/*
14854+ * Amount of storage needed, possibly taking into account the
14855+ * expected compression ratio and possibly also ignoring our
14856+ * allowance for extra pages.
14857+ */
7f9d2ee0 14858+static long main_storage_needed(int use_ecr,
4e97e4e9 14859+ int ignore_extra_pd1_allow)
24613191 14860+{
4e97e4e9 14861+ return ((pagedir1.size + pagedir2.size +
14862+ (ignore_extra_pd1_allow ? 0 : extra_pd1_pages_allowance)) *
14863+ (use_ecr ? toi_expected_compression_ratio() : 100) / 100);
24613191 14864+}
14865+
4e97e4e9 14866+/*
14867+ * Storage needed for the image header, in bytes until the return.
14868+ */
7f9d2ee0 14869+static long header_storage_needed(void)
24613191 14870+{
7f9d2ee0 14871+ long bytes = (int) sizeof(struct toi_header) +
4e97e4e9 14872+ toi_header_storage_for_modules() +
14873+ toi_pageflags_space_needed();
14874+
14875+ return DIV_ROUND_UP(bytes, PAGE_SIZE);
24613191 14876+}
14877+
24613191 14878+/*
4e97e4e9 14879+ * When freeing memory, pages from either pageset might be freed.
24613191 14880+ *
ad8f4a28
AM
14881+ * When seeking to free memory to be able to hibernate, for every ps1 page
14882+ * freed, we need 2 less pages for the atomic copy because there is one less
14883+ * page to copy and one more page into which data can be copied.
24613191 14884+ *
4e97e4e9 14885+ * Freeing ps2 pages saves us nothing directly. No more memory is available
14886+ * for the atomic copy. Indirectly, a ps1 page might be freed (slab?), but
14887+ * that's too much work to figure out.
24613191 14888+ *
4e97e4e9 14889+ * => ps1_to_free functions
24613191 14890+ *
4e97e4e9 14891+ * Of course if we just want to reduce the image size, because of storage
14892+ * limitations or an image size limit either ps will do.
24613191 14893+ *
4e97e4e9 14894+ * => any_to_free function
24613191 14895+ */
14896+
7f9d2ee0 14897+static long highpages_ps1_to_free(void)
24613191 14898+{
7f9d2ee0 14899+ return max_t(long, 0, DIV_ROUND_UP(get_highmem_size(pagedir1) -
4e97e4e9 14900+ get_highmem_size(pagedir2), 2) - real_nr_free_high_pages());
24613191 14901+}
14902+
7f9d2ee0 14903+static long lowpages_ps1_to_free(void)
24613191 14904+{
7f9d2ee0 14905+ return max_t(long, 0, DIV_ROUND_UP(get_lowmem_size(pagedir1) +
4e97e4e9 14906+ extra_pd1_pages_allowance + MIN_FREE_RAM +
ad8f4a28 14907+ toi_memory_for_modules(0) - get_lowmem_size(pagedir2) -
4e97e4e9 14908+ real_nr_free_low_pages() - extra_pages_allocated, 2));
24613191 14909+}
14910+
7f9d2ee0 14911+static long current_image_size(void)
24613191 14912+{
4e97e4e9 14913+ return pagedir1.size + pagedir2.size + header_space_allocated;
14914+}
24613191 14915+
7f9d2ee0 14916+static long storage_still_required(void)
ad8f4a28 14917+{
7f9d2ee0 14918+ return max_t(long, 0, main_storage_needed(1, 1) - storage_available);
ad8f4a28
AM
14919+}
14920+
7f9d2ee0 14921+static long ram_still_required(void)
ad8f4a28 14922+{
7f9d2ee0 14923+ return max_t(long, 0, MIN_FREE_RAM + toi_memory_for_modules(0) -
ad8f4a28
AM
14924+ real_nr_free_low_pages() + 2 * extra_pd1_pages_allowance);
14925+}
14926+
7f9d2ee0 14927+static long any_to_free(int use_image_size_limit)
4e97e4e9 14928+{
7f9d2ee0 14929+ long user_limit = (use_image_size_limit && image_size_limit > 0) ?
14930+ max_t(long, 0, current_image_size() - (image_size_limit << 8))
4e97e4e9 14931+ : 0;
24613191 14932+
7f9d2ee0 14933+ long storage_limit = storage_still_required(),
ad8f4a28 14934+ ram_limit = ram_still_required();
24613191 14935+
ad8f4a28 14936+ return max(max(user_limit, storage_limit), ram_limit);
24613191 14937+}
14938+
4e97e4e9 14939+/* amount_needed
14940+ *
14941+ * Calculates the amount by which the image size needs to be reduced to meet
14942+ * our constraints.
14943+ */
7f9d2ee0 14944+static long amount_needed(int use_image_size_limit)
24613191 14945+{
4e97e4e9 14946+ return max(highpages_ps1_to_free() + lowpages_ps1_to_free(),
14947+ any_to_free(use_image_size_limit));
14948+}
24613191 14949+
7f9d2ee0 14950+static long image_not_ready(int use_image_size_limit)
4e97e4e9 14951+{
14952+ toi_message(TOI_EAT_MEMORY, TOI_LOW, 1,
7f9d2ee0 14953+ "Amount still needed (%ld) > 0:%d. Header: %ld < %ld: %d,"
14954+ " Storage allocd: %ld < %ld: %d.\n",
4e97e4e9 14955+ amount_needed(use_image_size_limit),
14956+ (amount_needed(use_image_size_limit) > 0),
14957+ header_space_allocated, header_storage_needed(),
14958+ header_space_allocated < header_storage_needed(),
ad8f4a28 14959+ main_storage_allocated,
4e97e4e9 14960+ main_storage_needed(1, 1),
14961+ main_storage_allocated < main_storage_needed(1, 1));
24613191 14962+
4e97e4e9 14963+ toi_cond_pause(0, NULL);
24613191 14964+
4e97e4e9 14965+ return ((amount_needed(use_image_size_limit) > 0) ||
14966+ header_space_allocated < header_storage_needed() ||
14967+ main_storage_allocated < main_storage_needed(1, 1));
24613191 14968+}
14969+
ad8f4a28
AM
14970+static void display_failure_reason(int tries_exceeded)
14971+{
7f9d2ee0 14972+ long storage_required = storage_still_required(),
ad8f4a28
AM
14973+ ram_required = ram_still_required(),
14974+ high_ps1 = highpages_ps1_to_free(),
14975+ low_ps1 = lowpages_ps1_to_free();
14976+
14977+ printk(KERN_INFO "Failed to prepare the image because...\n");
14978+
14979+ if (!storage_available) {
14980+ printk(KERN_INFO "- You need some storage available to be "
14981+ "able to hibernate.\n");
14982+ return;
14983+ }
14984+
14985+ if (tries_exceeded)
14986+ printk(KERN_INFO "- The maximum number of iterations was "
14987+ "reached without successfully preparing the "
14988+ "image.\n");
14989+
14990+ if (header_space_allocated < header_storage_needed()) {
14991+ printk(KERN_INFO "- Insufficient header storage allocated. "
7f9d2ee0 14992+ "Need %ld, have %ld.\n", header_storage_needed(),
ad8f4a28
AM
14993+ header_space_allocated);
14994+ set_abort_result(TOI_INSUFFICIENT_STORAGE);
14995+ }
14996+
14997+ if (storage_required) {
7f9d2ee0 14998+ printk(KERN_INFO " - We need at least %ld pages of storage "
14999+ "(ignoring the header), but only have %ld.\n",
ad8f4a28
AM
15000+ main_storage_needed(1, 1),
15001+ main_storage_allocated);
15002+ set_abort_result(TOI_INSUFFICIENT_STORAGE);
15003+ }
15004+
15005+ if (ram_required) {
7f9d2ee0 15006+ printk(KERN_INFO " - We need %ld more free pages of low "
ad8f4a28
AM
15007+ "memory.\n", ram_required);
15008+ printk(KERN_INFO " Minimum free : %8d\n", MIN_FREE_RAM);
7f9d2ee0 15009+ printk(KERN_INFO " + Reqd. by modules : %8ld\n",
ad8f4a28 15010+ toi_memory_for_modules(0));
7f9d2ee0 15011+ printk(KERN_INFO " - Currently free : %8ld\n",
ad8f4a28 15012+ real_nr_free_low_pages());
7f9d2ee0 15013+ printk(KERN_INFO " + 2 * extra allow : %8ld\n",
ad8f4a28
AM
15014+ 2 * extra_pd1_pages_allowance);
15015+ printk(KERN_INFO " : ========\n");
7f9d2ee0 15016+ printk(KERN_INFO " Still needed : %8ld\n", ram_required);
ad8f4a28
AM
15017+
15018+ /* Print breakdown of memory needed for modules */
15019+ toi_memory_for_modules(1);
15020+ set_abort_result(TOI_UNABLE_TO_FREE_ENOUGH_MEMORY);
15021+ }
15022+
15023+ if (high_ps1) {
7f9d2ee0 15024+ printk(KERN_INFO "- We need to free %ld highmem pageset 1 "
ad8f4a28
AM
15025+ "pages.\n", high_ps1);
15026+ set_abort_result(TOI_UNABLE_TO_FREE_ENOUGH_MEMORY);
15027+ }
15028+
15029+ if (low_ps1) {
7f9d2ee0 15030+ printk(KERN_INFO " - We need to free %ld lowmem pageset 1 "
ad8f4a28
AM
15031+ "pages.\n", low_ps1);
15032+ set_abort_result(TOI_UNABLE_TO_FREE_ENOUGH_MEMORY);
15033+ }
15034+}
15035+
4e97e4e9 15036+static void display_stats(int always, int sub_extra_pd1_allow)
ad8f4a28 15037+{
4e97e4e9 15038+ char buffer[255];
ad8f4a28 15039+ snprintf(buffer, 254,
7f9d2ee0 15040+ "Free:%ld(%ld). Sets:%ld(%ld),%ld(%ld). Header:%ld/%ld. "
15041+ "Nosave:%ld-%ld=%ld. Storage:%lu/%lu(%lu=>%lu). "
15042+ "Needed:%ld,%ld,%ld(%d,%ld,%ld,%ld)\n",
ad8f4a28 15043+
4e97e4e9 15044+ /* Free */
15045+ real_nr_free_pages(all_zones_mask),
15046+ real_nr_free_low_pages(),
ad8f4a28 15047+
4e97e4e9 15048+ /* Sets */
15049+ pagedir1.size, pagedir1.size - get_highmem_size(pagedir1),
15050+ pagedir2.size, pagedir2.size - get_highmem_size(pagedir2),
15051+
15052+ /* Header */
15053+ header_space_allocated, header_storage_needed(),
e8d0ad9d 15054+
4e97e4e9 15055+ /* Nosave */
15056+ num_nosave, extra_pages_allocated,
15057+ num_nosave - extra_pages_allocated,
24613191 15058+
4e97e4e9 15059+ /* Storage */
15060+ main_storage_allocated,
15061+ storage_available,
15062+ main_storage_needed(1, sub_extra_pd1_allow),
15063+ main_storage_needed(1, 1),
15064+
15065+ /* Needed */
15066+ lowpages_ps1_to_free(), highpages_ps1_to_free(),
15067+ any_to_free(1),
ad8f4a28 15068+ MIN_FREE_RAM, toi_memory_for_modules(0),
7f9d2ee0 15069+ extra_pd1_pages_allowance, ((long) image_size_limit) << 8);
24613191 15070+
4e97e4e9 15071+ if (always)
15072+ printk(buffer);
15073+ else
15074+ toi_message(TOI_EAT_MEMORY, TOI_MEDIUM, 1, buffer);
24613191 15075+}
15076+
4e97e4e9 15077+/* generate_free_page_map
15078+ *
15079+ * Description: This routine generates a bitmap of free pages from the
15080+ * lists used by the memory manager. We then use the bitmap
15081+ * to quickly calculate which pages to save and in which
15082+ * pagesets.
15083+ */
ad8f4a28 15084+static void generate_free_page_map(void)
24613191 15085+{
ad8f4a28 15086+ int order, pfn, cpu, t;
4e97e4e9 15087+ unsigned long flags, i;
15088+ struct zone *zone;
ad8f4a28 15089+ struct list_head *curr;
24613191 15090+
4e97e4e9 15091+ for_each_zone(zone) {
15092+ if (!populated_zone(zone))
e8d0ad9d 15093+ continue;
ad8f4a28 15094+
4e97e4e9 15095+ spin_lock_irqsave(&zone->lock, flags);
24613191 15096+
ad8f4a28 15097+ for (i = 0; i < zone->spanned_pages; i++)
4e97e4e9 15098+ ClearPageNosaveFree(pfn_to_page(
15099+ zone->zone_start_pfn + i));
24613191 15100+
ad8f4a28
AM
15101+ for_each_migratetype_order(order, t) {
15102+ list_for_each(curr,
15103+ &zone->free_area[order].free_list[t]) {
15104+ unsigned long i;
15105+
15106+ pfn = page_to_pfn(list_entry(curr, struct page,
15107+ lru));
15108+ for (i = 0; i < (1UL << order); i++)
15109+ SetPageNosaveFree(pfn_to_page(pfn + i));
15110+ }
15111+ }
15112+
4e97e4e9 15113+ for_each_online_cpu(cpu) {
15114+ struct per_cpu_pageset *pset = zone_pcp(zone, cpu);
7f9d2ee0 15115+ struct per_cpu_pages *pcp = &pset->pcp;
15116+ struct page *page;
24613191 15117+
7f9d2ee0 15118+ list_for_each_entry(page, &pcp->list, lru)
15119+ SetPageNosaveFree(page);
24613191 15120+ }
ad8f4a28 15121+
4e97e4e9 15122+ spin_unlock_irqrestore(&zone->lock, flags);
24613191 15123+ }
15124+}
15125+
4e97e4e9 15126+/* size_of_free_region
ad8f4a28
AM
15127+ *
15128+ * Description: Return the number of pages that are free, beginning with and
4e97e4e9 15129+ * including this one.
15130+ */
15131+static int size_of_free_region(struct page *page)
24613191 15132+{
4e97e4e9 15133+ struct zone *zone = page_zone(page);
7f9d2ee0 15134+ unsigned long this_pfn = page_to_pfn(page),
15135+ orig_pfn = this_pfn,
15136+ end_pfn = zone->zone_start_pfn + zone->spanned_pages - 1;
24613191 15137+
7f9d2ee0 15138+ while (this_pfn <= end_pfn && PageNosaveFree(pfn_to_page(this_pfn)))
15139+ this_pfn++;
15140+
15141+ return (this_pfn - orig_pfn);
24613191 15142+}
15143+
4e97e4e9 15144+/* flag_image_pages
24613191 15145+ *
4e97e4e9 15146+ * This routine generates our lists of pages to be stored in each
15147+ * pageset. Since we store the data using extents, and adding new
15148+ * extents might allocate a new extent page, this routine may well
15149+ * be called more than once.
24613191 15150+ */
4e97e4e9 15151+static void flag_image_pages(int atomic_copy)
24613191 15152+{
4e97e4e9 15153+ int num_free = 0;
15154+ unsigned long loop;
15155+ struct zone *zone;
24613191 15156+
4e97e4e9 15157+ pagedir1.size = 0;
15158+ pagedir2.size = 0;
24613191 15159+
4e97e4e9 15160+ set_highmem_size(pagedir1, 0);
15161+ set_highmem_size(pagedir2, 0);
24613191 15162+
4e97e4e9 15163+ num_nosave = 0;
24613191 15164+
4e97e4e9 15165+ clear_dyn_pageflags(&pageset1_map);
24613191 15166+
4e97e4e9 15167+ generate_free_page_map();
24613191 15168+
4e97e4e9 15169+ /*
ad8f4a28
AM
15170+ * Pages not to be saved are marked Nosave irrespective of being
15171+ * reserved.
4e97e4e9 15172+ */
15173+ for_each_zone(zone) {
15174+ int highmem = is_highmem(zone);
15175+
15176+ if (!populated_zone(zone))
15177+ continue;
15178+
15179+ for (loop = 0; loop < zone->spanned_pages; loop++) {
15180+ unsigned long pfn = zone->zone_start_pfn + loop;
15181+ struct page *page;
15182+ int chunk_size;
15183+
15184+ if (!pfn_valid(pfn))
15185+ continue;
15186+
15187+ page = pfn_to_page(pfn);
15188+
15189+ chunk_size = size_of_free_region(page);
15190+ if (chunk_size) {
15191+ num_free += chunk_size;
15192+ loop += chunk_size - 1;
15193+ continue;
15194+ }
15195+
15196+ if (highmem)
15197+ page = saveable_highmem_page(pfn);
e8d0ad9d 15198+ else
4e97e4e9 15199+ page = saveable_page(pfn);
e8d0ad9d 15200+
4e97e4e9 15201+ if (!page || PageNosave(page)) {
15202+ num_nosave++;
15203+ continue;
15204+ }
24613191 15205+
4e97e4e9 15206+ if (PagePageset2(page)) {
15207+ pagedir2.size++;
15208+ if (PageHighMem(page))
15209+ inc_highmem_size(pagedir2);
15210+ else
15211+ SetPagePageset1Copy(page);
15212+ if (PageResave(page)) {
15213+ SetPagePageset1(page);
15214+ ClearPagePageset1Copy(page);
15215+ pagedir1.size++;
15216+ if (PageHighMem(page))
15217+ inc_highmem_size(pagedir1);
15218+ }
15219+ } else {
15220+ pagedir1.size++;
15221+ SetPagePageset1(page);
15222+ if (PageHighMem(page))
15223+ inc_highmem_size(pagedir1);
15224+ }
15225+ }
24613191 15226+ }
15227+
4e97e4e9 15228+ if (atomic_copy)
15229+ return;
24613191 15230+
4e97e4e9 15231+ toi_message(TOI_EAT_MEMORY, TOI_MEDIUM, 0,
7f9d2ee0 15232+ "Count data pages: Set1 (%d) + Set2 (%d) + Nosave (%ld) + "
4e97e4e9 15233+ "NumFree (%d) = %d.\n",
15234+ pagedir1.size, pagedir2.size, num_nosave, num_free,
15235+ pagedir1.size + pagedir2.size + num_nosave + num_free);
15236+}
24613191 15237+
ad8f4a28 15238+void toi_recalculate_image_contents(int atomic_copy)
4e97e4e9 15239+{
15240+ clear_dyn_pageflags(&pageset1_map);
15241+ if (!atomic_copy) {
15242+ int pfn;
15243+ BITMAP_FOR_EACH_SET(&pageset2_map, pfn)
15244+ ClearPagePageset1Copy(pfn_to_page(pfn));
15245+ /* Need to call this before getting pageset1_size! */
15246+ toi_mark_pages_for_pageset2();
15247+ }
15248+ flag_image_pages(atomic_copy);
24613191 15249+
4e97e4e9 15250+ if (!atomic_copy) {
15251+ storage_available = toiActiveAllocator->storage_available();
15252+ display_stats(0, 0);
24613191 15253+ }
24613191 15254+}
15255+
4e97e4e9 15256+/* update_image
15257+ *
15258+ * Allocate [more] memory and storage for the image.
15259+ */
ad8f4a28
AM
15260+static void update_image(void)
15261+{
7f9d2ee0 15262+ int wanted, got;
15263+ long seek;
24613191 15264+
4e97e4e9 15265+ toi_recalculate_image_contents(0);
24613191 15266+
4e97e4e9 15267+ /* Include allowance for growth in pagedir1 while writing pagedir 2 */
15268+ wanted = pagedir1.size + extra_pd1_pages_allowance -
15269+ get_lowmem_size(pagedir2);
15270+ if (wanted > extra_pages_allocated) {
15271+ got = toi_allocate_extra_pagedir_memory(wanted);
15272+ if (wanted < got) {
15273+ toi_message(TOI_EAT_MEMORY, TOI_LOW, 1,
15274+ "Want %d extra pages for pageset1, got %d.\n",
15275+ wanted, got);
15276+ return;
15277+ }
15278+ }
24613191 15279+
4e97e4e9 15280+ thaw_kernel_threads();
24613191 15281+
ad8f4a28 15282+ /*
4e97e4e9 15283+ * Allocate remaining storage space, if possible, up to the
15284+ * maximum we know we'll need. It's okay to allocate the
15285+ * maximum if the writer is the swapwriter, but
15286+ * we don't want to grab all available space on an NFS share.
15287+ * We therefore ignore the expected compression ratio here,
15288+ * thereby trying to allocate the maximum image size we could
15289+ * need (assuming compression doesn't expand the image), but
15290+ * don't complain if we can't get the full amount we're after.
15291+ */
24613191 15292+
7f9d2ee0 15293+ storage_available = toiActiveAllocator->storage_available();
24613191 15294+
7f9d2ee0 15295+ header_space_allocated = header_storage_needed();
24613191 15296+
7f9d2ee0 15297+ toiActiveAllocator->reserve_header_space(header_space_allocated);
24613191 15298+
7f9d2ee0 15299+ seek = min(storage_available, main_storage_needed(0, 0));
4e97e4e9 15300+
7f9d2ee0 15301+ toiActiveAllocator->allocate_storage(seek);
15302+
15303+ main_storage_allocated = toiActiveAllocator->storage_allocated();
24613191 15304+
4e97e4e9 15305+ if (freeze_processes())
15306+ set_abort_result(TOI_FREEZING_FAILED);
24613191 15307+
4e97e4e9 15308+ toi_recalculate_image_contents(0);
24613191 15309+}
15310+
4e97e4e9 15311+/* attempt_to_freeze
ad8f4a28 15312+ *
4e97e4e9 15313+ * Try to freeze processes.
15314+ */
24613191 15315+
4e97e4e9 15316+static int attempt_to_freeze(void)
24613191 15317+{
4e97e4e9 15318+ int result;
ad8f4a28 15319+
4e97e4e9 15320+ /* Stop processes before checking again */
15321+ thaw_processes();
ad8f4a28
AM
15322+ toi_prepare_status(CLEAR_BAR, "Freezing processes & syncing "
15323+ "filesystems.");
4e97e4e9 15324+ result = freeze_processes();
24613191 15325+
4e97e4e9 15326+ if (result)
15327+ set_abort_result(TOI_FREEZING_FAILED);
24613191 15328+
4e97e4e9 15329+ return result;
15330+}
24613191 15331+
4e97e4e9 15332+/* eat_memory
15333+ *
15334+ * Try to free some memory, either to meet hard or soft constraints on the image
15335+ * characteristics.
ad8f4a28 15336+ *
4e97e4e9 15337+ * Hard constraints:
15338+ * - Pageset1 must be < half of memory;
15339+ * - We must have enough memory free at resume time to have pageset1
15340+ * be able to be loaded in pages that don't conflict with where it has to
15341+ * be restored.
15342+ * Soft constraints
15343+ * - User specificied image size limit.
15344+ */
15345+static void eat_memory(void)
15346+{
7f9d2ee0 15347+ long amount_wanted = 0;
4e97e4e9 15348+ int did_eat_memory = 0;
ad8f4a28 15349+
4e97e4e9 15350+ /*
15351+ * Note that if we have enough storage space and enough free memory, we
15352+ * may exit without eating anything. We give up when the last 10
15353+ * iterations ate no extra pages because we're not going to get much
15354+ * more anyway, but the few pages we get will take a lot of time.
15355+ *
15356+ * We freeze processes before beginning, and then unfreeze them if we
15357+ * need to eat memory until we think we have enough. If our attempts
15358+ * to freeze fail, we give up and abort.
15359+ */
15360+
15361+ toi_recalculate_image_contents(0);
15362+ amount_wanted = amount_needed(1);
15363+
15364+ switch (image_size_limit) {
ad8f4a28
AM
15365+ case -1: /* Don't eat any memory */
15366+ if (amount_wanted > 0) {
15367+ set_abort_result(TOI_WOULD_EAT_MEMORY);
15368+ return;
15369+ }
15370+ break;
15371+ case -2: /* Free caches only */
15372+ drop_pagecache();
15373+ toi_recalculate_image_contents(0);
15374+ amount_wanted = amount_needed(1);
15375+ did_eat_memory = 1;
15376+ break;
15377+ default:
15378+ break;
24613191 15379+ }
15380+
4e97e4e9 15381+ if (amount_wanted > 0 && !test_result_state(TOI_ABORTED) &&
15382+ image_size_limit != -1) {
15383+ struct zone *zone;
15384+ int zone_idx;
24613191 15385+
ad8f4a28 15386+ toi_prepare_status(CLEAR_BAR,
7f9d2ee0 15387+ "Seeking to free %ldMB of memory.",
ad8f4a28 15388+ MB(amount_wanted));
24613191 15389+
4e97e4e9 15390+ thaw_kernel_threads();
24613191 15391+
4e97e4e9 15392+ for (zone_idx = 0; zone_idx < MAX_NR_ZONES; zone_idx++) {
7f9d2ee0 15393+ unsigned long zone_type_free = max_t(long,
ad8f4a28
AM
15394+ (zone_idx == ZONE_HIGHMEM) ?
15395+ highpages_ps1_to_free() :
15396+ lowpages_ps1_to_free(), amount_wanted);
24613191 15397+
4e97e4e9 15398+ if (zone_type_free < 0)
15399+ break;
24613191 15400+
4e97e4e9 15401+ for_each_zone(zone) {
15402+ if (zone_idx(zone) != zone_idx)
15403+ continue;
24613191 15404+
4e97e4e9 15405+ shrink_one_zone(zone, zone_type_free, 3);
24613191 15406+
4e97e4e9 15407+ did_eat_memory = 1;
24613191 15408+
4e97e4e9 15409+ toi_recalculate_image_contents(0);
24613191 15410+
4e97e4e9 15411+ amount_wanted = amount_needed(1);
7f9d2ee0 15412+ zone_type_free = max_t(long,
ad8f4a28 15413+ (zone_idx == ZONE_HIGHMEM) ?
4e97e4e9 15414+ highpages_ps1_to_free() :
15415+ lowpages_ps1_to_free(), amount_wanted);
24613191 15416+
4e97e4e9 15417+ if (zone_type_free < 0)
15418+ break;
15419+ }
15420+ }
24613191 15421+
4e97e4e9 15422+ toi_cond_pause(0, NULL);
24613191 15423+
4e97e4e9 15424+ if (freeze_processes())
15425+ set_abort_result(TOI_FREEZING_FAILED);
15426+ }
ad8f4a28 15427+
4e97e4e9 15428+ if (did_eat_memory) {
15429+ unsigned long orig_state = get_toi_state();
15430+ /* Freeze_processes will call sys_sync too */
15431+ restore_toi_state(orig_state);
15432+ toi_recalculate_image_contents(0);
15433+ }
24613191 15434+
4e97e4e9 15435+ /* Blank out image size display */
15436+ toi_update_status(100, 100, NULL);
15437+}
15438+
15439+/* toi_prepare_image
15440+ *
15441+ * Entry point to the whole image preparation section.
15442+ *
15443+ * We do four things:
15444+ * - Freeze processes;
15445+ * - Ensure image size constraints are met;
15446+ * - Complete all the preparation for saving the image,
15447+ * including allocation of storage. The only memory
15448+ * that should be needed when we're finished is that
15449+ * for actually storing the image (and we know how
15450+ * much is needed for that because the modules tell
15451+ * us).
15452+ * - Make sure that all dirty buffers are written out.
15453+ */
15454+#define MAX_TRIES 2
15455+int toi_prepare_image(void)
15456+{
15457+ int result = 1, tries = 1;
24613191 15458+
4e97e4e9 15459+ header_space_allocated = 0;
15460+ main_storage_allocated = 0;
24613191 15461+
4e97e4e9 15462+ if (attempt_to_freeze())
15463+ return 1;
24613191 15464+
4e97e4e9 15465+ if (!extra_pd1_pages_allowance)
15466+ get_extra_pd1_allowance();
24613191 15467+
4e97e4e9 15468+ storage_available = toiActiveAllocator->storage_available();
24613191 15469+
4e97e4e9 15470+ if (!storage_available) {
7f9d2ee0 15471+ printk(KERN_INFO "No storage available. Didn't try to prepare "
15472+ "an image.\n");
ad8f4a28 15473+ display_failure_reason(0);
4e97e4e9 15474+ set_abort_result(TOI_NOSTORAGE_AVAILABLE);
15475+ return 1;
15476+ }
24613191 15477+
4e97e4e9 15478+ if (build_attention_list()) {
15479+ abort_hibernate(TOI_UNABLE_TO_PREPARE_IMAGE,
15480+ "Unable to successfully prepare the image.\n");
15481+ return 1;
15482+ }
24613191 15483+
4e97e4e9 15484+ do {
ad8f4a28
AM
15485+ toi_prepare_status(CLEAR_BAR,
15486+ "Preparing Image. Try %d.", tries);
15487+
4e97e4e9 15488+ eat_memory();
24613191 15489+
4e97e4e9 15490+ if (test_result_state(TOI_ABORTED))
15491+ break;
24613191 15492+
4e97e4e9 15493+ update_image();
24613191 15494+
4e97e4e9 15495+ tries++;
24613191 15496+
4e97e4e9 15497+ } while (image_not_ready(1) && tries <= MAX_TRIES &&
15498+ !test_result_state(TOI_ABORTED));
24613191 15499+
4e97e4e9 15500+ result = image_not_ready(0);
24613191 15501+
4e97e4e9 15502+ if (!test_result_state(TOI_ABORTED)) {
15503+ if (result) {
15504+ display_stats(1, 0);
ad8f4a28 15505+ display_failure_reason(tries > MAX_TRIES);
4e97e4e9 15506+ abort_hibernate(TOI_UNABLE_TO_PREPARE_IMAGE,
15507+ "Unable to successfully prepare the image.\n");
15508+ } else {
15509+ unlink_lru_lists();
15510+ toi_cond_pause(1, "Image preparation complete.");
15511+ }
15512+ }
24613191 15513+
ad8f4a28 15514+ return result ? result : allocate_checksum_pages();
24613191 15515+}
24613191 15516+
4e97e4e9 15517+#ifdef CONFIG_TOI_EXPORTS
15518+EXPORT_SYMBOL_GPL(real_nr_free_pages);
15519+#endif
15520diff --git a/kernel/power/tuxonice_prepare_image.h b/kernel/power/tuxonice_prepare_image.h
15521new file mode 100644
7f9d2ee0 15522index 0000000..9604c79
4e97e4e9 15523--- /dev/null
15524+++ b/kernel/power/tuxonice_prepare_image.h
15525@@ -0,0 +1,35 @@
15526+/*
15527+ * kernel/power/tuxonice_prepare_image.h
15528+ *
15529+ * Copyright (C) 2003-2007 Nigel Cunningham (nigel at tuxonice net)
15530+ *
15531+ * This file is released under the GPLv2.
15532+ *
15533+ */
24613191 15534+
4e97e4e9 15535+#include <asm/sections.h>
24613191 15536+
4e97e4e9 15537+extern int toi_prepare_image(void);
15538+extern void toi_recalculate_image_contents(int storage_available);
7f9d2ee0 15539+extern long real_nr_free_pages(unsigned long zone_idx_mask);
4e97e4e9 15540+extern int image_size_limit;
15541+extern void toi_free_extra_pagedir_memory(void);
7f9d2ee0 15542+extern long extra_pd1_pages_allowance;
4e97e4e9 15543+extern void free_attention_list(void);
24613191 15544+
4e97e4e9 15545+#define MIN_FREE_RAM 100
7f9d2ee0 15546+#define DEFAULT_EXTRA_PAGES_ALLOWANCE 500
24613191 15547+
4e97e4e9 15548+#define all_zones_mask ((unsigned long) ((1 << MAX_NR_ZONES) - 1))
15549+#ifdef CONFIG_HIGHMEM
15550+#define real_nr_free_high_pages() (real_nr_free_pages(1 << ZONE_HIGHMEM))
15551+#define real_nr_free_low_pages() (real_nr_free_pages(all_zones_mask - \
15552+ (1 << ZONE_HIGHMEM)))
15553+#else
15554+#define real_nr_free_high_pages() (0)
15555+#define real_nr_free_low_pages() (real_nr_free_pages(all_zones_mask))
24613191 15556+
4e97e4e9 15557+/* For eat_memory function */
15558+#define ZONE_HIGHMEM (MAX_NR_ZONES + 1)
15559+#endif
24613191 15560+
4e97e4e9 15561diff --git a/kernel/power/tuxonice_storage.c b/kernel/power/tuxonice_storage.c
15562new file mode 100644
ad8f4a28 15563index 0000000..777ff3c
4e97e4e9 15564--- /dev/null
15565+++ b/kernel/power/tuxonice_storage.c
ad8f4a28 15566@@ -0,0 +1,292 @@
24613191 15567+/*
4e97e4e9 15568+ * kernel/power/tuxonice_storage.c
24613191 15569+ *
4e97e4e9 15570+ * Copyright (C) 2005-2007 Nigel Cunningham (nigel at tuxonice net)
15571+ *
15572+ * This file is released under the GPLv2.
15573+ *
15574+ * Routines for talking to a userspace program that manages storage.
15575+ *
15576+ * The kernel side:
15577+ * - starts the userspace program;
15578+ * - sends messages telling it when to open and close the connection;
15579+ * - tells it when to quit;
15580+ *
15581+ * The user space side:
15582+ * - passes messages regarding status;
24613191 15583+ *
24613191 15584+ */
15585+
4e97e4e9 15586+#include <linux/suspend.h>
15587+#include <linux/freezer.h>
ad8f4a28 15588+
4e97e4e9 15589+#include "tuxonice_sysfs.h"
15590+#include "tuxonice_modules.h"
15591+#include "tuxonice_netlink.h"
15592+#include "tuxonice_storage.h"
15593+#include "tuxonice_ui.h"
15594+
15595+static struct user_helper_data usm_helper_data;
15596+static struct toi_module_ops usm_ops;
ad8f4a28
AM
15597+static int message_received, usm_prepare_count;
15598+static int storage_manager_last_action, storage_manager_action;
15599+
4e97e4e9 15600+static int usm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
24613191 15601+{
4e97e4e9 15602+ int type;
15603+ int *data;
24613191 15604+
4e97e4e9 15605+ type = nlh->nlmsg_type;
24613191 15606+
4e97e4e9 15607+ /* A control message: ignore them */
15608+ if (type < NETLINK_MSG_BASE)
15609+ return 0;
24613191 15610+
4e97e4e9 15611+ /* Unknown message: reply with EINVAL */
15612+ if (type >= USM_MSG_MAX)
15613+ return -EINVAL;
24613191 15614+
4e97e4e9 15615+ /* All operations require privileges, even GET */
15616+ if (security_netlink_recv(skb, CAP_NET_ADMIN))
15617+ return -EPERM;
24613191 15618+
4e97e4e9 15619+ /* Only allow one task to receive NOFREEZE privileges */
15620+ if (type == NETLINK_MSG_NOFREEZE_ME && usm_helper_data.pid != -1)
15621+ return -EBUSY;
24613191 15622+
ad8f4a28 15623+ data = (int *) NLMSG_DATA(nlh);
24613191 15624+
4e97e4e9 15625+ switch (type) {
ad8f4a28
AM
15626+ case USM_MSG_SUCCESS:
15627+ case USM_MSG_FAILED:
15628+ message_received = type;
15629+ complete(&usm_helper_data.wait_for_process);
15630+ break;
15631+ default:
15632+ printk(KERN_INFO "Storage manager doesn't recognise "
15633+ "message %d.\n", type);
4e97e4e9 15634+ }
24613191 15635+
4e97e4e9 15636+ return 1;
24613191 15637+}
15638+
4e97e4e9 15639+#ifdef CONFIG_NET
ad8f4a28 15640+static int activations;
24613191 15641+
4e97e4e9 15642+int toi_activate_storage(int force)
24613191 15643+{
4e97e4e9 15644+ int tries = 1;
24613191 15645+
4e97e4e9 15646+ if (usm_helper_data.pid == -1 || !usm_ops.enabled)
15647+ return 0;
24613191 15648+
4e97e4e9 15649+ message_received = 0;
15650+ activations++;
24613191 15651+
4e97e4e9 15652+ if (activations > 1 && !force)
15653+ return 0;
24613191 15654+
ad8f4a28
AM
15655+ while ((!message_received || message_received == USM_MSG_FAILED) &&
15656+ tries < 2) {
15657+ toi_prepare_status(DONT_CLEAR_BAR, "Activate storage attempt "
15658+ "%d.\n", tries);
24613191 15659+
4e97e4e9 15660+ init_completion(&usm_helper_data.wait_for_process);
24613191 15661+
4e97e4e9 15662+ toi_send_netlink_message(&usm_helper_data,
15663+ USM_MSG_CONNECT,
15664+ NULL, 0);
24613191 15665+
4e97e4e9 15666+ /* Wait 2 seconds for the userspace process to make contact */
ad8f4a28
AM
15667+ wait_for_completion_timeout(&usm_helper_data.wait_for_process,
15668+ 2*HZ);
24613191 15669+
4e97e4e9 15670+ tries++;
24613191 15671+ }
15672+
4e97e4e9 15673+ return 0;
24613191 15674+}
15675+
4e97e4e9 15676+int toi_deactivate_storage(int force)
24613191 15677+{
4e97e4e9 15678+ if (usm_helper_data.pid == -1 || !usm_ops.enabled)
15679+ return 0;
ad8f4a28 15680+
4e97e4e9 15681+ message_received = 0;
15682+ activations--;
24613191 15683+
4e97e4e9 15684+ if (activations && !force)
15685+ return 0;
24613191 15686+
4e97e4e9 15687+ init_completion(&usm_helper_data.wait_for_process);
24613191 15688+
4e97e4e9 15689+ toi_send_netlink_message(&usm_helper_data,
15690+ USM_MSG_DISCONNECT,
15691+ NULL, 0);
24613191 15692+
4e97e4e9 15693+ wait_for_completion_timeout(&usm_helper_data.wait_for_process, 2*HZ);
24613191 15694+
4e97e4e9 15695+ if (!message_received || message_received == USM_MSG_FAILED) {
ad8f4a28 15696+ printk(KERN_INFO "Returning failure disconnecting storage.\n");
4e97e4e9 15697+ return 1;
15698+ }
24613191 15699+
4e97e4e9 15700+ return 0;
24613191 15701+}
4e97e4e9 15702+#endif
24613191 15703+
4e97e4e9 15704+static void storage_manager_simulate(void)
24613191 15705+{
ad8f4a28 15706+ printk(KERN_INFO "--- Storage manager simulate ---\n");
4e97e4e9 15707+ toi_prepare_usm();
15708+ schedule();
ad8f4a28 15709+ printk(KERN_INFO "--- Activate storage 1 ---\n");
4e97e4e9 15710+ toi_activate_storage(1);
15711+ schedule();
ad8f4a28 15712+ printk(KERN_INFO "--- Deactivate storage 1 ---\n");
4e97e4e9 15713+ toi_deactivate_storage(1);
15714+ schedule();
ad8f4a28 15715+ printk(KERN_INFO "--- Cleanup usm ---\n");
4e97e4e9 15716+ toi_cleanup_usm();
15717+ schedule();
ad8f4a28 15718+ printk(KERN_INFO "--- Storage manager simulate ends ---\n");
24613191 15719+}
15720+
4e97e4e9 15721+static int usm_storage_needed(void)
24613191 15722+{
4e97e4e9 15723+ return strlen(usm_helper_data.program);
24613191 15724+}
15725+
4e97e4e9 15726+static int usm_save_config_info(char *buf)
24613191 15727+{
4e97e4e9 15728+ int len = strlen(usm_helper_data.program);
15729+ memcpy(buf, usm_helper_data.program, len);
15730+ return len;
15731+}
24613191 15732+
4e97e4e9 15733+static void usm_load_config_info(char *buf, int size)
15734+{
15735+ /* Don't load the saved path if one has already been set */
15736+ if (usm_helper_data.program[0])
15737+ return;
24613191 15738+
4e97e4e9 15739+ memcpy(usm_helper_data.program, buf, size);
24613191 15740+}
15741+
4e97e4e9 15742+static int usm_memory_needed(void)
24613191 15743+{
4e97e4e9 15744+ /* ball park figure of 32 pages */
15745+ return (32 * PAGE_SIZE);
24613191 15746+}
15747+
4e97e4e9 15748+/* toi_prepare_usm
24613191 15749+ */
4e97e4e9 15750+int toi_prepare_usm(void)
24613191 15751+{
4e97e4e9 15752+ usm_prepare_count++;
24613191 15753+
4e97e4e9 15754+ if (usm_prepare_count > 1 || !usm_ops.enabled)
24613191 15755+ return 0;
ad8f4a28 15756+
4e97e4e9 15757+ usm_helper_data.pid = -1;
24613191 15758+
4e97e4e9 15759+ if (!*usm_helper_data.program)
15760+ return 0;
24613191 15761+
4e97e4e9 15762+ toi_netlink_setup(&usm_helper_data);
24613191 15763+
4e97e4e9 15764+ if (usm_helper_data.pid == -1)
ad8f4a28
AM
15765+ printk(KERN_INFO "TuxOnIce Storage Manager wanted, but couldn't"
15766+ " start it.\n");
24613191 15767+
4e97e4e9 15768+ toi_activate_storage(0);
24613191 15769+
4e97e4e9 15770+ return (usm_helper_data.pid != -1);
15771+}
24613191 15772+
4e97e4e9 15773+void toi_cleanup_usm(void)
15774+{
15775+ usm_prepare_count--;
15776+
15777+ if (usm_helper_data.pid > -1 && !usm_prepare_count) {
15778+ toi_deactivate_storage(0);
15779+ toi_netlink_close(&usm_helper_data);
24613191 15780+ }
4e97e4e9 15781+}
24613191 15782+
4e97e4e9 15783+static void storage_manager_activate(void)
15784+{
15785+ if (storage_manager_action == storage_manager_last_action)
15786+ return;
24613191 15787+
4e97e4e9 15788+ if (storage_manager_action)
15789+ toi_prepare_usm();
15790+ else
15791+ toi_cleanup_usm();
24613191 15792+
4e97e4e9 15793+ storage_manager_last_action = storage_manager_action;
15794+}
24613191 15795+
4e97e4e9 15796+/*
15797+ * User interface specific /sys/power/tuxonice entries.
15798+ */
24613191 15799+
4e97e4e9 15800+static struct toi_sysfs_data sysfs_params[] = {
15801+ { TOI_ATTR("simulate_atomic_copy", SYSFS_RW),
15802+ .type = TOI_SYSFS_DATA_NONE,
15803+ .write_side_effect = storage_manager_simulate,
15804+ },
24613191 15805+
4e97e4e9 15806+ { TOI_ATTR("enabled", SYSFS_RW),
15807+ SYSFS_INT(&usm_ops.enabled, 0, 1, 0)
15808+ },
24613191 15809+
4e97e4e9 15810+ { TOI_ATTR("program", SYSFS_RW),
15811+ SYSFS_STRING(usm_helper_data.program, 254, 0)
15812+ },
24613191 15813+
4e97e4e9 15814+ { TOI_ATTR("activate_storage", SYSFS_RW),
15815+ SYSFS_INT(&storage_manager_action, 0, 1, 0),
15816+ .write_side_effect = storage_manager_activate,
15817+ }
15818+};
24613191 15819+
4e97e4e9 15820+static struct toi_module_ops usm_ops = {
15821+ .type = MISC_MODULE,
15822+ .name = "usm",
15823+ .directory = "storage_manager",
15824+ .module = THIS_MODULE,
15825+ .storage_needed = usm_storage_needed,
15826+ .save_config_info = usm_save_config_info,
15827+ .load_config_info = usm_load_config_info,
15828+ .memory_needed = usm_memory_needed,
24613191 15829+
4e97e4e9 15830+ .sysfs_data = sysfs_params,
ad8f4a28
AM
15831+ .num_sysfs_entries = sizeof(sysfs_params) /
15832+ sizeof(struct toi_sysfs_data),
4e97e4e9 15833+};
24613191 15834+
4e97e4e9 15835+/* toi_usm_sysfs_init
15836+ * Description: Boot time initialisation for user interface.
24613191 15837+ */
4e97e4e9 15838+int toi_usm_init(void)
24613191 15839+{
4e97e4e9 15840+ usm_helper_data.nl = NULL;
15841+ usm_helper_data.program[0] = '\0';
15842+ usm_helper_data.pid = -1;
15843+ usm_helper_data.skb_size = 0;
15844+ usm_helper_data.pool_limit = 6;
15845+ usm_helper_data.netlink_id = NETLINK_TOI_USM;
15846+ usm_helper_data.name = "userspace storage manager";
15847+ usm_helper_data.rcv_msg = usm_user_rcv_msg;
15848+ usm_helper_data.interface_version = 1;
15849+ usm_helper_data.must_init = 0;
15850+ init_completion(&usm_helper_data.wait_for_process);
15851+
15852+ return toi_register_module(&usm_ops);
24613191 15853+}
15854+
4e97e4e9 15855+void toi_usm_exit(void)
24613191 15856+{
4e97e4e9 15857+ toi_unregister_module(&usm_ops);
15858+}
15859diff --git a/kernel/power/tuxonice_storage.h b/kernel/power/tuxonice_storage.h
15860new file mode 100644
ad8f4a28 15861index 0000000..2f895bf
4e97e4e9 15862--- /dev/null
15863+++ b/kernel/power/tuxonice_storage.h
15864@@ -0,0 +1,53 @@
15865+/*
15866+ * kernel/power/tuxonice_storage.h
15867+ *
15868+ * Copyright (C) 2005-2007 Nigel Cunningham (nigel at tuxonice net)
15869+ *
15870+ * This file is released under the GPLv2.
15871+ */
24613191 15872+
4e97e4e9 15873+#ifdef CONFIG_NET
15874+int toi_prepare_usm(void);
15875+void toi_cleanup_usm(void);
24613191 15876+
4e97e4e9 15877+int toi_activate_storage(int force);
15878+int toi_deactivate_storage(int force);
15879+extern int toi_usm_init(void);
15880+extern void toi_usm_exit(void);
15881+#else
15882+static inline int toi_usm_init(void) { return 0; }
15883+static inline void toi_usm_exit(void) { }
24613191 15884+
4e97e4e9 15885+static inline int toi_activate_storage(int force)
15886+{
e8d0ad9d 15887+ return 0;
24613191 15888+}
15889+
4e97e4e9 15890+static inline int toi_deactivate_storage(int force)
15891+{
15892+ return 0;
15893+}
24613191 15894+
4e97e4e9 15895+static inline int toi_prepare_usm(void) { return 0; }
15896+static inline void toi_cleanup_usm(void) { }
15897+#endif
24613191 15898+
4e97e4e9 15899+enum {
15900+ USM_MSG_BASE = 0x10,
24613191 15901+
4e97e4e9 15902+ /* Kernel -> Userspace */
15903+ USM_MSG_CONNECT = 0x30,
15904+ USM_MSG_DISCONNECT = 0x31,
15905+ USM_MSG_SUCCESS = 0x40,
15906+ USM_MSG_FAILED = 0x41,
24613191 15907+
4e97e4e9 15908+ USM_MSG_MAX,
24613191 15909+};
15910+
4e97e4e9 15911+#ifdef CONFIG_NET
15912+extern __init int toi_usm_init(void);
15913+extern __exit void toi_usm_cleanup(void);
24613191 15914+#else
ad8f4a28
AM
15915+#define toi_usm_init() do { } while (0)
15916+#define toi_usm_cleanup() do { } while (0)
24613191 15917+#endif
4e97e4e9 15918diff --git a/kernel/power/tuxonice_swap.c b/kernel/power/tuxonice_swap.c
15919new file mode 100644
7f9d2ee0 15920index 0000000..3cecaa0
4e97e4e9 15921--- /dev/null
15922+++ b/kernel/power/tuxonice_swap.c
7f9d2ee0 15923@@ -0,0 +1,1280 @@
24613191 15924+/*
4e97e4e9 15925+ * kernel/power/tuxonice_swap.c
24613191 15926+ *
4e97e4e9 15927+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
24613191 15928+ *
15929+ * Distributed under GPLv2.
ad8f4a28 15930+ *
24613191 15931+ * This file encapsulates functions for usage of swap space as a
15932+ * backing store.
15933+ */
15934+
15935+#include <linux/suspend.h>
15936+#include <linux/module.h>
15937+#include <linux/blkdev.h>
15938+#include <linux/swapops.h>
15939+#include <linux/swap.h>
15940+#include <linux/syscalls.h>
15941+
4e97e4e9 15942+#include "tuxonice.h"
15943+#include "tuxonice_sysfs.h"
15944+#include "tuxonice_modules.h"
15945+#include "tuxonice_io.h"
15946+#include "tuxonice_ui.h"
15947+#include "tuxonice_extent.h"
15948+#include "tuxonice_block_io.h"
ad8f4a28 15949+#include "tuxonice_alloc.h"
24613191 15950+
4e97e4e9 15951+static struct toi_module_ops toi_swapops;
24613191 15952+
24613191 15953+/* --- Struct of pages stored on disk */
15954+
7f9d2ee0 15955+struct tuxonice_sig_data {
15956+ dev_t device;
15957+ unsigned long sector;
15958+ int resume_attempted;
15959+ int orig_sig_type;
15960+};
15961+
24613191 15962+union diskpage {
15963+ union swap_header swh; /* swh.magic is the only member used */
7f9d2ee0 15964+ struct tuxonice_sig_data tuxonice_sig_data;
24613191 15965+};
15966+
15967+union p_diskpage {
15968+ union diskpage *pointer;
15969+ char *ptr;
ad8f4a28 15970+ unsigned long address;
24613191 15971+};
15972+
7f9d2ee0 15973+enum {
15974+ IMAGE_SIGNATURE,
15975+ NO_IMAGE_SIGNATURE,
15976+ TRIED_RESUME,
15977+ NO_TRIED_RESUME,
15978+};
15979+
15980+/*
15981+ * Both of these point to versions of the swap header page. original_sig points
15982+ * to the data we read from disk at the start of hibernating or checking whether
15983+ * to resume. no_image is the page stored in the image header, showing what the
15984+ * swap header page looked like at the start of hibernating.
15985+ */
15986+static char *current_signature_page;
15987+static char no_image_signature_contents[sizeof(struct tuxonice_sig_data)];
15988+
24613191 15989+/* Devices used for swap */
4e97e4e9 15990+static struct toi_bdev_info devinfo[MAX_SWAPFILES];
24613191 15991+
15992+/* Extent chains for swap & blocks */
15993+struct extent_chain swapextents;
15994+struct extent_chain block_chain[MAX_SWAPFILES];
15995+
15996+static dev_t header_dev_t;
15997+static struct block_device *header_block_device;
15998+static unsigned long headerblock;
15999+
16000+/* For swapfile automatically swapon/off'd. */
16001+static char swapfilename[32] = "";
4e97e4e9 16002+static int toi_swapon_status;
24613191 16003+
16004+/* Header Page Information */
7f9d2ee0 16005+static long header_pages_reserved;
24613191 16006+
16007+/* Swap Pages */
7f9d2ee0 16008+static long swap_pages_allocated;
24613191 16009+
16010+/* User Specified Parameters. */
16011+
16012+static unsigned long resume_firstblock;
24613191 16013+static dev_t resume_swap_dev_t;
16014+static struct block_device *resume_block_device;
16015+
16016+struct sysinfo swapinfo;
24613191 16017+
16018+/* Block devices open. */
ad8f4a28 16019+struct bdev_opened {
24613191 16020+ dev_t device;
16021+ struct block_device *bdev;
24613191 16022+};
16023+
ad8f4a28 16024+/*
24613191 16025+ * Entry MAX_SWAPFILES is the resume block device, which may
4e97e4e9 16026+ * be a swap device not enabled when we hibernate.
24613191 16027+ * Entry MAX_SWAPFILES + 1 is the header block device, which
16028+ * is needed before we find out which slot it occupies.
4e97e4e9 16029+ *
16030+ * We use a separate struct to devInfo so that we can track
16031+ * the bdevs we open, because if we need to abort resuming
16032+ * prior to the atomic restore, they need to be closed, but
16033+ * closing them after sucessfully resuming would be wrong.
24613191 16034+ */
4e97e4e9 16035+static struct bdev_opened *bdevs_opened[MAX_SWAPFILES + 2];
ad8f4a28 16036+
4e97e4e9 16037+/**
16038+ * close_bdev: Close a swap bdev.
16039+ *
16040+ * int: The swap entry number to close.
16041+ */
24613191 16042+static void close_bdev(int i)
16043+{
4e97e4e9 16044+ struct bdev_opened *this = bdevs_opened[i];
24613191 16045+
4e97e4e9 16046+ if (!this)
16047+ return;
24613191 16048+
24613191 16049+ blkdev_put(this->bdev);
ad8f4a28 16050+ toi_kfree(8, this);
4e97e4e9 16051+ bdevs_opened[i] = NULL;
24613191 16052+}
16053+
4e97e4e9 16054+/**
16055+ * close_bdevs: Close all bdevs we opened.
16056+ *
16057+ * Close all bdevs that we opened and reset the related vars.
16058+ */
24613191 16059+static void close_bdevs(void)
16060+{
16061+ int i;
16062+
4e97e4e9 16063+ for (i = 0; i < MAX_SWAPFILES + 2; i++)
16064+ close_bdev(i);
24613191 16065+
16066+ resume_block_device = header_block_device = NULL;
16067+}
16068+
4e97e4e9 16069+/**
16070+ * open_bdev: Open a bdev at resume time.
16071+ *
16072+ * index: The swap index. May be MAX_SWAPFILES for the resume_dev_t
16073+ * (the user can have resume= pointing at a swap partition/file that isn't
16074+ * swapon'd when they hibernate. MAX_SWAPFILES+1 for the first page of the
16075+ * header. It will be from a swap partition that was enabled when we hibernated,
16076+ * but we don't know it's real index until we read that first page.
16077+ * dev_t: The device major/minor.
16078+ * display_errs: Whether to try to do this quietly.
16079+ *
16080+ * We stored a dev_t in the image header. Open the matching device without
16081+ * requiring /dev/<whatever> in most cases and record the details needed
16082+ * to close it later and avoid duplicating work.
16083+ */
24613191 16084+static struct block_device *open_bdev(int index, dev_t device, int display_errs)
16085+{
16086+ struct bdev_opened *this;
16087+ struct block_device *bdev;
16088+
4e97e4e9 16089+ if (bdevs_opened[index]) {
16090+ if (bdevs_opened[index]->device == device)
16091+ return bdevs_opened[index]->bdev;
ad8f4a28 16092+
24613191 16093+ close_bdev(index);
4e97e4e9 16094+ }
24613191 16095+
ad8f4a28 16096+ bdev = toi_open_by_devnum(device, FMODE_READ);
24613191 16097+
16098+ if (IS_ERR(bdev) || !bdev) {
16099+ if (display_errs)
ad8f4a28 16100+ toi_early_boot_message(1, TOI_CONTINUE_REQ,
24613191 16101+ "Failed to get access to block device "
16102+ "\"%x\" (error %d).\n Maybe you need "
16103+ "to run mknod and/or lvmsetup in an "
16104+ "initrd/ramfs?", device, bdev);
16105+ return ERR_PTR(-EINVAL);
16106+ }
16107+
ad8f4a28 16108+ this = toi_kzalloc(8, sizeof(struct bdev_opened), GFP_KERNEL);
24613191 16109+ if (!this) {
4e97e4e9 16110+ printk(KERN_WARNING "TuxOnIce: Failed to allocate memory for "
24613191 16111+ "opening a bdev.");
ad8f4a28 16112+ blkdev_put(bdev);
24613191 16113+ return ERR_PTR(-ENOMEM);
16114+ }
16115+
4e97e4e9 16116+ bdevs_opened[index] = this;
24613191 16117+ this->device = device;
16118+ this->bdev = bdev;
16119+
24613191 16120+ return bdev;
16121+}
16122+
4e97e4e9 16123+/**
16124+ * enable_swapfile: Swapon the user specified swapfile prior to hibernating.
16125+ *
16126+ * Activate the given swapfile if it wasn't already enabled. Remember whether
16127+ * we really did swapon it for swapoffing later.
24613191 16128+ */
4e97e4e9 16129+static void enable_swapfile(void)
24613191 16130+{
16131+ int activateswapresult = -EINVAL;
16132+
24613191 16133+ if (swapfilename[0]) {
16134+ /* Attempt to swap on with maximum priority */
16135+ activateswapresult = sys_swapon(swapfilename, 0xFFFF);
4e97e4e9 16136+ if (activateswapresult && activateswapresult != -EBUSY)
16137+ printk("TuxOnIce: The swapfile/partition specified by "
16138+ "/sys/power/tuxonice/swap/swapfile "
24613191 16139+ "(%s) could not be turned on (error %d). "
16140+ "Attempting to continue.\n",
16141+ swapfilename, activateswapresult);
16142+ if (!activateswapresult)
4e97e4e9 16143+ toi_swapon_status = 1;
24613191 16144+ }
24613191 16145+}
16146+
4e97e4e9 16147+/**
16148+ * disable_swapfile: Swapoff any file swaponed at the start of the cycle.
16149+ *
16150+ * If we did successfully swapon a file at the start of the cycle, swapoff
16151+ * it now (finishing up).
16152+ */
16153+static void disable_swapfile(void)
24613191 16154+{
4e97e4e9 16155+ if (!toi_swapon_status)
16156+ return;
24613191 16157+
4e97e4e9 16158+ sys_swapoff(swapfilename);
16159+ toi_swapon_status = 0;
24613191 16160+}
16161+
4e97e4e9 16162+/**
16163+ * try_to_parse_resume_device: Try to parse resume=
16164+ *
16165+ * Any "swap:" has been stripped away and we just have the path to deal with.
16166+ * We attempt to do name_to_dev_t, open and stat the file. Having opened the
16167+ * file, get the struct block_device * to match.
16168+ */
24613191 16169+static int try_to_parse_resume_device(char *commandline, int quiet)
16170+{
16171+ struct kstat stat;
16172+ int error = 0;
16173+
16174+ resume_swap_dev_t = name_to_dev_t(commandline);
16175+
16176+ if (!resume_swap_dev_t) {
ad8f4a28
AM
16177+ struct file *file = filp_open(commandline,
16178+ O_RDONLY|O_LARGEFILE, 0);
24613191 16179+
16180+ if (!IS_ERR(file) && file) {
16181+ vfs_getattr(file->f_vfsmnt, file->f_dentry, &stat);
16182+ filp_close(file, NULL);
16183+ } else
16184+ error = vfs_stat(commandline, &stat);
16185+ if (!error)
16186+ resume_swap_dev_t = stat.rdev;
16187+ }
16188+
16189+ if (!resume_swap_dev_t) {
16190+ if (quiet)
16191+ return 1;
16192+
4e97e4e9 16193+ if (test_toi_state(TOI_TRYING_TO_RESUME))
16194+ toi_early_boot_message(1, TOI_CONTINUE_REQ,
24613191 16195+ "Failed to translate \"%s\" into a device id.\n",
16196+ commandline);
16197+ else
4e97e4e9 16198+ printk("TuxOnIce: Can't translate \"%s\" into a device "
24613191 16199+ "id yet.\n", commandline);
16200+ return 1;
16201+ }
16202+
16203+ resume_block_device = open_bdev(MAX_SWAPFILES, resume_swap_dev_t, 0);
16204+ if (IS_ERR(resume_block_device)) {
16205+ if (!quiet)
4e97e4e9 16206+ toi_early_boot_message(1, TOI_CONTINUE_REQ,
24613191 16207+ "Failed to get access to \"%s\", where"
16208+ " the swap header should be found.",
16209+ commandline);
16210+ return 1;
16211+ }
ad8f4a28 16212+
24613191 16213+ return 0;
16214+}
16215+
ad8f4a28 16216+/*
24613191 16217+ * If we have read part of the image, we might have filled memory with
16218+ * data that should be zeroed out.
16219+ */
4e97e4e9 16220+static void toi_swap_noresume_reset(void)
24613191 16221+{
7f9d2ee0 16222+ toi_bio_ops.rw_cleanup(READ);
24613191 16223+ memset((char *) &devinfo, 0, sizeof(devinfo));
24613191 16224+}
16225+
7f9d2ee0 16226+static int get_current_signature(void)
24613191 16227+{
7f9d2ee0 16228+ int result;
24613191 16229+
7f9d2ee0 16230+ if (current_signature_page)
24613191 16231+ return 0;
24613191 16232+
7f9d2ee0 16233+ current_signature_page = (char *) toi_get_zeroed_page(38, TOI_ATOMIC_GFP);
16234+ if (!current_signature_page)
16235+ return -ENOMEM;
ad8f4a28 16236+
7f9d2ee0 16237+ result = toi_bio_ops.bdev_page_io(READ, resume_block_device,
16238+ resume_firstblock, virt_to_page(current_signature_page));
ad8f4a28 16239+
7f9d2ee0 16240+ return result;
16241+}
24613191 16242+
7f9d2ee0 16243+static int parse_signature(void)
16244+{
16245+ union p_diskpage swap_header_page;
16246+ struct tuxonice_sig_data *sig;
16247+ int type;
16248+ char *swap_header;
16249+ const char *sigs[] = {
16250+ "SWAP-SPACE", "SWAPSPACE2", "S1SUSP", "S2SUSP", "S1SUSPEND"
16251+ };
16252+
16253+ if (!current_signature_page) {
16254+ int result = get_current_signature();
16255+
16256+ if (result)
16257+ return result;
24613191 16258+ }
16259+
7f9d2ee0 16260+ swap_header_page = (union p_diskpage) current_signature_page;
16261+ sig = (struct tuxonice_sig_data *) current_signature_page;
16262+ swap_header = swap_header_page.pointer->swh.magic.magic;
16263+
16264+ for (type = 0; type < 5; type++)
16265+ if (!memcmp(sigs[type], swap_header, strlen(sigs[type])))
16266+ return type;
16267+
16268+ if (memcmp(tuxonice_signature, swap_header, 1))
16269+ return -1;
16270+
16271+ header_dev_t = sig->device;
16272+ clear_toi_state(TOI_RESUMED_BEFORE);
16273+ if (sig->resume_attempted)
16274+ set_toi_state(TOI_RESUMED_BEFORE);
16275+ headerblock = sig->sector;
16276+
16277+ return 10;
16278+}
16279+
16280+static void forget_signatures(void)
16281+{
16282+ if (current_signature_page) {
16283+ toi_free_page(38, (unsigned long) current_signature_page);
16284+ current_signature_page = NULL;
16285+ }
24613191 16286+}
16287+
16288+/*
7f9d2ee0 16289+ * write_modified_signature
16290+ *
16291+ * Write a (potentially) modified signature page without forgetting the
16292+ * original contents.
24613191 16293+ */
7f9d2ee0 16294+static int write_modified_signature(int modification)
24613191 16295+{
7f9d2ee0 16296+ union p_diskpage swap_header_page;
16297+ struct swap_info_struct *si;
16298+ int result;
16299+ char *orig_sig;
24613191 16300+
7f9d2ee0 16301+ /* In case we haven't already */
16302+ result = get_current_signature();
16303+
16304+ if (result)
16305+ return result;
24613191 16306+
7f9d2ee0 16307+ swap_header_page.address = toi_get_zeroed_page(38, TOI_ATOMIC_GFP);
24613191 16308+
7f9d2ee0 16309+ if (!swap_header_page.address)
16310+ return -ENOMEM;
16311+
16312+ memcpy(swap_header_page.ptr, current_signature_page, PAGE_SIZE);
16313+
16314+ switch (modification) {
16315+ case IMAGE_SIGNATURE:
16316+
16317+ memcpy(no_image_signature_contents, swap_header_page.ptr,
16318+ sizeof(no_image_signature_contents));
16319+
16320+ /* Get the details of the header first page. */
16321+ toi_extent_state_goto_start(&toi_writer_posn);
16322+ toi_bio_ops.forward_one_page(1);
16323+
16324+ si = get_swap_info_struct(toi_writer_posn.current_chain);
16325+
16326+ /* Prepare the signature */
16327+ swap_header_page.pointer->tuxonice_sig_data.device =
16328+ si->bdev->bd_dev;
16329+ swap_header_page.pointer->tuxonice_sig_data.sector =
16330+ toi_writer_posn.current_offset;
16331+ swap_header_page.pointer->tuxonice_sig_data.resume_attempted =0;
16332+ swap_header_page.pointer->tuxonice_sig_data.orig_sig_type =
16333+ parse_signature();
16334+
16335+ memcpy(swap_header_page.pointer->swh.magic.magic,
16336+ tuxonice_signature, sizeof(tuxonice_signature));
16337+
16338+ break;
16339+ case NO_IMAGE_SIGNATURE:
16340+ if (!swap_header_page.pointer->tuxonice_sig_data.orig_sig_type)
16341+ orig_sig = "SWAP-SPACE";
16342+ else
16343+ orig_sig = "SWAPSPACE2";
16344+
16345+ memcpy(swap_header_page.pointer->swh.magic.magic, orig_sig, 10);
16346+ memcpy(swap_header_page.ptr, no_image_signature_contents,
16347+ sizeof(no_image_signature_contents));
16348+ break;
16349+ case TRIED_RESUME:
16350+ swap_header_page.pointer->tuxonice_sig_data.resume_attempted =1;
16351+ break;
16352+ case NO_TRIED_RESUME:
16353+ swap_header_page.pointer->tuxonice_sig_data.resume_attempted =0;
16354+ break;
24613191 16355+ }
16356+
7f9d2ee0 16357+ result = toi_bio_ops.bdev_page_io(WRITE, resume_block_device,
16358+ resume_firstblock, virt_to_page(swap_header_page.address));
16359+
16360+ toi_free_page(38, swap_header_page.address);
24613191 16361+
7f9d2ee0 16362+ return result;
16363+}
24613191 16364+
7f9d2ee0 16365+static int apply_header_reservation(void)
24613191 16366+{
16367+ int i;
16368+
4e97e4e9 16369+ toi_extent_state_goto_start(&toi_writer_posn);
ad8f4a28
AM
16370+ toi_bio_ops.forward_one_page(1); /* To first page */
16371+
7f9d2ee0 16372+ for (i = 0; i < header_pages_reserved; i++) {
ad8f4a28
AM
16373+ if (toi_bio_ops.forward_one_page(1)) {
16374+ printk(KERN_INFO "Out of space while seeking to "
16375+ "allocate header pages,\n");
24613191 16376+ return -ENOSPC;
16377+ }
24613191 16378+ }
16379+
24613191 16380+ /* The end of header pages will be the start of pageset 2;
16381+ * we are now sitting on the first pageset2 page. */
7f9d2ee0 16382+ toi_extent_state_save(&toi_writer_posn, &toi_writer_posn_save[2]);
24613191 16383+ return 0;
16384+}
16385+
7f9d2ee0 16386+static void toi_swap_reserve_header_space(int request)
16387+{
16388+ header_pages_reserved = (long) request;
16389+
16390+ /* If we've already allocated storage (hence ignoring return value): */
16391+ apply_header_reservation();
16392+}
16393+
4e97e4e9 16394+static void free_block_chains(void)
24613191 16395+{
4e97e4e9 16396+ int i;
24613191 16397+
16398+ for (i = 0; i < MAX_SWAPFILES; i++)
16399+ if (block_chain[i].first)
4e97e4e9 16400+ toi_put_extent_chain(&block_chain[i]);
16401+}
16402+
7f9d2ee0 16403+static int add_blocks_to_extent_chain(int chain, int minimum, int maximum)
16404+{
16405+ if (test_action_state(TOI_TEST_BIO))
16406+ printk(KERN_INFO "Adding extent chain %d %d-%d.\n", chain,
16407+ minimum << devinfo[chain].bmap_shift,
16408+ maximum << devinfo[chain].bmap_shift);
16409+
16410+ if (toi_add_to_extent_chain(&block_chain[chain], minimum, maximum)) {
16411+ free_block_chains();
16412+ return -ENOMEM;
16413+ }
16414+
16415+ return 0;
16416+}
16417+
16418+
4e97e4e9 16419+static int get_main_pool_phys_params(void)
16420+{
16421+ struct extent *extentpointer = NULL;
16422+ unsigned long address;
16423+ int extent_min = -1, extent_max = -1, last_chain = -1;
16424+
16425+ free_block_chains();
24613191 16426+
4e97e4e9 16427+ toi_extent_for_each(&swapextents, extentpointer, address) {
24613191 16428+ swp_entry_t swap_address = extent_val_to_swap_entry(address);
16429+ pgoff_t offset = swp_offset(swap_address);
16430+ unsigned swapfilenum = swp_type(swap_address);
ad8f4a28
AM
16431+ struct swap_info_struct *sis =
16432+ get_swap_info_struct(swapfilenum);
24613191 16433+ sector_t new_sector = map_swap_page(sis, offset);
16434+
16435+ if ((new_sector == extent_max + 1) &&
ad8f4a28 16436+ (last_chain == swapfilenum)) {
24613191 16437+ extent_max++;
ad8f4a28
AM
16438+ continue;
16439+ }
16440+
7f9d2ee0 16441+ if (extent_min > -1 && add_blocks_to_extent_chain(last_chain,
16442+ extent_min, extent_max))
16443+ return -ENOMEM;
16444+
ad8f4a28
AM
16445+ extent_min = extent_max = new_sector;
16446+ last_chain = swapfilenum;
24613191 16447+ }
16448+
7f9d2ee0 16449+ if (extent_min > -1 && add_blocks_to_extent_chain(last_chain,
16450+ extent_min, extent_max))
4e97e4e9 16451+ return -ENOMEM;
24613191 16452+
7f9d2ee0 16453+ return apply_header_reservation();
16454+}
16455+
16456+static long raw_to_real(long raw)
16457+{
16458+ long result;
16459+
16460+ result = raw - (raw * (sizeof(unsigned long) + sizeof(int)) +
16461+ (PAGE_SIZE + sizeof(unsigned long) + sizeof(int) + 1)) /
16462+ (PAGE_SIZE + sizeof(unsigned long) + sizeof(int));
16463+
16464+ return result < 0 ? 0 : result;
24613191 16465+}
16466+
4e97e4e9 16467+static int toi_swap_storage_allocated(void)
24613191 16468+{
7f9d2ee0 16469+ return (int) raw_to_real(swap_pages_allocated - header_pages_reserved);
24613191 16470+}
16471+
7f9d2ee0 16472+/*
16473+ * We can't just remember the value from allocation time, because other
16474+ * processes might have allocated swap in the mean time.
16475+ */
4e97e4e9 16476+static int toi_swap_storage_available(void)
24613191 16477+{
16478+ si_swapinfo(&swapinfo);
7f9d2ee0 16479+ return (int) raw_to_real((long) swapinfo.freeswap +
16480+ swap_pages_allocated - header_pages_reserved);
24613191 16481+}
16482+
4e97e4e9 16483+static int toi_swap_initialise(int starting_cycle)
24613191 16484+{
16485+ if (!starting_cycle)
16486+ return 0;
16487+
16488+ enable_swapfile();
16489+
16490+ if (resume_swap_dev_t && !resume_block_device &&
16491+ IS_ERR(resume_block_device =
ad8f4a28 16492+ open_bdev(MAX_SWAPFILES, resume_swap_dev_t, 1)))
24613191 16493+ return 1;
16494+
16495+ return 0;
16496+}
16497+
4e97e4e9 16498+static void toi_swap_cleanup(int ending_cycle)
24613191 16499+{
16500+ if (ending_cycle)
16501+ disable_swapfile();
ad8f4a28 16502+
24613191 16503+ close_bdevs();
7f9d2ee0 16504+
16505+ forget_signatures();
24613191 16506+}
16507+
4e97e4e9 16508+static int toi_swap_release_storage(void)
24613191 16509+{
4e97e4e9 16510+ if (test_action_state(TOI_KEEP_IMAGE) &&
16511+ test_toi_state(TOI_NOW_RESUMING))
24613191 16512+ return 0;
16513+
7f9d2ee0 16514+ header_pages_reserved = 0;
16515+ swap_pages_allocated = 0;
24613191 16516+
16517+ if (swapextents.first) {
16518+ /* Free swap entries */
16519+ struct extent *extentpointer;
16520+ unsigned long extentvalue;
ad8f4a28 16521+ toi_extent_for_each(&swapextents, extentpointer,
e8d0ad9d 16522+ extentvalue)
16523+ swap_free(extent_val_to_swap_entry(extentvalue));
24613191 16524+
4e97e4e9 16525+ toi_put_extent_chain(&swapextents);
24613191 16526+
4e97e4e9 16527+ free_block_chains();
24613191 16528+ }
16529+
16530+ return 0;
16531+}
16532+
e8d0ad9d 16533+static void free_swap_range(unsigned long min, unsigned long max)
16534+{
16535+ int j;
16536+
4e97e4e9 16537+ for (j = min; j <= max; j++)
e8d0ad9d 16538+ swap_free(extent_val_to_swap_entry(j));
16539+}
16540+
ad8f4a28 16541+/*
24613191 16542+ * Round robin allocation (where swap storage has the same priority).
16543+ * could make this very inefficient, so we track extents allocated on
7f9d2ee0 16544+ * a per-swapfile basis.
16545+ *
16546+ * We ignore here the fact that some space is for the header and doesn't
16547+ * have the overhead. It will only rarely make a 1 page difference.
24613191 16548+ */
7f9d2ee0 16549+static int toi_swap_allocate_storage(int request)
24613191 16550+{
4e97e4e9 16551+ int i, result = 0, to_add[MAX_SWAPFILES], pages_to_get, extra_pages,
16552+ gotten = 0;
24613191 16553+ unsigned long extent_min[MAX_SWAPFILES], extent_max[MAX_SWAPFILES];
16554+
7f9d2ee0 16555+ extra_pages = DIV_ROUND_UP(request * (sizeof(unsigned long)
24613191 16556+ + sizeof(int)), PAGE_SIZE);
7f9d2ee0 16557+ pages_to_get = request + extra_pages - swapextents.size;
24613191 16558+
16559+ if (pages_to_get < 1)
16560+ return 0;
16561+
ad8f4a28 16562+ for (i = 0; i < MAX_SWAPFILES; i++) {
24613191 16563+ struct swap_info_struct *si = get_swap_info_struct(i);
4e97e4e9 16564+ to_add[i] = 0;
16565+ if (!si->bdev)
16566+ continue;
16567+ devinfo[i].bdev = si->bdev;
16568+ devinfo[i].dev_t = si->bdev->bd_dev;
24613191 16569+ devinfo[i].bmap_shift = 3;
16570+ devinfo[i].blocks_per_page = 1;
24613191 16571+ }
16572+
ad8f4a28 16573+ for (i = 0; i < pages_to_get; i++) {
24613191 16574+ swp_entry_t entry;
16575+ unsigned long new_value;
16576+ unsigned swapfilenum;
16577+
16578+ entry = get_swap_page();
e8d0ad9d 16579+ if (!entry.val)
24613191 16580+ break;
24613191 16581+
16582+ swapfilenum = swp_type(entry);
16583+ new_value = swap_entry_to_extent_val(entry);
e8d0ad9d 16584+
4e97e4e9 16585+ if (!to_add[swapfilenum]) {
16586+ to_add[swapfilenum] = 1;
e8d0ad9d 16587+ extent_min[swapfilenum] = new_value;
16588+ extent_max[swapfilenum] = new_value;
16589+ gotten++;
16590+ continue;
16591+ }
16592+
16593+ if (new_value == extent_max[swapfilenum] + 1) {
16594+ extent_max[swapfilenum]++;
16595+ gotten++;
16596+ continue;
16597+ }
16598+
4e97e4e9 16599+ if (toi_add_to_extent_chain(&swapextents,
24613191 16600+ extent_min[swapfilenum],
e8d0ad9d 16601+ extent_max[swapfilenum])) {
ad8f4a28
AM
16602+ printk(KERN_INFO "Failed to allocate extent for "
16603+ "%lu-%lu.\n", extent_min[swapfilenum],
16604+ extent_max[swapfilenum]);
e8d0ad9d 16605+ free_swap_range(extent_min[swapfilenum],
24613191 16606+ extent_max[swapfilenum]);
e8d0ad9d 16607+ swap_free(entry);
16608+ gotten -= (extent_max[swapfilenum] -
4e97e4e9 16609+ extent_min[swapfilenum] + 1);
ad8f4a28
AM
16610+ /* Don't try to add again below */
16611+ to_add[swapfilenum] = 0;
e8d0ad9d 16612+ break;
16613+ } else {
16614+ extent_min[swapfilenum] = new_value;
16615+ extent_max[swapfilenum] = new_value;
16616+ gotten++;
24613191 16617+ }
16618+ }
16619+
4e97e4e9 16620+ for (i = 0; i < MAX_SWAPFILES; i++) {
16621+ if (!to_add[i] || !toi_add_to_extent_chain(&swapextents,
16622+ extent_min[i], extent_max[i]))
16623+ continue;
16624+
16625+ free_swap_range(extent_min[i], extent_max[i]);
16626+ gotten -= (extent_max[i] - extent_min[i] + 1);
16627+ break;
16628+ }
24613191 16629+
e8d0ad9d 16630+ if (gotten < pages_to_get)
16631+ result = -ENOSPC;
16632+
7f9d2ee0 16633+ swap_pages_allocated += (long) gotten;
4e97e4e9 16634+
16635+ return result ? result : get_main_pool_phys_params();
24613191 16636+}
16637+
4e97e4e9 16638+static int toi_swap_write_header_init(void)
24613191 16639+{
16640+ int i, result;
16641+ struct swap_info_struct *si;
16642+
7f9d2ee0 16643+ toi_bio_ops.rw_init(WRITE, 0);
4e97e4e9 16644+ toi_writer_buffer_posn = 0;
24613191 16645+
16646+ /* Info needed to bootstrap goes at the start of the header.
16647+ * First we save the positions and devinfo, including the number
16648+ * of header pages. Then we save the structs containing data needed
16649+ * for reading the header pages back.
16650+ * Note that even if header pages take more than one page, when we
16651+ * read back the info, we will have restored the location of the
16652+ * next header page by the time we go to use it.
16653+ */
16654+
7f9d2ee0 16655+ result = toi_bio_ops.rw_header_chunk(WRITE, &toi_swapops,
16656+ (char *) &no_image_signature_contents,
16657+ sizeof(struct tuxonice_sig_data));
16658+
16659+ if (result)
16660+ return result;
16661+
24613191 16662+ /* Forward one page will be done prior to the read */
16663+ for (i = 0; i < MAX_SWAPFILES; i++) {
16664+ si = get_swap_info_struct(i);
16665+ if (si->swap_file)
16666+ devinfo[i].dev_t = si->bdev->bd_dev;
16667+ else
16668+ devinfo[i].dev_t = (dev_t) 0;
16669+ }
16670+
ad8f4a28
AM
16671+ result = toi_bio_ops.rw_header_chunk(WRITE, &toi_swapops,
16672+ (char *) &toi_writer_posn_save,
16673+ sizeof(toi_writer_posn_save));
16674+
16675+ if (result)
24613191 16676+ return result;
16677+
ad8f4a28
AM
16678+ result = toi_bio_ops.rw_header_chunk(WRITE, &toi_swapops,
16679+ (char *) &devinfo, sizeof(devinfo));
16680+
16681+ if (result)
24613191 16682+ return result;
16683+
ad8f4a28 16684+ for (i = 0; i < MAX_SWAPFILES; i++)
4e97e4e9 16685+ toi_serialise_extent_chain(&toi_swapops, &block_chain[i]);
24613191 16686+
16687+ return 0;
16688+}
16689+
4e97e4e9 16690+static int toi_swap_write_header_cleanup(void)
24613191 16691+{
24613191 16692+ /* Write any unsaved data */
4e97e4e9 16693+ if (toi_writer_buffer_posn)
16694+ toi_bio_ops.write_header_chunk_finish();
24613191 16695+
4e97e4e9 16696+ toi_bio_ops.finish_all_io();
24613191 16697+
7f9d2ee0 16698+ /* Set signature to save we have an image */
16699+ return write_modified_signature(IMAGE_SIGNATURE);
24613191 16700+}
16701+
16702+/* ------------------------- HEADER READING ------------------------- */
16703+
16704+/*
16705+ * read_header_init()
ad8f4a28 16706+ *
24613191 16707+ * Description:
4e97e4e9 16708+ * 1. Attempt to read the device specified with resume=.
24613191 16709+ * 2. Check the contents of the swap header for our signature.
16710+ * 3. Warn, ignore, reset and/or continue as appropriate.
4e97e4e9 16711+ * 4. If continuing, read the toi_swap configuration section
24613191 16712+ * of the header and set up block device info so we can read
16713+ * the rest of the header & image.
16714+ *
16715+ * Returns:
16716+ * May not return if user choose to reboot at a warning.
16717+ * -EINVAL if cannot resume at this time. Booting should continue
16718+ * normally.
16719+ */
16720+
4e97e4e9 16721+static int toi_swap_read_header_init(void)
24613191 16722+{
16723+ int i, result = 0;
7f9d2ee0 16724+ toi_writer_buffer_posn = 0;
24613191 16725+
24613191 16726+ if (!header_dev_t) {
ad8f4a28 16727+ printk(KERN_INFO "read_header_init called when we haven't "
24613191 16728+ "verified there is an image!\n");
16729+ return -EINVAL;
16730+ }
16731+
ad8f4a28
AM
16732+ /*
16733+ * If the header is not on the resume_swap_dev_t, get the resume device
16734+ * first.
24613191 16735+ */
16736+ if (header_dev_t != resume_swap_dev_t) {
16737+ header_block_device = open_bdev(MAX_SWAPFILES + 1,
16738+ header_dev_t, 1);
16739+
16740+ if (IS_ERR(header_block_device))
16741+ return PTR_ERR(header_block_device);
16742+ } else
16743+ header_block_device = resume_block_device;
16744+
7f9d2ee0 16745+ toi_bio_ops.read_header_init();
16746+
ad8f4a28 16747+ /*
4e97e4e9 16748+ * Read toi_swap configuration.
24613191 16749+ * Headerblock size taken into account already.
16750+ */
7f9d2ee0 16751+ result = toi_bio_ops.bdev_page_io(READ, header_block_device,
24613191 16752+ headerblock << 3,
4e97e4e9 16753+ virt_to_page((unsigned long) toi_writer_buffer));
7f9d2ee0 16754+ if (result)
16755+ return result;
16756+
16757+ memcpy(&no_image_signature_contents, toi_writer_buffer,
16758+ sizeof(no_image_signature_contents));
24613191 16759+
7f9d2ee0 16760+ toi_writer_buffer_posn = sizeof(no_image_signature_contents);
24613191 16761+
7f9d2ee0 16762+ memcpy(&toi_writer_posn_save, toi_writer_buffer + toi_writer_buffer_posn,
16763+ sizeof(toi_writer_posn_save));
16764+
16765+ toi_writer_buffer_posn += sizeof(toi_writer_posn_save);
24613191 16766+
ad8f4a28
AM
16767+ memcpy(&devinfo, toi_writer_buffer + toi_writer_buffer_posn,
16768+ sizeof(devinfo));
24613191 16769+
4e97e4e9 16770+ toi_writer_buffer_posn += sizeof(devinfo);
24613191 16771+
16772+ /* Restore device info */
16773+ for (i = 0; i < MAX_SWAPFILES; i++) {
16774+ dev_t thisdevice = devinfo[i].dev_t;
16775+ struct block_device *result;
16776+
16777+ devinfo[i].bdev = NULL;
16778+
16779+ if (!thisdevice)
16780+ continue;
16781+
16782+ if (thisdevice == resume_swap_dev_t) {
16783+ devinfo[i].bdev = resume_block_device;
24613191 16784+ continue;
16785+ }
16786+
16787+ if (thisdevice == header_dev_t) {
16788+ devinfo[i].bdev = header_block_device;
24613191 16789+ continue;
16790+ }
16791+
16792+ result = open_bdev(i, thisdevice, 1);
e8d0ad9d 16793+ if (IS_ERR(result))
24613191 16794+ return PTR_ERR(result);
4e97e4e9 16795+ devinfo[i].bdev = bdevs_opened[i]->bdev;
24613191 16796+ }
16797+
4e97e4e9 16798+ toi_extent_state_goto_start(&toi_writer_posn);
16799+ toi_bio_ops.set_extra_page_forward();
24613191 16800+
16801+ for (i = 0; i < MAX_SWAPFILES && !result; i++)
4e97e4e9 16802+ result = toi_load_extent_chain(&block_chain[i]);
24613191 16803+
16804+ return result;
16805+}
16806+
4e97e4e9 16807+static int toi_swap_read_header_cleanup(void)
24613191 16808+{
4e97e4e9 16809+ toi_bio_ops.rw_cleanup(READ);
24613191 16810+ return 0;
16811+}
16812+
24613191 16813+/*
16814+ * workspace_size
16815+ *
16816+ * Description:
16817+ * Returns the number of bytes of RAM needed for this
16818+ * code to do its work. (Used when calculating whether
4e97e4e9 16819+ * we have enough memory to be able to hibernate & resume).
24613191 16820+ *
16821+ */
4e97e4e9 16822+static int toi_swap_memory_needed(void)
24613191 16823+{
16824+ return 1;
16825+}
16826+
16827+/*
16828+ * Print debug info
16829+ *
16830+ * Description:
16831+ */
4e97e4e9 16832+static int toi_swap_print_debug_stats(char *buffer, int size)
24613191 16833+{
16834+ int len = 0;
16835+ struct sysinfo sysinfo;
ad8f4a28 16836+
4e97e4e9 16837+ if (toiActiveAllocator != &toi_swapops) {
ad8f4a28
AM
16838+ len = snprintf_used(buffer, size,
16839+ "- SwapAllocator inactive.\n");
24613191 16840+ return len;
16841+ }
16842+
16843+ len = snprintf_used(buffer, size, "- SwapAllocator active.\n");
16844+ if (swapfilename[0])
ad8f4a28
AM
16845+ len += snprintf_used(buffer+len, size-len,
16846+ " Attempting to automatically swapon: %s.\n",
16847+ swapfilename);
24613191 16848+
16849+ si_swapinfo(&sysinfo);
ad8f4a28
AM
16850+
16851+ len += snprintf_used(buffer+len, size-len,
16852+ " Swap available for image: %ld pages.\n",
4e97e4e9 16853+ (int) sysinfo.freeswap + toi_swap_storage_allocated());
24613191 16854+
16855+ return len;
16856+}
16857+
16858+/*
16859+ * Storage needed
16860+ *
16861+ * Returns amount of space in the swap header required
4e97e4e9 16862+ * for the toi_swap's data. This ignores the links between
24613191 16863+ * pages, which we factor in when allocating the space.
16864+ *
16865+ * We ensure the space is allocated, but actually save the
16866+ * data from write_header_init and therefore don't also define a
16867+ * save_config_info routine.
16868+ */
4e97e4e9 16869+static int toi_swap_storage_needed(void)
24613191 16870+{
16871+ int i, result;
4e97e4e9 16872+ result = sizeof(toi_writer_posn_save) + sizeof(devinfo);
24613191 16873+
16874+ for (i = 0; i < MAX_SWAPFILES; i++) {
16875+ result += 3 * sizeof(int);
ad8f4a28 16876+ result += (2 * sizeof(unsigned long) *
24613191 16877+ block_chain[i].num_extents);
16878+ }
16879+
16880+ return result;
16881+}
16882+
16883+/*
16884+ * Image_exists
7f9d2ee0 16885+ *
16886+ * Returns -1 if don't know, otherwise 0 (no) or 1 (yes).
24613191 16887+ */
7f9d2ee0 16888+static int toi_swap_image_exists(int quiet)
24613191 16889+{
16890+ int signature_found;
ad8f4a28 16891+
24613191 16892+ if (!resume_swap_dev_t) {
7f9d2ee0 16893+ if (!quiet)
16894+ printk(KERN_INFO "Not even trying to read header "
24613191 16895+ "because resume_swap_dev_t is not set.\n");
7f9d2ee0 16896+ return -1;
24613191 16897+ }
ad8f4a28 16898+
24613191 16899+ if (!resume_block_device &&
16900+ IS_ERR(resume_block_device =
16901+ open_bdev(MAX_SWAPFILES, resume_swap_dev_t, 1))) {
7f9d2ee0 16902+ if (!quiet)
16903+ printk(KERN_INFO "Failed to open resume dev_t (%x).\n",
ad8f4a28 16904+ resume_swap_dev_t);
7f9d2ee0 16905+ return -1;
24613191 16906+ }
16907+
7f9d2ee0 16908+ signature_found = parse_signature();
24613191 16909+
7f9d2ee0 16910+ switch (signature_found) {
16911+ case -ENOMEM:
16912+ return -1;
16913+ case -1:
16914+ if (!quiet)
16915+ printk(KERN_ERR "TuxOnIce: Unable to find a signature."
16916+ " Could you have moved a swap file?\n");
16917+ return -1;
16918+ case 0:
16919+ case 1:
16920+ if (!quiet)
16921+ printk(KERN_INFO "TuxOnIce: Normal swapspace found.\n");
24613191 16922+ return 0;
7f9d2ee0 16923+ case 2:
16924+ case 3:
16925+ case 4:
16926+ if (!quiet)
16927+ printk(KERN_INFO "TuxOnIce: Detected another "
16928+ "implementation's signature.\n");
24613191 16929+ return 0;
7f9d2ee0 16930+ case 10:
16931+ if (!quiet)
16932+ printk(KERN_INFO "TuxOnIce: Detected TuxOnIce binary "
16933+ "signature.\n");
16934+ return 1;
24613191 16935+ }
16936+
7f9d2ee0 16937+ BUG();
16938+ return 0;
16939+}
16940+
16941+/* toi_swap_remove_image
16942+ *
16943+ */
16944+static int toi_swap_remove_image(void)
16945+{
16946+ /*
16947+ * If nr_hibernates == 0, we must be booting, so no swap pages
16948+ * will be recorded as used yet.
16949+ */
16950+
16951+ if (nr_hibernates)
16952+ toi_swap_release_storage();
16953+
16954+ /*
16955+ * We don't do a sanity check here: we want to restore the swap
16956+ * whatever version of kernel made the hibernate image.
16957+ *
16958+ * We need to write swap, but swap may not be enabled so
16959+ * we write the device directly
16960+ *
16961+ * If we don't have an current_signature_page, we didn't
16962+ * read an image header, so don't change anything.
16963+ */
16964+
16965+ return toi_swap_image_exists(1) ?
16966+ write_modified_signature(NO_IMAGE_SIGNATURE) : 0;
24613191 16967+}
16968+
16969+/*
16970+ * Mark resume attempted.
16971+ *
7f9d2ee0 16972+ * Record that we tried to resume from this image. We have already read the
16973+ * signature in. We just need to write the modified version.
24613191 16974+ */
7f9d2ee0 16975+static int toi_swap_mark_resume_attempted(int mark)
24613191 16976+{
24613191 16977+ if (!resume_swap_dev_t) {
ad8f4a28 16978+ printk(KERN_INFO "Not even trying to record attempt at resuming"
24613191 16979+ " because resume_swap_dev_t is not set.\n");
7f9d2ee0 16980+ return -ENODEV;
24613191 16981+ }
ad8f4a28 16982+
7f9d2ee0 16983+ return write_modified_signature(mark ? TRIED_RESUME : NO_TRIED_RESUME);
24613191 16984+}
16985+
16986+/*
16987+ * Parse Image Location
16988+ *
4e97e4e9 16989+ * Attempt to parse a resume= parameter.
24613191 16990+ * Swap Writer accepts:
4e97e4e9 16991+ * resume=swap:DEVNAME[:FIRSTBLOCK][@BLOCKSIZE]
24613191 16992+ *
16993+ * Where:
16994+ * DEVNAME is convertable to a dev_t by name_to_dev_t
16995+ * FIRSTBLOCK is the location of the first block in the swap file
16996+ * (specifying for a swap partition is nonsensical but not prohibited).
16997+ * Data is validated by attempting to read a swap header from the
4e97e4e9 16998+ * location given. Failure will result in toi_swap refusing to
24613191 16999+ * save an image, and a reboot with correct parameters will be
17000+ * necessary.
17001+ */
4e97e4e9 17002+static int toi_swap_parse_sig_location(char *commandline,
24613191 17003+ int only_allocator, int quiet)
17004+{
4e97e4e9 17005+ char *thischar, *devstart, *colon = NULL;
24613191 17006+ int signature_found, result = -EINVAL, temp_result;
17007+
17008+ if (strncmp(commandline, "swap:", 5)) {
ad8f4a28 17009+ /*
24613191 17010+ * Failing swap:, we'll take a simple
4e97e4e9 17011+ * resume=/dev/hda2, but fall through to
24613191 17012+ * other allocators if /dev/ isn't matched.
17013+ */
17014+ if (strncmp(commandline, "/dev/", 5))
17015+ return 1;
17016+ } else
17017+ commandline += 5;
17018+
17019+ devstart = thischar = commandline;
17020+ while ((*thischar != ':') && (*thischar != '@') &&
17021+ ((thischar - commandline) < 250) && (*thischar))
17022+ thischar++;
17023+
17024+ if (*thischar == ':') {
17025+ colon = thischar;
17026+ *colon = 0;
17027+ thischar++;
17028+ }
17029+
4e97e4e9 17030+ while ((thischar - commandline) < 250 && *thischar)
24613191 17031+ thischar++;
17032+
24613191 17033+ if (colon)
17034+ resume_firstblock = (int) simple_strtoul(colon + 1, NULL, 0);
17035+ else
17036+ resume_firstblock = 0;
17037+
4e97e4e9 17038+ clear_toi_state(TOI_CAN_HIBERNATE);
17039+ clear_toi_state(TOI_CAN_RESUME);
ad8f4a28 17040+
24613191 17041+ temp_result = try_to_parse_resume_device(devstart, quiet);
17042+
17043+ if (colon)
17044+ *colon = ':';
24613191 17045+
17046+ if (temp_result)
17047+ return -EINVAL;
17048+
7f9d2ee0 17049+ signature_found = toi_swap_image_exists(quiet);
24613191 17050+
17051+ if (signature_found != -1) {
24613191 17052+ result = 0;
17053+
4e97e4e9 17054+ toi_bio_ops.set_devinfo(devinfo);
17055+ toi_writer_posn.chains = &block_chain[0];
17056+ toi_writer_posn.num_chains = MAX_SWAPFILES;
17057+ set_toi_state(TOI_CAN_HIBERNATE);
17058+ set_toi_state(TOI_CAN_RESUME);
24613191 17059+ } else
17060+ if (!quiet)
4e97e4e9 17061+ printk(KERN_ERR "TuxOnIce: SwapAllocator: No swap "
17062+ "signature found at %s.\n", devstart);
24613191 17063+ return result;
24613191 17064+}
17065+
17066+static int header_locations_read_sysfs(const char *page, int count)
17067+{
17068+ int i, printedpartitionsmessage = 0, len = 0, haveswap = 0;
17069+ struct inode *swapf = 0;
17070+ int zone;
4e97e4e9 17071+ char *path_page = (char *) toi_get_free_page(10, GFP_KERNEL);
24613191 17072+ char *path, *output = (char *) page;
17073+ int path_len;
ad8f4a28 17074+
24613191 17075+ if (!page)
17076+ return 0;
17077+
17078+ for (i = 0; i < MAX_SWAPFILES; i++) {
17079+ struct swap_info_struct *si = get_swap_info_struct(i);
17080+
17081+ if (!si->swap_file)
17082+ continue;
ad8f4a28 17083+
24613191 17084+ if (S_ISBLK(si->swap_file->f_mapping->host->i_mode)) {
17085+ haveswap = 1;
17086+ if (!printedpartitionsmessage) {
ad8f4a28 17087+ len += sprintf(output + len,
24613191 17088+ "For swap partitions, simply use the "
4e97e4e9 17089+ "format: resume=swap:/dev/hda1.\n");
24613191 17090+ printedpartitionsmessage = 1;
17091+ }
17092+ } else {
17093+ path_len = 0;
ad8f4a28 17094+
7f9d2ee0 17095+ path = d_path(&si->swap_file->f_path, path_page,
17096+ PAGE_SIZE);
24613191 17097+ path_len = snprintf(path_page, 31, "%s", path);
ad8f4a28 17098+
24613191 17099+ haveswap = 1;
17100+ swapf = si->swap_file->f_mapping->host;
ad8f4a28
AM
17101+ zone = bmap(swapf, 0);
17102+ if (!zone) {
17103+ len += sprintf(output + len,
24613191 17104+ "Swapfile %s has been corrupted. Reuse"
17105+ " mkswap on it and try again.\n",
17106+ path_page);
17107+ } else {
17108+ char name_buffer[255];
ad8f4a28
AM
17109+ len += sprintf(output + len,
17110+ "For swapfile `%s`,"
4e97e4e9 17111+ " use resume=swap:/dev/%s:0x%x.\n",
24613191 17112+ path_page,
17113+ bdevname(si->bdev, name_buffer),
17114+ zone << (swapf->i_blkbits - 9));
4e97e4e9 17115+ }
4e97e4e9 17116+ }
17117+ }
ad8f4a28 17118+
4e97e4e9 17119+ if (!haveswap)
17120+ len = sprintf(output, "You need to turn on swap partitions "
17121+ "before examining this file.\n");
24613191 17122+
ad8f4a28 17123+ toi_free_page(10, (unsigned long) path_page);
4e97e4e9 17124+ return len;
73c609d5 17125+}
24613191 17126+
4e97e4e9 17127+static struct toi_sysfs_data sysfs_params[] = {
17128+ {
17129+ TOI_ATTR("swapfilename", SYSFS_RW),
17130+ SYSFS_STRING(swapfilename, 255, 0)
17131+ },
24613191 17132+
4e97e4e9 17133+ {
17134+ TOI_ATTR("headerlocations", SYSFS_READONLY),
17135+ SYSFS_CUSTOM(header_locations_read_sysfs, NULL, 0)
17136+ },
24613191 17137+
4e97e4e9 17138+ { TOI_ATTR("enabled", SYSFS_RW),
17139+ SYSFS_INT(&toi_swapops.enabled, 0, 1, 0),
17140+ .write_side_effect = attempt_to_parse_resume_device2,
24613191 17141+ }
4e97e4e9 17142+};
24613191 17143+
4e97e4e9 17144+static struct toi_module_ops toi_swapops = {
17145+ .type = WRITER_MODULE,
17146+ .name = "swap storage",
17147+ .directory = "swap",
17148+ .module = THIS_MODULE,
17149+ .memory_needed = toi_swap_memory_needed,
17150+ .print_debug_info = toi_swap_print_debug_stats,
17151+ .storage_needed = toi_swap_storage_needed,
17152+ .initialise = toi_swap_initialise,
17153+ .cleanup = toi_swap_cleanup,
17154+
17155+ .noresume_reset = toi_swap_noresume_reset,
17156+ .storage_available = toi_swap_storage_available,
17157+ .storage_allocated = toi_swap_storage_allocated,
17158+ .release_storage = toi_swap_release_storage,
7f9d2ee0 17159+ .reserve_header_space = toi_swap_reserve_header_space,
4e97e4e9 17160+ .allocate_storage = toi_swap_allocate_storage,
17161+ .image_exists = toi_swap_image_exists,
17162+ .mark_resume_attempted = toi_swap_mark_resume_attempted,
17163+ .write_header_init = toi_swap_write_header_init,
17164+ .write_header_cleanup = toi_swap_write_header_cleanup,
17165+ .read_header_init = toi_swap_read_header_init,
17166+ .read_header_cleanup = toi_swap_read_header_cleanup,
17167+ .remove_image = toi_swap_remove_image,
17168+ .parse_sig_location = toi_swap_parse_sig_location,
24613191 17169+
4e97e4e9 17170+ .sysfs_data = sysfs_params,
ad8f4a28
AM
17171+ .num_sysfs_entries = sizeof(sysfs_params) /
17172+ sizeof(struct toi_sysfs_data),
4e97e4e9 17173+};
24613191 17174+
4e97e4e9 17175+/* ---- Registration ---- */
17176+static __init int toi_swap_load(void)
17177+{
17178+ toi_swapops.rw_init = toi_bio_ops.rw_init;
17179+ toi_swapops.rw_cleanup = toi_bio_ops.rw_cleanup;
17180+ toi_swapops.read_page = toi_bio_ops.read_page;
17181+ toi_swapops.write_page = toi_bio_ops.write_page;
17182+ toi_swapops.rw_header_chunk = toi_bio_ops.rw_header_chunk;
7f9d2ee0 17183+ toi_swapops.rw_header_chunk_noreadahead =
17184+ toi_bio_ops.rw_header_chunk_noreadahead;
17185+ toi_swapops.io_flusher = toi_bio_ops.io_flusher;
24613191 17186+
4e97e4e9 17187+ return toi_register_module(&toi_swapops);
17188+}
24613191 17189+
4e97e4e9 17190+#ifdef MODULE
17191+static __exit void toi_swap_unload(void)
17192+{
17193+ toi_unregister_module(&toi_swapops);
73c609d5 17194+}
24613191 17195+
4e97e4e9 17196+module_init(toi_swap_load);
17197+module_exit(toi_swap_unload);
17198+MODULE_LICENSE("GPL");
17199+MODULE_AUTHOR("Nigel Cunningham");
17200+MODULE_DESCRIPTION("TuxOnIce SwapAllocator");
17201+#else
17202+late_initcall(toi_swap_load);
17203+#endif
17204diff --git a/kernel/power/tuxonice_sysfs.c b/kernel/power/tuxonice_sysfs.c
17205new file mode 100644
7f9d2ee0 17206index 0000000..f5ab465
4e97e4e9 17207--- /dev/null
17208+++ b/kernel/power/tuxonice_sysfs.c
7f9d2ee0 17209@@ -0,0 +1,337 @@
4e97e4e9 17210+/*
17211+ * kernel/power/tuxonice_sysfs.c
17212+ *
17213+ * Copyright (C) 2002-2007 Nigel Cunningham (nigel at tuxonice net)
24613191 17214+ *
4e97e4e9 17215+ * This file is released under the GPLv2.
24613191 17216+ *
4e97e4e9 17217+ * This file contains support for sysfs entries for tuning TuxOnIce.
73c609d5 17218+ *
4e97e4e9 17219+ * We have a generic handler that deals with the most common cases, and
17220+ * hooks for special handlers to use.
24613191 17221+ */
24613191 17222+
4e97e4e9 17223+#include <linux/suspend.h>
17224+#include <linux/module.h>
17225+#include <asm/uaccess.h>
24613191 17226+
4e97e4e9 17227+#include "tuxonice_sysfs.h"
17228+#include "tuxonice.h"
17229+#include "tuxonice_storage.h"
ad8f4a28 17230+#include "tuxonice_alloc.h"
24613191 17231+
ad8f4a28 17232+static int toi_sysfs_initialised;
24613191 17233+
4e97e4e9 17234+static void toi_initialise_sysfs(void);
24613191 17235+
4e97e4e9 17236+static struct toi_sysfs_data sysfs_params[];
24613191 17237+
4e97e4e9 17238+#define to_sysfs_data(_attr) container_of(_attr, struct toi_sysfs_data, attr)
24613191 17239+
4e97e4e9 17240+static void toi_main_wrapper(void)
24613191 17241+{
4e97e4e9 17242+ _toi_try_hibernate(0);
24613191 17243+}
17244+
4e97e4e9 17245+static ssize_t toi_attr_show(struct kobject *kobj, struct attribute *attr,
17246+ char *page)
24613191 17247+{
4e97e4e9 17248+ struct toi_sysfs_data *sysfs_data = to_sysfs_data(attr);
17249+ int len = 0;
24613191 17250+
4e97e4e9 17251+ if (toi_start_anything(0))
17252+ return -EBUSY;
17253+
17254+ if (sysfs_data->flags & SYSFS_NEEDS_SM_FOR_READ)
17255+ toi_prepare_usm();
ad8f4a28 17256+
4e97e4e9 17257+ switch (sysfs_data->type) {
ad8f4a28
AM
17258+ case TOI_SYSFS_DATA_CUSTOM:
17259+ len = (sysfs_data->data.special.read_sysfs) ?
17260+ (sysfs_data->data.special.read_sysfs)(page, PAGE_SIZE)
17261+ : 0;
17262+ break;
17263+ case TOI_SYSFS_DATA_BIT:
17264+ len = sprintf(page, "%d\n",
17265+ -test_bit(sysfs_data->data.bit.bit,
17266+ sysfs_data->data.bit.bit_vector));
17267+ break;
17268+ case TOI_SYSFS_DATA_INTEGER:
17269+ len = sprintf(page, "%d\n",
17270+ *(sysfs_data->data.integer.variable));
17271+ break;
17272+ case TOI_SYSFS_DATA_LONG:
17273+ len = sprintf(page, "%ld\n",
17274+ *(sysfs_data->data.a_long.variable));
17275+ break;
17276+ case TOI_SYSFS_DATA_UL:
17277+ len = sprintf(page, "%lu\n",
17278+ *(sysfs_data->data.ul.variable));
17279+ break;
17280+ case TOI_SYSFS_DATA_STRING:
17281+ len = sprintf(page, "%s\n",
17282+ sysfs_data->data.string.variable);
17283+ break;
73c609d5 17284+ }
4e97e4e9 17285+ /* Side effect routine? */
17286+ if (sysfs_data->read_side_effect)
17287+ sysfs_data->read_side_effect();
24613191 17288+
4e97e4e9 17289+ if (sysfs_data->flags & SYSFS_NEEDS_SM_FOR_READ)
17290+ toi_cleanup_usm();
24613191 17291+
4e97e4e9 17292+ toi_finish_anything(0);
24613191 17293+
4e97e4e9 17294+ return len;
17295+}
24613191 17296+
4e97e4e9 17297+#define BOUND(_variable, _type) \
ad8f4a28 17298+ do { \
4e97e4e9 17299+ if (*_variable < sysfs_data->data._type.minimum) \
17300+ *_variable = sysfs_data->data._type.minimum; \
17301+ else if (*_variable > sysfs_data->data._type.maximum) \
ad8f4a28
AM
17302+ *_variable = sysfs_data->data._type.maximum; \
17303+ } while (0)
24613191 17304+
4e97e4e9 17305+static ssize_t toi_attr_store(struct kobject *kobj, struct attribute *attr,
17306+ const char *my_buf, size_t count)
17307+{
17308+ int assigned_temp_buffer = 0, result = count;
17309+ struct toi_sysfs_data *sysfs_data = to_sysfs_data(attr);
17310+
17311+ if (toi_start_anything((sysfs_data->flags & SYSFS_HIBERNATE_OR_RESUME)))
17312+ return -EBUSY;
17313+
17314+ ((char *) my_buf)[count] = 0;
17315+
17316+ if (sysfs_data->flags & SYSFS_NEEDS_SM_FOR_WRITE)
17317+ toi_prepare_usm();
17318+
17319+ switch (sysfs_data->type) {
ad8f4a28
AM
17320+ case TOI_SYSFS_DATA_CUSTOM:
17321+ if (sysfs_data->data.special.write_sysfs)
17322+ result = (sysfs_data->data.special.write_sysfs)
17323+ (my_buf, count);
17324+ break;
17325+ case TOI_SYSFS_DATA_BIT:
17326+ {
17327+ int value = simple_strtoul(my_buf, NULL, 0);
17328+ if (value)
17329+ set_bit(sysfs_data->data.bit.bit,
17330+ (sysfs_data->data.bit.bit_vector));
17331+ else
17332+ clear_bit(sysfs_data->data.bit.bit,
17333+ (sysfs_data->data.bit.bit_vector));
17334+ }
17335+ break;
17336+ case TOI_SYSFS_DATA_INTEGER:
17337+ {
17338+ int *variable =
17339+ sysfs_data->data.integer.variable;
17340+ *variable = simple_strtol(my_buf, NULL, 0);
17341+ BOUND(variable, integer);
73c609d5 17342+ break;
ad8f4a28
AM
17343+ }
17344+ case TOI_SYSFS_DATA_LONG:
17345+ {
17346+ long *variable =
17347+ sysfs_data->data.a_long.variable;
17348+ *variable = simple_strtol(my_buf, NULL, 0);
17349+ BOUND(variable, a_long);
4e97e4e9 17350+ break;
ad8f4a28
AM
17351+ }
17352+ case TOI_SYSFS_DATA_UL:
17353+ {
17354+ unsigned long *variable =
17355+ sysfs_data->data.ul.variable;
17356+ *variable = simple_strtoul(my_buf, NULL, 0);
17357+ BOUND(variable, ul);
4e97e4e9 17358+ break;
ad8f4a28
AM
17359+ }
17360+ break;
17361+ case TOI_SYSFS_DATA_STRING:
17362+ {
17363+ int copy_len = count;
17364+ char *variable =
17365+ sysfs_data->data.string.variable;
17366+
17367+ if (sysfs_data->data.string.max_length &&
17368+ (copy_len > sysfs_data->data.string.max_length))
17369+ copy_len = sysfs_data->data.string.max_length;
17370+
17371+ if (!variable) {
17372+ variable = (char *) toi_get_zeroed_page(31,
17373+ TOI_ATOMIC_GFP);
17374+ sysfs_data->data.string.variable = variable;
17375+ assigned_temp_buffer = 1;
4e97e4e9 17376+ }
ad8f4a28
AM
17377+ strncpy(variable, my_buf, copy_len);
17378+ if ((copy_len) &&
17379+ (my_buf[copy_len - 1] == '\n'))
17380+ variable[count - 1] = 0;
17381+ variable[count] = 0;
17382+ }
17383+ break;
73c609d5 17384+ }
24613191 17385+
4e97e4e9 17386+ /* Side effect routine? */
17387+ if (sysfs_data->write_side_effect)
17388+ sysfs_data->write_side_effect();
24613191 17389+
4e97e4e9 17390+ /* Free temporary buffers */
17391+ if (assigned_temp_buffer) {
ad8f4a28
AM
17392+ toi_free_page(31,
17393+ (unsigned long) sysfs_data->data.string.variable);
4e97e4e9 17394+ sysfs_data->data.string.variable = NULL;
73c609d5 17395+ }
24613191 17396+
4e97e4e9 17397+ if (sysfs_data->flags & SYSFS_NEEDS_SM_FOR_WRITE)
17398+ toi_cleanup_usm();
24613191 17399+
4e97e4e9 17400+ toi_finish_anything(sysfs_data->flags & SYSFS_HIBERNATE_OR_RESUME);
24613191 17401+
4e97e4e9 17402+ return result;
24613191 17403+}
17404+
4e97e4e9 17405+static struct sysfs_ops toi_sysfs_ops = {
17406+ .show = &toi_attr_show,
17407+ .store = &toi_attr_store,
17408+};
17409+
17410+static struct kobj_type toi_ktype = {
17411+ .sysfs_ops = &toi_sysfs_ops,
17412+};
17413+
7f9d2ee0 17414+struct kobject *tuxonice_kobj;
4e97e4e9 17415+
17416+/* Non-module sysfs entries.
17417+ *
17418+ * This array contains entries that are automatically registered at
17419+ * boot. Modules and the console code register their own entries separately.
73c609d5 17420+ *
4e97e4e9 17421+ * NB: If you move do_hibernate, change toi_write_sysfs's test so that
17422+ * toi_start_anything still gets a 1 when the user echos > do_hibernate!
73c609d5 17423+ */
24613191 17424+
4e97e4e9 17425+static struct toi_sysfs_data sysfs_params[] = {
17426+ { TOI_ATTR("do_hibernate", SYSFS_WRITEONLY),
17427+ SYSFS_CUSTOM(NULL, NULL, SYSFS_HIBERNATING),
17428+ .write_side_effect = toi_main_wrapper
17429+ },
73c609d5 17430+
4e97e4e9 17431+ { TOI_ATTR("do_resume", SYSFS_WRITEONLY),
17432+ SYSFS_CUSTOM(NULL, NULL, SYSFS_RESUMING),
17433+ .write_side_effect = __toi_try_resume
17434+ },
73c609d5 17435+
4e97e4e9 17436+};
73c609d5 17437+
4e97e4e9 17438+void remove_toi_sysdir(struct kobject *kobj)
73c609d5 17439+{
4e97e4e9 17440+ if (!kobj)
73c609d5 17441+ return;
24613191 17442+
7f9d2ee0 17443+ kobject_put(kobj);
73c609d5 17444+}
24613191 17445+
4e97e4e9 17446+struct kobject *make_toi_sysdir(char *name)
73c609d5 17447+{
7f9d2ee0 17448+ struct kobject *kobj = kobject_create_and_add(name, tuxonice_kobj);
73c609d5 17449+
ad8f4a28
AM
17450+ if (!kobj) {
17451+ printk(KERN_INFO "TuxOnIce: Can't allocate kobject for sysfs "
17452+ "dir!\n");
4e97e4e9 17453+ return NULL;
17454+ }
73c609d5 17455+
7f9d2ee0 17456+ kobj->ktype = &toi_ktype;
73c609d5 17457+
7f9d2ee0 17458+ return kobj;
73c609d5 17459+}
17460+
4e97e4e9 17461+/* toi_register_sysfs_file
17462+ *
17463+ * Helper for registering a new /sysfs/tuxonice entry.
73c609d5 17464+ */
17465+
4e97e4e9 17466+int toi_register_sysfs_file(
17467+ struct kobject *kobj,
17468+ struct toi_sysfs_data *toi_sysfs_data)
73c609d5 17469+{
4e97e4e9 17470+ int result;
17471+
17472+ if (!toi_sysfs_initialised)
17473+ toi_initialise_sysfs();
17474+
ad8f4a28
AM
17475+ result = sysfs_create_file(kobj, &toi_sysfs_data->attr);
17476+ if (result)
17477+ printk(KERN_INFO "TuxOnIce: sysfs_create_file for %s "
17478+ "returned %d.\n",
4e97e4e9 17479+ toi_sysfs_data->attr.name, result);
7f9d2ee0 17480+ kobj->ktype = &toi_ktype;
4e97e4e9 17481+
17482+ return result;
73c609d5 17483+}
ad8f4a28 17484+EXPORT_SYMBOL_GPL(toi_register_sysfs_file);
73c609d5 17485+
4e97e4e9 17486+/* toi_unregister_sysfs_file
17487+ *
17488+ * Helper for removing unwanted /sys/power/tuxonice entries.
73c609d5 17489+ *
73c609d5 17490+ */
4e97e4e9 17491+void toi_unregister_sysfs_file(struct kobject *kobj,
17492+ struct toi_sysfs_data *toi_sysfs_data)
73c609d5 17493+{
4e97e4e9 17494+ sysfs_remove_file(kobj, &toi_sysfs_data->attr);
17495+}
ad8f4a28 17496+EXPORT_SYMBOL_GPL(toi_unregister_sysfs_file);
24613191 17497+
4e97e4e9 17498+void toi_cleanup_sysfs(void)
17499+{
17500+ int i,
17501+ numfiles = sizeof(sysfs_params) / sizeof(struct toi_sysfs_data);
24613191 17502+
4e97e4e9 17503+ if (!toi_sysfs_initialised)
73c609d5 17504+ return;
24613191 17505+
ad8f4a28 17506+ for (i = 0; i < numfiles; i++)
7f9d2ee0 17507+ toi_unregister_sysfs_file(tuxonice_kobj, &sysfs_params[i]);
24613191 17508+
7f9d2ee0 17509+ kobject_put(tuxonice_kobj);
4e97e4e9 17510+ toi_sysfs_initialised = 0;
73c609d5 17511+}
24613191 17512+
4e97e4e9 17513+/* toi_initialise_sysfs
73c609d5 17514+ *
4e97e4e9 17515+ * Initialise the /sysfs/tuxonice directory.
73c609d5 17516+ */
24613191 17517+
4e97e4e9 17518+static void toi_initialise_sysfs(void)
73c609d5 17519+{
7f9d2ee0 17520+ int i;
4e97e4e9 17521+ int numfiles = sizeof(sysfs_params) / sizeof(struct toi_sysfs_data);
ad8f4a28 17522+
4e97e4e9 17523+ if (toi_sysfs_initialised)
17524+ return;
24613191 17525+
4e97e4e9 17526+ /* Make our TuxOnIce directory a child of /sys/power */
7f9d2ee0 17527+ tuxonice_kobj = kobject_create_and_add("tuxonice", power_kobj);
17528+ if (!tuxonice_kobj)
4e97e4e9 17529+ return;
24613191 17530+
4e97e4e9 17531+ toi_sysfs_initialised = 1;
24613191 17532+
ad8f4a28 17533+ for (i = 0; i < numfiles; i++)
7f9d2ee0 17534+ toi_register_sysfs_file(tuxonice_kobj, &sysfs_params[i]);
4e97e4e9 17535+}
24613191 17536+
4e97e4e9 17537+int toi_sysfs_init(void)
17538+{
17539+ toi_initialise_sysfs();
17540+ return 0;
17541+}
73c609d5 17542+
4e97e4e9 17543+void toi_sysfs_exit(void)
17544+{
17545+ toi_cleanup_sysfs();
17546+}
17547diff --git a/kernel/power/tuxonice_sysfs.h b/kernel/power/tuxonice_sysfs.h
17548new file mode 100644
7f9d2ee0 17549index 0000000..bf56414
4e97e4e9 17550--- /dev/null
17551+++ b/kernel/power/tuxonice_sysfs.h
17552@@ -0,0 +1,127 @@
17553+/*
17554+ * kernel/power/tuxonice_sysfs.h
17555+ *
17556+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
17557+ *
17558+ * This file is released under the GPLv2.
17559+ */
73c609d5 17560+
4e97e4e9 17561+#include <linux/sysfs.h>
17562+#include "power.h"
73c609d5 17563+
4e97e4e9 17564+struct toi_sysfs_data {
17565+ struct attribute attr;
17566+ int type;
17567+ int flags;
17568+ union {
17569+ struct {
17570+ unsigned long *bit_vector;
17571+ int bit;
17572+ } bit;
17573+ struct {
17574+ int *variable;
17575+ int minimum;
17576+ int maximum;
17577+ } integer;
17578+ struct {
17579+ long *variable;
17580+ long minimum;
17581+ long maximum;
17582+ } a_long;
17583+ struct {
17584+ unsigned long *variable;
17585+ unsigned long minimum;
17586+ unsigned long maximum;
17587+ } ul;
17588+ struct {
17589+ char *variable;
17590+ int max_length;
17591+ } string;
17592+ struct {
17593+ int (*read_sysfs) (const char *buffer, int count);
17594+ int (*write_sysfs) (const char *buffer, int count);
17595+ void *data;
17596+ } special;
17597+ } data;
ad8f4a28 17598+
4e97e4e9 17599+ /* Side effects routines. Used, eg, for reparsing the
17600+ * resume= entry when it changes */
17601+ void (*read_side_effect) (void);
ad8f4a28 17602+ void (*write_side_effect) (void);
4e97e4e9 17603+ struct list_head sysfs_data_list;
73c609d5 17604+};
17605+
4e97e4e9 17606+enum {
17607+ TOI_SYSFS_DATA_NONE = 1,
17608+ TOI_SYSFS_DATA_CUSTOM,
17609+ TOI_SYSFS_DATA_BIT,
17610+ TOI_SYSFS_DATA_INTEGER,
17611+ TOI_SYSFS_DATA_UL,
17612+ TOI_SYSFS_DATA_LONG,
17613+ TOI_SYSFS_DATA_STRING
17614+};
73c609d5 17615+
4e97e4e9 17616+#define TOI_ATTR(_name, _mode) \
ad8f4a28 17617+ .attr = {.name = _name , .mode = _mode }
73c609d5 17618+
4e97e4e9 17619+#define SYSFS_BIT(_ul, _bit, _flags) \
17620+ .type = TOI_SYSFS_DATA_BIT, \
17621+ .flags = _flags, \
17622+ .data = { .bit = { .bit_vector = _ul, .bit = _bit } }
73c609d5 17623+
4e97e4e9 17624+#define SYSFS_INT(_int, _min, _max, _flags) \
17625+ .type = TOI_SYSFS_DATA_INTEGER, \
17626+ .flags = _flags, \
17627+ .data = { .integer = { .variable = _int, .minimum = _min, \
17628+ .maximum = _max } }
73c609d5 17629+
4e97e4e9 17630+#define SYSFS_UL(_ul, _min, _max, _flags) \
17631+ .type = TOI_SYSFS_DATA_UL, \
17632+ .flags = _flags, \
17633+ .data = { .ul = { .variable = _ul, .minimum = _min, \
17634+ .maximum = _max } }
73c609d5 17635+
4e97e4e9 17636+#define SYSFS_LONG(_long, _min, _max, _flags) \
17637+ .type = TOI_SYSFS_DATA_LONG, \
17638+ .flags = _flags, \
17639+ .data = { .a_long = { .variable = _long, .minimum = _min, \
17640+ .maximum = _max } }
17641+
17642+#define SYSFS_STRING(_string, _max_len, _flags) \
17643+ .type = TOI_SYSFS_DATA_STRING, \
17644+ .flags = _flags, \
17645+ .data = { .string = { .variable = _string, .max_length = _max_len } }
17646+
17647+#define SYSFS_CUSTOM(_read, _write, _flags) \
17648+ .type = TOI_SYSFS_DATA_CUSTOM, \
17649+ .flags = _flags, \
17650+ .data = { .special = { .read_sysfs = _read, .write_sysfs = _write } }
17651+
17652+#define SYSFS_WRITEONLY 0200
17653+#define SYSFS_READONLY 0444
17654+#define SYSFS_RW 0644
17655+
17656+/* Flags */
17657+#define SYSFS_NEEDS_SM_FOR_READ 1
17658+#define SYSFS_NEEDS_SM_FOR_WRITE 2
17659+#define SYSFS_HIBERNATE 4
17660+#define SYSFS_RESUME 8
17661+#define SYSFS_HIBERNATE_OR_RESUME (SYSFS_HIBERNATE | SYSFS_RESUME)
17662+#define SYSFS_HIBERNATING (SYSFS_HIBERNATE | SYSFS_NEEDS_SM_FOR_WRITE)
17663+#define SYSFS_RESUMING (SYSFS_RESUME | SYSFS_NEEDS_SM_FOR_WRITE)
17664+#define SYSFS_NEEDS_SM_FOR_BOTH \
17665+ (SYSFS_NEEDS_SM_FOR_READ | SYSFS_NEEDS_SM_FOR_WRITE)
17666+
17667+int toi_register_sysfs_file(struct kobject *kobj,
17668+ struct toi_sysfs_data *toi_sysfs_data);
17669+void toi_unregister_sysfs_file(struct kobject *kobj,
17670+ struct toi_sysfs_data *toi_sysfs_data);
17671+
7f9d2ee0 17672+extern struct kobject *tuxonice_kobj;
4e97e4e9 17673+
17674+struct kobject *make_toi_sysdir(char *name);
17675+void remove_toi_sysdir(struct kobject *obj);
17676+extern void toi_cleanup_sysfs(void);
17677+
17678+extern int toi_sysfs_init(void);
17679+extern void toi_sysfs_exit(void);
17680diff --git a/kernel/power/tuxonice_ui.c b/kernel/power/tuxonice_ui.c
17681new file mode 100644
7f9d2ee0 17682index 0000000..125f21e
4e97e4e9 17683--- /dev/null
17684+++ b/kernel/power/tuxonice_ui.c
ad8f4a28 17685@@ -0,0 +1,261 @@
24613191 17686+/*
4e97e4e9 17687+ * kernel/power/tuxonice_ui.c
24613191 17688+ *
4e97e4e9 17689+ * Copyright (C) 1998-2001 Gabor Kuti <seasons@fornax.hu>
17690+ * Copyright (C) 1998,2001,2002 Pavel Machek <pavel@suse.cz>
17691+ * Copyright (C) 2002-2003 Florent Chabaud <fchabaud@free.fr>
17692+ * Copyright (C) 2002-2007 Nigel Cunningham (nigel at tuxonice net)
24613191 17693+ *
17694+ * This file is released under the GPLv2.
17695+ *
4e97e4e9 17696+ * Routines for TuxOnIce's user interface.
17697+ *
17698+ * The user interface code talks to a userspace program via a
17699+ * netlink socket.
17700+ *
17701+ * The kernel side:
17702+ * - starts the userui program;
17703+ * - sends text messages and progress bar status;
17704+ *
17705+ * The user space side:
17706+ * - passes messages regarding user requests (abort, toggle reboot etc)
24613191 17707+ *
17708+ */
17709+
4e97e4e9 17710+#define __KERNEL_SYSCALLS__
17711+
4e97e4e9 17712+#include <linux/reboot.h>
ad8f4a28 17713+
4e97e4e9 17714+#include "tuxonice_sysfs.h"
17715+#include "tuxonice_modules.h"
17716+#include "tuxonice.h"
17717+#include "tuxonice_ui.h"
17718+#include "tuxonice_netlink.h"
17719+#include "tuxonice_power_off.h"
7f9d2ee0 17720+#include "tuxonice_builtin.h"
24613191 17721+
4e97e4e9 17722+static char local_printf_buf[1024]; /* Same as printk - should be safe */
4e97e4e9 17723+struct ui_ops *toi_current_ui;
24613191 17724+
4e97e4e9 17725+/**
17726+ * toi_wait_for_keypress - Wait for keypress via userui or /dev/console.
17727+ *
17728+ * @timeout: Maximum time to wait.
17729+ *
17730+ * Wait for a keypress, either from userui or /dev/console if userui isn't
17731+ * available. The non-userui path is particularly for at boot-time, prior
17732+ * to userui being started, when we have an important warning to give to
17733+ * the user.
17734+ */
17735+static char toi_wait_for_keypress(int timeout)
24613191 17736+{
ad8f4a28
AM
17737+ if (toi_current_ui && toi_current_ui->wait_for_key(timeout))
17738+ return ' ';
24613191 17739+
ad8f4a28 17740+ return toi_wait_for_keypress_dev_console(timeout);
4e97e4e9 17741+}
24613191 17742+
4e97e4e9 17743+/* toi_early_boot_message()
17744+ * Description: Handle errors early in the process of booting.
17745+ * The user may press C to continue booting, perhaps
ad8f4a28
AM
17746+ * invalidating the image, or space to reboot.
17747+ * This works from either the serial console or normally
4e97e4e9 17748+ * attached keyboard.
73c609d5 17749+ *
4e97e4e9 17750+ * Note that we come in here from init, while the kernel is
17751+ * locked. If we want to get events from the serial console,
17752+ * we need to temporarily unlock the kernel.
73c609d5 17753+ *
4e97e4e9 17754+ * toi_early_boot_message may also be called post-boot.
17755+ * In this case, it simply printks the message and returns.
17756+ *
17757+ * Arguments: int Whether we are able to erase the image.
17758+ * int default_answer. What to do when we timeout. This
17759+ * will normally be continue, but the user might
17760+ * provide command line options (__setup) to override
17761+ * particular cases.
17762+ * Char *. Pointer to a string explaining why we're moaning.
73c609d5 17763+ */
24613191 17764+
4e97e4e9 17765+#define say(message, a...) printk(KERN_EMERG message, ##a)
24613191 17766+
ad8f4a28
AM
17767+void toi_early_boot_message(int message_detail, int default_answer,
17768+ char *warning_reason, ...)
24613191 17769+{
4e97e4e9 17770+#if defined(CONFIG_VT) || defined(CONFIG_SERIAL_CONSOLE)
17771+ unsigned long orig_state = get_toi_state(), continue_req = 0;
17772+ unsigned long orig_loglevel = console_loglevel;
17773+ int can_ask = 1;
17774+#else
17775+ int can_ask = 0;
17776+#endif
24613191 17777+
4e97e4e9 17778+ va_list args;
17779+ int printed_len;
73c609d5 17780+
4e97e4e9 17781+ if (!toi_wait) {
17782+ set_toi_state(TOI_CONTINUE_REQ);
17783+ can_ask = 0;
17784+ }
73c609d5 17785+
4e97e4e9 17786+ if (warning_reason) {
17787+ va_start(args, warning_reason);
ad8f4a28
AM
17788+ printed_len = vsnprintf(local_printf_buf,
17789+ sizeof(local_printf_buf),
4e97e4e9 17790+ warning_reason,
17791+ args);
17792+ va_end(args);
24613191 17793+ }
17794+
4e97e4e9 17795+ if (!test_toi_state(TOI_BOOT_TIME)) {
17796+ printk("TuxOnIce: %s\n", local_printf_buf);
17797+ return;
17798+ }
24613191 17799+
4e97e4e9 17800+ if (!can_ask) {
17801+ continue_req = !!default_answer;
17802+ goto post_ask;
73c609d5 17803+ }
24613191 17804+
4e97e4e9 17805+#if defined(CONFIG_VT) || defined(CONFIG_SERIAL_CONSOLE)
17806+ console_loglevel = 7;
24613191 17807+
4e97e4e9 17808+ say("=== TuxOnIce ===\n\n");
17809+ if (warning_reason) {
17810+ say("BIG FAT WARNING!! %s\n\n", local_printf_buf);
17811+ switch (message_detail) {
ad8f4a28
AM
17812+ case 0:
17813+ say("If you continue booting, note that any image WILL"
17814+ "NOT BE REMOVED.\nTuxOnIce is unable to do so "
17815+ "because the appropriate modules aren't\n"
17816+ "loaded. You should manually remove the image "
17817+ "to avoid any\npossibility of corrupting your "
17818+ "filesystem(s) later.\n");
4e97e4e9 17819+ break;
ad8f4a28
AM
17820+ case 1:
17821+ say("If you want to use the current TuxOnIce image, "
17822+ "reboot and try\nagain with the same kernel "
17823+ "that you hibernated from. If you want\n"
17824+ "to forget that image, continue and the image "
17825+ "will be erased.\n");
4e97e4e9 17826+ break;
17827+ }
ad8f4a28
AM
17828+ say("Press SPACE to reboot or C to continue booting with "
17829+ "this kernel\n\n");
4e97e4e9 17830+ if (toi_wait > 0)
ad8f4a28
AM
17831+ say("Default action if you don't select one in %d "
17832+ "seconds is: %s.\n",
4e97e4e9 17833+ toi_wait,
17834+ default_answer == TOI_CONTINUE_REQ ?
17835+ "continue booting" : "reboot");
17836+ } else {
ad8f4a28
AM
17837+ say("BIG FAT WARNING!!\n\n"
17838+ "You have tried to resume from this image before.\n"
17839+ "If it failed once, it may well fail again.\n"
17840+ "Would you like to remove the image and boot "
17841+ "normally?\nThis will be equivalent to entering "
17842+ "noresume on the\nkernel command line.\n\n"
17843+ "Press SPACE to remove the image or C to continue "
17844+ "resuming.\n\n");
4e97e4e9 17845+ if (toi_wait > 0)
ad8f4a28
AM
17846+ say("Default action if you don't select one in %d "
17847+ "seconds is: %s.\n", toi_wait,
4e97e4e9 17848+ !!default_answer ?
17849+ "continue resuming" : "remove the image");
17850+ }
17851+ console_loglevel = orig_loglevel;
ad8f4a28 17852+
4e97e4e9 17853+ set_toi_state(TOI_SANITY_CHECK_PROMPT);
17854+ clear_toi_state(TOI_CONTINUE_REQ);
24613191 17855+
4e97e4e9 17856+ if (toi_wait_for_keypress(toi_wait) == 0) /* We timed out */
17857+ continue_req = !!default_answer;
17858+ else
17859+ continue_req = test_toi_state(TOI_CONTINUE_REQ);
24613191 17860+
4e97e4e9 17861+#endif /* CONFIG_VT or CONFIG_SERIAL_CONSOLE */
17862+
17863+post_ask:
17864+ if ((warning_reason) && (!continue_req))
17865+ machine_restart(NULL);
ad8f4a28 17866+
4e97e4e9 17867+ restore_toi_state(orig_state);
17868+ if (continue_req)
17869+ set_toi_state(TOI_CONTINUE_REQ);
24613191 17870+}
4e97e4e9 17871+#undef say
24613191 17872+
4e97e4e9 17873+/*
17874+ * User interface specific /sys/power/tuxonice entries.
24613191 17875+ */
24613191 17876+
4e97e4e9 17877+static struct toi_sysfs_data sysfs_params[] = {
17878+#if defined(CONFIG_NET) && defined(CONFIG_SYSFS)
17879+ { TOI_ATTR("default_console_level", SYSFS_RW),
ad8f4a28 17880+ SYSFS_INT(&toi_bkd.toi_default_console_level, 0, 7, 0)
4e97e4e9 17881+ },
24613191 17882+
4e97e4e9 17883+ { TOI_ATTR("debug_sections", SYSFS_RW),
ad8f4a28 17884+ SYSFS_UL(&toi_bkd.toi_debug_state, 0, 1 << 30, 0)
4e97e4e9 17885+ },
24613191 17886+
4e97e4e9 17887+ { TOI_ATTR("log_everything", SYSFS_RW),
ad8f4a28 17888+ SYSFS_BIT(&toi_bkd.toi_action, TOI_LOGALL, 0)
4e97e4e9 17889+ },
17890+#endif
17891+ { TOI_ATTR("pm_prepare_console", SYSFS_RW),
ad8f4a28 17892+ SYSFS_BIT(&toi_bkd.toi_action, TOI_PM_PREPARE_CONSOLE, 0)
4e97e4e9 17893+ }
17894+};
24613191 17895+
4e97e4e9 17896+static struct toi_module_ops userui_ops = {
17897+ .type = MISC_HIDDEN_MODULE,
17898+ .name = "printk ui",
17899+ .directory = "user_interface",
17900+ .module = THIS_MODULE,
17901+ .sysfs_data = sysfs_params,
ad8f4a28
AM
17902+ .num_sysfs_entries = sizeof(sysfs_params) /
17903+ sizeof(struct toi_sysfs_data),
4e97e4e9 17904+};
24613191 17905+
4e97e4e9 17906+int toi_register_ui_ops(struct ui_ops *this_ui)
73c609d5 17907+{
4e97e4e9 17908+ if (toi_current_ui) {
ad8f4a28
AM
17909+ printk(KERN_INFO "Only one TuxOnIce user interface module can "
17910+ "be loaded at a time.");
4e97e4e9 17911+ return -EBUSY;
17912+ }
17913+
17914+ toi_current_ui = this_ui;
17915+
17916+ return 0;
73c609d5 17917+}
24613191 17918+
4e97e4e9 17919+void toi_remove_ui_ops(struct ui_ops *this_ui)
24613191 17920+{
4e97e4e9 17921+ if (toi_current_ui != this_ui)
73c609d5 17922+ return;
24613191 17923+
4e97e4e9 17924+ toi_current_ui = NULL;
24613191 17925+}
17926+
4e97e4e9 17927+/* toi_console_sysfs_init
17928+ * Description: Boot time initialisation for user interface.
24613191 17929+ */
73c609d5 17930+
4e97e4e9 17931+int toi_ui_init(void)
24613191 17932+{
4e97e4e9 17933+ return toi_register_module(&userui_ops);
24613191 17934+}
17935+
4e97e4e9 17936+void toi_ui_exit(void)
24613191 17937+{
4e97e4e9 17938+ toi_unregister_module(&userui_ops);
73c609d5 17939+}
24613191 17940+
4e97e4e9 17941+#ifdef CONFIG_TOI_EXPORTS
17942+EXPORT_SYMBOL_GPL(toi_current_ui);
17943+EXPORT_SYMBOL_GPL(toi_early_boot_message);
17944+EXPORT_SYMBOL_GPL(toi_register_ui_ops);
17945+EXPORT_SYMBOL_GPL(toi_remove_ui_ops);
4e97e4e9 17946+#endif
17947diff --git a/kernel/power/tuxonice_ui.h b/kernel/power/tuxonice_ui.h
17948new file mode 100644
7f9d2ee0 17949index 0000000..7825ddf
4e97e4e9 17950--- /dev/null
17951+++ b/kernel/power/tuxonice_ui.h
ad8f4a28 17952@@ -0,0 +1,104 @@
73c609d5 17953+/*
4e97e4e9 17954+ * kernel/power/tuxonice_ui.h
73c609d5 17955+ *
4e97e4e9 17956+ * Copyright (C) 2004-2007 Nigel Cunningham (nigel at tuxonice net)
73c609d5 17957+ */
24613191 17958+
4e97e4e9 17959+enum {
17960+ DONT_CLEAR_BAR,
17961+ CLEAR_BAR
73c609d5 17962+};
24613191 17963+
73c609d5 17964+enum {
4e97e4e9 17965+ /* Userspace -> Kernel */
17966+ USERUI_MSG_ABORT = 0x11,
17967+ USERUI_MSG_SET_STATE = 0x12,
17968+ USERUI_MSG_GET_STATE = 0x13,
17969+ USERUI_MSG_GET_DEBUG_STATE = 0x14,
17970+ USERUI_MSG_SET_DEBUG_STATE = 0x15,
17971+ USERUI_MSG_SPACE = 0x18,
17972+ USERUI_MSG_GET_POWERDOWN_METHOD = 0x1A,
17973+ USERUI_MSG_SET_POWERDOWN_METHOD = 0x1B,
17974+ USERUI_MSG_GET_LOGLEVEL = 0x1C,
17975+ USERUI_MSG_SET_LOGLEVEL = 0x1D,
17976+ USERUI_MSG_PRINTK = 0x1E,
17977+
17978+ /* Kernel -> Userspace */
17979+ USERUI_MSG_MESSAGE = 0x21,
17980+ USERUI_MSG_PROGRESS = 0x22,
17981+ USERUI_MSG_POST_ATOMIC_RESTORE = 0x25,
17982+
17983+ USERUI_MSG_MAX,
73c609d5 17984+};
24613191 17985+
4e97e4e9 17986+struct userui_msg_params {
17987+ unsigned long a, b, c, d;
17988+ char text[255];
17989+};
24613191 17990+
4e97e4e9 17991+struct ui_ops {
17992+ char (*wait_for_key) (int timeout);
17993+ unsigned long (*update_status) (unsigned long value,
17994+ unsigned long maximum, const char *fmt, ...);
17995+ void (*prepare_status) (int clearbar, const char *fmt, ...);
17996+ void (*cond_pause) (int pause, char *message);
17997+ void (*abort)(int result_code, const char *fmt, ...);
17998+ void (*prepare)(void);
17999+ void (*cleanup)(void);
18000+ void (*post_atomic_restore)(void);
18001+ void (*message)(unsigned long section, unsigned long level,
18002+ int normally_logged, const char *fmt, ...);
18003+};
24613191 18004+
4e97e4e9 18005+extern struct ui_ops *toi_current_ui;
24613191 18006+
4e97e4e9 18007+#define toi_update_status(val, max, fmt, args...) \
ad8f4a28
AM
18008+ (toi_current_ui ? (toi_current_ui->update_status) (val, max, fmt, ##args) : \
18009+ max)
24613191 18010+
4e97e4e9 18011+#define toi_ui_post_atomic_restore(void) \
18012+ do { if (toi_current_ui) \
18013+ (toi_current_ui->post_atomic_restore)(); \
ad8f4a28 18014+ } while (0)
24613191 18015+
4e97e4e9 18016+#define toi_prepare_console(void) \
18017+ do { if (toi_current_ui) \
18018+ (toi_current_ui->prepare)(); \
ad8f4a28 18019+ } while (0)
24613191 18020+
4e97e4e9 18021+#define toi_cleanup_console(void) \
18022+ do { if (toi_current_ui) \
18023+ (toi_current_ui->cleanup)(); \
ad8f4a28 18024+ } while (0)
24613191 18025+
4e97e4e9 18026+#define abort_hibernate(result, fmt, args...) \
18027+ do { if (toi_current_ui) \
18028+ (toi_current_ui->abort)(result, fmt, ##args); \
18029+ else { \
18030+ set_abort_result(result); \
18031+ } \
ad8f4a28 18032+ } while (0)
24613191 18033+
4e97e4e9 18034+#define toi_cond_pause(pause, message) \
18035+ do { if (toi_current_ui) \
18036+ (toi_current_ui->cond_pause)(pause, message); \
ad8f4a28 18037+ } while (0)
24613191 18038+
4e97e4e9 18039+#define toi_prepare_status(clear, fmt, args...) \
18040+ do { if (toi_current_ui) \
18041+ (toi_current_ui->prepare_status)(clear, fmt, ##args); \
18042+ else \
7f9d2ee0 18043+ printk(KERN_ERR fmt "%s", ##args, "\n"); \
ad8f4a28 18044+ } while (0)
24613191 18045+
4e97e4e9 18046+#define toi_message(sn, lev, log, fmt, a...) \
18047+do { \
18048+ if (toi_current_ui && (!sn || test_debug_state(sn))) \
18049+ toi_current_ui->message(sn, lev, log, fmt, ##a); \
ad8f4a28 18050+} while (0)
24613191 18051+
4e97e4e9 18052+__exit void toi_ui_cleanup(void);
18053+extern int toi_ui_init(void);
18054+extern void toi_ui_exit(void);
18055+extern int toi_register_ui_ops(struct ui_ops *this_ui);
18056+extern void toi_remove_ui_ops(struct ui_ops *this_ui);
18057diff --git a/kernel/power/tuxonice_userui.c b/kernel/power/tuxonice_userui.c
18058new file mode 100644
7f9d2ee0 18059index 0000000..634a801
4e97e4e9 18060--- /dev/null
18061+++ b/kernel/power/tuxonice_userui.c
ad8f4a28 18062@@ -0,0 +1,674 @@
73c609d5 18063+/*
4e97e4e9 18064+ * kernel/power/user_ui.c
73c609d5 18065+ *
4e97e4e9 18066+ * Copyright (C) 2005-2007 Bernard Blackham
18067+ * Copyright (C) 2002-2007 Nigel Cunningham (nigel at tuxonice net)
73c609d5 18068+ *
18069+ * This file is released under the GPLv2.
18070+ *
4e97e4e9 18071+ * Routines for TuxOnIce's user interface.
73c609d5 18072+ *
18073+ * The user interface code talks to a userspace program via a
18074+ * netlink socket.
18075+ *
18076+ * The kernel side:
18077+ * - starts the userui program;
18078+ * - sends text messages and progress bar status;
18079+ *
18080+ * The user space side:
18081+ * - passes messages regarding user requests (abort, toggle reboot etc)
24613191 18082+ *
24613191 18083+ */
18084+
73c609d5 18085+#define __KERNEL_SYSCALLS__
24613191 18086+
73c609d5 18087+#include <linux/suspend.h>
18088+#include <linux/freezer.h>
18089+#include <linux/console.h>
18090+#include <linux/ctype.h>
18091+#include <linux/tty.h>
18092+#include <linux/vt_kern.h>
18093+#include <linux/module.h>
18094+#include <linux/reboot.h>
18095+#include <linux/kmod.h>
18096+#include <linux/security.h>
18097+#include <linux/syscalls.h>
ad8f4a28 18098+
4e97e4e9 18099+#include "tuxonice_sysfs.h"
18100+#include "tuxonice_modules.h"
18101+#include "tuxonice.h"
18102+#include "tuxonice_ui.h"
18103+#include "tuxonice_netlink.h"
18104+#include "tuxonice_power_off.h"
24613191 18105+
73c609d5 18106+static char local_printf_buf[1024]; /* Same as printk - should be safe */
73c609d5 18107+
4e97e4e9 18108+static struct user_helper_data ui_helper_data;
18109+static struct toi_module_ops userui_ops;
18110+static int orig_kmsg;
24613191 18111+
4e97e4e9 18112+static char lastheader[512];
18113+static int lastheader_message_len;
18114+static int ui_helper_changed; /* Used at resume-time so don't overwrite value
18115+ set from initrd/ramfs. */
18116+
18117+/* Number of distinct progress amounts that userspace can display */
18118+static int progress_granularity = 30;
18119+
18120+static DECLARE_WAIT_QUEUE_HEAD(userui_wait_for_key);
18121+
18122+/**
18123+ * ui_nl_set_state - Update toi_action based on a message from userui.
24613191 18124+ *
4e97e4e9 18125+ * @n: The bit (1 << bit) to set.
18126+ */
18127+static void ui_nl_set_state(int n)
18128+{
18129+ /* Only let them change certain settings */
18130+ static const int toi_action_mask =
18131+ (1 << TOI_REBOOT) | (1 << TOI_PAUSE) |
7f9d2ee0 18132+ (1 << TOI_LOGALL) |
4e97e4e9 18133+ (1 << TOI_SINGLESTEP) |
18134+ (1 << TOI_PAUSE_NEAR_PAGESET_END);
18135+
ad8f4a28 18136+ toi_bkd.toi_action = (toi_bkd.toi_action & (~toi_action_mask)) |
4e97e4e9 18137+ (n & toi_action_mask);
18138+
18139+ if (!test_action_state(TOI_PAUSE) &&
18140+ !test_action_state(TOI_SINGLESTEP))
18141+ wake_up_interruptible(&userui_wait_for_key);
18142+}
18143+
18144+/**
18145+ * userui_post_atomic_restore - Tell userui that atomic restore just happened.
24613191 18146+ *
4e97e4e9 18147+ * Tell userui that atomic restore just occured, so that it can do things like
18148+ * redrawing the screen, re-getting settings and so on.
18149+ */
18150+static void userui_post_atomic_restore(void)
18151+{
18152+ toi_send_netlink_message(&ui_helper_data,
18153+ USERUI_MSG_POST_ATOMIC_RESTORE, NULL, 0);
18154+}
18155+
18156+/**
18157+ * userui_storage_needed - Report how much memory in image header is needed.
18158+ */
18159+static int userui_storage_needed(void)
18160+{
18161+ return sizeof(ui_helper_data.program) + 1 + sizeof(int);
18162+}
18163+
18164+/**
18165+ * userui_save_config_info - Fill buffer with config info for image header.
24613191 18166+ *
4e97e4e9 18167+ * @buf: Buffer into which to put the config info we want to save.
18168+ */
18169+static int userui_save_config_info(char *buf)
18170+{
18171+ *((int *) buf) = progress_granularity;
ad8f4a28
AM
18172+ memcpy(buf + sizeof(int), ui_helper_data.program,
18173+ sizeof(ui_helper_data.program));
4e97e4e9 18174+ return sizeof(ui_helper_data.program) + sizeof(int) + 1;
18175+}
18176+
18177+/**
18178+ * userui_load_config_info - Restore config info from buffer.
18179+ *
18180+ * @buf: Buffer containing header info loaded.
18181+ * @size: Size of data loaded for this module.
18182+ */
18183+static void userui_load_config_info(char *buf, int size)
18184+{
18185+ progress_granularity = *((int *) buf);
18186+ size -= sizeof(int);
18187+
18188+ /* Don't load the saved path if one has already been set */
18189+ if (ui_helper_changed)
18190+ return;
18191+
18192+ if (size > sizeof(ui_helper_data.program))
18193+ size = sizeof(ui_helper_data.program);
18194+
18195+ memcpy(ui_helper_data.program, buf + sizeof(int), size);
18196+ ui_helper_data.program[sizeof(ui_helper_data.program)-1] = '\0';
18197+}
18198+
18199+/**
18200+ * set_ui_program_set: Record that userui program was changed.
18201+ *
18202+ * Side effect routine for when the userui program is set. In an initrd or
18203+ * ramfs, the user may set a location for the userui program. If this happens,
18204+ * we don't want to reload the value that was saved in the image header. This
18205+ * routine allows us to flag that we shouldn't restore the program name from
18206+ * the image header.
18207+ */
18208+static void set_ui_program_set(void)
18209+{
18210+ ui_helper_changed = 1;
18211+}
18212+
18213+/**
18214+ * userui_memory_needed - Tell core how much memory to reserve for us.
18215+ */
18216+static int userui_memory_needed(void)
18217+{
18218+ /* ball park figure of 128 pages */
18219+ return (128 * PAGE_SIZE);
18220+}
18221+
18222+/**
18223+ * userui_update_status - Update the progress bar and (if on) in-bar message.
18224+ *
18225+ * @value: Current progress percentage numerator.
18226+ * @maximum: Current progress percentage denominator.
18227+ * @fmt: Message to be displayed in the middle of the progress bar.
18228+ *
18229+ * Note that a NULL message does not mean that any previous message is erased!
18230+ * For that, you need toi_prepare_status with clearbar on.
18231+ *
18232+ * Returns an unsigned long, being the next numerator (as determined by the
18233+ * maximum and progress granularity) where status needs to be updated.
18234+ * This is to reduce unnecessary calls to update_status.
18235+ */
18236+static unsigned long userui_update_status(unsigned long value,
18237+ unsigned long maximum, const char *fmt, ...)
18238+{
18239+ static int last_step = -1;
18240+ struct userui_msg_params msg;
18241+ int bitshift;
18242+ int this_step;
18243+ unsigned long next_update;
18244+
18245+ if (ui_helper_data.pid == -1)
18246+ return 0;
18247+
18248+ if ((!maximum) || (!progress_granularity))
18249+ return maximum;
18250+
18251+ if (value < 0)
18252+ value = 0;
18253+
18254+ if (value > maximum)
18255+ value = maximum;
18256+
18257+ /* Try to avoid math problems - we can't do 64 bit math here
18258+ * (and shouldn't need it - anyone got screen resolution
18259+ * of 65536 pixels or more?) */
18260+ bitshift = fls(maximum) - 16;
18261+ if (bitshift > 0) {
18262+ unsigned long temp_maximum = maximum >> bitshift;
18263+ unsigned long temp_value = value >> bitshift;
18264+ this_step = (int)
18265+ (temp_value * progress_granularity / temp_maximum);
18266+ next_update = (((this_step + 1) * temp_maximum /
18267+ progress_granularity) + 1) << bitshift;
18268+ } else {
18269+ this_step = (int) (value * progress_granularity / maximum);
18270+ next_update = ((this_step + 1) * maximum /
18271+ progress_granularity) + 1;
18272+ }
18273+
18274+ if (this_step == last_step)
18275+ return next_update;
18276+
18277+ memset(&msg, 0, sizeof(msg));
18278+
18279+ msg.a = this_step;
18280+ msg.b = progress_granularity;
18281+
18282+ if (fmt) {
18283+ va_list args;
18284+ va_start(args, fmt);
18285+ vsnprintf(msg.text, sizeof(msg.text), fmt, args);
18286+ va_end(args);
18287+ msg.text[sizeof(msg.text)-1] = '\0';
18288+ }
18289+
18290+ toi_send_netlink_message(&ui_helper_data, USERUI_MSG_PROGRESS,
18291+ &msg, sizeof(msg));
18292+ last_step = this_step;
18293+
18294+ return next_update;
18295+}
18296+
18297+/**
18298+ * userui_message - Display a message without necessarily logging it.
18299+ *
18300+ * @section: Type of message. Messages can be filtered by type.
18301+ * @level: Degree of importance of the message. Lower values = higher priority.
18302+ * @normally_logged: Whether logged even if log_everything is off.
18303+ * @fmt: Message (and parameters).
18304+ *
18305+ * This function is intended to do the same job as printk, but without normally
18306+ * logging what is printed. The point is to be able to get debugging info on
18307+ * screen without filling the logs with "1/534. ^M 2/534^M. 3/534^M"
18308+ *
18309+ * It may be called from an interrupt context - can't sleep!
18310+ */
18311+static void userui_message(unsigned long section, unsigned long level,
18312+ int normally_logged, const char *fmt, ...)
18313+{
18314+ struct userui_msg_params msg;
18315+
18316+ if ((level) && (level > console_loglevel))
18317+ return;
18318+
18319+ memset(&msg, 0, sizeof(msg));
18320+
18321+ msg.a = section;
18322+ msg.b = level;
18323+ msg.c = normally_logged;
18324+
18325+ if (fmt) {
18326+ va_list args;
18327+ va_start(args, fmt);
18328+ vsnprintf(msg.text, sizeof(msg.text), fmt, args);
18329+ va_end(args);
18330+ msg.text[sizeof(msg.text)-1] = '\0';
18331+ }
18332+
18333+ if (test_action_state(TOI_LOGALL))
ad8f4a28 18334+ printk(KERN_INFO "%s\n", msg.text);
4e97e4e9 18335+
18336+ toi_send_netlink_message(&ui_helper_data, USERUI_MSG_MESSAGE,
18337+ &msg, sizeof(msg));
18338+}
18339+
18340+/**
18341+ * wait_for_key_via_userui - Wait for userui to receive a keypress.
18342+ */
18343+static void wait_for_key_via_userui(void)
18344+{
18345+ DECLARE_WAITQUEUE(wait, current);
18346+
18347+ add_wait_queue(&userui_wait_for_key, &wait);
18348+ set_current_state(TASK_INTERRUPTIBLE);
18349+
18350+ interruptible_sleep_on(&userui_wait_for_key);
18351+
18352+ set_current_state(TASK_RUNNING);
18353+ remove_wait_queue(&userui_wait_for_key, &wait);
18354+}
18355+
18356+/**
18357+ * userui_prepare_status - Display high level messages.
18358+ *
18359+ * @clearbar: Whether to clear the progress bar.
18360+ * @fmt...: New message for the title.
18361+ *
18362+ * Prepare the 'nice display', drawing the header and version, along with the
18363+ * current action and perhaps also resetting the progress bar.
18364+ */
18365+static void userui_prepare_status(int clearbar, const char *fmt, ...)
18366+{
18367+ va_list args;
18368+
18369+ if (fmt) {
18370+ va_start(args, fmt);
18371+ lastheader_message_len = vsnprintf(lastheader, 512, fmt, args);
18372+ va_end(args);
18373+ }
18374+
18375+ if (clearbar)
18376+ toi_update_status(0, 1, NULL);
18377+
18378+ if (ui_helper_data.pid == -1)
18379+ printk(KERN_EMERG "%s\n", lastheader);
18380+ else
18381+ toi_message(0, TOI_STATUS, 1, lastheader, NULL);
18382+}
18383+
18384+/**
18385+ * toi_wait_for_keypress - Wait for keypress via userui.
18386+ *
18387+ * @timeout: Maximum time to wait.
18388+ *
18389+ * Wait for a keypress from userui.
18390+ *
18391+ * FIXME: Implement timeout?
24613191 18392+ */
4e97e4e9 18393+static char userui_wait_for_keypress(int timeout)
18394+{
18395+ char key = '\0';
24613191 18396+
4e97e4e9 18397+ if (ui_helper_data.pid != -1) {
18398+ wait_for_key_via_userui();
18399+ key = ' ';
18400+ }
ad8f4a28 18401+
4e97e4e9 18402+ return key;
18403+}
24613191 18404+
4e97e4e9 18405+/**
18406+ * userui_abort_hibernate - Abort a cycle & tell user if they didn't request it.
18407+ *
18408+ * @result_code: Reason why we're aborting (1 << bit).
18409+ * @fmt: Message to display if telling the user what's going on.
18410+ *
18411+ * Abort a cycle. If this wasn't at the user's request (and we're displaying
18412+ * output), tell the user why and wait for them to acknowledge the message.
18413+ */
18414+static void userui_abort_hibernate(int result_code, const char *fmt, ...)
24613191 18415+{
24613191 18416+ va_list args;
4e97e4e9 18417+ int printed_len = 0;
24613191 18418+
4e97e4e9 18419+ set_result_state(result_code);
24613191 18420+
4e97e4e9 18421+ if (test_result_state(TOI_ABORTED))
18422+ return;
24613191 18423+
4e97e4e9 18424+ set_result_state(TOI_ABORTED);
24613191 18425+
4e97e4e9 18426+ if (test_result_state(TOI_ABORT_REQUESTED))
18427+ return;
24613191 18428+
4e97e4e9 18429+ va_start(args, fmt);
18430+ printed_len = vsnprintf(local_printf_buf, sizeof(local_printf_buf),
18431+ fmt, args);
18432+ va_end(args);
18433+ if (ui_helper_data.pid != -1)
18434+ printed_len = sprintf(local_printf_buf + printed_len,
18435+ " (Press SPACE to continue)");
24613191 18436+
7f9d2ee0 18437+ toi_prepare_status(CLEAR_BAR, "%s", local_printf_buf);
24613191 18438+
4e97e4e9 18439+ if (ui_helper_data.pid != -1)
18440+ userui_wait_for_keypress(0);
18441+}
24613191 18442+
4e97e4e9 18443+/**
18444+ * request_abort_hibernate - Abort hibernating or resuming at user request.
18445+ *
18446+ * Handle the user requesting the cancellation of a hibernation or resume by
18447+ * pressing escape.
18448+ */
18449+static void request_abort_hibernate(void)
18450+{
18451+ if (test_result_state(TOI_ABORT_REQUESTED))
18452+ return;
18453+
18454+ if (test_toi_state(TOI_NOW_RESUMING)) {
18455+ toi_prepare_status(CLEAR_BAR, "Escape pressed. "
18456+ "Powering down again.");
18457+ set_toi_state(TOI_STOP_RESUME);
18458+ while (!test_toi_state(TOI_IO_STOPPED))
18459+ schedule();
18460+ if (toiActiveAllocator->mark_resume_attempted)
18461+ toiActiveAllocator->mark_resume_attempted(0);
18462+ toi_power_down();
18463+ }
18464+
18465+ toi_prepare_status(CLEAR_BAR, "--- ESCAPE PRESSED :"
18466+ " ABORTING HIBERNATION ---");
18467+ set_abort_result(TOI_ABORT_REQUESTED);
18468+ wake_up_interruptible(&userui_wait_for_key);
24613191 18469+}
24613191 18470+
4e97e4e9 18471+/**
18472+ * userui_user_rcv_msg - Receive a netlink message from userui.
18473+ *
18474+ * @skb: skb received.
18475+ * @nlh: Netlink header received.
24613191 18476+ */
4e97e4e9 18477+static int userui_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
18478+{
18479+ int type;
18480+ int *data;
24613191 18481+
4e97e4e9 18482+ type = nlh->nlmsg_type;
24613191 18483+
4e97e4e9 18484+ /* A control message: ignore them */
18485+ if (type < NETLINK_MSG_BASE)
18486+ return 0;
24613191 18487+
4e97e4e9 18488+ /* Unknown message: reply with EINVAL */
18489+ if (type >= USERUI_MSG_MAX)
18490+ return -EINVAL;
24613191 18491+
4e97e4e9 18492+ /* All operations require privileges, even GET */
18493+ if (security_netlink_recv(skb, CAP_NET_ADMIN))
18494+ return -EPERM;
24613191 18495+
4e97e4e9 18496+ /* Only allow one task to receive NOFREEZE privileges */
18497+ if (type == NETLINK_MSG_NOFREEZE_ME && ui_helper_data.pid != -1) {
ad8f4a28
AM
18498+ printk(KERN_INFO "Got NOFREEZE_ME request when "
18499+ "ui_helper_data.pid is %d.\n", ui_helper_data.pid);
73c609d5 18500+ return -EBUSY;
18501+ }
18502+
ad8f4a28 18503+ data = (int *) NLMSG_DATA(nlh);
73c609d5 18504+
4e97e4e9 18505+ switch (type) {
ad8f4a28
AM
18506+ case USERUI_MSG_ABORT:
18507+ request_abort_hibernate();
18508+ return 0;
18509+ case USERUI_MSG_GET_STATE:
18510+ toi_send_netlink_message(&ui_helper_data,
18511+ USERUI_MSG_GET_STATE, &toi_bkd.toi_action,
18512+ sizeof(toi_bkd.toi_action));
18513+ return 0;
18514+ case USERUI_MSG_GET_DEBUG_STATE:
18515+ toi_send_netlink_message(&ui_helper_data,
18516+ USERUI_MSG_GET_DEBUG_STATE,
18517+ &toi_bkd.toi_debug_state,
18518+ sizeof(toi_bkd.toi_debug_state));
18519+ return 0;
18520+ case USERUI_MSG_SET_STATE:
18521+ if (nlh->nlmsg_len < NLMSG_LENGTH(sizeof(int)))
18522+ return -EINVAL;
18523+ ui_nl_set_state(*data);
18524+ return 0;
18525+ case USERUI_MSG_SET_DEBUG_STATE:
18526+ if (nlh->nlmsg_len < NLMSG_LENGTH(sizeof(int)))
18527+ return -EINVAL;
18528+ toi_bkd.toi_debug_state = (*data);
18529+ return 0;
18530+ case USERUI_MSG_SPACE:
18531+ wake_up_interruptible(&userui_wait_for_key);
18532+ return 0;
18533+ case USERUI_MSG_GET_POWERDOWN_METHOD:
18534+ toi_send_netlink_message(&ui_helper_data,
18535+ USERUI_MSG_GET_POWERDOWN_METHOD,
18536+ &toi_poweroff_method,
18537+ sizeof(toi_poweroff_method));
18538+ return 0;
18539+ case USERUI_MSG_SET_POWERDOWN_METHOD:
18540+ if (nlh->nlmsg_len < NLMSG_LENGTH(sizeof(int)))
18541+ return -EINVAL;
18542+ toi_poweroff_method = (*data);
18543+ return 0;
18544+ case USERUI_MSG_GET_LOGLEVEL:
18545+ toi_send_netlink_message(&ui_helper_data,
18546+ USERUI_MSG_GET_LOGLEVEL,
18547+ &toi_bkd.toi_default_console_level,
18548+ sizeof(toi_bkd.toi_default_console_level));
18549+ return 0;
18550+ case USERUI_MSG_SET_LOGLEVEL:
18551+ if (nlh->nlmsg_len < NLMSG_LENGTH(sizeof(int)))
18552+ return -EINVAL;
18553+ toi_bkd.toi_default_console_level = (*data);
18554+ return 0;
18555+ case USERUI_MSG_PRINTK:
18556+ printk((char *) data);
18557+ return 0;
4e97e4e9 18558+ }
18559+
ad8f4a28 18560+ /* Unhandled here */
4e97e4e9 18561+ return 1;
18562+}
18563+
18564+/**
18565+ * userui_cond_pause - Possibly pause at user request.
18566+ *
18567+ * @pause: Whether to pause or just display the message.
18568+ * @message: Message to display at the start of pausing.
ad8f4a28 18569+ *
4e97e4e9 18570+ * Potentially pause and wait for the user to tell us to continue. We normally
18571+ * only pause when @pause is set. While paused, the user can do things like
18572+ * changing the loglevel, toggling the display of debugging sections and such
18573+ * like.
18574+ */
18575+static void userui_cond_pause(int pause, char *message)
18576+{
18577+ int displayed_message = 0, last_key = 0;
ad8f4a28 18578+
4e97e4e9 18579+ while (last_key != 32 &&
18580+ ui_helper_data.pid != -1 &&
ad8f4a28 18581+ ((test_action_state(TOI_PAUSE) && pause) ||
4e97e4e9 18582+ (test_action_state(TOI_SINGLESTEP)))) {
18583+ if (!displayed_message) {
ad8f4a28 18584+ toi_prepare_status(DONT_CLEAR_BAR,
4e97e4e9 18585+ "%s Press SPACE to continue.%s",
18586+ message ? message : "",
ad8f4a28 18587+ (test_action_state(TOI_SINGLESTEP)) ?
4e97e4e9 18588+ " Single step on." : "");
18589+ displayed_message = 1;
18590+ }
18591+ last_key = userui_wait_for_keypress(0);
18592+ }
18593+ schedule();
73c609d5 18594+}
18595+
4e97e4e9 18596+/**
18597+ * userui_prepare_console - Prepare the console for use.
18598+ *
18599+ * Prepare a console for use, saving current kmsg settings and attempting to
18600+ * start userui. Console loglevel changes are handled by userui.
18601+ */
18602+static void userui_prepare_console(void)
73c609d5 18603+{
4e97e4e9 18604+ orig_kmsg = kmsg_redirect;
18605+ kmsg_redirect = fg_console + 1;
18606+
18607+ ui_helper_data.pid = -1;
18608+
18609+ if (!userui_ops.enabled) {
18610+ printk("TuxOnIce: Userui disabled.\n");
73c609d5 18611+ return;
4e97e4e9 18612+ }
73c609d5 18613+
4e97e4e9 18614+ if (*ui_helper_data.program)
18615+ toi_netlink_setup(&ui_helper_data);
18616+ else
ad8f4a28 18617+ printk(KERN_INFO "TuxOnIce: Userui program not configured.\n");
73c609d5 18618+}
18619+
4e97e4e9 18620+/**
18621+ * userui_cleanup_console - Cleanup after a cycle.
18622+ *
18623+ * Tell userui to cleanup, and restore kmsg_redirect to its original value.
24613191 18624+ */
18625+
4e97e4e9 18626+static void userui_cleanup_console(void)
24613191 18627+{
4e97e4e9 18628+ if (ui_helper_data.pid > -1)
18629+ toi_netlink_close(&ui_helper_data);
24613191 18630+
4e97e4e9 18631+ kmsg_redirect = orig_kmsg;
24613191 18632+}
18633+
24613191 18634+/*
4e97e4e9 18635+ * User interface specific /sys/power/tuxonice entries.
24613191 18636+ */
18637+
4e97e4e9 18638+static struct toi_sysfs_data sysfs_params[] = {
18639+#if defined(CONFIG_NET) && defined(CONFIG_SYSFS)
18640+ { TOI_ATTR("enable_escape", SYSFS_RW),
ad8f4a28 18641+ SYSFS_BIT(&toi_bkd.toi_action, TOI_CAN_CANCEL, 0)
4e97e4e9 18642+ },
24613191 18643+
4e97e4e9 18644+ { TOI_ATTR("pause_between_steps", SYSFS_RW),
ad8f4a28 18645+ SYSFS_BIT(&toi_bkd.toi_action, TOI_PAUSE, 0)
4e97e4e9 18646+ },
24613191 18647+
4e97e4e9 18648+ { TOI_ATTR("enabled", SYSFS_RW),
18649+ SYSFS_INT(&userui_ops.enabled, 0, 1, 0)
18650+ },
24613191 18651+
4e97e4e9 18652+ { TOI_ATTR("progress_granularity", SYSFS_RW),
18653+ SYSFS_INT(&progress_granularity, 1, 2048, 0)
18654+ },
24613191 18655+
4e97e4e9 18656+ { TOI_ATTR("program", SYSFS_RW),
18657+ SYSFS_STRING(ui_helper_data.program, 255, 0),
18658+ .write_side_effect = set_ui_program_set,
18659+ },
18660+#endif
73c609d5 18661+};
18662+
4e97e4e9 18663+static struct toi_module_ops userui_ops = {
18664+ .type = MISC_MODULE,
18665+ .name = "userui",
18666+ .shared_directory = "user_interface",
18667+ .module = THIS_MODULE,
18668+ .storage_needed = userui_storage_needed,
18669+ .save_config_info = userui_save_config_info,
18670+ .load_config_info = userui_load_config_info,
18671+ .memory_needed = userui_memory_needed,
18672+ .sysfs_data = sysfs_params,
ad8f4a28
AM
18673+ .num_sysfs_entries = sizeof(sysfs_params) /
18674+ sizeof(struct toi_sysfs_data),
4e97e4e9 18675+};
73c609d5 18676+
4e97e4e9 18677+static struct ui_ops my_ui_ops = {
18678+ .post_atomic_restore = userui_post_atomic_restore,
18679+ .update_status = userui_update_status,
18680+ .message = userui_message,
18681+ .prepare_status = userui_prepare_status,
18682+ .abort = userui_abort_hibernate,
18683+ .cond_pause = userui_cond_pause,
18684+ .prepare = userui_prepare_console,
18685+ .cleanup = userui_cleanup_console,
18686+ .wait_for_key = userui_wait_for_keypress,
18687+};
73c609d5 18688+
4e97e4e9 18689+/**
18690+ * toi_user_ui_init - Boot time initialisation for user interface.
18691+ *
18692+ * Invoked from the core init routine.
18693+ */
18694+static __init int toi_user_ui_init(void)
18695+{
18696+ int result;
73c609d5 18697+
4e97e4e9 18698+ ui_helper_data.nl = NULL;
ad8f4a28 18699+ strncpy(ui_helper_data.program, CONFIG_TOI_USERUI_DEFAULT_PATH, 255);
4e97e4e9 18700+ ui_helper_data.pid = -1;
18701+ ui_helper_data.skb_size = sizeof(struct userui_msg_params);
18702+ ui_helper_data.pool_limit = 6;
18703+ ui_helper_data.netlink_id = NETLINK_TOI_USERUI;
18704+ ui_helper_data.name = "userspace ui";
18705+ ui_helper_data.rcv_msg = userui_user_rcv_msg;
18706+ ui_helper_data.interface_version = 7;
18707+ ui_helper_data.must_init = 0;
18708+ ui_helper_data.not_ready = userui_cleanup_console;
18709+ init_completion(&ui_helper_data.wait_for_process);
18710+ result = toi_register_module(&userui_ops);
18711+ if (!result)
18712+ result = toi_register_ui_ops(&my_ui_ops);
18713+ if (result)
18714+ toi_unregister_module(&userui_ops);
24613191 18715+
4e97e4e9 18716+ return result;
18717+}
24613191 18718+
4e97e4e9 18719+#ifdef MODULE
18720+/**
18721+ * toi_user_ui_ext - Cleanup code for if the core is unloaded.
18722+ */
18723+static __exit void toi_user_ui_exit(void)
18724+{
18725+ toi_remove_ui_ops(&my_ui_ops);
18726+ toi_unregister_module(&userui_ops);
18727+}
24613191 18728+
4e97e4e9 18729+module_init(toi_user_ui_init);
18730+module_exit(toi_user_ui_exit);
18731+MODULE_AUTHOR("Nigel Cunningham");
18732+MODULE_DESCRIPTION("TuxOnIce Userui Support");
18733+MODULE_LICENSE("GPL");
18734+#else
18735+late_initcall(toi_user_ui_init);
18736+#endif
18737diff --git a/kernel/printk.c b/kernel/printk.c
7f9d2ee0 18738index bdd4ea8..cb95b05 100644
4e97e4e9 18739--- a/kernel/printk.c
18740+++ b/kernel/printk.c
7f9d2ee0 18741@@ -32,6 +32,7 @@
18742 #include <linux/security.h>
24613191 18743 #include <linux/bootmem.h>
18744 #include <linux/syscalls.h>
24613191 18745+#include <linux/suspend.h>
18746
18747 #include <asm/uaccess.h>
18748
7f9d2ee0 18749@@ -99,9 +101,12 @@ static DEFINE_SPINLOCK(logbuf_lock);
24613191 18750 * The indices into log_buf are not constrained to log_buf_len - they
18751 * must be masked before subscripting
18752 */
7f9d2ee0 18753-static unsigned log_start; /* Index into log_buf: next char to be read by syslog() */
18754-static unsigned con_start; /* Index into log_buf: next char to be sent to consoles */
18755-static unsigned log_end; /* Index into log_buf: most-recently-written-char + 1 */
ad8f4a28 18756+/* Index into log_buf: next char to be read by syslog() */
7f9d2ee0 18757+static unsigned POSS_NOSAVE log_start;
ad8f4a28 18758+/* Index into log_buf: next char to be sent to consoles */
7f9d2ee0 18759+static unsigned POSS_NOSAVE con_start;
ad8f4a28 18760+/* Index into log_buf: most-recently-written-char + 1 */
7f9d2ee0 18761+static unsigned POSS_NOSAVE log_end;
24613191 18762
18763 /*
18764 * Array of consoles built from command line options (console=)
7f9d2ee0 18765@@ -124,10 +129,11 @@ static int console_may_schedule;
24613191 18766
18767 #ifdef CONFIG_PRINTK
18768
18769-static char __log_buf[__LOG_BUF_LEN];
18770-static char *log_buf = __log_buf;
18771-static int log_buf_len = __LOG_BUF_LEN;
7f9d2ee0 18772-static unsigned logged_chars; /* Number of chars produced since last read+clear operation */
ad8f4a28
AM
18773+static POSS_NOSAVE char __log_buf[__LOG_BUF_LEN];
18774+static POSS_NOSAVE char *log_buf = __log_buf;
18775+static POSS_NOSAVE int log_buf_len = __LOG_BUF_LEN;
18776+/* Number of chars produced since last read+clear operation */
7f9d2ee0 18777+static POSS_NOSAVE unsigned logged_chars;
24613191 18778
18779 static int __init log_buf_len_setup(char *str)
18780 {
7f9d2ee0 18781@@ -927,6 +933,7 @@ void suspend_console(void)
e8d0ad9d 18782 acquire_console_sem();
18783 console_suspended = 1;
18784 }
18785+EXPORT_SYMBOL(suspend_console);
18786
18787 void resume_console(void)
18788 {
7f9d2ee0 18789@@ -935,6 +942,7 @@ void resume_console(void)
e8d0ad9d 18790 console_suspended = 0;
18791 release_console_sem();
18792 }
18793+EXPORT_SYMBOL(resume_console);
e8d0ad9d 18794
18795 /**
ad8f4a28 18796 * acquire_console_sem - lock the console system for exclusive use.
4e97e4e9 18797diff --git a/kernel/timer.c b/kernel/timer.c
7f9d2ee0 18798index b024106..0ff93c9 100644
4e97e4e9 18799--- a/kernel/timer.c
18800+++ b/kernel/timer.c
ad8f4a28 18801@@ -37,6 +37,8 @@
4e97e4e9 18802 #include <linux/delay.h>
18803 #include <linux/tick.h>
18804 #include <linux/kallsyms.h>
18805+#include <linux/notifier.h>
18806+#include <linux/suspend.h>
18807
18808 #include <asm/uaccess.h>
18809 #include <asm/unistd.h>
7f9d2ee0 18810@@ -876,6 +878,59 @@ unsigned long avenrun[3];
24613191 18811
e8d0ad9d 18812 EXPORT_SYMBOL(avenrun);
18813
4e97e4e9 18814+#ifdef CONFIG_PM
e8d0ad9d 18815+static unsigned long avenrun_save[3];
18816+/*
18817+ * save_avenrun - Record the values prior to starting a hibernation cycle.
18818+ * We do this to make the work done in hibernation invisible to userspace
18819+ * post-suspend. Some programs, including some MTAs, watch the load average
18820+ * and stop work until it lowers. Without this, they would stop working for
18821+ * a while post-resume, unnecessarily.
18822+ */
18823+
4e97e4e9 18824+static void save_avenrun(void)
e8d0ad9d 18825+{
18826+ avenrun_save[0] = avenrun[0];
18827+ avenrun_save[1] = avenrun[1];
18828+ avenrun_save[2] = avenrun[2];
18829+}
18830+
4e97e4e9 18831+static void restore_avenrun(void)
e8d0ad9d 18832+{
18833+ if (!avenrun_save[0])
18834+ return;
18835+
18836+ avenrun[0] = avenrun_save[0];
18837+ avenrun[1] = avenrun_save[1];
18838+ avenrun[2] = avenrun_save[2];
18839+
18840+ avenrun_save[0] = 0;
18841+}
18842+
4e97e4e9 18843+static int avenrun_pm_callback(struct notifier_block *nfb,
18844+ unsigned long action,
18845+ void *ignored)
18846+{
18847+ switch (action) {
18848+ case PM_HIBERNATION_PREPARE:
18849+ save_avenrun();
18850+ return NOTIFY_OK;
18851+ case PM_POST_HIBERNATION:
18852+ restore_avenrun();
18853+ return NOTIFY_OK;
18854+ }
18855+
18856+ return NOTIFY_DONE;
18857+}
18858+
18859+static void register_pm_notifier_callback(void)
18860+{
18861+ pm_notifier(avenrun_pm_callback, 0);
18862+}
18863+#else
18864+static inline void register_pm_notifier_callback(void) { }
18865+#endif
e8d0ad9d 18866+
18867 /*
18868 * calc_load - given tick count, update the avenrun load estimates.
18869 * This is called while holding a write_lock on xtime_lock.
7f9d2ee0 18870@@ -1374,6 +1429,7 @@ void __init init_timers(void)
4e97e4e9 18871 BUG_ON(err == NOTIFY_BAD);
18872 register_cpu_notifier(&timers_nb);
18873 open_softirq(TIMER_SOFTIRQ, run_timer_softirq, NULL);
18874+ register_pm_notifier_callback();
18875 }
24613191 18876
4e97e4e9 18877 /**
18878diff --git a/lib/vsprintf.c b/lib/vsprintf.c
7f9d2ee0 18879index 6021757..457c957 100644
4e97e4e9 18880--- a/lib/vsprintf.c
18881+++ b/lib/vsprintf.c
7f9d2ee0 18882@@ -482,6 +482,29 @@ static char *number(char *buf, char *end, unsigned long long num, int base, int
4e97e4e9 18883 return buf;
18884 }
18885
18886+/*
18887+ * vsnprintf_used
18888+ *
ad8f4a28 18889+ * Functionality : Print a string with parameters to a buffer of a
4e97e4e9 18890+ * limited size. Unlike vsnprintf, we return the number
18891+ * of bytes actually put in the buffer, not the number
18892+ * that would have been put in if it was big enough.
18893+ */
18894+int snprintf_used(char *buffer, int buffer_size, const char *fmt, ...)
18895+{
18896+ int result;
18897+ va_list args;
18898+
18899+ if (!buffer_size)
18900+ return 0;
18901+
18902+ va_start(args, fmt);
18903+ result = vsnprintf(buffer, buffer_size, fmt, args);
18904+ va_end(args);
24613191 18905+
4e97e4e9 18906+ return result > buffer_size ? buffer_size : result;
18907+}
18908+
18909 /**
18910 * vsnprintf - Format a string and place it in a buffer
18911 * @buf: The buffer to place the result into
18912diff --git a/mm/Makefile b/mm/Makefile
7f9d2ee0 18913index a5b0dd9..ba0df45 100644
4e97e4e9 18914--- a/mm/Makefile
18915+++ b/mm/Makefile
18916@@ -11,7 +11,7 @@ obj-y := bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \
18917 page_alloc.o page-writeback.o pdflush.o \
18918 readahead.o swap.o truncate.o vmscan.o \
18919 prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
ad8f4a28
AM
18920- page_isolation.o $(mmu-y)
18921+ dyn_pageflags.o page_isolation.o $(mmu-y)
24613191 18922
7f9d2ee0 18923 obj-$(CONFIG_PROC_PAGE_MONITOR) += pagewalk.o
4e97e4e9 18924 obj-$(CONFIG_BOUNCE) += bounce.o
4e97e4e9 18925diff --git a/mm/dyn_pageflags.c b/mm/dyn_pageflags.c
18926new file mode 100644
ad8f4a28 18927index 0000000..30d95f0
4e97e4e9 18928--- /dev/null
18929+++ b/mm/dyn_pageflags.c
ad8f4a28 18930@@ -0,0 +1,801 @@
4e97e4e9 18931+/*
18932+ * lib/dyn_pageflags.c
18933+ *
18934+ * Copyright (C) 2004-2007 Nigel Cunningham <nigel at tuxonice net>
ad8f4a28 18935+ *
4e97e4e9 18936+ * This file is released under the GPLv2.
18937+ *
18938+ * Routines for dynamically allocating and releasing bitmaps
18939+ * used as pseudo-pageflags.
18940+ *
18941+ * We use bitmaps, built out of order zero allocations and
18942+ * linked together by kzalloc'd arrays of pointers into
18943+ * an array that looks like...
18944+ *
18945+ * pageflags->bitmap[node][zone_id][page_num][ul]
18946+ *
18947+ * All of this is transparent to the caller, who just uses
18948+ * the allocate & free routines to create/destroy bitmaps,
18949+ * and get/set/clear to operate on individual flags.
18950+ *
18951+ * Bitmaps can be sparse, with the individual pages only being
18952+ * allocated when a bit is set in the page.
18953+ *
18954+ * Memory hotplugging support is work in progress. A zone's
18955+ * start_pfn may change. If it does, we need to reallocate
18956+ * the zone bitmap, adding additional pages to the front to
18957+ * cover the bitmap. For simplicity, we don't shift the
18958+ * contents of existing pages around. The lock is only used
18959+ * to avoid reentrancy when resizing zones. The replacement
18960+ * of old data with new is done atomically. If we try to test
18961+ * a bit in the new area before the update is completed, we
18962+ * know it's zero.
18963+ *
18964+ * TuxOnIce knows the structure of these pageflags, so that
18965+ * it can serialise them in the image header. TODO: Make
18966+ * that support more generic so that TuxOnIce doesn't need
18967+ * to know how dyn_pageflags are stored.
18968+ */
18969+
18970+/* Avoid warnings in include/linux/mm.h */
18971+struct page;
18972+struct dyn_pageflags;
18973+int test_dynpageflag(struct dyn_pageflags *bitmap, struct page *page);
18974+
18975+#include <linux/bootmem.h>
18976+#include <linux/dyn_pageflags.h>
18977+#include <linux/module.h>
18978+
18979+static LIST_HEAD(flags_list);
18980+static DEFINE_SPINLOCK(flags_list_lock);
18981+
18982+static void* (*dyn_allocator)(unsigned long size, unsigned long flags);
18983+
ad8f4a28 18984+static int dyn_pageflags_debug;
4e97e4e9 18985+
18986+#define PR_DEBUG(a, b...) \
ad8f4a28 18987+ do { if (dyn_pageflags_debug) printk(a, ##b); } while (0)
4e97e4e9 18988+#define DUMP_DEBUG(bitmap) \
ad8f4a28 18989+ do { if (dyn_pageflags_debug) dump_pagemap(bitmap); } while (0)
4e97e4e9 18990+
18991+#if BITS_PER_LONG == 32
18992+#define UL_SHIFT 5
ad8f4a28 18993+#else
4e97e4e9 18994+#if BITS_PER_LONG == 64
18995+#define UL_SHIFT 6
18996+#else
18997+#error Bits per long not 32 or 64?
18998+#endif
18999+#endif
19000+
19001+#define BIT_NUM_MASK ((sizeof(unsigned long) << 3) - 1)
19002+#define PAGE_NUM_MASK (~((1 << (PAGE_SHIFT + 3)) - 1))
19003+#define UL_NUM_MASK (~(BIT_NUM_MASK | PAGE_NUM_MASK))
19004+
19005+/*
19006+ * PAGENUMBER gives the index of the page within the zone.
19007+ * PAGEINDEX gives the index of the unsigned long within that page.
19008+ * PAGEBIT gives the index of the bit within the unsigned long.
19009+ */
19010+#define PAGENUMBER(zone_offset) ((int) (zone_offset >> (PAGE_SHIFT + 3)))
19011+#define PAGEINDEX(zone_offset) ((int) ((zone_offset & UL_NUM_MASK) >> UL_SHIFT))
19012+#define PAGEBIT(zone_offset) ((int) (zone_offset & BIT_NUM_MASK))
19013+
19014+#define PAGE_UL_PTR(bitmap, node, zone_num, zone_pfn) \
19015+ ((bitmap[node][zone_num][PAGENUMBER(zone_pfn)])+PAGEINDEX(zone_pfn))
19016+
19017+#define pages_for_zone(zone) \
19018+ (DIV_ROUND_UP((zone)->spanned_pages, (PAGE_SIZE << 3)))
19019+
19020+#define pages_for_span(span) \
19021+ (DIV_ROUND_UP(span, PAGE_SIZE << 3))
19022+
19023+/* __maybe_unused for testing functions below */
19024+#define GET_BIT_AND_UL(pageflags, page) \
19025+ struct zone *zone = page_zone(page); \
19026+ unsigned long pfn = page_to_pfn(page); \
19027+ unsigned long zone_pfn = pfn - zone->zone_start_pfn; \
19028+ int node = page_to_nid(page); \
19029+ int zone_num = zone_idx(zone); \
19030+ int pagenum = PAGENUMBER(zone_pfn) + 2; \
19031+ int page_offset = PAGEINDEX(zone_pfn); \
19032+ unsigned long **zone_array = ((pageflags)->bitmap && \
19033+ (pageflags)->bitmap[node] && \
19034+ (pageflags)->bitmap[node][zone_num]) ? \
19035+ (pageflags)->bitmap[node][zone_num] : NULL; \
19036+ unsigned long __maybe_unused *ul = (zone_array && \
19037+ (unsigned long) zone_array[0] <= pfn && \
19038+ (unsigned long) zone_array[1] >= (pagenum-2) && \
19039+ zone_array[pagenum]) ? zone_array[pagenum] + page_offset : \
19040+ NULL; \
19041+ int bit __maybe_unused = PAGEBIT(zone_pfn);
19042+
19043+#define for_each_online_pgdat_zone(pgdat, zone_nr) \
19044+ for_each_online_pgdat(pgdat) \
19045+ for (zone_nr = 0; zone_nr < MAX_NR_ZONES; zone_nr++)
19046+
19047+/**
19048+ * dump_pagemap - Display the contents of a bitmap for debugging purposes.
19049+ *
19050+ * @pagemap: The array to be dumped.
19051+ */
19052+void dump_pagemap(struct dyn_pageflags *pagemap)
19053+{
ad8f4a28 19054+ int i = 0;
4e97e4e9 19055+ struct pglist_data *pgdat;
ad8f4a28 19056+ unsigned long ****bitmap = pagemap->bitmap;
4e97e4e9 19057+
19058+ printk(" --- Dump bitmap %p ---\n", pagemap);
19059+
ad8f4a28
AM
19060+ printk(KERN_INFO "%p: Sparse flag = %d\n",
19061+ &pagemap->sparse, pagemap->sparse);
19062+ printk(KERN_INFO "%p: Bitmap = %p\n",
19063+ &pagemap->bitmap, bitmap);
4e97e4e9 19064+
19065+ if (!bitmap)
19066+ goto out;
19067+
19068+ for_each_online_pgdat(pgdat) {
19069+ int node_id = pgdat->node_id, zone_nr;
ad8f4a28
AM
19070+ printk(KERN_INFO "%p: Node %d => %p\n",
19071+ &bitmap[node_id], node_id,
4e97e4e9 19072+ bitmap[node_id]);
19073+ if (!bitmap[node_id])
19074+ continue;
19075+ for (zone_nr = 0; zone_nr < MAX_NR_ZONES; zone_nr++) {
ad8f4a28 19076+ printk(KERN_INFO "%p: Zone %d => %p%s\n",
4e97e4e9 19077+ &bitmap[node_id][zone_nr], zone_nr,
19078+ bitmap[node_id][zone_nr],
19079+ bitmap[node_id][zone_nr] ? "" :
19080+ " (empty)");
19081+ if (!bitmap[node_id][zone_nr])
19082+ continue;
19083+
ad8f4a28 19084+ printk(KERN_INFO "%p: Zone start pfn = %p\n",
4e97e4e9 19085+ &bitmap[node_id][zone_nr][0],
19086+ bitmap[node_id][zone_nr][0]);
ad8f4a28 19087+ printk(KERN_INFO "%p: Number of pages = %p\n",
4e97e4e9 19088+ &bitmap[node_id][zone_nr][1],
19089+ bitmap[node_id][zone_nr][1]);
ad8f4a28
AM
19090+ for (i = 2; i < (unsigned long) bitmap[node_id]
19091+ [zone_nr][1] + 2; i++)
19092+ printk(KERN_INFO
19093+ "%p: Page %2d = %p\n",
4e97e4e9 19094+ &bitmap[node_id][zone_nr][i],
19095+ i - 2,
19096+ bitmap[node_id][zone_nr][i]);
19097+ }
19098+ }
19099+out:
ad8f4a28 19100+ printk(KERN_INFO " --- Dump of bitmap %p finishes\n", pagemap);
4e97e4e9 19101+}
4e97e4e9 19102+EXPORT_SYMBOL_GPL(dump_pagemap);
19103+
19104+/**
19105+ * clear_dyn_pageflags - Zero all pageflags in a bitmap.
19106+ *
19107+ * @pagemap: The array to be cleared.
19108+ *
19109+ * Clear an array used to store dynamically allocated pageflags.
19110+ */
19111+void clear_dyn_pageflags(struct dyn_pageflags *pagemap)
19112+{
ad8f4a28 19113+ int i = 0, zone_idx;
4e97e4e9 19114+ struct pglist_data *pgdat;
ad8f4a28 19115+ unsigned long ****bitmap = pagemap->bitmap;
4e97e4e9 19116+
19117+ for_each_online_pgdat_zone(pgdat, zone_idx) {
19118+ int node_id = pgdat->node_id;
19119+ struct zone *zone = &pgdat->node_zones[zone_idx];
19120+
19121+ if (!populated_zone(zone) ||
19122+ (!bitmap[node_id] || !bitmap[node_id][zone_idx]))
19123+ continue;
19124+
19125+ for (i = 2; i < pages_for_zone(zone) + 2; i++)
19126+ if (bitmap[node_id][zone_idx][i])
19127+ memset((bitmap[node_id][zone_idx][i]), 0,
19128+ PAGE_SIZE);
19129+ }
19130+}
4e97e4e9 19131+EXPORT_SYMBOL_GPL(clear_dyn_pageflags);
19132+
19133+/**
19134+ * Allocators.
19135+ *
19136+ * During boot time, we want to use alloc_bootmem_low. Afterwards, we want
19137+ * kzalloc. These routines let us do that without causing compile time warnings
19138+ * about mismatched sections, as would happen if we did a simple
19139+ * boot ? alloc_bootmem_low() : kzalloc() below.
19140+ */
19141+
19142+/**
19143+ * boot_time_allocator - Allocator used while booting.
19144+ *
19145+ * @size: Number of bytes wanted.
19146+ * @flags: Allocation flags (ignored here).
19147+ */
19148+static __init void *boot_time_allocator(unsigned long size, unsigned long flags)
19149+{
19150+ return alloc_bootmem_low(size);
19151+}
19152+
19153+/**
19154+ * normal_allocator - Allocator used post-boot.
19155+ *
19156+ * @size: Number of bytes wanted.
19157+ * @flags: Allocation flags.
19158+ *
19159+ * Allocate memory for our page flags.
19160+ */
19161+static void *normal_allocator(unsigned long size, unsigned long flags)
19162+{
19163+ if (size == PAGE_SIZE)
19164+ return (void *) get_zeroed_page(flags);
19165+ else
19166+ return kzalloc(size, flags);
19167+}
19168+
19169+/**
19170+ * dyn_pageflags_init - Do the earliest initialisation.
19171+ *
19172+ * Very early in the boot process, set our allocator (alloc_bootmem_low) and
19173+ * allocate bitmaps for slab and buddy pageflags.
19174+ */
19175+void __init dyn_pageflags_init(void)
19176+{
19177+ dyn_allocator = boot_time_allocator;
19178+}
19179+
19180+/**
19181+ * dyn_pageflags_use_kzalloc - Reset the allocator for normal use.
19182+ *
19183+ * Reset the allocator to our normal, post boot function.
19184+ */
19185+void __init dyn_pageflags_use_kzalloc(void)
19186+{
19187+ dyn_allocator = (void *) normal_allocator;
19188+}
19189+
19190+/**
19191+ * try_alloc_dyn_pageflag_part - Try to allocate a pointer array.
19192+ *
19193+ * Try to allocate a contiguous array of pointers.
19194+ */
19195+static int try_alloc_dyn_pageflag_part(int nr_ptrs, void **ptr)
19196+{
19197+ *ptr = (*dyn_allocator)(sizeof(void *) * nr_ptrs, GFP_ATOMIC);
19198+
19199+ if (*ptr)
19200+ return 0;
19201+
ad8f4a28
AM
19202+ printk(KERN_INFO
19203+ "Error. Unable to allocate memory for dynamic pageflags.");
4e97e4e9 19204+ return -ENOMEM;
19205+}
19206+
19207+static int populate_bitmap_page(struct dyn_pageflags *pageflags, int take_lock,
19208+ unsigned long **page_ptr)
19209+{
19210+ void *address;
19211+ unsigned long flags = 0;
19212+
19213+ if (take_lock)
19214+ spin_lock_irqsave(&pageflags->struct_lock, flags);
19215+
19216+ /*
19217+ * The page may have been allocated while we waited.
19218+ */
19219+ if (*page_ptr)
19220+ goto out;
19221+
19222+ address = (*dyn_allocator)(PAGE_SIZE, GFP_ATOMIC);
19223+
19224+ if (!address) {
19225+ PR_DEBUG("Error. Unable to allocate memory for "
19226+ "dynamic pageflags page.");
19227+ if (pageflags)
19228+ spin_unlock_irqrestore(&pageflags->struct_lock, flags);
19229+ return -ENOMEM;
19230+ }
19231+
19232+ *page_ptr = address;
19233+out:
19234+ if (take_lock)
19235+ spin_unlock_irqrestore(&pageflags->struct_lock, flags);
19236+ return 0;
19237+}
19238+
19239+/**
19240+ * resize_zone_bitmap - Resize the array of pages for a bitmap.
19241+ *
19242+ * Shrink or extend a list of pages for a zone in a bitmap, preserving
19243+ * existing data.
19244+ */
19245+static int resize_zone_bitmap(struct dyn_pageflags *pagemap, struct zone *zone,
19246+ unsigned long old_pages, unsigned long new_pages,
19247+ unsigned long copy_offset, int take_lock)
19248+{
19249+ unsigned long **new_ptr = NULL, ****bitmap = pagemap->bitmap;
19250+ int node_id = zone_to_nid(zone), zone_idx = zone_idx(zone),
19251+ to_copy = min(old_pages, new_pages), result = 0;
19252+ unsigned long **old_ptr = bitmap[node_id][zone_idx], i;
19253+
19254+ if (new_pages) {
19255+ if (try_alloc_dyn_pageflag_part(new_pages + 2,
19256+ (void **) &new_ptr))
19257+ return -ENOMEM;
19258+
19259+ if (old_pages)
19260+ memcpy(new_ptr + 2 + copy_offset, old_ptr + 2,
19261+ sizeof(unsigned long) * to_copy);
19262+
19263+ new_ptr[0] = (void *) zone->zone_start_pfn;
19264+ new_ptr[1] = (void *) new_pages;
19265+ }
19266+
19267+ /* Free/alloc bitmap pages. */
19268+ if (old_pages > new_pages) {
19269+ for (i = new_pages + 2; i < old_pages + 2; i++)
19270+ if (old_ptr[i])
19271+ free_page((unsigned long) old_ptr[i]);
19272+ } else if (!pagemap->sparse) {
19273+ for (i = old_pages + 2; i < new_pages + 2; i++)
19274+ if (populate_bitmap_page(NULL, take_lock,
19275+ (unsigned long **) &new_ptr[i])) {
19276+ result = -ENOMEM;
19277+ break;
19278+ }
19279+ }
19280+
19281+ bitmap[node_id][zone_idx] = new_ptr;
ad8f4a28 19282+ kfree(old_ptr);
4e97e4e9 19283+ return result;
19284+}
24613191 19285+
4e97e4e9 19286+/**
19287+ * check_dyn_pageflag_range - Resize a section of a dyn_pageflag array.
24613191 19288+ *
4e97e4e9 19289+ * @pagemap: The array to be worked on.
19290+ * @zone: The zone to get in sync with reality.
24613191 19291+ *
4e97e4e9 19292+ * Check the pagemap has correct allocations for the zone. This can be
19293+ * invoked when allocating a new bitmap, or for hot[un]plug, and so
19294+ * must deal with any disparities between zone_start_pfn/spanned_pages
19295+ * and what we have allocated. In addition, we must deal with the possibility
19296+ * of zone_start_pfn having changed.
24613191 19297+ */
4e97e4e9 19298+int check_dyn_pageflag_zone(struct dyn_pageflags *pagemap, struct zone *zone,
19299+ int force_free_all, int take_lock)
19300+{
19301+ int node_id = zone_to_nid(zone), zone_idx = zone_idx(zone);
19302+ unsigned long copy_offset = 0, old_pages, new_pages;
19303+ unsigned long **old_ptr = pagemap->bitmap[node_id][zone_idx];
24613191 19304+
4e97e4e9 19305+ old_pages = old_ptr ? (unsigned long) old_ptr[1] : 0;
19306+ new_pages = force_free_all ? 0 : pages_for_span(zone->spanned_pages);
24613191 19307+
4e97e4e9 19308+ if (old_pages == new_pages &&
19309+ (!old_pages || (unsigned long) old_ptr[0] == zone->zone_start_pfn))
19310+ return 0;
24613191 19311+
4e97e4e9 19312+ if (old_pages &&
19313+ (unsigned long) old_ptr[0] != zone->zone_start_pfn)
19314+ copy_offset = pages_for_span((unsigned long) old_ptr[0] -
19315+ zone->zone_start_pfn);
24613191 19316+
4e97e4e9 19317+ /* New/expanded zone? */
19318+ return resize_zone_bitmap(pagemap, zone, old_pages, new_pages,
19319+ copy_offset, take_lock);
19320+}
19321+
19322+#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
19323+/**
19324+ * dyn_pageflags_hotplug - Add pages to bitmaps for hotplugged memory.
24613191 19325+ *
4e97e4e9 19326+ * Seek to expand bitmaps for hotplugged memory. We ignore any failure.
19327+ * Since we handle sparse bitmaps anyway, they'll be automatically
19328+ * populated as needed.
24613191 19329+ */
4e97e4e9 19330+void dyn_pageflags_hotplug(struct zone *zone)
24613191 19331+{
4e97e4e9 19332+ struct dyn_pageflags *this;
24613191 19333+
4e97e4e9 19334+ list_for_each_entry(this, &flags_list, list)
19335+ check_dyn_pageflag_zone(this, zone, 0, 1);
24613191 19336+}
4e97e4e9 19337+#endif
24613191 19338+
4e97e4e9 19339+/**
19340+ * free_dyn_pageflags - Free an array of dynamically allocated pageflags.
24613191 19341+ *
4e97e4e9 19342+ * @pagemap: The array to be freed.
24613191 19343+ *
4e97e4e9 19344+ * Free a dynamically allocated pageflags bitmap.
24613191 19345+ */
4e97e4e9 19346+void free_dyn_pageflags(struct dyn_pageflags *pagemap)
24613191 19347+{
4e97e4e9 19348+ int zone_idx;
24613191 19349+ struct pglist_data *pgdat;
4e97e4e9 19350+ unsigned long flags;
24613191 19351+
4e97e4e9 19352+ DUMP_DEBUG(pagemap);
19353+
19354+ if (!pagemap->bitmap)
24613191 19355+ return;
ad8f4a28 19356+
4e97e4e9 19357+ for_each_online_pgdat_zone(pgdat, zone_idx)
19358+ check_dyn_pageflag_zone(pagemap,
19359+ &pgdat->node_zones[zone_idx], 1, 1);
24613191 19360+
19361+ for_each_online_pgdat(pgdat) {
4e97e4e9 19362+ int i = pgdat->node_id;
24613191 19363+
4e97e4e9 19364+ if (pagemap->bitmap[i])
19365+ kfree((pagemap->bitmap)[i]);
19366+ }
24613191 19367+
4e97e4e9 19368+ kfree(pagemap->bitmap);
19369+ pagemap->bitmap = NULL;
24613191 19370+
4e97e4e9 19371+ pagemap->initialised = 0;
24613191 19372+
4e97e4e9 19373+ if (!pagemap->sparse) {
19374+ spin_lock_irqsave(&flags_list_lock, flags);
19375+ list_del_init(&pagemap->list);
19376+ pagemap->sparse = 1;
19377+ spin_unlock_irqrestore(&flags_list_lock, flags);
24613191 19378+ }
24613191 19379+}
4e97e4e9 19380+EXPORT_SYMBOL_GPL(free_dyn_pageflags);
24613191 19381+
4e97e4e9 19382+/**
19383+ * allocate_dyn_pageflags - Allocate a bitmap.
19384+ *
19385+ * @pagemap: The bitmap we want to allocate.
19386+ * @sparse: Whether to make the array sparse.
24613191 19387+ *
4e97e4e9 19388+ * The array we're preparing. If sparse, we don't allocate the actual
19389+ * pages until they're needed. If not sparse, we add the bitmap to the
19390+ * list so that if we're supporting memory hotplugging, we can allocate
19391+ * new pages on hotplug events.
24613191 19392+ *
4e97e4e9 19393+ * This routine may be called directly, or indirectly when the first bit
19394+ * needs to be set on a previously unused bitmap.
24613191 19395+ */
4e97e4e9 19396+int allocate_dyn_pageflags(struct dyn_pageflags *pagemap, int sparse)
24613191 19397+{
4e97e4e9 19398+ int zone_idx, result = -ENOMEM;
24613191 19399+ struct zone *zone;
19400+ struct pglist_data *pgdat;
4e97e4e9 19401+ unsigned long flags;
24613191 19402+
4e97e4e9 19403+ if (!sparse && (pagemap->sparse || !pagemap->initialised)) {
19404+ spin_lock_irqsave(&flags_list_lock, flags);
19405+ list_add(&pagemap->list, &flags_list);
19406+ spin_unlock_irqrestore(&flags_list_lock, flags);
24613191 19407+ }
19408+
4e97e4e9 19409+ spin_lock_irqsave(&pagemap->struct_lock, flags);
24613191 19410+
4e97e4e9 19411+ pagemap->initialised = 1;
19412+ pagemap->sparse = sparse;
24613191 19413+
4e97e4e9 19414+ if (!pagemap->bitmap && try_alloc_dyn_pageflag_part((1 << NODES_WIDTH),
19415+ (void **) &pagemap->bitmap))
19416+ goto out;
24613191 19417+
19418+ for_each_online_pgdat(pgdat) {
4e97e4e9 19419+ int node_id = pgdat->node_id;
24613191 19420+
4e97e4e9 19421+ if (!pagemap->bitmap[node_id] &&
19422+ try_alloc_dyn_pageflag_part(MAX_NR_ZONES,
19423+ (void **) &(pagemap->bitmap)[node_id]))
19424+ goto out;
24613191 19425+
19426+ for (zone_idx = 0; zone_idx < MAX_NR_ZONES; zone_idx++) {
24613191 19427+ zone = &pgdat->node_zones[zone_idx];
19428+
4e97e4e9 19429+ if (populated_zone(zone) &&
19430+ check_dyn_pageflag_zone(pagemap, zone, 0, 0))
19431+ goto out;
24613191 19432+ }
24613191 19433+ }
19434+
4e97e4e9 19435+ result = 0;
19436+
19437+out:
19438+ spin_unlock_irqrestore(&pagemap->struct_lock, flags);
19439+ return result;
24613191 19440+}
4e97e4e9 19441+EXPORT_SYMBOL_GPL(allocate_dyn_pageflags);
24613191 19442+
4e97e4e9 19443+/**
19444+ * test_dynpageflag - Test a page in a bitmap.
24613191 19445+ *
4e97e4e9 19446+ * @bitmap: The bitmap we're checking.
19447+ * @page: The page for which we want to test the matching bit.
24613191 19448+ *
4e97e4e9 19449+ * Test whether the bit is on in the array. The array may be sparse,
19450+ * in which case the result is zero.
24613191 19451+ */
4e97e4e9 19452+int test_dynpageflag(struct dyn_pageflags *bitmap, struct page *page)
24613191 19453+{
19454+ GET_BIT_AND_UL(bitmap, page);
4e97e4e9 19455+ return ul ? test_bit(bit, ul) : 0;
24613191 19456+}
4e97e4e9 19457+EXPORT_SYMBOL_GPL(test_dynpageflag);
19458+
19459+/**
19460+ * set_dynpageflag - Set a bit in a bitmap.
24613191 19461+ *
4e97e4e9 19462+ * @bitmap: The bitmap we're operating on.
19463+ * @page: The page for which we want to set the matching bit.
24613191 19464+ *
4e97e4e9 19465+ * Set the associated bit in the array. If the array is sparse, we
19466+ * seek to allocate the missing page.
24613191 19467+ */
4e97e4e9 19468+void set_dynpageflag(struct dyn_pageflags *pageflags, struct page *page)
19469+{
19470+ GET_BIT_AND_UL(pageflags, page);
19471+
ad8f4a28
AM
19472+ if (!ul) {
19473+ /*
19474+ * Sparse, hotplugged or unprepared.
19475+ * Allocate / fill gaps in high levels
19476+ */
4e97e4e9 19477+ if (allocate_dyn_pageflags(pageflags, 1) ||
19478+ populate_bitmap_page(pageflags, 1, (unsigned long **)
19479+ &pageflags->bitmap[node][zone_num][pagenum])) {
19480+ printk(KERN_EMERG "Failed to allocate storage in a "
19481+ "sparse bitmap.\n");
19482+ dump_pagemap(pageflags);
19483+ BUG();
19484+ }
19485+ set_dynpageflag(pageflags, page);
19486+ } else
19487+ set_bit(bit, ul);
24613191 19488+}
4e97e4e9 19489+EXPORT_SYMBOL_GPL(set_dynpageflag);
19490+
19491+/**
19492+ * clear_dynpageflag - Clear a bit in a bitmap.
24613191 19493+ *
4e97e4e9 19494+ * @bitmap: The bitmap we're operating on.
19495+ * @page: The page for which we want to clear the matching bit.
24613191 19496+ *
4e97e4e9 19497+ * Clear the associated bit in the array. It is not an error to be asked
19498+ * to clear a bit on a page we haven't allocated.
24613191 19499+ */
4e97e4e9 19500+void clear_dynpageflag(struct dyn_pageflags *bitmap, struct page *page)
24613191 19501+{
19502+ GET_BIT_AND_UL(bitmap, page);
4e97e4e9 19503+ if (ul)
19504+ clear_bit(bit, ul);
24613191 19505+}
4e97e4e9 19506+EXPORT_SYMBOL_GPL(clear_dynpageflag);
19507+
19508+/**
19509+ * get_next_bit_on - Get the next bit in a bitmap.
24613191 19510+ *
4e97e4e9 19511+ * @pageflags: The bitmap we're searching.
19512+ * @counter: The previous pfn. We always return a value > this.
24613191 19513+ *
4e97e4e9 19514+ * Given a pfn (possibly max_pfn+1), find the next pfn in the bitmap that
19515+ * is set. If there are no more flags set, return max_pfn+1.
24613191 19516+ */
4e97e4e9 19517+unsigned long get_next_bit_on(struct dyn_pageflags *pageflags,
19518+ unsigned long counter)
24613191 19519+{
19520+ struct page *page;
19521+ struct zone *zone;
19522+ unsigned long *ul = NULL;
19523+ unsigned long zone_offset;
19524+ int pagebit, zone_num, first = (counter == (max_pfn + 1)), node;
19525+
19526+ if (first)
19527+ counter = first_online_pgdat()->node_zones->zone_start_pfn;
19528+
19529+ page = pfn_to_page(counter);
19530+ zone = page_zone(page);
19531+ node = zone->zone_pgdat->node_id;
19532+ zone_num = zone_idx(zone);
19533+ zone_offset = counter - zone->zone_start_pfn;
19534+
19535+ if (first)
19536+ goto test;
19537+
19538+ do {
19539+ zone_offset++;
ad8f4a28 19540+
24613191 19541+ if (zone_offset >= zone->spanned_pages) {
19542+ do {
19543+ zone = next_zone(zone);
19544+ if (!zone)
19545+ return max_pfn + 1;
ad8f4a28
AM
19546+ } while (!zone->spanned_pages);
19547+
24613191 19548+ zone_num = zone_idx(zone);
19549+ node = zone->zone_pgdat->node_id;
19550+ zone_offset = 0;
19551+ }
19552+test:
19553+ pagebit = PAGEBIT(zone_offset);
19554+
4e97e4e9 19555+ if (!pagebit || !ul) {
ad8f4a28
AM
19556+ ul = pageflags->bitmap[node][zone_num]
19557+ [PAGENUMBER(zone_offset)+2];
4e97e4e9 19558+ if (ul)
ad8f4a28 19559+ ul += PAGEINDEX(zone_offset);
4e97e4e9 19560+ else {
19561+ PR_DEBUG("Unallocated page. Skipping from zone"
19562+ " offset %lu to the start of the next "
19563+ "one.\n", zone_offset);
19564+ zone_offset = roundup(zone_offset + 1,
19565+ PAGE_SIZE << 3) - 1;
ad8f4a28
AM
19566+ PR_DEBUG("New zone offset is %lu.\n",
19567+ zone_offset);
4e97e4e9 19568+ continue;
19569+ }
19570+ }
24613191 19571+
4e97e4e9 19572+ if (!ul || !(*ul & ~((1 << pagebit) - 1))) {
24613191 19573+ zone_offset += BITS_PER_LONG - pagebit - 1;
19574+ continue;
19575+ }
19576+
ad8f4a28 19577+ } while (!ul || !test_bit(pagebit, ul));
24613191 19578+
19579+ return zone->zone_start_pfn + zone_offset;
19580+}
4e97e4e9 19581+EXPORT_SYMBOL_GPL(get_next_bit_on);
19582+
19583+#ifdef SELF_TEST
19584+#include <linux/jiffies.h>
19585+
19586+static __init int dyn_pageflags_test(void)
24613191 19587+{
4e97e4e9 19588+ struct dyn_pageflags test_map;
19589+ struct page *test_page1 = pfn_to_page(1);
19590+ unsigned long pfn = 0, start, end;
19591+ int i, iterations;
24613191 19592+
4e97e4e9 19593+ memset(&test_map, 0, sizeof(test_map));
24613191 19594+
4e97e4e9 19595+ printk("Dynpageflags testing...\n");
24613191 19596+
ad8f4a28 19597+ printk(KERN_INFO "Set page 1...");
4e97e4e9 19598+ set_dynpageflag(&test_map, test_page1);
19599+ if (test_dynpageflag(&test_map, test_page1))
ad8f4a28 19600+ printk(KERN_INFO "Ok.\n");
4e97e4e9 19601+ else
ad8f4a28 19602+ printk(KERN_INFO "FAILED.\n");
4e97e4e9 19603+
ad8f4a28 19604+ printk(KERN_INFO "Test memory hotplugging #1 ...");
4e97e4e9 19605+ {
19606+ unsigned long orig_size;
19607+ GET_BIT_AND_UL(&test_map, test_page1);
19608+ orig_size = (unsigned long) test_map.bitmap[node][zone_num][1];
19609+ /*
19610+ * Use the code triggered when zone_start_pfn lowers,
19611+ * checking that our bit is then set in the third page.
19612+ */
ad8f4a28
AM
19613+ resize_zone_bitmap(&test_map, zone, orig_size,
19614+ orig_size + 2, 2);
4e97e4e9 19615+ DUMP_DEBUG(&test_map);
ad8f4a28
AM
19616+ if ((unsigned long) test_map.bitmap[node][zone_num]
19617+ [pagenum + 2] &&
19618+ (unsigned long) test_map.bitmap[node][zone_num]
19619+ [pagenum + 2][0] == 2UL)
19620+ printk(KERN_INFO "Ok.\n");
4e97e4e9 19621+ else
ad8f4a28 19622+ printk(KERN_INFO "FAILED.\n");
4e97e4e9 19623+ }
19624+
ad8f4a28 19625+ printk(KERN_INFO "Test memory hotplugging #2 ...");
4e97e4e9 19626+ {
19627+ /*
19628+ * Test expanding bitmap length.
19629+ */
19630+ unsigned long orig_size;
19631+ GET_BIT_AND_UL(&test_map, test_page1);
ad8f4a28
AM
19632+ orig_size = (unsigned long) test_map.bitmap[node]
19633+ [zone_num][1];
19634+ resize_zone_bitmap(&test_map, zone, orig_size,
19635+ orig_size + 2, 0);
4e97e4e9 19636+ DUMP_DEBUG(&test_map);
19637+ pagenum += 2; /* Offset for first test */
19638+ if (test_map.bitmap[node][zone_num][pagenum] &&
19639+ test_map.bitmap[node][zone_num][pagenum][0] == 2UL &&
19640+ (unsigned long) test_map.bitmap[node][zone_num][1] ==
ad8f4a28
AM
19641+ orig_size + 2)
19642+ printk(KERN_INFO "Ok.\n");
4e97e4e9 19643+ else
ad8f4a28
AM
19644+ printk(KERN_INFO "FAILED ([%d][%d][%d]: %p && %lu == "
19645+ "2UL && %p == %lu).\n",
19646+ node, zone_num, pagenum,
19647+ test_map.bitmap[node][zone_num][pagenum],
19648+ test_map.bitmap[node][zone_num][pagenum] ?
19649+ test_map.bitmap[node][zone_num][pagenum][0] : 0,
19650+ test_map.bitmap[node][zone_num][1],
19651+ orig_size + 2);
4e97e4e9 19652+ }
19653+
19654+ free_dyn_pageflags(&test_map);
19655+
19656+ allocate_dyn_pageflags(&test_map, 0);
19657+
19658+ start = jiffies;
19659+
19660+ iterations = 25000000 / max_pfn;
19661+
19662+ for (i = 0; i < iterations; i++) {
19663+ for (pfn = 0; pfn < max_pfn; pfn++)
19664+ set_dynpageflag(&test_map, pfn_to_page(pfn));
19665+ for (pfn = 0; pfn < max_pfn; pfn++)
19666+ clear_dynpageflag(&test_map, pfn_to_page(pfn));
19667+ }
19668+
19669+ end = jiffies;
19670+
19671+ free_dyn_pageflags(&test_map);
19672+
ad8f4a28
AM
19673+ printk(KERN_INFO "Dyn: %d iterations of setting & clearing all %lu "
19674+ "flags took %lu jiffies.\n",
4e97e4e9 19675+ iterations, max_pfn, end - start);
19676+
19677+ start = jiffies;
19678+
19679+ for (i = 0; i < iterations; i++) {
19680+ for (pfn = 0; pfn < max_pfn; pfn++)
19681+ set_bit(7, &(pfn_to_page(pfn))->flags);
19682+ for (pfn = 0; pfn < max_pfn; pfn++)
19683+ clear_bit(7, &(pfn_to_page(pfn))->flags);
19684+ }
19685+
19686+ end = jiffies;
19687+
ad8f4a28
AM
19688+ printk(KERN_INFO "Real flags: %d iterations of setting & clearing "
19689+ "all %lu flags took %lu jiffies.\n",
4e97e4e9 19690+ iterations, max_pfn, end - start);
19691+
19692+ iterations = 25000000;
19693+
19694+ start = jiffies;
19695+
19696+ for (i = 0; i < iterations; i++) {
19697+ set_dynpageflag(&test_map, pfn_to_page(1));
19698+ clear_dynpageflag(&test_map, pfn_to_page(1));
19699+ }
19700+
19701+ end = jiffies;
19702+
ad8f4a28
AM
19703+ printk(KERN_INFO "Dyn: %d iterations of setting & clearing all one "
19704+ "flag took %lu jiffies.\n", iterations, end - start);
4e97e4e9 19705+
19706+ start = jiffies;
19707+
19708+ for (i = 0; i < iterations; i++) {
19709+ set_bit(7, &(pfn_to_page(1))->flags);
19710+ clear_bit(7, &(pfn_to_page(1))->flags);
19711+ }
19712+
19713+ end = jiffies;
19714+
ad8f4a28
AM
19715+ printk(KERN_INFO "Real pageflag: %d iterations of setting & clearing "
19716+ "all one flag took %lu jiffies.\n",
4e97e4e9 19717+ iterations, end - start);
19718+ return 0;
24613191 19719+}
19720+
4e97e4e9 19721+late_initcall(dyn_pageflags_test);
19722+#endif
19723+
19724+static int __init dyn_pageflags_debug_setup(char *str)
19725+{
ad8f4a28 19726+ printk(KERN_INFO "Dynamic pageflags debugging enabled.\n");
4e97e4e9 19727+ dyn_pageflags_debug = 1;
19728+ return 1;
19729+}
19730+
19731+__setup("dyn_pageflags_debug", dyn_pageflags_debug_setup);
19732diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
7f9d2ee0 19733index 7469c50..280a8b8 100644
4e97e4e9 19734--- a/mm/memory_hotplug.c
19735+++ b/mm/memory_hotplug.c
ad8f4a28 19736@@ -77,6 +77,8 @@ static int __add_zone(struct zone *zone, unsigned long phys_start_pfn)
4e97e4e9 19737 }
19738 memmap_init_zone(nr_pages, nid, zone_type,
19739 phys_start_pfn, MEMMAP_HOTPLUG);
19740+
19741+ dyn_pageflags_hotplug(zone);
19742 return 0;
19743 }
19744
ad8f4a28 19745diff --git a/mm/page_alloc.c b/mm/page_alloc.c
7f9d2ee0 19746index 402a504..7dd7430 100644
ad8f4a28
AM
19747--- a/mm/page_alloc.c
19748+++ b/mm/page_alloc.c
7f9d2ee0 19749@@ -1730,6 +1730,26 @@ static unsigned int nr_free_zone_pages(int offset)
ad8f4a28
AM
19750 return sum;
19751 }
19752
19753+static unsigned int nr_unallocated_zone_pages(int offset)
19754+{
19755+ /* Just pick one node, since fallback list is circular */
19756+ pg_data_t *pgdat = NODE_DATA(numa_node_id());
19757+ unsigned int sum = 0;
19758+
19759+ struct zonelist *zonelist = pgdat->node_zonelists + offset;
19760+ struct zone **zonep = zonelist->zones;
19761+ struct zone *zone;
19762+
19763+ for (zone = *zonep++; zone; zone = *zonep++) {
19764+ unsigned long high = zone->pages_high;
19765+ unsigned long left = zone_page_state(zone, NR_FREE_PAGES);
19766+ if (left > high)
19767+ sum += left - high;
19768+ }
19769+
19770+ return sum;
19771+}
19772+
19773 /*
19774 * Amount of free RAM allocatable within ZONE_DMA and ZONE_NORMAL
19775 */
7f9d2ee0 19776@@ -1740,6 +1760,15 @@ unsigned int nr_free_buffer_pages(void)
ad8f4a28
AM
19777 EXPORT_SYMBOL_GPL(nr_free_buffer_pages);
19778
19779 /*
19780+ * Amount of free RAM allocatable within ZONE_DMA and ZONE_NORMAL
19781+ */
19782+unsigned int nr_unallocated_buffer_pages(void)
19783+{
19784+ return nr_unallocated_zone_pages(gfp_zone(GFP_USER));
19785+}
19786+EXPORT_SYMBOL_GPL(nr_unallocated_buffer_pages);
19787+
19788+/*
19789 * Amount of free RAM allocatable within all zones
19790 */
19791 unsigned int nr_free_pagecache_pages(void)
4e97e4e9 19792diff --git a/mm/vmscan.c b/mm/vmscan.c
7f9d2ee0 19793index 4046434..718226e 100644
4e97e4e9 19794--- a/mm/vmscan.c
19795+++ b/mm/vmscan.c
7f9d2ee0 19796@@ -776,6 +776,28 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
24613191 19797 return nr_taken;
19798 }
19799
19800+/* return_lru_pages puts a list of pages back on a zone's lru lists. */
19801+
19802+static void return_lru_pages(struct list_head *page_list, struct zone *zone,
19803+ struct pagevec *pvec)
19804+{
19805+ while (!list_empty(page_list)) {
19806+ struct page *page = lru_to_page(page_list);
19807+ VM_BUG_ON(PageLRU(page));
19808+ SetPageLRU(page);
19809+ list_del(&page->lru);
19810+ if (PageActive(page))
19811+ add_page_to_active_list(zone, page);
19812+ else
19813+ add_page_to_inactive_list(zone, page);
19814+ if (!pagevec_add(pvec, page)) {
19815+ spin_unlock_irq(&zone->lru_lock);
19816+ __pagevec_release(pvec);
19817+ spin_lock_irq(&zone->lru_lock);
19818+ }
19819+ }
19820+}
19821+
7f9d2ee0 19822 static unsigned long isolate_pages_global(unsigned long nr,
19823 struct list_head *dst,
19824 unsigned long *scanned, int order,
19825@@ -826,7 +848,6 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
24613191 19826 lru_add_drain();
19827 spin_lock_irq(&zone->lru_lock);
19828 do {
19829- struct page *page;
19830 unsigned long nr_taken;
19831 unsigned long nr_scan;
19832 unsigned long nr_freed;
7f9d2ee0 19833@@ -888,21 +909,7 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
24613191 19834 /*
19835 * Put back any unfreeable pages.
19836 */
19837- while (!list_empty(&page_list)) {
19838- page = lru_to_page(&page_list);
19839- VM_BUG_ON(PageLRU(page));
19840- SetPageLRU(page);
19841- list_del(&page->lru);
19842- if (PageActive(page))
19843- add_page_to_active_list(zone, page);
19844- else
19845- add_page_to_inactive_list(zone, page);
19846- if (!pagevec_add(&pvec, page)) {
19847- spin_unlock_irq(&zone->lru_lock);
19848- __pagevec_release(&pvec);
19849- spin_lock_irq(&zone->lru_lock);
19850- }
19851- }
19852+ return_lru_pages(&page_list, zone, &pvec);
19853 } while (nr_scanned < max_scan);
19854 spin_unlock(&zone->lru_lock);
19855 done:
7f9d2ee0 19856@@ -1625,6 +1632,72 @@ out:
24613191 19857 return nr_reclaimed;
19858 }
19859
19860+struct lru_save {
19861+ struct zone *zone;
19862+ struct list_head active_list;
19863+ struct list_head inactive_list;
19864+ struct lru_save *next;
19865+};
19866+
19867+struct lru_save *lru_save_list;
19868+
19869+void unlink_lru_lists(void)
19870+{
19871+ struct zone *zone;
19872+
19873+ for_each_zone(zone) {
19874+ struct lru_save *this;
19875+ unsigned long moved, scanned;
19876+
19877+ if (!zone->spanned_pages)
19878+ continue;
19879+
19880+ this = (struct lru_save *)
19881+ kzalloc(sizeof(struct lru_save), GFP_ATOMIC);
19882+
19883+ BUG_ON(!this);
19884+
19885+ this->next = lru_save_list;
19886+ lru_save_list = this;
ad8f4a28 19887+
24613191 19888+ this->zone = zone;
19889+
19890+ spin_lock_irq(&zone->lru_lock);
19891+ INIT_LIST_HEAD(&this->active_list);
19892+ INIT_LIST_HEAD(&this->inactive_list);
73c609d5 19893+ moved = isolate_lru_pages(zone_page_state(zone, NR_ACTIVE),
24613191 19894+ &zone->active_list, &this->active_list,
4e97e4e9 19895+ &scanned, 0, ISOLATE_BOTH);
73c609d5 19896+ __mod_zone_page_state(zone, NR_ACTIVE, -moved);
19897+ moved = isolate_lru_pages(zone_page_state(zone, NR_INACTIVE),
24613191 19898+ &zone->inactive_list, &this->inactive_list,
4e97e4e9 19899+ &scanned, 0, ISOLATE_BOTH);
73c609d5 19900+ __mod_zone_page_state(zone, NR_INACTIVE, -moved);
24613191 19901+ spin_unlock_irq(&zone->lru_lock);
19902+ }
19903+}
19904+
19905+void relink_lru_lists(void)
19906+{
ad8f4a28 19907+ while (lru_save_list) {
24613191 19908+ struct lru_save *this = lru_save_list;
19909+ struct zone *zone = this->zone;
19910+ struct pagevec pvec;
19911+
19912+ pagevec_init(&pvec, 1);
19913+
19914+ lru_save_list = this->next;
19915+
19916+ spin_lock_irq(&zone->lru_lock);
19917+ return_lru_pages(&this->active_list, zone, &pvec);
19918+ return_lru_pages(&this->inactive_list, zone, &pvec);
19919+ spin_unlock_irq(&zone->lru_lock);
19920+ pagevec_release(&pvec);
19921+
19922+ kfree(this);
19923+ }
19924+}
19925+
19926 /*
19927 * The background pageout daemon, started as a kernel thread
19928 * from the init process.
7f9d2ee0 19929@@ -1710,6 +1783,9 @@ void wakeup_kswapd(struct zone *zone, int order)
e8d0ad9d 19930 if (!populated_zone(zone))
19931 return;
19932
19933+ if (freezer_is_on())
19934+ return;
19935+
19936 pgdat = zone->zone_pgdat;
19937 if (zone_watermark_ok(zone, order, zone->pages_low, 0, 0))
19938 return;
7f9d2ee0 19939@@ -1723,6 +1799,109 @@ void wakeup_kswapd(struct zone *zone, int order)
24613191 19940 }
19941
19942 #ifdef CONFIG_PM
ad8f4a28
AM
19943+static unsigned long shrink_ps1_zone(struct zone *zone,
19944+ unsigned long total_to_free, struct scan_control sc)
24613191 19945+{
4e97e4e9 19946+ unsigned long freed = 0;
24613191 19947+
4e97e4e9 19948+ while (total_to_free > freed) {
19949+ unsigned long nr_slab = global_page_state(NR_SLAB_RECLAIMABLE);
19950+ struct reclaim_state reclaim_state;
24613191 19951+
4e97e4e9 19952+ if (nr_slab > total_to_free)
19953+ nr_slab = total_to_free;
24613191 19954+
4e97e4e9 19955+ reclaim_state.reclaimed_slab = 0;
19956+ shrink_slab(nr_slab, sc.gfp_mask, nr_slab);
19957+ if (!reclaim_state.reclaimed_slab)
19958+ return freed;
19959+
19960+ freed += reclaim_state.reclaimed_slab;
19961+ }
19962+
19963+ return freed;
19964+}
19965+
19966+unsigned long shrink_ps2_zone(struct zone *zone, unsigned long total_to_free,
19967+ struct scan_control sc)
19968+{
19969+ int prio;
19970+ unsigned long freed = 0;
ad8f4a28 19971+ if (!populated_zone(zone) || zone_is_all_unreclaimable(zone))
4e97e4e9 19972+ return 0;
24613191 19973+
19974+ for (prio = DEF_PRIORITY; prio >= 0; prio--) {
73c609d5 19975+ unsigned long to_free, just_freed, orig_size;
19976+ unsigned long old_nr_active;
24613191 19977+
73c609d5 19978+ to_free = min(zone_page_state(zone, NR_ACTIVE) +
19979+ zone_page_state(zone, NR_INACTIVE),
4e97e4e9 19980+ total_to_free - freed);
24613191 19981+
19982+ if (to_free <= 0)
4e97e4e9 19983+ return freed;
24613191 19984+
73c609d5 19985+ sc.swap_cluster_max = to_free -
19986+ zone_page_state(zone, NR_INACTIVE);
24613191 19987+
73c609d5 19988+ do {
19989+ old_nr_active = zone_page_state(zone, NR_ACTIVE);
24613191 19990+ zone->nr_scan_active = sc.swap_cluster_max - 1;
19991+ shrink_active_list(sc.swap_cluster_max, zone, &sc,
19992+ prio);
19993+ zone->nr_scan_active = 0;
24613191 19994+
73c609d5 19995+ sc.swap_cluster_max = to_free - zone_page_state(zone,
19996+ NR_INACTIVE);
24613191 19997+
73c609d5 19998+ } while (sc.swap_cluster_max > 0 &&
19999+ zone_page_state(zone, NR_ACTIVE) > old_nr_active);
20000+
20001+ to_free = min(zone_page_state(zone, NR_ACTIVE) +
20002+ zone_page_state(zone, NR_INACTIVE),
4e97e4e9 20003+ total_to_free - freed);
ad8f4a28 20004+
73c609d5 20005+ do {
20006+ orig_size = zone_page_state(zone, NR_ACTIVE) +
20007+ zone_page_state(zone, NR_INACTIVE);
24613191 20008+ zone->nr_scan_inactive = to_free;
20009+ sc.swap_cluster_max = to_free;
73c609d5 20010+ shrink_inactive_list(to_free, zone, &sc);
20011+ just_freed = (orig_size -
20012+ (zone_page_state(zone, NR_ACTIVE) +
20013+ zone_page_state(zone, NR_INACTIVE)));
24613191 20014+ zone->nr_scan_inactive = 0;
4e97e4e9 20015+ freed += just_freed;
20016+ } while (just_freed > 0 && freed < total_to_free);
20017+ }
24613191 20018+
4e97e4e9 20019+ return freed;
20020+}
24613191 20021+
ad8f4a28
AM
20022+void shrink_one_zone(struct zone *zone, unsigned long total_to_free,
20023+ int ps_wanted)
4e97e4e9 20024+{
20025+ unsigned long freed = 0;
20026+ struct scan_control sc = {
20027+ .gfp_mask = GFP_KERNEL,
20028+ .may_swap = 0,
20029+ .may_writepage = 1,
20030+ .swappiness = vm_swappiness,
7f9d2ee0 20031+ .isolate_pages = isolate_pages_global,
4e97e4e9 20032+ };
24613191 20033+
4e97e4e9 20034+ if (total_to_free <= 0)
20035+ return;
24613191 20036+
4e97e4e9 20037+ if (is_highmem(zone))
20038+ sc.gfp_mask |= __GFP_HIGHMEM;
24613191 20039+
4e97e4e9 20040+ if (ps_wanted & 2)
20041+ freed = shrink_ps2_zone(zone, total_to_free, sc);
20042+ if (ps_wanted & 1)
20043+ shrink_ps1_zone(zone, total_to_free - freed, sc);
24613191 20044+}
24613191 20045+
20046 /*
20047 * Helper function for shrink_all_memory(). Tries to reclaim 'nr_pages' pages
20048 * from LRU lists system-wide, for given pass and priority, and returns the
This page took 3.006637 seconds and 4 git commands to generate.