1 diff -urN linux-2.4.20/Documentation/sched-coding.txt linux-2.4.20-o1/Documentation/sched-coding.txt
2 --- linux-2.4.20/Documentation/sched-coding.txt Thu Jan 1 01:00:00 1970
3 +++ linux-2.4.20-o1/Documentation/sched-coding.txt Wed Mar 12 00:41:43 2003
5 + Reference for various scheduler-related methods in the O(1) scheduler
6 + Robert Love <rml@tech9.net>, MontaVista Software
9 +Note most of these methods are local to kernel/sched.c - this is by design.
10 +The scheduler is meant to be self-contained and abstracted away. This document
11 +is primarily for understanding the scheduler, not interfacing to it. Some of
12 +the discussed interfaces, however, are general process/scheduling methods.
13 +They are typically defined in include/linux/sched.h.
16 +Main Scheduling Methods
17 +-----------------------
19 +void load_balance(runqueue_t *this_rq, int idle)
20 + Attempts to pull tasks from one cpu to another to balance cpu usage,
21 + if needed. This method is called explicitly if the runqueues are
22 + inbalanced or periodically by the timer tick. Prior to calling,
23 + the current runqueue must be locked and interrupts disabled.
26 + The main scheduling function. Upon return, the highest priority
27 + process will be active.
33 +Each runqueue has its own lock, rq->lock. When multiple runqueues need
34 +to be locked, lock acquires must be ordered by ascending &runqueue value.
36 +A specific runqueue is locked via
38 + task_rq_lock(task_t pid, unsigned long *flags)
40 +which disables preemption, disables interrupts, and locks the runqueue pid is
41 +running on. Likewise,
43 + task_rq_unlock(task_t pid, unsigned long *flags)
45 +unlocks the runqueue pid is running on, restores interrupts to their previous
46 +state, and reenables preemption.
50 + double_rq_lock(runqueue_t *rq1, runqueue_t *rq2)
54 + double_rq_unlock(runqueue_t *rq1, runqueue_t rq2)
56 +safely lock and unlock, respectively, the two specified runqueues. They do
57 +not, however, disable and restore interrupts. Users are required to do so
58 +manually before and after calls.
65 + The maximum priority of the system, stored in the task as task->prio.
66 + Lower priorities are higher. Normal (non-RT) priorities range from
67 + MAX_RT_PRIO to (MAX_PRIO - 1).
69 + The maximum real-time priority of the system. Valid RT priorities
70 + range from 0 to (MAX_RT_PRIO - 1).
72 + The maximum real-time priority that is exported to user-space. Should
73 + always be equal to or less than MAX_RT_PRIO. Setting it less allows
74 + kernel threads to have higher priorities than any user-space task.
77 + Respectively, the minimum and maximum timeslices (quanta) of a process.
83 + The main per-CPU runqueue data structure.
85 + The main per-process data structure.
92 + Returns the runqueue of the specified cpu.
94 + Returns the runqueue of the current cpu.
96 + Returns the runqueue which holds the specified pid.
98 + Returns the task currently running on the given cpu.
100 + Returns true if pid is real-time, false if not.
103 +Process Control Methods
104 +-----------------------
106 +void set_user_nice(task_t *p, long nice)
107 + Sets the "nice" value of task p to the given value.
108 +int setscheduler(pid_t pid, int policy, struct sched_param *param)
109 + Sets the scheduling policy and parameters for the given pid.
110 +void set_cpus_allowed(task_t *p, unsigned long new_mask)
111 + Sets a given task's CPU affinity and migrates it to a proper cpu.
112 + Callers must have a valid reference to the task and assure the
113 + task not exit prematurely. No locks can be held during the call.
114 +set_task_state(tsk, state_value)
115 + Sets the given task's state to the given value.
116 +set_current_state(state_value)
117 + Sets the current task's state to the given value.
118 +void set_tsk_need_resched(struct task_struct *tsk)
119 + Sets need_resched in the given task.
120 +void clear_tsk_need_resched(struct task_struct *tsk)
121 + Clears need_resched in the given task.
122 +void set_need_resched()
123 + Sets need_resched in the current task.
124 +void clear_need_resched()
125 + Clears need_resched in the current task.
127 + Returns true if need_resched is set in the current task, false
130 + Place the current process at the end of the runqueue and call schedule.
131 diff -urN linux-2.4.20/Documentation/sched-design.txt linux-2.4.20-o1/Documentation/sched-design.txt
132 --- linux-2.4.20/Documentation/sched-design.txt Thu Jan 1 01:00:00 1970
133 +++ linux-2.4.20-o1/Documentation/sched-design.txt Wed Mar 12 00:41:43 2003
135 + Goals, Design and Implementation of the
136 + new ultra-scalable O(1) scheduler
139 + This is an edited version of an email Ingo Molnar sent to
140 + lkml on 4 Jan 2002. It describes the goals, design, and
141 + implementation of Ingo's new ultra-scalable O(1) scheduler.
142 + Last Updated: 18 April 2002.
148 +The main goal of the new scheduler is to keep all the good things we know
149 +and love about the current Linux scheduler:
151 + - good interactive performance even during high load: if the user
152 + types or clicks then the system must react instantly and must execute
153 + the user tasks smoothly, even during considerable background load.
155 + - good scheduling/wakeup performance with 1-2 runnable processes.
157 + - fairness: no process should stay without any timeslice for any
158 + unreasonable amount of time. No process should get an unjustly high
159 + amount of CPU time.
161 + - priorities: less important tasks can be started with lower priority,
162 + more important tasks with higher priority.
164 + - SMP efficiency: no CPU should stay idle if there is work to do.
166 + - SMP affinity: processes which run on one CPU should stay affine to
167 + that CPU. Processes should not bounce between CPUs too frequently.
169 + - plus additional scheduler features: RT scheduling, CPU binding.
171 +and the goal is also to add a few new things:
173 + - fully O(1) scheduling. Are you tired of the recalculation loop
174 + blowing the L1 cache away every now and then? Do you think the goodness
175 + loop is taking a bit too long to finish if there are lots of runnable
176 + processes? This new scheduler takes no prisoners: wakeup(), schedule(),
177 + the timer interrupt are all O(1) algorithms. There is no recalculation
178 + loop. There is no goodness loop either.
180 + - 'perfect' SMP scalability. With the new scheduler there is no 'big'
181 + runqueue_lock anymore - it's all per-CPU runqueues and locks - two
182 + tasks on two separate CPUs can wake up, schedule and context-switch
183 + completely in parallel, without any interlocking. All
184 + scheduling-relevant data is structured for maximum scalability.
186 + - better SMP affinity. The old scheduler has a particular weakness that
187 + causes the random bouncing of tasks between CPUs if/when higher
188 + priority/interactive tasks, this was observed and reported by many
189 + people. The reason is that the timeslice recalculation loop first needs
190 + every currently running task to consume its timeslice. But when this
191 + happens on eg. an 8-way system, then this property starves an
192 + increasing number of CPUs from executing any process. Once the last
193 + task that has a timeslice left has finished using up that timeslice,
194 + the recalculation loop is triggered and other CPUs can start executing
195 + tasks again - after having idled around for a number of timer ticks.
196 + The more CPUs, the worse this effect.
198 + Furthermore, this same effect causes the bouncing effect as well:
199 + whenever there is such a 'timeslice squeeze' of the global runqueue,
200 + idle processors start executing tasks which are not affine to that CPU.
201 + (because the affine tasks have finished off their timeslices already.)
203 + The new scheduler solves this problem by distributing timeslices on a
204 + per-CPU basis, without having any global synchronization or
207 + - batch scheduling. A significant proportion of computing-intensive tasks
208 + benefit from batch-scheduling, where timeslices are long and processes
209 + are roundrobin scheduled. The new scheduler does such batch-scheduling
210 + of the lowest priority tasks - so nice +19 jobs will get
211 + 'batch-scheduled' automatically. With this scheduler, nice +19 jobs are
212 + in essence SCHED_IDLE, from an interactiveness point of view.
214 + - handle extreme loads more smoothly, without breakdown and scheduling
217 + - O(1) RT scheduling. For those RT folks who are paranoid about the
218 + O(nr_running) property of the goodness loop and the recalculation loop.
220 + - run fork()ed children before the parent. Andrea has pointed out the
221 + advantages of this a few months ago, but patches for this feature
222 + do not work with the old scheduler as well as they should,
223 + because idle processes often steal the new child before the fork()ing
224 + CPU gets to execute it.
230 +the core of the new scheduler are the following mechanizms:
232 + - *two*, priority-ordered 'priority arrays' per CPU. There is an 'active'
233 + array and an 'expired' array. The active array contains all tasks that
234 + are affine to this CPU and have timeslices left. The expired array
235 + contains all tasks which have used up their timeslices - but this array
236 + is kept sorted as well. The active and expired array is not accessed
237 + directly, it's accessed through two pointers in the per-CPU runqueue
238 + structure. If all active tasks are used up then we 'switch' the two
239 + pointers and from now on the ready-to-go (former-) expired array is the
240 + active array - and the empty active array serves as the new collector
243 + - there is a 64-bit bitmap cache for array indices. Finding the highest
244 + priority task is thus a matter of two x86 BSFL bit-search instructions.
246 +the split-array solution enables us to have an arbitrary number of active
247 +and expired tasks, and the recalculation of timeslices can be done
248 +immediately when the timeslice expires. Because the arrays are always
249 +access through the pointers in the runqueue, switching the two arrays can
250 +be done very quickly.
252 +this is a hybride priority-list approach coupled with roundrobin
253 +scheduling and the array-switch method of distributing timeslices.
255 + - there is a per-task 'load estimator'.
257 +one of the toughest things to get right is good interactive feel during
258 +heavy system load. While playing with various scheduler variants i found
259 +that the best interactive feel is achieved not by 'boosting' interactive
260 +tasks, but by 'punishing' tasks that want to use more CPU time than there
261 +is available. This method is also much easier to do in an O(1) fashion.
263 +to establish the actual 'load' the task contributes to the system, a
264 +complex-looking but pretty accurate method is used: there is a 4-entry
265 +'history' ringbuffer of the task's activities during the last 4 seconds.
266 +This ringbuffer is operated without much overhead. The entries tell the
267 +scheduler a pretty accurate load-history of the task: has it used up more
268 +CPU time or less during the past N seconds. [the size '4' and the interval
269 +of 4x 1 seconds was found by lots of experimentation - this part is
270 +flexible and can be changed in both directions.]
272 +the penalty a task gets for generating more load than the CPU can handle
273 +is a priority decrease - there is a maximum amount to this penalty
274 +relative to their static priority, so even fully CPU-bound tasks will
275 +observe each other's priorities, and will share the CPU accordingly.
277 +the SMP load-balancer can be extended/switched with additional parallel
278 +computing and cache hierarchy concepts: NUMA scheduling, multi-core CPUs
279 +can be supported easily by changing the load-balancer. Right now it's
280 +tuned for my SMP systems.
282 +i skipped the prev->mm == next->mm advantage - no workload i know of shows
283 +any sensitivity to this. It can be added back by sacrificing O(1)
284 +schedule() [the current and one-lower priority list can be searched for a
285 +that->mm == current->mm condition], but costs a fair number of cycles
286 +during a number of important workloads, so i wanted to avoid this as much
289 +- the SMP idle-task startup code was still racy and the new scheduler
290 +triggered this. So i streamlined the idle-setup code a bit. We do not call
291 +into schedule() before all processors have started up fully and all idle
292 +threads are in place.
294 +- the patch also cleans up a number of aspects of sched.c - moves code
295 +into other areas of the kernel where it's appropriate, and simplifies
296 +certain code paths and data constructs. As a result, the new scheduler's
297 +code is smaller than the old one.
300 diff -urN linux-2.4.20/arch/alpha/kernel/entry.S linux-2.4.20-o1/arch/alpha/kernel/entry.S
301 --- linux-2.4.20/arch/alpha/kernel/entry.S Sat Aug 3 02:39:42 2002
302 +++ linux-2.4.20-o1/arch/alpha/kernel/entry.S Wed Mar 12 00:41:43 2003
305 lda $26,ret_from_sys_call
308 jsr $31,schedule_tail
313 diff -urN linux-2.4.20/arch/alpha/kernel/process.c linux-2.4.20-o1/arch/alpha/kernel/process.c
314 --- linux-2.4.20/arch/alpha/kernel/process.c Sun Sep 30 21:26:08 2001
315 +++ linux-2.4.20-o1/arch/alpha/kernel/process.c Wed Mar 12 00:41:43 2003
319 /* An endless idle loop with no priority at all. */
320 - current->nice = 20;
321 - current->counter = -100;
324 /* FIXME -- EV6 and LCA45 know how to power down
326 diff -urN linux-2.4.20/arch/alpha/kernel/smp.c linux-2.4.20-o1/arch/alpha/kernel/smp.c
327 --- linux-2.4.20/arch/alpha/kernel/smp.c Sat Aug 3 02:39:42 2002
328 +++ linux-2.4.20-o1/arch/alpha/kernel/smp.c Wed Mar 12 00:41:43 2003
330 int smp_num_probed; /* Internal processor count */
331 int smp_num_cpus = 1; /* Number that came online. */
332 int smp_threads_ready; /* True once the per process idle is forked. */
333 +cycles_t cacheflush_time;
334 +unsigned long cache_decay_ticks;
336 int __cpu_number_map[NR_CPUS];
337 int __cpu_logical_map[NR_CPUS];
340 int cpuid = hard_smp_processor_id();
342 - if (current != init_tasks[cpu_number_map(cpuid)]) {
343 - printk("BUG: smp_calling: cpu %d current %p init_tasks[cpu_number_map(cpuid)] %p\n",
344 - cpuid, current, init_tasks[cpu_number_map(cpuid)]);
347 DBGS(("CALLIN %d state 0x%lx\n", cpuid, current->state));
349 /* Turn on machine checks. */
351 DBGS(("smp_callin: commencing CPU %d current %p\n",
354 - /* Setup the scheduler for this processor. */
357 /* ??? This should be in init_idle. */
358 atomic_inc(&init_mm.mm_count);
359 current->active_mm = &init_mm;
366 + * Rough estimation for SMP scheduling, this is the number of cycles it
367 + * takes for a fully memory-limited process to flush the SMP-local cache.
369 + * We are not told how much cache there is, so we have to guess.
372 +smp_tune_scheduling (int cpuid)
374 + struct percpu_struct *cpu;
375 + unsigned long on_chip_cache; /* kB */
376 + unsigned long freq; /* Hz */
377 + unsigned long bandwidth = 350; /* MB/s */
379 + cpu = (struct percpu_struct*)((char*)hwrpb + hwrpb->processor_offset
380 + + cpuid * hwrpb->processor_size);
384 + on_chip_cache = 16 + 16;
389 + on_chip_cache = 8 + 8 + 96;
393 + on_chip_cache = 16 + 8;
399 + on_chip_cache = 64 + 64;
403 + freq = hwrpb->cycle_freq ? : est_cycle_freq;
405 + cacheflush_time = (freq / 1000000) * (on_chip_cache << 10) / bandwidth;
406 + cache_decay_ticks = cacheflush_time / (freq / 1000) * HZ / 1000;
408 + printk("per-CPU timeslice cutoff: %ld.%02ld usecs.\n",
409 + cacheflush_time/(freq/1000000),
410 + (cacheflush_time*100/(freq/1000000)) % 100);
411 + printk("task migration cache decay timeout: %ld msecs.\n",
412 + (cache_decay_ticks + 1) * 1000 / HZ);
416 * Send a message to a secondary's console. "START" is one such
417 * interesting message. ;-)
418 @@ -505,14 +491,11 @@
419 if (idle == &init_task)
420 panic("idle process is init_task for CPU %d", cpuid);
422 - idle->processor = cpuid;
423 - idle->cpus_runnable = 1 << cpuid; /* we schedule the first task manually */
424 + init_idle(idle, cpuid);
425 + unhash_process(idle);
427 __cpu_logical_map[cpunum] = cpuid;
428 __cpu_number_map[cpuid] = cpunum;
430 - del_from_runqueue(idle);
431 - unhash_process(idle);
432 - init_tasks[cpunum] = idle;
434 DBGS(("smp_boot_one_cpu: CPU %d state 0x%lx flags 0x%lx\n",
435 cpuid, idle->state, idle->flags));
436 @@ -619,12 +602,10 @@
438 __cpu_number_map[boot_cpuid] = 0;
439 __cpu_logical_map[0] = boot_cpuid;
440 - current->processor = boot_cpuid;
442 smp_store_cpu_info(boot_cpuid);
443 + smp_tune_scheduling(boot_cpuid);
444 smp_setup_percpu_timer(boot_cpuid);
448 /* ??? This should be in init_idle. */
449 atomic_inc(&init_mm.mm_count);
450 diff -urN linux-2.4.20/arch/arm/kernel/process.c linux-2.4.20-o1/arch/arm/kernel/process.c
451 --- linux-2.4.20/arch/arm/kernel/process.c Sat Aug 3 02:39:42 2002
452 +++ linux-2.4.20-o1/arch/arm/kernel/process.c Wed Mar 12 00:41:43 2003
455 /* endless idle loop with no priority at all */
457 - current->nice = 20;
458 - current->counter = -100;
461 void (*idle)(void) = pm_idle;
462 diff -urN linux-2.4.20/arch/i386/kernel/entry.S linux-2.4.20-o1/arch/i386/kernel/entry.S
463 --- linux-2.4.20/arch/i386/kernel/entry.S Fri Nov 29 00:53:09 2002
464 +++ linux-2.4.20-o1/arch/i386/kernel/entry.S Wed Mar 12 00:41:43 2003
480 call SYMBOL_NAME(schedule_tail)
484 testb $0x02,tsk_ptrace(%ebx) # PT_TRACESYS
486 diff -urN linux-2.4.20/arch/i386/kernel/process.c linux-2.4.20-o1/arch/i386/kernel/process.c
487 --- linux-2.4.20/arch/i386/kernel/process.c Sat Aug 3 02:39:42 2002
488 +++ linux-2.4.20-o1/arch/i386/kernel/process.c Wed Mar 12 00:41:43 2003
491 if (current_cpu_data.hlt_works_ok && !hlt_counter) {
493 - if (!current->need_resched)
494 + if (!need_resched())
501 /* endless idle loop with no priority at all */
503 - current->nice = 20;
504 - current->counter = -100;
507 void (*idle)(void) = pm_idle;
508 @@ -697,15 +694,17 @@
509 asm volatile("movl %%gs,%0":"=m" (*(int *)&prev->gs));
512 - * Restore %fs and %gs.
513 + * Restore %fs and %gs if needed.
515 - loadsegment(fs, next->fs);
516 - loadsegment(gs, next->gs);
517 + if (unlikely(prev->fs | prev->gs | next->fs | next->gs)) {
518 + loadsegment(fs, next->fs);
519 + loadsegment(gs, next->gs);
523 * Now maybe reload the debug registers
525 - if (next->debugreg[7]){
526 + if (unlikely(next->debugreg[7])) {
534 - if (prev->ioperm || next->ioperm) {
535 + if (unlikely(prev->ioperm || next->ioperm)) {
538 * 4 cachelines copy ... not good, but not that
539 diff -urN linux-2.4.20/arch/i386/kernel/setup.c linux-2.4.20-o1/arch/i386/kernel/setup.c
540 --- linux-2.4.20/arch/i386/kernel/setup.c Fri Nov 29 00:53:09 2002
541 +++ linux-2.4.20-o1/arch/i386/kernel/setup.c Wed Mar 12 00:41:43 2003
542 @@ -3046,9 +3046,10 @@
547 - * Clear all 6 debug registers:
549 + /* Clear %fs and %gs. */
550 + asm volatile ("xorl %eax, %eax; movl %eax, %fs; movl %eax, %gs");
552 + /* Clear all 6 debug registers: */
554 #define CD(register) __asm__("movl %0,%%db" #register ::"r"(0) );
556 diff -urN linux-2.4.20/arch/i386/kernel/smp.c linux-2.4.20-o1/arch/i386/kernel/smp.c
557 --- linux-2.4.20/arch/i386/kernel/smp.c Fri Nov 29 00:53:09 2002
558 +++ linux-2.4.20-o1/arch/i386/kernel/smp.c Wed Mar 12 00:41:43 2003
559 @@ -493,10 +493,20 @@
560 * it goes straight through and wastes no time serializing
561 * anything. Worst case is that we lose a reschedule ...
564 void smp_send_reschedule(int cpu)
566 send_IPI_mask(1 << cpu, RESCHEDULE_VECTOR);
570 + * this function sends a reschedule IPI to all (other) CPUs.
571 + * This should only be used if some 'global' task became runnable,
572 + * such as a RT task, that must be handled now. The first CPU
573 + * that manages to grab the task will run it.
575 +void smp_send_reschedule_all(void)
577 + send_IPI_allbutself(RESCHEDULE_VECTOR);
581 diff -urN linux-2.4.20/arch/i386/kernel/smpboot.c linux-2.4.20-o1/arch/i386/kernel/smpboot.c
582 --- linux-2.4.20/arch/i386/kernel/smpboot.c Fri Nov 29 00:53:09 2002
583 +++ linux-2.4.20-o1/arch/i386/kernel/smpboot.c Wed Mar 12 00:41:43 2003
584 @@ -308,14 +308,14 @@
585 if (tsc_values[i] < avg)
586 realdelta = -realdelta;
588 - printk("BIOS BUG: CPU#%d improperly initialized, has %ld usecs TSC skew! FIXED.\n",
590 + printk("BIOS BUG: CPU#%d improperly initialized, has %ld usecs TSC skew! FIXED.\n", i, realdelta);
600 static void __init synchronize_tsc_ap (void)
602 * (This works even if the APIC is not enabled.)
604 phys_id = GET_APIC_ID(apic_read(APIC_ID));
605 - cpuid = current->processor;
607 if (test_and_set_bit(cpuid, &cpu_online_map)) {
608 printk("huh, phys CPU#%d, CPU#%d already present??\n",
612 smp_store_cpu_info(cpuid);
614 + disable_APIC_timer();
616 * Allow the master to continue.
620 while (!atomic_read(&smp_commenced))
622 + enable_APIC_timer();
624 * low-memory mappings have been cleared, flush them from
625 * the local TLBs too.
626 @@ -803,16 +805,13 @@
628 panic("No idle process for CPU %d", cpu);
630 - idle->processor = cpu;
631 - idle->cpus_runnable = 1 << cpu; /* we schedule the first task manually */
632 + init_idle(idle, cpu);
634 map_cpu_to_boot_apicid(cpu, apicid);
636 idle->thread.eip = (unsigned long) start_secondary;
638 - del_from_runqueue(idle);
639 unhash_process(idle);
640 - init_tasks[cpu] = idle;
642 /* start_eip had better be page-aligned! */
643 start_eip = setup_trampoline();
647 cycles_t cacheflush_time;
648 +unsigned long cache_decay_ticks;
650 static void smp_tune_scheduling (void)
653 cacheflush_time = (cpu_khz>>10) * (cachesize<<10) / bandwidth;
656 + cache_decay_ticks = (long)cacheflush_time/cpu_khz * HZ / 1000;
658 printk("per-CPU timeslice cutoff: %ld.%02ld usecs.\n",
659 (long)cacheflush_time/(cpu_khz/1000),
660 ((long)cacheflush_time*100/(cpu_khz/1000)) % 100);
661 + printk("task migration cache decay timeout: %ld msecs.\n",
662 + (cache_decay_ticks + 1) * 1000 / HZ);
666 @@ -1023,8 +1027,7 @@
667 map_cpu_to_boot_apicid(0, boot_cpu_apicid);
669 global_irq_holder = 0;
670 - current->processor = 0;
673 smp_tune_scheduling();
676 diff -urN linux-2.4.20/arch/mips64/kernel/process.c linux-2.4.20-o1/arch/mips64/kernel/process.c
677 --- linux-2.4.20/arch/mips64/kernel/process.c Fri Nov 29 00:53:10 2002
678 +++ linux-2.4.20-o1/arch/mips64/kernel/process.c Wed Mar 12 00:41:43 2003
681 /* endless idle loop with no priority at all */
683 - current->nice = 20;
684 - current->counter = -100;
687 while (!current->need_resched)
689 diff -urN linux-2.4.20/arch/parisc/kernel/process.c linux-2.4.20-o1/arch/parisc/kernel/process.c
690 --- linux-2.4.20/arch/parisc/kernel/process.c Fri Nov 29 00:53:10 2002
691 +++ linux-2.4.20-o1/arch/parisc/kernel/process.c Wed Mar 12 00:41:43 2003
694 /* endless idle loop with no priority at all */
696 - current->nice = 20;
697 - current->counter = -100;
700 while (!current->need_resched) {
701 diff -urN linux-2.4.20/arch/ppc/8260_io/uart.c linux-2.4.20-o1/arch/ppc/8260_io/uart.c
702 --- linux-2.4.20/arch/ppc/8260_io/uart.c Sat Aug 3 02:39:43 2002
703 +++ linux-2.4.20-o1/arch/ppc/8260_io/uart.c Wed Mar 12 00:41:43 2003
704 @@ -1732,7 +1732,6 @@
705 printk("lsr = %d (jiff=%lu)...", lsr, jiffies);
707 current->state = TASK_INTERRUPTIBLE;
708 -/* current->counter = 0; make us low-priority */
709 schedule_timeout(char_time);
710 if (signal_pending(current))
712 diff -urN linux-2.4.20/arch/ppc/8xx_io/uart.c linux-2.4.20-o1/arch/ppc/8xx_io/uart.c
713 --- linux-2.4.20/arch/ppc/8xx_io/uart.c Sat Aug 3 02:39:43 2002
714 +++ linux-2.4.20-o1/arch/ppc/8xx_io/uart.c Wed Mar 12 00:41:43 2003
715 @@ -1796,7 +1796,6 @@
716 printk("lsr = %d (jiff=%lu)...", lsr, jiffies);
718 current->state = TASK_INTERRUPTIBLE;
719 -/* current->counter = 0; make us low-priority */
720 schedule_timeout(char_time);
721 if (signal_pending(current))
723 diff -urN linux-2.4.20/arch/ppc/kernel/entry.S linux-2.4.20-o1/arch/ppc/kernel/entry.S
724 --- linux-2.4.20/arch/ppc/kernel/entry.S Fri Nov 29 00:53:11 2002
725 +++ linux-2.4.20-o1/arch/ppc/kernel/entry.S Wed Mar 12 00:41:43 2003
733 lwz r0,TASK_PTRACE(r2)
734 andi. r0,r0,PT_TRACESYS
736 diff -urN linux-2.4.20/arch/ppc/kernel/idle.c linux-2.4.20-o1/arch/ppc/kernel/idle.c
737 --- linux-2.4.20/arch/ppc/kernel/idle.c Fri Nov 29 00:53:11 2002
738 +++ linux-2.4.20-o1/arch/ppc/kernel/idle.c Wed Mar 12 00:41:43 2003
742 /* endless loop with no priority at all */
743 - current->nice = 20;
744 - current->counter = -100;
749 if (!do_power_save) {
750 diff -urN linux-2.4.20/arch/ppc/kernel/mk_defs.c linux-2.4.20-o1/arch/ppc/kernel/mk_defs.c
751 --- linux-2.4.20/arch/ppc/kernel/mk_defs.c Tue Aug 28 15:58:33 2001
752 +++ linux-2.4.20-o1/arch/ppc/kernel/mk_defs.c Wed Mar 12 00:41:43 2003
754 /*DEFINE(KERNELBASE, KERNELBASE);*/
755 DEFINE(STATE, offsetof(struct task_struct, state));
756 DEFINE(NEXT_TASK, offsetof(struct task_struct, next_task));
757 - DEFINE(COUNTER, offsetof(struct task_struct, counter));
758 - DEFINE(PROCESSOR, offsetof(struct task_struct, processor));
759 + DEFINE(COUNTER, offsetof(struct task_struct, time_slice));
760 + DEFINE(PROCESSOR, offsetof(struct task_struct, cpu));
761 DEFINE(SIGPENDING, offsetof(struct task_struct, sigpending));
762 DEFINE(THREAD, offsetof(struct task_struct, thread));
763 DEFINE(MM, offsetof(struct task_struct, mm));
764 diff -urN linux-2.4.20/arch/ppc/kernel/process.c linux-2.4.20-o1/arch/ppc/kernel/process.c
765 --- linux-2.4.20/arch/ppc/kernel/process.c Mon Nov 26 14:29:17 2001
766 +++ linux-2.4.20-o1/arch/ppc/kernel/process.c Wed Mar 12 00:41:43 2003
771 - printk(" CPU: %d", current->processor);
772 + printk(" CPU: %d", current->cpu);
773 #endif /* CONFIG_SMP */
776 diff -urN linux-2.4.20/arch/ppc/kernel/smp.c linux-2.4.20-o1/arch/ppc/kernel/smp.c
777 --- linux-2.4.20/arch/ppc/kernel/smp.c Sat Aug 3 02:39:43 2002
778 +++ linux-2.4.20-o1/arch/ppc/kernel/smp.c Wed Mar 12 00:41:43 2003
780 unsigned long cpu_online_map;
781 int smp_hw_index[NR_CPUS];
782 static struct smp_ops_t *smp_ops;
783 +unsigned long cache_decay_ticks = HZ/100;
785 /* all cpu mappings are 1-1 -- Cort */
786 volatile unsigned long cpu_callin_map[NR_CPUS];
788 * cpu 0, the master -- Cort
790 cpu_callin_map[0] = 1;
791 - current->processor = 0;
796 for (i = 0; i < NR_CPUS; i++) {
799 p = init_task.prev_task;
801 panic("No idle task for CPU %d", i);
802 - del_from_runqueue(p);
808 - p->cpus_runnable = 1 << i; /* we schedule the first task manually */
814 void __init smp_callin(void)
816 - int cpu = current->processor;
817 + int cpu = current->cpu;
819 smp_store_cpu_info(cpu);
820 smp_ops->setup_cpu(cpu);
821 diff -urN linux-2.4.20/arch/ppc/lib/dec_and_lock.c linux-2.4.20-o1/arch/ppc/lib/dec_and_lock.c
822 --- linux-2.4.20/arch/ppc/lib/dec_and_lock.c Fri Nov 16 19:10:08 2001
823 +++ linux-2.4.20-o1/arch/ppc/lib/dec_and_lock.c Wed Mar 12 00:41:43 2003
825 #include <linux/module.h>
826 +#include <linux/sched.h>
827 #include <linux/spinlock.h>
828 #include <asm/atomic.h>
829 #include <asm/system.h>
830 diff -urN linux-2.4.20/arch/ppc/mm/init.c linux-2.4.20-o1/arch/ppc/mm/init.c
831 --- linux-2.4.20/arch/ppc/mm/init.c Sat Aug 3 02:39:43 2002
832 +++ linux-2.4.20-o1/arch/ppc/mm/init.c Wed Mar 12 00:41:43 2003
837 - printk("%3d ", p->processor);
838 - if ( (p->processor != NO_PROC_ID) &&
839 - (p == current_set[p->processor]) )
840 + printk("%3d ", p->cpu);
841 + if ( (p->cpu != NO_PROC_ID) &&
842 + (p == current_set[p->cpu]) )
846 diff -urN linux-2.4.20/arch/ppc64/kernel/entry.S linux-2.4.20-o1/arch/ppc64/kernel/entry.S
847 --- linux-2.4.20/arch/ppc64/kernel/entry.S Fri Nov 29 00:53:11 2002
848 +++ linux-2.4.20-o1/arch/ppc64/kernel/entry.S Wed Mar 12 00:41:43 2003
852 _GLOBAL(ret_from_fork)
856 ld r0,TASK_PTRACE(r13)
857 andi. r0,r0,PT_TRACESYS
858 beq+ .ret_from_except
859 diff -urN linux-2.4.20/arch/ppc64/kernel/idle.c linux-2.4.20-o1/arch/ppc64/kernel/idle.c
860 --- linux-2.4.20/arch/ppc64/kernel/idle.c Sat Aug 3 02:39:43 2002
861 +++ linux-2.4.20-o1/arch/ppc64/kernel/idle.c Wed Mar 12 00:41:43 2003
866 - /* endless loop with no priority at all */
867 - current->nice = 20;
868 - current->counter = -100;
869 #ifdef CONFIG_PPC_ISERIES
870 /* ensure iSeries run light will be out when idle */
871 current->thread.flags &= ~PPC_FLAG_RUN_LIGHT;
877 + /* endless loop with no priority at all */
881 diff -urN linux-2.4.20/arch/ppc64/kernel/process.c linux-2.4.20-o1/arch/ppc64/kernel/process.c
882 --- linux-2.4.20/arch/ppc64/kernel/process.c Fri Nov 29 00:53:11 2002
883 +++ linux-2.4.20-o1/arch/ppc64/kernel/process.c Wed Mar 12 00:41:43 2003
885 #ifdef SHOW_TASK_SWITCHES
886 printk("%s/%d -> %s/%d NIP %08lx cpu %d root %x/%x\n",
887 prev->comm,prev->pid,
888 - new->comm,new->pid,new->thread.regs->nip,new->processor,
889 + new->comm,new->pid,new->thread.regs->nip,new->cpu,
890 new->fs->root,prev->fs->root);
893 diff -urN linux-2.4.20/arch/ppc64/kernel/smp.c linux-2.4.20-o1/arch/ppc64/kernel/smp.c
894 --- linux-2.4.20/arch/ppc64/kernel/smp.c Fri Nov 29 00:53:11 2002
895 +++ linux-2.4.20-o1/arch/ppc64/kernel/smp.c Wed Mar 12 00:41:43 2003
897 extern atomic_t ipi_sent;
898 spinlock_t kernel_flag __cacheline_aligned = SPIN_LOCK_UNLOCKED;
899 cycles_t cacheflush_time;
900 +unsigned long cache_decay_ticks = HZ/100;
901 static int max_cpus __initdata = NR_CPUS;
903 unsigned long cpu_online_map;
905 * cpu 0, the master -- Cort
907 cpu_callin_map[0] = 1;
908 - current->processor = 0;
913 for (i = 0; i < NR_CPUS; i++) {
914 paca[i].prof_counter = 1;
917 PPCDBG(PPCDBG_SMP,"\tProcessor %d, task = 0x%lx\n", i, p);
919 - del_from_runqueue(p);
925 - p->cpus_runnable = 1 << i; /* we schedule the first task manually */
926 current_set[i].task = p;
927 sp = ((unsigned long)p) + sizeof(union task_union)
928 - STACK_FRAME_OVERHEAD;
931 void __init smp_callin(void)
933 - int cpu = current->processor;
934 + int cpu = current->cpu;
936 smp_store_cpu_info(cpu);
937 set_dec(paca[cpu].default_decr);
940 ppc_md.smp_setup_cpu(cpu);
944 set_bit(smp_processor_id(), &cpu_online_map);
946 while(!smp_commenced) {
951 - cpu = current->processor;
952 + cpu = current->cpu;
953 atomic_inc(&init_mm.mm_count);
954 current->active_mm = &init_mm;
956 diff -urN linux-2.4.20/arch/s390/kernel/process.c linux-2.4.20-o1/arch/s390/kernel/process.c
957 --- linux-2.4.20/arch/s390/kernel/process.c Sat Aug 3 02:39:43 2002
958 +++ linux-2.4.20-o1/arch/s390/kernel/process.c Wed Mar 12 00:41:43 2003
961 /* endless idle loop with no priority at all */
963 - current->nice = 20;
964 - current->counter = -100;
967 if (current->need_resched) {
969 diff -urN linux-2.4.20/arch/s390x/kernel/process.c linux-2.4.20-o1/arch/s390x/kernel/process.c
970 --- linux-2.4.20/arch/s390x/kernel/process.c Fri Nov 29 00:53:11 2002
971 +++ linux-2.4.20-o1/arch/s390x/kernel/process.c Wed Mar 12 00:41:43 2003
974 /* endless idle loop with no priority at all */
976 - current->nice = 20;
977 - current->counter = -100;
980 if (current->need_resched) {
982 diff -urN linux-2.4.20/arch/sh/kernel/process.c linux-2.4.20-o1/arch/sh/kernel/process.c
983 --- linux-2.4.20/arch/sh/kernel/process.c Mon Oct 15 22:36:48 2001
984 +++ linux-2.4.20-o1/arch/sh/kernel/process.c Wed Mar 12 00:41:43 2003
987 /* endless idle loop with no priority at all */
989 - current->nice = 20;
990 - current->counter = -100;
994 diff -urN linux-2.4.20/arch/sparc/kernel/entry.S linux-2.4.20-o1/arch/sparc/kernel/entry.S
995 --- linux-2.4.20/arch/sparc/kernel/entry.S Tue Nov 13 18:16:05 2001
996 +++ linux-2.4.20-o1/arch/sparc/kernel/entry.S Wed Mar 12 00:46:06 2003
997 @@ -1463,7 +1463,9 @@
999 .globl C_LABEL(ret_from_fork)
1000 C_LABEL(ret_from_fork):
1005 b C_LABEL(ret_sys_call)
1006 ld [%sp + REGWIN_SZ + PT_I0], %o0
1007 diff -urN linux-2.4.20/arch/sparc/kernel/process.c linux-2.4.20-o1/arch/sparc/kernel/process.c
1008 --- linux-2.4.20/arch/sparc/kernel/process.c Sat Aug 3 02:39:43 2002
1009 +++ linux-2.4.20-o1/arch/sparc/kernel/process.c Wed Mar 12 00:41:43 2003
1013 /* endless idle loop with no priority at all */
1014 - current->nice = 20;
1015 - current->counter = -100;
1019 if (ARCH_SUN4C_SUN4) {
1023 /* endless idle loop with no priority at all */
1024 - current->nice = 20;
1025 - current->counter = -100;
1029 if(current->need_resched) {
1030 diff -urN linux-2.4.20/arch/sparc/kernel/smp.c linux-2.4.20-o1/arch/sparc/kernel/smp.c
1031 --- linux-2.4.20/arch/sparc/kernel/smp.c Fri Dec 21 18:41:53 2001
1032 +++ linux-2.4.20-o1/arch/sparc/kernel/smp.c Wed Mar 12 00:41:43 2003
1034 volatile int __cpu_number_map[NR_CPUS];
1035 volatile int __cpu_logical_map[NR_CPUS];
1036 cycles_t cacheflush_time = 0; /* XXX */
1037 +unsigned long cache_decay_ticks = HZ/100; /* XXX */
1039 /* The only guaranteed locking primitive available on all Sparc
1040 * processors is 'ldstub [%reg + immediate], %dest_reg' which atomically
1041 diff -urN linux-2.4.20/arch/sparc/kernel/sun4d_smp.c linux-2.4.20-o1/arch/sparc/kernel/sun4d_smp.c
1042 --- linux-2.4.20/arch/sparc/kernel/sun4d_smp.c Sat Aug 3 02:39:43 2002
1043 +++ linux-2.4.20-o1/arch/sparc/kernel/sun4d_smp.c Wed Mar 12 00:41:43 2003
1045 * the SMP initialization the master will be just allowed
1046 * to call the scheduler code.
1050 /* Get our local ticker going. */
1051 smp_setup_percpu_timer();
1053 while((unsigned long)current_set[cpuid] < PAGE_OFFSET)
1056 - while(current_set[cpuid]->processor != cpuid)
1057 + while(current_set[cpuid]->cpu != cpuid)
1060 /* Fix idle thread fields. */
1061 @@ -197,10 +196,8 @@
1063 __cpu_number_map[boot_cpu_id] = 0;
1064 __cpu_logical_map[0] = boot_cpu_id;
1065 - current->processor = boot_cpu_id;
1066 smp_store_cpu_info(boot_cpu_id);
1067 smp_setup_percpu_timer();
1069 local_flush_cache_all();
1070 if(linux_num_cpus == 1)
1071 return; /* Not an MP box. */
1072 @@ -222,14 +219,10 @@
1075 p = init_task.prev_task;
1076 - init_tasks[i] = p;
1079 - p->cpus_runnable = 1 << i; /* we schedule the first task manually */
1083 - del_from_runqueue(p);
1087 for (no = 0; no < linux_num_cpus; no++)
1088 diff -urN linux-2.4.20/arch/sparc/kernel/sun4m_smp.c linux-2.4.20-o1/arch/sparc/kernel/sun4m_smp.c
1089 --- linux-2.4.20/arch/sparc/kernel/sun4m_smp.c Wed Nov 21 19:31:09 2001
1090 +++ linux-2.4.20-o1/arch/sparc/kernel/sun4m_smp.c Wed Mar 12 00:41:43 2003
1092 * the SMP initialization the master will be just allowed
1093 * to call the scheduler code.
1097 /* Allow master to continue. */
1098 swap((unsigned long *)&cpu_callin_map[cpuid], 1);
1099 @@ -170,12 +169,10 @@
1100 mid_xlate[boot_cpu_id] = (linux_cpus[boot_cpu_id].mid & ~8);
1101 __cpu_number_map[boot_cpu_id] = 0;
1102 __cpu_logical_map[0] = boot_cpu_id;
1103 - current->processor = boot_cpu_id;
1105 smp_store_cpu_info(boot_cpu_id);
1106 set_irq_udt(mid_xlate[boot_cpu_id]);
1107 smp_setup_percpu_timer();
1109 local_flush_cache_all();
1110 if(linux_num_cpus == 1)
1111 return; /* Not an MP box. */
1112 @@ -195,14 +192,10 @@
1115 p = init_task.prev_task;
1116 - init_tasks[i] = p;
1119 - p->cpus_runnable = 1 << i; /* we schedule the first task manually */
1123 - del_from_runqueue(p);
1127 /* See trampoline.S for details... */
1128 diff -urN linux-2.4.20/arch/sparc64/kernel/entry.S linux-2.4.20-o1/arch/sparc64/kernel/entry.S
1129 --- linux-2.4.20/arch/sparc64/kernel/entry.S Fri Nov 29 00:53:12 2002
1130 +++ linux-2.4.20-o1/arch/sparc64/kernel/entry.S Wed Mar 12 00:46:53 2003
1131 @@ -1619,7 +1619,9 @@
1133 andn %o7, SPARC_FLAG_NEWCHILD, %l0
1134 mov %g5, %o0 /* 'prev' */
1138 stb %l0, [%g6 + AOFF_task_thread + AOFF_thread_flags]
1139 andcc %l0, SPARC_FLAG_PERFCTR, %g0
1141 diff -urN linux-2.4.20/arch/sparc64/kernel/irq.c linux-2.4.20-o1/arch/sparc64/kernel/irq.c
1142 --- linux-2.4.20/arch/sparc64/kernel/irq.c Fri Nov 29 00:53:12 2002
1143 +++ linux-2.4.20-o1/arch/sparc64/kernel/irq.c Wed Mar 12 00:41:43 2003
1145 tid = ((tid & UPA_CONFIG_MID) << 9);
1146 tid &= IMAP_TID_UPA;
1148 - tid = (starfire_translate(imap, current->processor) << 26);
1149 + tid = (starfire_translate(imap, current->cpu) << 26);
1150 tid &= IMAP_TID_UPA;
1153 diff -urN linux-2.4.20/arch/sparc64/kernel/process.c linux-2.4.20-o1/arch/sparc64/kernel/process.c
1154 --- linux-2.4.20/arch/sparc64/kernel/process.c Fri Nov 29 00:53:12 2002
1155 +++ linux-2.4.20-o1/arch/sparc64/kernel/process.c Wed Mar 12 00:41:43 2003
1159 /* endless idle loop with no priority at all */
1160 - current->nice = 20;
1161 - current->counter = -100;
1165 /* If current->need_resched is zero we should really
1168 * the idle loop on a UltraMultiPenguin...
1170 -#define idle_me_harder() (cpu_data[current->processor].idle_volume += 1)
1171 -#define unidle_me() (cpu_data[current->processor].idle_volume = 0)
1172 +#define idle_me_harder() (cpu_data[current->cpu].idle_volume += 1)
1173 +#define unidle_me() (cpu_data[current->cpu].idle_volume = 0)
1176 - current->nice = 20;
1177 - current->counter = -100;
1181 if (current->need_resched != 0) {
1183 diff -urN linux-2.4.20/arch/sparc64/kernel/rtrap.S linux-2.4.20-o1/arch/sparc64/kernel/rtrap.S
1184 --- linux-2.4.20/arch/sparc64/kernel/rtrap.S 2003-08-16 04:07:49.000000000 +0200
1185 +++ linux-2.4.20-o1/arch/sparc64/kernel/rtrap.S 2003-08-16 04:08:38.000000000 +0200
1188 .globl rtrap_clr_l6, rtrap, irqsz_patchme, rtrap_xcall
1189 rtrap_clr_l6: clr %l6
1190 -rtrap: lduw [%g6 + AOFF_task_processor], %l0
1191 +rtrap: lduw [%g6 + AOFF_task_cpu], %l0
1192 sethi %hi(irq_stat), %l2 ! &softirq_active
1193 or %l2, %lo(irq_stat), %l2 ! &softirq_active
1194 irqsz_patchme: sllx %l0, 0, %l0
1195 diff -urN linux-2.4.20/arch/sparc64/kernel/smp.c linux-2.4.20-o1/arch/sparc64/kernel/smp.c
1196 --- linux-2.4.20/arch/sparc64/kernel/smp.c Fri Nov 29 00:53:12 2002
1197 +++ linux-2.4.20-o1/arch/sparc64/kernel/smp.c Wed Mar 12 00:41:43 2003
1199 printk("Entering UltraSMPenguin Mode...\n");
1201 smp_store_cpu_info(boot_cpu_id);
1202 + smp_tune_scheduling();
1205 if (linux_num_cpus == 1)
1207 @@ -282,12 +281,8 @@
1210 p = init_task.prev_task;
1211 - init_tasks[cpucount] = p;
1214 - p->cpus_runnable = 1UL << i; /* we schedule the first task manually */
1216 - del_from_runqueue(p);
1221 @@ -1154,8 +1149,94 @@
1222 __cpu_number_map[boot_cpu_id] = 0;
1223 prom_cpu_nodes[boot_cpu_id] = linux_cpus[0].prom_node;
1224 __cpu_logical_map[0] = boot_cpu_id;
1225 - current->processor = boot_cpu_id;
1226 prof_counter(boot_cpu_id) = prof_multiplier(boot_cpu_id) = 1;
1229 +cycles_t cacheflush_time;
1230 +unsigned long cache_decay_ticks;
1232 +extern unsigned long cheetah_tune_scheduling(void);
1234 +static void __init smp_tune_scheduling(void)
1236 + unsigned long orig_flush_base, flush_base, flags, *p;
1237 + unsigned int ecache_size, order;
1238 + cycles_t tick1, tick2, raw;
1240 + /* Approximate heuristic for SMP scheduling. It is an
1241 + * estimation of the time it takes to flush the L2 cache
1242 + * on the local processor.
1244 + * The ia32 chooses to use the L1 cache flush time instead,
1245 + * and I consider this complete nonsense. The Ultra can service
1246 + * a miss to the L1 with a hit to the L2 in 7 or 8 cycles, and
1247 + * L2 misses are what create extra bus traffic (ie. the "cost"
1248 + * of moving a process from one cpu to another).
1250 + printk("SMP: Calibrating ecache flush... ");
1251 + if (tlb_type == cheetah || tlb_type == cheetah_plus) {
1252 + cacheflush_time = cheetah_tune_scheduling();
1256 + ecache_size = prom_getintdefault(linux_cpus[0].prom_node,
1257 + "ecache-size", (512 * 1024));
1258 + if (ecache_size > (4 * 1024 * 1024))
1259 + ecache_size = (4 * 1024 * 1024);
1260 + orig_flush_base = flush_base =
1261 + __get_free_pages(GFP_KERNEL, order = get_order(ecache_size));
1263 + if (flush_base != 0UL) {
1264 + local_irq_save(flags);
1266 + /* Scan twice the size once just to get the TLB entries
1267 + * loaded and make sure the second scan measures pure misses.
1269 + for (p = (unsigned long *)flush_base;
1270 + ((unsigned long)p) < (flush_base + (ecache_size<<1));
1271 + p += (64 / sizeof(unsigned long)))
1272 + *((volatile unsigned long *)p);
1274 + tick1 = tick_ops->get_tick();
1276 + __asm__ __volatile__("1:\n\t"
1277 + "ldx [%0 + 0x000], %%g1\n\t"
1278 + "ldx [%0 + 0x040], %%g2\n\t"
1279 + "ldx [%0 + 0x080], %%g3\n\t"
1280 + "ldx [%0 + 0x0c0], %%g5\n\t"
1281 + "add %0, 0x100, %0\n\t"
1283 + "bne,pt %%xcc, 1b\n\t"
1285 + : "=&r" (flush_base)
1286 + : "0" (flush_base),
1287 + "r" (flush_base + ecache_size)
1288 + : "g1", "g2", "g3", "g5");
1290 + tick2 = tick_ops->get_tick();
1292 + local_irq_restore(flags);
1294 + raw = (tick2 - tick1);
1296 + /* Dampen it a little, considering two processes
1297 + * sharing the cache and fitting.
1299 + cacheflush_time = (raw - (raw >> 2));
1301 + free_pages(orig_flush_base, order);
1303 + cacheflush_time = ((ecache_size << 2) +
1304 + (ecache_size << 1));
1307 + /* Convert ticks/sticks to jiffies. */
1308 + cache_decay_ticks = cacheflush_time / timer_tick_offset;
1309 + if (cache_decay_ticks < 1)
1310 + cache_decay_ticks = 1;
1312 + printk("Using heuristic of %ld cycles, %ld ticks.\n",
1313 + cacheflush_time, cache_decay_ticks);
1316 static inline unsigned long find_flush_base(unsigned long size)
1317 diff -urN linux-2.4.20/arch/sparc64/kernel/trampoline.S linux-2.4.20-o1/arch/sparc64/kernel/trampoline.S
1318 --- linux-2.4.20/arch/sparc64/kernel/trampoline.S 2003-08-16 04:07:57.000000000 +0200
1319 +++ linux-2.4.20-o1/arch/sparc64/kernel/trampoline.S 2003-08-16 04:08:56.000000000 +0200
1321 wrpr %o1, PSTATE_IG, %pstate
1323 /* Get our UPA MID. */
1324 - lduw [%o2 + AOFF_task_processor], %g1
1325 + lduw [%o2 + AOFF_task_cpu], %g1
1326 sethi %hi(cpu_data), %g5
1327 or %g5, %lo(cpu_data), %g5
1329 diff -urN linux-2.4.20/arch/sparc64/kernel/traps.c linux-2.4.20-o1/arch/sparc64/kernel/traps.c
1330 --- linux-2.4.20/arch/sparc64/kernel/traps.c Fri Nov 29 00:53:12 2002
1331 +++ linux-2.4.20-o1/arch/sparc64/kernel/traps.c Wed Mar 12 00:41:43 2003
1333 #include <linux/smp.h>
1334 #include <linux/smp_lock.h>
1335 #include <linux/mm.h>
1336 +#include <linux/init.h>
1338 #include <asm/delay.h>
1339 #include <asm/system.h>
1340 @@ -570,6 +570,48 @@
1341 "i" (ASI_PHYS_USE_EC));
1345 +unsigned long __init cheetah_tune_scheduling(void)
1347 + unsigned long tick1, tick2, raw;
1348 + unsigned long flush_base = ecache_flush_physbase;
1349 + unsigned long flush_linesize = ecache_flush_linesize;
1350 + unsigned long flush_size = ecache_flush_size;
1352 + /* Run through the whole cache to guarentee the timed loop
1353 + * is really displacing cache lines.
1355 + __asm__ __volatile__("1: subcc %0, %4, %0\n\t"
1356 + " bne,pt %%xcc, 1b\n\t"
1357 + " ldxa [%2 + %0] %3, %%g0\n\t"
1358 + : "=&r" (flush_size)
1359 + : "0" (flush_size), "r" (flush_base),
1360 + "i" (ASI_PHYS_USE_EC), "r" (flush_linesize));
1362 + /* The flush area is 2 X Ecache-size, so cut this in half for
1365 + flush_base = ecache_flush_physbase;
1366 + flush_linesize = ecache_flush_linesize;
1367 + flush_size = ecache_flush_size >> 1;
1369 + __asm__ __volatile__("rd %%tick, %0" : "=r" (tick1));
1371 + __asm__ __volatile__("1: subcc %0, %4, %0\n\t"
1372 + " bne,pt %%xcc, 1b\n\t"
1373 + " ldxa [%2 + %0] %3, %%g0\n\t"
1374 + : "=&r" (flush_size)
1375 + : "0" (flush_size), "r" (flush_base),
1376 + "i" (ASI_PHYS_USE_EC), "r" (flush_linesize));
1378 + __asm__ __volatile__("rd %%tick, %0" : "=r" (tick2));
1380 + raw = (tick2 - tick1);
1382 + return (raw - (raw >> 2));
1386 /* Unfortunately, the diagnostic access to the I-cache tags we need to
1387 * use to clear the thing interferes with I-cache coherency transactions.
1389 diff -urN linux-2.4.20/drivers/char/drm-4.0/tdfx_drv.c linux-2.4.20-o1/drivers/char/drm-4.0/tdfx_drv.c
1390 --- linux-2.4.20/drivers/char/drm-4.0/tdfx_drv.c Fri Nov 29 00:53:12 2002
1391 +++ linux-2.4.20-o1/drivers/char/drm-4.0/tdfx_drv.c Wed Mar 12 00:41:43 2003
1393 lock.context, current->pid, j,
1394 dev->lock.lock_time, jiffies);
1395 current->state = TASK_INTERRUPTIBLE;
1396 - current->policy |= SCHED_YIELD;
1397 schedule_timeout(DRM_LOCK_SLICE-j);
1398 DRM_DEBUG("jiffies=%d\n", jiffies);
1400 diff -urN linux-2.4.20/drivers/char/mwave/mwavedd.c linux-2.4.20-o1/drivers/char/mwave/mwavedd.c
1401 --- linux-2.4.20/drivers/char/mwave/mwavedd.c Mon Feb 25 20:37:57 2002
1402 +++ linux-2.4.20-o1/drivers/char/mwave/mwavedd.c Wed Mar 12 00:41:43 2003
1404 pDrvData->IPCs[ipcnum].bIsHere = FALSE;
1405 pDrvData->IPCs[ipcnum].bIsEnabled = TRUE;
1406 #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,4,0)
1407 - current->nice = -20; /* boost to provide priority timing */
1409 current->priority = 0x28; /* boost to provide priority timing */
1411 diff -urN linux-2.4.20/drivers/char/serial_txx927.c linux-2.4.20-o1/drivers/char/serial_txx927.c
1412 --- linux-2.4.20/drivers/char/serial_txx927.c Sat Aug 3 02:39:43 2002
1413 +++ linux-2.4.20-o1/drivers/char/serial_txx927.c Wed Mar 12 00:41:43 2003
1414 @@ -1533,7 +1533,6 @@
1415 printk("cisr = %d (jiff=%lu)...", cisr, jiffies);
1417 current->state = TASK_INTERRUPTIBLE;
1418 - current->counter = 0; /* make us low-priority */
1419 schedule_timeout(char_time);
1420 if (signal_pending(current))
1422 diff -urN linux-2.4.20/drivers/md/md.c linux-2.4.20-o1/drivers/md/md.c
1423 --- linux-2.4.20/drivers/md/md.c Fri Nov 29 00:53:13 2002
1424 +++ linux-2.4.20-o1/drivers/md/md.c Wed Mar 12 00:41:43 2003
1425 @@ -2936,8 +2936,6 @@
1426 * bdflush, otherwise bdflush will deadlock if there are too
1427 * many dirty RAID5 blocks.
1429 - current->policy = SCHED_OTHER;
1430 - current->nice = -20;
1433 complete(thread->event);
1434 @@ -3391,11 +3389,6 @@
1435 "(but not more than %d KB/sec) for reconstruction.\n",
1436 sysctl_speed_limit_max);
1439 - * Resync has low priority.
1441 - current->nice = 19;
1443 is_mddev_idle(mddev); /* this also initializes IO event counters */
1444 for (m = 0; m < SYNC_MARKS; m++) {
1446 @@ -3473,16 +3466,13 @@
1447 currspeed = (j-mddev->resync_mark_cnt)/2/((jiffies-mddev->resync_mark)/HZ +1) +1;
1449 if (currspeed > sysctl_speed_limit_min) {
1450 - current->nice = 19;
1452 if ((currspeed > sysctl_speed_limit_max) ||
1453 !is_mddev_idle(mddev)) {
1454 current->state = TASK_INTERRUPTIBLE;
1455 md_schedule_timeout(HZ/4);
1459 - current->nice = -20;
1462 printk(KERN_INFO "md: md%d: sync done.\n",mdidx(mddev));
1464 diff -urN linux-2.4.20/fs/binfmt_elf.c linux-2.4.20-o1/fs/binfmt_elf.c
1465 --- linux-2.4.20/fs/binfmt_elf.c Sat Aug 3 02:39:45 2002
1466 +++ linux-2.4.20-o1/fs/binfmt_elf.c Wed Mar 12 00:41:43 2003
1467 @@ -1143,7 +1143,7 @@
1468 psinfo.pr_state = i;
1469 psinfo.pr_sname = (i < 0 || i > 5) ? '.' : "RSDZTD"[i];
1470 psinfo.pr_zomb = psinfo.pr_sname == 'Z';
1471 - psinfo.pr_nice = current->nice;
1472 + psinfo.pr_nice = task_nice(current);
1473 psinfo.pr_flag = current->flags;
1474 psinfo.pr_uid = NEW_TO_OLD_UID(current->uid);
1475 psinfo.pr_gid = NEW_TO_OLD_GID(current->gid);
1476 diff -urN linux-2.4.20/fs/jffs2/background.c linux-2.4.20-o1/fs/jffs2/background.c
1477 --- linux-2.4.20/fs/jffs2/background.c Thu Oct 25 09:07:09 2001
1478 +++ linux-2.4.20-o1/fs/jffs2/background.c Wed Mar 12 00:41:43 2003
1481 sprintf(current->comm, "jffs2_gcd_mtd%d", c->mtd->index);
1483 - /* FIXME in the 2.2 backport */
1484 - current->nice = 10;
1487 spin_lock_irq(¤t->sigmask_lock);
1488 siginitsetinv (¤t->blocked, sigmask(SIGHUP) | sigmask(SIGKILL) | sigmask(SIGSTOP) | sigmask(SIGCONT));
1489 diff -urN linux-2.4.20/fs/proc/array.c linux-2.4.20-o1/fs/proc/array.c
1490 --- linux-2.4.20/fs/proc/array.c Sat Aug 3 02:39:45 2002
1491 +++ linux-2.4.20-o1/fs/proc/array.c Wed Mar 12 00:41:43 2003
1494 /* scale priority and nice values from timeslices to -20..20 */
1495 /* to make it look like a "normal" Unix priority/nice value */
1496 - priority = task->counter;
1497 - priority = 20 - (priority * 10 + DEF_COUNTER / 2) / DEF_COUNTER;
1498 - nice = task->nice;
1499 + priority = task_prio(task);
1500 + nice = task_nice(task);
1502 read_lock(&tasklist_lock);
1503 ppid = task->pid ? task->p_opptr->pid : 0;
1513 diff -urN linux-2.4.20/fs/proc/proc_misc.c linux-2.4.20-o1/fs/proc/proc_misc.c
1514 --- linux-2.4.20/fs/proc/proc_misc.c Fri Nov 29 00:53:15 2002
1515 +++ linux-2.4.20-o1/fs/proc/proc_misc.c Wed Mar 12 00:41:43 2003
1516 @@ -106,11 +106,11 @@
1517 a = avenrun[0] + (FIXED_1/200);
1518 b = avenrun[1] + (FIXED_1/200);
1519 c = avenrun[2] + (FIXED_1/200);
1520 - len = sprintf(page,"%d.%02d %d.%02d %d.%02d %d/%d %d\n",
1521 + len = sprintf(page,"%d.%02d %d.%02d %d.%02d %ld/%d %d\n",
1522 LOAD_INT(a), LOAD_FRAC(a),
1523 LOAD_INT(b), LOAD_FRAC(b),
1524 LOAD_INT(c), LOAD_FRAC(c),
1525 - nr_running, nr_threads, last_pid);
1526 + nr_running(), nr_threads, last_pid);
1527 return proc_calc_metrics(page, start, off, count, eof, len);
1534 - idle = init_tasks[0]->times.tms_utime + init_tasks[0]->times.tms_stime;
1535 + idle = init_task.times.tms_utime + init_task.times.tms_stime;
1537 /* The formula for the fraction parts really is ((t * 100) / HZ) % 100, but
1538 that would overflow about every five days at HZ == 100.
1539 @@ -371,10 +371,10 @@
1542 proc_sprintf(page, &off, &len,
1547 - kstat.context_swtch,
1548 + nr_context_switches(),
1549 xtime.tv_sec - jif / HZ,
1552 diff -urN linux-2.4.20/fs/reiserfs/buffer2.c linux-2.4.20-o1/fs/reiserfs/buffer2.c
1553 --- linux-2.4.20/fs/reiserfs/buffer2.c Fri Nov 29 00:53:15 2002
1554 +++ linux-2.4.20-o1/fs/reiserfs/buffer2.c Wed Mar 12 00:41:43 2003
1556 struct buffer_head * reiserfs_bread (struct super_block *super, int n_block, int n_size)
1558 struct buffer_head *result;
1559 - PROC_EXP( unsigned int ctx_switches = kstat.context_swtch );
1560 + PROC_EXP( unsigned int ctx_switches = nr_context_switches(); );
1562 result = bread (super -> s_dev, n_block, n_size);
1563 PROC_INFO_INC( super, breads );
1564 - PROC_EXP( if( kstat.context_swtch != ctx_switches )
1565 + PROC_EXP( if( nr_context_switches() != ctx_switches )
1566 PROC_INFO_INC( super, bread_miss ) );
1569 diff -urN linux-2.4.20/include/asm-alpha/bitops.h linux-2.4.20-o1/include/asm-alpha/bitops.h
1570 --- linux-2.4.20/include/asm-alpha/bitops.h Sat Oct 13 00:35:54 2001
1571 +++ linux-2.4.20-o1/include/asm-alpha/bitops.h Wed Mar 12 00:41:43 2003
1574 #include <linux/config.h>
1575 #include <linux/kernel.h>
1576 +#include <asm/compiler.h>
1579 * Copyright 1994, Linus Torvalds.
1582 __asm__ __volatile__(
1591 :"=&r" (temp), "=m" (*m)
1592 - :"Ir" (~(1UL << (nr & 31))), "m" (*m));
1593 + :"Ir" (1UL << (nr & 31)), "m" (*m));
1597 * WARNING: non atomic version.
1599 static __inline__ void
1600 -__change_bit(unsigned long nr, volatile void * addr)
1601 +__clear_bit(unsigned long nr, volatile void * addr)
1603 int *m = ((int *) addr) + (nr >> 5);
1605 - *m ^= 1 << (nr & 31);
1606 + *m &= ~(1 << (nr & 31));
1611 :"Ir" (1UL << (nr & 31)), "m" (*m));
1615 + * WARNING: non atomic version.
1617 +static __inline__ void
1618 +__change_bit(unsigned long nr, volatile void * addr)
1620 + int *m = ((int *) addr) + (nr >> 5);
1622 + *m ^= 1 << (nr & 31);
1626 test_and_set_bit(unsigned long nr, volatile void *addr)
1628 @@ -181,20 +193,6 @@
1629 return (old & mask) != 0;
1633 - * WARNING: non atomic version.
1635 -static __inline__ int
1636 -__test_and_change_bit(unsigned long nr, volatile void * addr)
1638 - unsigned long mask = 1 << (nr & 0x1f);
1639 - int *m = ((int *) addr) + (nr >> 5);
1643 - return (old & mask) != 0;
1647 test_and_change_bit(unsigned long nr, volatile void * addr)
1649 @@ -220,6 +218,20 @@
1654 + * WARNING: non atomic version.
1656 +static __inline__ int
1657 +__test_and_change_bit(unsigned long nr, volatile void * addr)
1659 + unsigned long mask = 1 << (nr & 0x1f);
1660 + int *m = ((int *) addr) + (nr >> 5);
1664 + return (old & mask) != 0;
1668 test_bit(int nr, volatile void * addr)
1670 @@ -235,12 +247,15 @@
1672 static inline unsigned long ffz_b(unsigned long x)
1674 - unsigned long sum = 0;
1675 + unsigned long sum, x1, x2, x4;
1677 x = ~x & -~x; /* set first 0 bit, clear others */
1678 - if (x & 0xF0) sum += 4;
1679 - if (x & 0xCC) sum += 2;
1680 - if (x & 0xAA) sum += 1;
1685 + sum += (x4 != 0) * 4;
1690 @@ -257,24 +272,46 @@
1692 __asm__("cmpbge %1,%2,%0" : "=r"(bits) : "r"(word), "r"(~0UL));
1694 - __asm__("extbl %1,%2,%0" : "=r"(bits) : "r"(word), "r"(qofs));
1695 + bits = __kernel_extbl(word, qofs);
1698 return qofs*8 + bofs;
1703 + * __ffs = Find First set bit in word. Undefined if no set bit exists.
1705 +static inline unsigned long __ffs(unsigned long word)
1707 +#if defined(__alpha_cix__) && defined(__alpha_fix__)
1708 + /* Whee. EV67 can calculate it directly. */
1709 + unsigned long result;
1710 + __asm__("cttz %1,%0" : "=r"(result) : "r"(word));
1713 + unsigned long bits, qofs, bofs;
1715 + __asm__("cmpbge $31,%1,%0" : "=r"(bits) : "r"(word));
1716 + qofs = ffz_b(bits);
1717 + bits = __kernel_extbl(word, qofs);
1718 + bofs = ffz_b(~bits);
1720 + return qofs*8 + bofs;
1727 * ffs: find first bit set. This is defined the same way as
1728 * the libc and compiler builtin ffs routines, therefore
1729 - * differs in spirit from the above ffz (man ffs).
1730 + * differs in spirit from the above __ffs.
1733 static inline int ffs(int word)
1735 - int result = ffz(~word);
1736 + int result = __ffs(word);
1737 return word ? result+1 : 0;
1740 @@ -316,6 +353,14 @@
1741 #define hweight16(x) hweight64((x) & 0xfffful)
1742 #define hweight8(x) hweight64((x) & 0xfful)
1744 +static inline unsigned long hweight64(unsigned long w)
1746 + unsigned long result;
1747 + for (result = 0; w ; w >>= 1)
1748 + result += (w & 1);
1752 #define hweight32(x) generic_hweight32(x)
1753 #define hweight16(x) generic_hweight16(x)
1754 #define hweight8(x) generic_hweight8(x)
1755 @@ -365,12 +410,76 @@
1759 - * The optimizer actually does good code for this case..
1760 + * Find next one bit in a bitmap reasonably efficiently.
1762 +static inline unsigned long
1763 +find_next_bit(void * addr, unsigned long size, unsigned long offset)
1765 + unsigned long * p = ((unsigned long *) addr) + (offset >> 6);
1766 + unsigned long result = offset & ~63UL;
1767 + unsigned long tmp;
1769 + if (offset >= size)
1775 + tmp &= ~0UL << offset;
1779 + goto found_middle;
1783 + while (size & ~63UL) {
1784 + if ((tmp = *(p++)))
1785 + goto found_middle;
1793 + tmp &= ~0UL >> (64 - size);
1795 + return result + size;
1797 + return result + __ffs(tmp);
1801 + * The optimizer actually does good code for this case.
1803 #define find_first_zero_bit(addr, size) \
1804 find_next_zero_bit((addr), (size), 0)
1805 +#define find_first_bit(addr, size) \
1806 + find_next_bit((addr), (size), 0)
1811 + * Every architecture must define this function. It's the fastest
1812 + * way of searching a 140-bit bitmap where the first 100 bits are
1813 + * unlikely to be set. It's guaranteed that at least one of the 140
1816 +static inline unsigned long
1817 +sched_find_first_bit(unsigned long b[3])
1819 + unsigned long b0 = b[0], b1 = b[1], b2 = b[2];
1820 + unsigned long ofs;
1822 + ofs = (b1 ? 64 : 128);
1823 + b1 = (b1 ? b1 : b2);
1824 + ofs = (b0 ? 0 : ofs);
1825 + b0 = (b0 ? b0 : b1);
1827 + return __ffs(b0) + ofs;
1831 #define ext2_set_bit __test_and_set_bit
1832 #define ext2_clear_bit __test_and_clear_bit
1833 diff -urN linux-2.4.20/include/asm-alpha/smp.h linux-2.4.20-o1/include/asm-alpha/smp.h
1834 --- linux-2.4.20/include/asm-alpha/smp.h Fri Sep 14 00:21:32 2001
1835 +++ linux-2.4.20-o1/include/asm-alpha/smp.h Wed Mar 12 00:41:43 2003
1837 #define cpu_logical_map(cpu) __cpu_logical_map[cpu]
1839 #define hard_smp_processor_id() __hard_smp_processor_id()
1840 -#define smp_processor_id() (current->processor)
1841 +#define smp_processor_id() (current->cpu)
1843 extern unsigned long cpu_present_mask;
1844 #define cpu_online_map cpu_present_mask
1845 diff -urN linux-2.4.20/include/asm-alpha/system.h linux-2.4.20-o1/include/asm-alpha/system.h
1846 --- linux-2.4.20/include/asm-alpha/system.h Fri Oct 5 03:47:08 2001
1847 +++ linux-2.4.20-o1/include/asm-alpha/system.h Wed Mar 12 00:41:43 2003
1849 extern void halt(void) __attribute__((noreturn));
1850 #define __halt() __asm__ __volatile__ ("call_pal %0 #halt" : : "i" (PAL_halt))
1852 -#define prepare_to_switch() do { } while(0)
1853 #define switch_to(prev,next,last) \
1855 unsigned long pcbb; \
1856 diff -urN linux-2.4.20/include/asm-arm/bitops.h linux-2.4.20-o1/include/asm-arm/bitops.h
1857 --- linux-2.4.20/include/asm-arm/bitops.h Sun Aug 12 20:14:00 2001
1858 +++ linux-2.4.20-o1/include/asm-arm/bitops.h Wed Mar 12 00:41:43 2003
1860 * Copyright 1995, Russell King.
1861 * Various bits and pieces copyrights include:
1862 * Linus Torvalds (test_bit).
1863 + * Big endian support: Copyright 2001, Nicolas Pitre
1864 + * reworked by rmk.
1866 * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
1868 @@ -17,81 +19,271 @@
1872 +#include <asm/system.h>
1874 #define smp_mb__before_clear_bit() do { } while (0)
1875 #define smp_mb__after_clear_bit() do { } while (0)
1878 - * Function prototypes to keep gcc -Wall happy.
1879 + * These functions are the basis of our bit ops.
1880 + * First, the atomic bitops.
1882 + * The endian issue for these functions is handled by the macros below.
1884 -extern void set_bit(int nr, volatile void * addr);
1886 +____atomic_set_bit_mask(unsigned int mask, volatile unsigned char *p)
1888 + unsigned long flags;
1890 + local_irq_save(flags);
1892 + local_irq_restore(flags);
1896 +____atomic_clear_bit_mask(unsigned int mask, volatile unsigned char *p)
1898 + unsigned long flags;
1900 + local_irq_save(flags);
1902 + local_irq_restore(flags);
1906 +____atomic_change_bit_mask(unsigned int mask, volatile unsigned char *p)
1908 + unsigned long flags;
1910 + local_irq_save(flags);
1912 + local_irq_restore(flags);
1915 -static inline void __set_bit(int nr, volatile void *addr)
1917 +____atomic_test_and_set_bit_mask(unsigned int mask, volatile unsigned char *p)
1919 - ((unsigned char *) addr)[nr >> 3] |= (1U << (nr & 7));
1920 + unsigned long flags;
1923 + local_irq_save(flags);
1926 + local_irq_restore(flags);
1928 + return res & mask;
1931 -extern void clear_bit(int nr, volatile void * addr);
1933 +____atomic_test_and_clear_bit_mask(unsigned int mask, volatile unsigned char *p)
1935 + unsigned long flags;
1938 + local_irq_save(flags);
1941 + local_irq_restore(flags);
1943 + return res & mask;
1946 -static inline void __clear_bit(int nr, volatile void *addr)
1948 +____atomic_test_and_change_bit_mask(unsigned int mask, volatile unsigned char *p)
1950 - ((unsigned char *) addr)[nr >> 3] &= ~(1U << (nr & 7));
1951 + unsigned long flags;
1954 + local_irq_save(flags);
1957 + local_irq_restore(flags);
1959 + return res & mask;
1962 -extern void change_bit(int nr, volatile void * addr);
1964 + * Now the non-atomic variants. We let the compiler handle all optimisations
1967 +static inline void ____nonatomic_set_bit(int nr, volatile void *p)
1969 + ((unsigned char *) p)[nr >> 3] |= (1U << (nr & 7));
1972 -static inline void __change_bit(int nr, volatile void *addr)
1973 +static inline void ____nonatomic_clear_bit(int nr, volatile void *p)
1975 - ((unsigned char *) addr)[nr >> 3] ^= (1U << (nr & 7));
1976 + ((unsigned char *) p)[nr >> 3] &= ~(1U << (nr & 7));
1979 -extern int test_and_set_bit(int nr, volatile void * addr);
1980 +static inline void ____nonatomic_change_bit(int nr, volatile void *p)
1982 + ((unsigned char *) p)[nr >> 3] ^= (1U << (nr & 7));
1985 -static inline int __test_and_set_bit(int nr, volatile void *addr)
1986 +static inline int ____nonatomic_test_and_set_bit(int nr, volatile void *p)
1988 unsigned int mask = 1 << (nr & 7);
1989 unsigned int oldval;
1991 - oldval = ((unsigned char *) addr)[nr >> 3];
1992 - ((unsigned char *) addr)[nr >> 3] = oldval | mask;
1993 + oldval = ((unsigned char *) p)[nr >> 3];
1994 + ((unsigned char *) p)[nr >> 3] = oldval | mask;
1995 return oldval & mask;
1998 -extern int test_and_clear_bit(int nr, volatile void * addr);
2000 -static inline int __test_and_clear_bit(int nr, volatile void *addr)
2001 +static inline int ____nonatomic_test_and_clear_bit(int nr, volatile void *p)
2003 unsigned int mask = 1 << (nr & 7);
2004 unsigned int oldval;
2006 - oldval = ((unsigned char *) addr)[nr >> 3];
2007 - ((unsigned char *) addr)[nr >> 3] = oldval & ~mask;
2008 + oldval = ((unsigned char *) p)[nr >> 3];
2009 + ((unsigned char *) p)[nr >> 3] = oldval & ~mask;
2010 return oldval & mask;
2013 -extern int test_and_change_bit(int nr, volatile void * addr);
2015 -static inline int __test_and_change_bit(int nr, volatile void *addr)
2016 +static inline int ____nonatomic_test_and_change_bit(int nr, volatile void *p)
2018 unsigned int mask = 1 << (nr & 7);
2019 unsigned int oldval;
2021 - oldval = ((unsigned char *) addr)[nr >> 3];
2022 - ((unsigned char *) addr)[nr >> 3] = oldval ^ mask;
2023 + oldval = ((unsigned char *) p)[nr >> 3];
2024 + ((unsigned char *) p)[nr >> 3] = oldval ^ mask;
2025 return oldval & mask;
2028 -extern int find_first_zero_bit(void * addr, unsigned size);
2029 -extern int find_next_zero_bit(void * addr, int size, int offset);
2032 * This routine doesn't need to be atomic.
2034 -static inline int test_bit(int nr, const void * addr)
2035 +static inline int ____test_bit(int nr, const void * p)
2037 - return (((unsigned char *) addr)[nr >> 3] >> (nr & 7)) & 1;
2038 + return (((volatile unsigned char *) p)[nr >> 3] >> (nr & 7)) & 1;
2042 + * A note about Endian-ness.
2043 + * -------------------------
2045 + * When the ARM is put into big endian mode via CR15, the processor
2046 + * merely swaps the order of bytes within words, thus:
2048 + * ------------ physical data bus bits -----------
2049 + * D31 ... D24 D23 ... D16 D15 ... D8 D7 ... D0
2050 + * little byte 3 byte 2 byte 1 byte 0
2051 + * big byte 0 byte 1 byte 2 byte 3
2053 + * This means that reading a 32-bit word at address 0 returns the same
2054 + * value irrespective of the endian mode bit.
2056 + * Peripheral devices should be connected with the data bus reversed in
2057 + * "Big Endian" mode. ARM Application Note 61 is applicable, and is
2058 + * available from http://www.arm.com/.
2060 + * The following assumes that the data bus connectivity for big endian
2061 + * mode has been followed.
2063 + * Note that bit 0 is defined to be 32-bit word bit 0, not byte 0 bit 0.
2067 + * Little endian assembly bitops. nr = 0 -> byte 0 bit 0.
2069 +extern void _set_bit_le(int nr, volatile void * p);
2070 +extern void _clear_bit_le(int nr, volatile void * p);
2071 +extern void _change_bit_le(int nr, volatile void * p);
2072 +extern int _test_and_set_bit_le(int nr, volatile void * p);
2073 +extern int _test_and_clear_bit_le(int nr, volatile void * p);
2074 +extern int _test_and_change_bit_le(int nr, volatile void * p);
2075 +extern int _find_first_zero_bit_le(void * p, unsigned size);
2076 +extern int _find_next_zero_bit_le(void * p, int size, int offset);
2079 + * Big endian assembly bitops. nr = 0 -> byte 3 bit 0.
2081 +extern void _set_bit_be(int nr, volatile void * p);
2082 +extern void _clear_bit_be(int nr, volatile void * p);
2083 +extern void _change_bit_be(int nr, volatile void * p);
2084 +extern int _test_and_set_bit_be(int nr, volatile void * p);
2085 +extern int _test_and_clear_bit_be(int nr, volatile void * p);
2086 +extern int _test_and_change_bit_be(int nr, volatile void * p);
2087 +extern int _find_first_zero_bit_be(void * p, unsigned size);
2088 +extern int _find_next_zero_bit_be(void * p, int size, int offset);
2092 + * The __* form of bitops are non-atomic and may be reordered.
2094 +#define ATOMIC_BITOP_LE(name,nr,p) \
2095 + (__builtin_constant_p(nr) ? \
2096 + ____atomic_##name##_mask(1 << ((nr) & 7), \
2097 + ((unsigned char *)(p)) + ((nr) >> 3)) : \
2098 + _##name##_le(nr,p))
2100 +#define ATOMIC_BITOP_BE(name,nr,p) \
2101 + (__builtin_constant_p(nr) ? \
2102 + ____atomic_##name##_mask(1 << ((nr) & 7), \
2103 + ((unsigned char *)(p)) + (((nr) >> 3) ^ 3)) : \
2104 + _##name##_be(nr,p))
2106 +#define NONATOMIC_BITOP_LE(name,nr,p) \
2107 + (____nonatomic_##name(nr, p))
2109 +#define NONATOMIC_BITOP_BE(name,nr,p) \
2110 + (____nonatomic_##name(nr ^ 0x18, p))
2114 + * These are the little endian, atomic definitions.
2116 +#define set_bit(nr,p) ATOMIC_BITOP_LE(set_bit,nr,p)
2117 +#define clear_bit(nr,p) ATOMIC_BITOP_LE(clear_bit,nr,p)
2118 +#define change_bit(nr,p) ATOMIC_BITOP_LE(change_bit,nr,p)
2119 +#define test_and_set_bit(nr,p) ATOMIC_BITOP_LE(test_and_set_bit,nr,p)
2120 +#define test_and_clear_bit(nr,p) ATOMIC_BITOP_LE(test_and_clear_bit,nr,p)
2121 +#define test_and_change_bit(nr,p) ATOMIC_BITOP_LE(test_and_change_bit,nr,p)
2122 +#define test_bit(nr,p) ____test_bit(nr,p)
2123 +#define find_first_zero_bit(p,sz) _find_first_zero_bit_le(p,sz)
2124 +#define find_next_zero_bit(p,sz,off) _find_next_zero_bit_le(p,sz,off)
2127 + * These are the little endian, non-atomic definitions.
2129 +#define __set_bit(nr,p) NONATOMIC_BITOP_LE(set_bit,nr,p)
2130 +#define __clear_bit(nr,p) NONATOMIC_BITOP_LE(clear_bit,nr,p)
2131 +#define __change_bit(nr,p) NONATOMIC_BITOP_LE(change_bit,nr,p)
2132 +#define __test_and_set_bit(nr,p) NONATOMIC_BITOP_LE(test_and_set_bit,nr,p)
2133 +#define __test_and_clear_bit(nr,p) NONATOMIC_BITOP_LE(test_and_clear_bit,nr,p)
2134 +#define __test_and_change_bit(nr,p) NONATOMIC_BITOP_LE(test_and_change_bit,nr,p)
2135 +#define __test_bit(nr,p) ____test_bit(nr,p)
2140 + * These are the big endian, atomic definitions.
2142 +#define set_bit(nr,p) ATOMIC_BITOP_BE(set_bit,nr,p)
2143 +#define clear_bit(nr,p) ATOMIC_BITOP_BE(clear_bit,nr,p)
2144 +#define change_bit(nr,p) ATOMIC_BITOP_BE(change_bit,nr,p)
2145 +#define test_and_set_bit(nr,p) ATOMIC_BITOP_BE(test_and_set_bit,nr,p)
2146 +#define test_and_clear_bit(nr,p) ATOMIC_BITOP_BE(test_and_clear_bit,nr,p)
2147 +#define test_and_change_bit(nr,p) ATOMIC_BITOP_BE(test_and_change_bit,nr,p)
2148 +#define test_bit(nr,p) ____test_bit((nr) ^ 0x18, p)
2149 +#define find_first_zero_bit(p,sz) _find_first_zero_bit_be(p,sz)
2150 +#define find_next_zero_bit(p,sz,off) _find_next_zero_bit_be(p,sz,off)
2153 + * These are the big endian, non-atomic definitions.
2155 +#define __set_bit(nr,p) NONATOMIC_BITOP_BE(set_bit,nr,p)
2156 +#define __clear_bit(nr,p) NONATOMIC_BITOP_BE(clear_bit,nr,p)
2157 +#define __change_bit(nr,p) NONATOMIC_BITOP_BE(change_bit,nr,p)
2158 +#define __test_and_set_bit(nr,p) NONATOMIC_BITOP_BE(test_and_set_bit,nr,p)
2159 +#define __test_and_clear_bit(nr,p) NONATOMIC_BITOP_BE(test_and_clear_bit,nr,p)
2160 +#define __test_and_change_bit(nr,p) NONATOMIC_BITOP_BE(test_and_change_bit,nr,p)
2161 +#define __test_bit(nr,p) ____test_bit((nr) ^ 0x18, p)
2166 * ffz = Find First Zero in word. Undefined if no zero exists,
2167 * so code should check against ~0UL first..
2169 @@ -110,6 +302,29 @@
2173 + * ffz = Find First Zero in word. Undefined if no zero exists,
2174 + * so code should check against ~0UL first..
2176 +static inline unsigned long __ffs(unsigned long word)
2181 + if (word & 0x0000ffff) { k -= 16; word <<= 16; }
2182 + if (word & 0x00ff0000) { k -= 8; word <<= 8; }
2183 + if (word & 0x0f000000) { k -= 4; word <<= 4; }
2184 + if (word & 0x30000000) { k -= 2; word <<= 2; }
2185 + if (word & 0x40000000) { k -= 1; }
2190 + * fls: find last bit set.
2193 +#define fls(x) generic_fls(x)
2196 * ffs: find first bit set. This is defined the same way as
2197 * the libc and compiler builtin ffs routines, therefore
2198 * differs in spirit from the above ffz (man ffs).
2199 @@ -118,6 +333,22 @@
2200 #define ffs(x) generic_ffs(x)
2203 + * Find first bit set in a 168-bit bitmap, where the first
2204 + * 128 bits are unlikely to be set.
2206 +static inline int sched_find_first_bit(unsigned long *b)
2211 + for (off = 0; v = b[off], off < 4; off++) {
2215 + return __ffs(v) + off * 32;
2219 * hweightN: returns the hamming weight (i.e. the number
2220 * of bits set) of a N-bit word
2222 @@ -126,18 +357,25 @@
2223 #define hweight16(x) generic_hweight16(x)
2224 #define hweight8(x) generic_hweight8(x)
2226 -#define ext2_set_bit test_and_set_bit
2227 -#define ext2_clear_bit test_and_clear_bit
2228 -#define ext2_test_bit test_bit
2229 -#define ext2_find_first_zero_bit find_first_zero_bit
2230 -#define ext2_find_next_zero_bit find_next_zero_bit
2232 -/* Bitmap functions for the minix filesystem. */
2233 -#define minix_test_and_set_bit(nr,addr) test_and_set_bit(nr,addr)
2234 -#define minix_set_bit(nr,addr) set_bit(nr,addr)
2235 -#define minix_test_and_clear_bit(nr,addr) test_and_clear_bit(nr,addr)
2236 -#define minix_test_bit(nr,addr) test_bit(nr,addr)
2237 -#define minix_find_first_zero_bit(addr,size) find_first_zero_bit(addr,size)
2239 + * Ext2 is defined to use little-endian byte ordering.
2240 + * These do not need to be atomic.
2242 +#define ext2_set_bit(nr,p) NONATOMIC_BITOP_LE(test_and_set_bit,nr,p)
2243 +#define ext2_clear_bit(nr,p) NONATOMIC_BITOP_LE(test_and_clear_bit,nr,p)
2244 +#define ext2_test_bit(nr,p) __test_bit(nr,p)
2245 +#define ext2_find_first_zero_bit(p,sz) _find_first_zero_bit_le(p,sz)
2246 +#define ext2_find_next_zero_bit(p,sz,off) _find_next_zero_bit_le(p,sz,off)
2249 + * Minix is defined to use little-endian byte ordering.
2250 + * These do not need to be atomic.
2252 +#define minix_set_bit(nr,p) NONATOMIC_BITOP_LE(set_bit,nr,p)
2253 +#define minix_test_bit(nr,p) __test_bit(nr,p)
2254 +#define minix_test_and_set_bit(nr,p) NONATOMIC_BITOP_LE(test_and_set_bit,nr,p)
2255 +#define minix_test_and_clear_bit(nr,p) NONATOMIC_BITOP_LE(test_and_clear_bit,nr,p)
2256 +#define minix_find_first_zero_bit(p,sz) _find_first_zero_bit_le(p,sz)
2258 #endif /* __KERNEL__ */
2260 diff -urN linux-2.4.20/include/asm-cris/bitops.h linux-2.4.20-o1/include/asm-cris/bitops.h
2261 --- linux-2.4.20/include/asm-cris/bitops.h Mon Feb 25 20:38:10 2002
2262 +++ linux-2.4.20-o1/include/asm-cris/bitops.h Wed Mar 12 00:41:43 2003
2264 /* We use generic_ffs so get it; include guards resolve the possible
2265 mutually inclusion. */
2266 #include <linux/bitops.h>
2267 +#include <linux/compiler.h>
2270 * Some hacks to defeat gcc over-optimizations..
2273 #define set_bit(nr, addr) (void)test_and_set_bit(nr, addr)
2275 +#define __set_bit(nr, addr) (void)__test_and_set_bit(nr, addr)
2278 * clear_bit - Clears a bit in memory
2282 #define clear_bit(nr, addr) (void)test_and_clear_bit(nr, addr)
2284 +#define __clear_bit(nr, addr) (void)__test_and_clear_bit(nr, addr)
2287 * change_bit - Toggle a bit in memory
2290 * It also implies a memory barrier.
2293 -extern __inline__ int test_and_set_bit(int nr, void *addr)
2294 +extern inline int test_and_set_bit(int nr, void *addr)
2296 unsigned int mask, retval;
2297 unsigned long flags;
2298 @@ -105,6 +110,18 @@
2302 +extern inline int __test_and_set_bit(int nr, void *addr)
2304 + unsigned int mask, retval;
2305 + unsigned int *adr = (unsigned int *)addr;
2308 + mask = 1 << (nr & 0x1f);
2309 + retval = (mask & *adr) != 0;
2315 * clear_bit() doesn't provide any barrier for the compiler.
2318 * It also implies a memory barrier.
2321 -extern __inline__ int test_and_clear_bit(int nr, void *addr)
2322 +extern inline int test_and_clear_bit(int nr, void *addr)
2324 unsigned int mask, retval;
2325 unsigned long flags;
2327 * but actually fail. You must protect multiple accesses with a lock.
2330 -extern __inline__ int __test_and_clear_bit(int nr, void *addr)
2331 +extern inline int __test_and_clear_bit(int nr, void *addr)
2333 unsigned int mask, retval;
2334 unsigned int *adr = (unsigned int *)addr;
2336 * It also implies a memory barrier.
2339 -extern __inline__ int test_and_change_bit(int nr, void *addr)
2340 +extern inline int test_and_change_bit(int nr, void *addr)
2342 unsigned int mask, retval;
2343 unsigned long flags;
2346 /* WARNING: non atomic and it can be reordered! */
2348 -extern __inline__ int __test_and_change_bit(int nr, void *addr)
2349 +extern inline int __test_and_change_bit(int nr, void *addr)
2351 unsigned int mask, retval;
2352 unsigned int *adr = (unsigned int *)addr;
2354 * This routine doesn't need to be atomic.
2357 -extern __inline__ int test_bit(int nr, const void *addr)
2358 +extern inline int test_bit(int nr, const void *addr)
2361 unsigned int *adr = (unsigned int *)addr;
2363 * number. They differ in that the first function also inverts all bits
2366 -extern __inline__ unsigned long cris_swapnwbrlz(unsigned long w)
2367 +extern inline unsigned long cris_swapnwbrlz(unsigned long w)
2369 /* Let's just say we return the result in the same register as the
2370 input. Saying we clobber the input but can return the result
2375 -extern __inline__ unsigned long cris_swapwbrlz(unsigned long w)
2376 +extern inline unsigned long cris_swapwbrlz(unsigned long w)
2379 __asm__ ("swapwbr %0 \n\t"
2381 * ffz = Find First Zero in word. Undefined if no zero exists,
2382 * so code should check against ~0UL first..
2384 -extern __inline__ unsigned long ffz(unsigned long w)
2385 +extern inline unsigned long ffz(unsigned long w)
2387 /* The generic_ffs function is used to avoid the asm when the
2388 argument is a constant. */
2390 * Somewhat like ffz but the equivalent of generic_ffs: in contrast to
2391 * ffz we return the first one-bit *plus one*.
2393 -extern __inline__ unsigned long kernel_ffs(unsigned long w)
2394 +extern inline unsigned long kernel_ffs(unsigned long w)
2396 /* The generic_ffs function is used to avoid the asm when the
2397 argument is a constant. */
2399 * @offset: The bitnumber to start searching at
2400 * @size: The maximum size to search
2402 -extern __inline__ int find_next_zero_bit (void * addr, int size, int offset)
2403 +extern inline int find_next_zero_bit (void * addr, int size, int offset)
2405 unsigned long *p = ((unsigned long *) addr) + (offset >> 5);
2406 unsigned long result = offset & ~31UL;
2407 @@ -354,7 +371,45 @@
2408 #define minix_test_bit(nr,addr) test_bit(nr,addr)
2409 #define minix_find_first_zero_bit(addr,size) find_first_zero_bit(addr,size)
2411 -#endif /* __KERNEL__ */
2413 +/* TODO: see below */
2414 +#define sched_find_first_zero_bit(addr) find_first_zero_bit(addr, 168)
2417 +/* TODO: left out pending where to put it.. (there are .h dependencies) */
2420 + * Every architecture must define this function. It's the fastest
2421 + * way of searching a 168-bit bitmap where the first 128 bits are
2422 + * unlikely to be set. It's guaranteed that at least one of the 168
2423 + * bits is cleared.
2426 +#if MAX_RT_PRIO != 128 || MAX_PRIO != 168
2427 +# error update this function.
2430 +#define MAX_RT_PRIO 128
2431 +#define MAX_PRIO 168
2434 +static inline int sched_find_first_zero_bit(char *bitmap)
2436 + unsigned int *b = (unsigned int *)bitmap;
2439 + rt = b[0] & b[1] & b[2] & b[3];
2440 + if (unlikely(rt != 0xffffffff))
2441 + return find_first_zero_bit(bitmap, MAX_RT_PRIO);
2444 + return ffz(b[4]) + MAX_RT_PRIO;
2445 + return ffz(b[5]) + 32 + MAX_RT_PRIO;
2451 +#endif /* __KERNEL__ */
2453 #endif /* _CRIS_BITOPS_H */
2454 diff -urN linux-2.4.20/include/asm-generic/bitops.h linux-2.4.20-o1/include/asm-generic/bitops.h
2455 --- linux-2.4.20/include/asm-generic/bitops.h Tue Nov 28 02:47:38 2000
2456 +++ linux-2.4.20-o1/include/asm-generic/bitops.h Wed Mar 12 00:41:43 2003
2458 return ((mask & *addr) != 0);
2462 + * fls: find last bit set.
2465 +#define fls(x) generic_fls(x)
2470 diff -urN linux-2.4.20/include/asm-i386/bitops.h linux-2.4.20-o1/include/asm-i386/bitops.h
2471 --- linux-2.4.20/include/asm-i386/bitops.h Fri Nov 29 00:53:15 2002
2472 +++ linux-2.4.20-o1/include/asm-i386/bitops.h Wed Mar 12 00:41:43 2003
2476 #include <linux/config.h>
2477 +#include <linux/compiler.h>
2480 * These have to be done with inline assembly: that way the bit-setting
2486 +static __inline__ void __clear_bit(int nr, volatile void * addr)
2488 + __asm__ __volatile__(
2493 #define smp_mb__before_clear_bit() barrier()
2494 #define smp_mb__after_clear_bit() barrier()
2496 @@ -284,6 +293,34 @@
2500 + * find_first_bit - find the first set bit in a memory region
2501 + * @addr: The address to start the search at
2502 + * @size: The maximum size to search
2504 + * Returns the bit-number of the first set bit, not the number of the byte
2505 + * containing a bit.
2507 +static __inline__ int find_first_bit(void * addr, unsigned size)
2512 + /* This looks at memory. Mark it volatile to tell gcc not to move it around */
2513 + __asm__ __volatile__(
2514 + "xorl %%eax,%%eax\n\t"
2517 + "leal -4(%%edi),%%edi\n\t"
2518 + "bsfl (%%edi),%%eax\n"
2519 + "1:\tsubl %%ebx,%%edi\n\t"
2520 + "shll $3,%%edi\n\t"
2521 + "addl %%edi,%%eax"
2522 + :"=a" (res), "=&c" (d0), "=&D" (d1)
2523 + :"1" ((size + 31) >> 5), "2" (addr), "b" (addr));
2528 * find_next_zero_bit - find the first zero bit in a memory region
2529 * @addr: The address to base the search on
2530 * @offset: The bitnumber to start searching at
2535 - * Look for zero in first byte
2536 + * Look for zero in the first 32 bits.
2538 __asm__("bsfl %1,%0\n\t"
2540 @@ -317,6 +354,39 @@
2544 + * find_next_bit - find the first set bit in a memory region
2545 + * @addr: The address to base the search on
2546 + * @offset: The bitnumber to start searching at
2547 + * @size: The maximum size to search
2549 +static __inline__ int find_next_bit (void * addr, int size, int offset)
2551 + unsigned long * p = ((unsigned long *) addr) + (offset >> 5);
2552 + int set = 0, bit = offset & 31, res;
2556 + * Look for nonzero in the first 32 bits:
2558 + __asm__("bsfl %1,%0\n\t"
2563 + : "r" (*p >> bit));
2564 + if (set < (32 - bit))
2565 + return set + offset;
2570 + * No set bit yet, search remaining full words for a bit
2572 + res = find_first_bit (p, size - 32 * (p - (unsigned long *) addr));
2573 + return (offset + set + res);
2577 * ffz - find first zero in word.
2578 * @word: The word to search
2580 @@ -330,7 +400,40 @@
2585 + * __ffs - find first bit in word.
2586 + * @word: The word to search
2588 + * Undefined if no bit exists, so code should check against 0 first.
2590 +static __inline__ unsigned long __ffs(unsigned long word)
2592 + __asm__("bsfl %1,%0"
2601 + * Every architecture must define this function. It's the fastest
2602 + * way of searching a 140-bit bitmap where the first 100 bits are
2603 + * unlikely to be set. It's guaranteed that at least one of the 140
2604 + * bits is cleared.
2606 +static inline int sched_find_first_bit(unsigned long *b)
2608 + if (unlikely(b[0]))
2609 + return __ffs(b[0]);
2610 + if (unlikely(b[1]))
2611 + return __ffs(b[1]) + 32;
2612 + if (unlikely(b[2]))
2613 + return __ffs(b[2]) + 64;
2615 + return __ffs(b[3]) + 96;
2616 + return __ffs(b[4]) + 128;
2620 * ffs - find first bit set
2621 diff -urN linux-2.4.20/include/asm-i386/mmu_context.h linux-2.4.20-o1/include/asm-i386/mmu_context.h
2622 --- linux-2.4.20/include/asm-i386/mmu_context.h Sat Aug 3 02:39:45 2002
2623 +++ linux-2.4.20-o1/include/asm-i386/mmu_context.h Wed Mar 12 00:41:43 2003
2626 static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk, unsigned cpu)
2628 - if (prev != next) {
2629 + if (likely(prev != next)) {
2630 /* stop flush ipis for the previous mm */
2631 clear_bit(cpu, &prev->cpu_vm_mask);
2633 * Re-load LDT if necessary
2635 - if (prev->context.segments != next->context.segments)
2636 + if (unlikely(prev->context.segments != next->context.segments))
2639 cpu_tlbstate[cpu].state = TLBSTATE_OK;
2640 diff -urN linux-2.4.20/include/asm-i386/processor.h linux-2.4.20-o1/include/asm-i386/processor.h
2641 --- linux-2.4.20/include/asm-i386/processor.h Sat Aug 3 02:39:45 2002
2642 +++ linux-2.4.20-o1/include/asm-i386/processor.h Wed Mar 12 00:41:43 2003
2645 #define cpu_relax() rep_nop()
2647 +#define ARCH_HAS_SMP_BALANCE
2649 /* Prefetch instructions for Pentium III and AMD Athlon */
2650 #ifdef CONFIG_MPENTIUMIII
2652 diff -urN linux-2.4.20/include/asm-i386/smp.h linux-2.4.20-o1/include/asm-i386/smp.h
2653 --- linux-2.4.20/include/asm-i386/smp.h Fri Nov 29 00:53:15 2002
2654 +++ linux-2.4.20-o1/include/asm-i386/smp.h Wed Mar 12 00:41:43 2003
2656 extern void smp_flush_tlb(void);
2657 extern void smp_message_irq(int cpl, void *dev_id, struct pt_regs *regs);
2658 extern void smp_send_reschedule(int cpu);
2659 +extern void smp_send_reschedule_all(void);
2660 extern void smp_invalidate_rcv(void); /* Process an NMI */
2661 extern void (*mtrr_hook) (void);
2662 extern void zap_low_mappings (void);
2664 * so this is correct in the x86 case.
2667 -#define smp_processor_id() (current->processor)
2668 +#define smp_processor_id() (current->cpu)
2670 static __inline int hard_smp_processor_id(void)
2673 #endif /* !__ASSEMBLY__ */
2675 #define NO_PROC_ID 0xFF /* No processor magic marker */
2678 - * This magic constant controls our willingness to transfer
2679 - * a process across CPUs. Such a transfer incurs misses on the L1
2680 - * cache, and on a P6 or P5 with multiple L2 caches L2 hits. My
2681 - * gut feeling is this will vary by board in value. For a board
2682 - * with separate L2 cache it probably depends also on the RSS, and
2683 - * for a board with shared L2 cache it ought to decay fast as other
2684 - * processes are run.
2687 -#define PROC_CHANGE_PENALTY 15 /* Schedule penalty */
2691 diff -urN linux-2.4.20/include/asm-i386/smp_balance.h linux-2.4.20-o1/include/asm-i386/smp_balance.h
2692 --- linux-2.4.20/include/asm-i386/smp_balance.h Thu Jan 1 01:00:00 1970
2693 +++ linux-2.4.20-o1/include/asm-i386/smp_balance.h Wed Mar 12 00:41:43 2003
2695 +#ifndef _ASM_SMP_BALANCE_H
2696 +#define _ASM_SMP_BALANCE_H
2699 + * We have an architecture-specific SMP load balancer to improve
2700 + * scheduling behavior on hyperthreaded CPUs. Since only P4s have
2701 + * HT, maybe this should be conditional on CONFIG_MPENTIUM4...
2706 + * Find any idle processor package (i.e. both virtual processors are idle)
2708 +static inline int find_idle_package(int this_cpu)
2712 + this_cpu = cpu_number_map(this_cpu);
2714 + for (i = (this_cpu + 1) % smp_num_cpus;
2716 + i = (i + 1) % smp_num_cpus) {
2717 + int physical = cpu_logical_map(i);
2718 + int sibling = cpu_sibling_map[physical];
2720 + if (idle_cpu(physical) && idle_cpu(sibling))
2723 + return -1; /* not found */
2726 +static inline int arch_reschedule_idle_override(task_t * p, int idle)
2728 + if (unlikely(smp_num_siblings > 1) && !idle_cpu(cpu_sibling_map[idle])) {
2729 + int true_idle = find_idle_package(idle);
2730 + if (true_idle >= 0) {
2731 + if (likely(p->cpus_allowed & (1UL << true_idle)))
2734 + true_idle = cpu_sibling_map[true_idle];
2735 + if (p->cpus_allowed & (1UL << true_idle))
2744 +static inline int arch_load_balance(int this_cpu, int idle)
2746 + /* Special hack for hyperthreading */
2747 + if (unlikely(smp_num_siblings > 1 && idle == 2 && !idle_cpu(cpu_sibling_map[this_cpu]))) {
2749 + struct runqueue *rq_target;
2751 + if ((found = find_idle_package(this_cpu)) >= 0 ) {
2752 + rq_target = cpu_rq(found);
2753 + resched_task(rq_target->idle);
2760 +#endif /* _ASM_SMP_BALANCE_H */
2761 diff -urN linux-2.4.20/include/asm-i386/system.h linux-2.4.20-o1/include/asm-i386/system.h
2762 --- linux-2.4.20/include/asm-i386/system.h Fri Nov 29 00:53:15 2002
2763 +++ linux-2.4.20-o1/include/asm-i386/system.h Wed Mar 12 00:41:43 2003
2765 struct task_struct; /* one of the stranger aspects of C forward declarations.. */
2766 extern void FASTCALL(__switch_to(struct task_struct *prev, struct task_struct *next));
2768 -#define prepare_to_switch() do { } while(0)
2769 #define switch_to(prev,next,last) do { \
2770 asm volatile("pushl %%esi\n\t" \
2773 "movl %%esp,%0\n\t" /* save ESP */ \
2774 - "movl %3,%%esp\n\t" /* restore ESP */ \
2775 + "movl %2,%%esp\n\t" /* restore ESP */ \
2776 "movl $1f,%1\n\t" /* save EIP */ \
2777 - "pushl %4\n\t" /* restore EIP */ \
2778 + "pushl %3\n\t" /* restore EIP */ \
2779 "jmp __switch_to\n" \
2784 - :"=m" (prev->thread.esp),"=m" (prev->thread.eip), \
2786 + :"=m" (prev->thread.esp),"=m" (prev->thread.eip) \
2787 :"m" (next->thread.esp),"m" (next->thread.eip), \
2788 - "a" (prev), "d" (next), \
2790 + "a" (prev), "d" (next)); \
2793 #define _set_base(addr,base) do { unsigned long __pr; \
2794 diff -urN linux-2.4.20/include/asm-ia64/bitops.h linux-2.4.20-o1/include/asm-ia64/bitops.h
2795 --- linux-2.4.20/include/asm-ia64/bitops.h Fri Nov 29 00:53:15 2002
2796 +++ linux-2.4.20-o1/include/asm-ia64/bitops.h Wed Mar 12 00:41:43 2003
2799 * Copyright (C) 1998-2002 Hewlett-Packard Co
2800 * David Mosberger-Tang <davidm@hpl.hp.com>
2802 + * 02/06/02 find_next_bit() and find_first_bit() added from Erich Focht's ia64 O(1)
2806 #include <linux/types.h>
2811 + * __clear_bit - Clears a bit in memory (non-atomic version)
2813 +static __inline__ void
2814 +__clear_bit (int nr, volatile void *addr)
2816 + volatile __u32 *p = (__u32 *) addr + (nr >> 5);
2817 + __u32 m = 1 << (nr & 31);
2822 * change_bit - Toggle a bit in memory
2824 * @addr: Address to start counting from
2825 @@ -264,12 +280,11 @@
2829 - * ffz - find the first zero bit in a memory region
2830 - * @x: The address to start the search at
2831 + * ffz - find the first zero bit in a long word
2832 + * @x: The long word to find the bit in
2834 - * Returns the bit-number (0..63) of the first (least significant) zero bit, not
2835 - * the number of the byte containing a bit. Undefined if no zero exists, so
2836 - * code should check against ~0UL first...
2837 + * Returns the bit-number (0..63) of the first (least significant) zero bit. Undefined if
2838 + * no zero exists, so code should check against ~0UL first...
2840 static inline unsigned long
2841 ffz (unsigned long x)
2842 @@ -280,6 +295,21 @@
2847 + * __ffs - find first bit in word.
2848 + * @x: The word to search
2850 + * Undefined if no bit exists, so code should check against 0 first.
2852 +static __inline__ unsigned long
2853 +__ffs (unsigned long x)
2855 + unsigned long result;
2857 + __asm__ ("popcnt %0=%1" : "=r" (result) : "r" ((x - 1) & ~x));
2864 @@ -296,6 +326,12 @@
2865 return exp - 0xffff;
2871 + return ia64_fls((unsigned int) x);
2875 * ffs: find first bit set. This is defined the same way as the libc and compiler builtin
2876 * ffs routines, therefore differs in spirit from the above ffz (man ffs): it operates on
2877 @@ -368,8 +404,53 @@
2879 #define find_first_zero_bit(addr, size) find_next_zero_bit((addr), (size), 0)
2882 + * Find next bit in a bitmap reasonably efficiently..
2885 +find_next_bit (void *addr, unsigned long size, unsigned long offset)
2887 + unsigned long *p = ((unsigned long *) addr) + (offset >> 6);
2888 + unsigned long result = offset & ~63UL;
2889 + unsigned long tmp;
2891 + if (offset >= size)
2897 + tmp &= ~0UL << offset;
2901 + goto found_middle;
2905 + while (size & ~63UL) {
2906 + if ((tmp = *(p++)))
2907 + goto found_middle;
2915 + tmp &= ~0UL >> (64-size);
2916 + if (tmp == 0UL) /* Are any bits set? */
2917 + return result + size; /* Nope. */
2919 + return result + __ffs(tmp);
2922 +#define find_first_bit(addr, size) find_next_bit((addr), (size), 0)
2926 +#define __clear_bit(nr, addr) clear_bit(nr, addr)
2928 #define ext2_set_bit test_and_set_bit
2929 #define ext2_clear_bit test_and_clear_bit
2930 #define ext2_test_bit test_bit
2931 @@ -382,6 +463,16 @@
2932 #define minix_test_and_clear_bit(nr,addr) test_and_clear_bit(nr,addr)
2933 #define minix_test_bit(nr,addr) test_bit(nr,addr)
2934 #define minix_find_first_zero_bit(addr,size) find_first_zero_bit(addr,size)
2937 +sched_find_first_bit (unsigned long *b)
2939 + if (unlikely(b[0]))
2940 + return __ffs(b[0]);
2941 + if (unlikely(b[1]))
2942 + return 64 + __ffs(b[1]);
2943 + return __ffs(b[2]) + 128;
2946 #endif /* __KERNEL__ */
2948 diff -urN linux-2.4.20/include/asm-m68k/bitops.h linux-2.4.20-o1/include/asm-m68k/bitops.h
2949 --- linux-2.4.20/include/asm-m68k/bitops.h Thu Oct 25 22:53:55 2001
2950 +++ linux-2.4.20-o1/include/asm-m68k/bitops.h Wed Mar 12 00:41:43 2003
2952 (__builtin_constant_p(nr) ? \
2953 __constant_clear_bit(nr, vaddr) : \
2954 __generic_clear_bit(nr, vaddr))
2955 +#define __clear_bit(nr,vaddr) clear_bit(nr,vaddr)
2957 extern __inline__ void __constant_clear_bit(int nr, volatile void * vaddr)
2959 @@ -239,6 +240,28 @@
2963 +#define __ffs(x) (ffs(x) - 1)
2967 + * Every architecture must define this function. It's the fastest
2968 + * way of searching a 140-bit bitmap where the first 100 bits are
2969 + * unlikely to be set. It's guaranteed that at least one of the 140
2970 + * bits is cleared.
2972 +static inline int sched_find_first_bit(unsigned long *b)
2974 + if (unlikely(b[0]))
2975 + return __ffs(b[0]);
2976 + if (unlikely(b[1]))
2977 + return __ffs(b[1]) + 32;
2978 + if (unlikely(b[2]))
2979 + return __ffs(b[2]) + 64;
2981 + return __ffs(b[3]) + 96;
2982 + return __ffs(b[4]) + 128;
2987 * hweightN: returns the hamming weight (i.e. the number
2988 diff -urN linux-2.4.20/include/asm-mips/bitops.h linux-2.4.20-o1/include/asm-mips/bitops.h
2989 --- linux-2.4.20/include/asm-mips/bitops.h Fri Nov 29 00:53:15 2002
2990 +++ linux-2.4.20-o1/include/asm-mips/bitops.h Wed Mar 12 00:41:43 2003
2993 #ifdef CONFIG_CPU_HAS_LLSC
2995 +#include <asm/mipsregs.h>
2998 * These functions for MIPS ISA > 1 are interrupt and SMP proof and
2999 * interrupt friendly
3000 @@ -684,20 +688,29 @@
3002 * Undefined if no zero exists, so code should check against ~0UL first.
3004 -static __inline__ unsigned long ffz(unsigned long word)
3005 +extern __inline__ unsigned long ffz(unsigned long word)
3008 + unsigned int __res;
3009 + unsigned int mask = 1;
3012 - s = 16; if (word << 16 != 0) s = 0; b += s; word >>= s;
3013 - s = 8; if (word << 24 != 0) s = 0; b += s; word >>= s;
3014 - s = 4; if (word << 28 != 0) s = 0; b += s; word >>= s;
3015 - s = 2; if (word << 30 != 0) s = 0; b += s; word >>= s;
3016 - s = 1; if (word << 31 != 0) s = 0; b += s;
3018 + ".set\tnoreorder\n\t"
3021 + "1:\tand\t$1,%2,%1\n\t"
3029 + : "=&r" (__res), "=r" (mask)
3030 + : "r" (word), "1" (mask)
3040 diff -urN linux-2.4.20/include/asm-mips64/bitops.h linux-2.4.20-o1/include/asm-mips64/bitops.h
3041 --- linux-2.4.20/include/asm-mips64/bitops.h Fri Nov 29 00:53:15 2002
3042 +++ linux-2.4.20-o1/include/asm-mips64/bitops.h Wed Mar 12 00:41:43 2003
3045 #include <asm/system.h>
3046 #include <asm/sgidefs.h>
3047 +#include <asm/mipsregs.h>
3050 * set_bit - Atomically set a bit in memory
3052 * Note that @nr may be almost arbitrarily large; this function is not
3053 * restricted to acting on a single-word quantity.
3055 -static inline void set_bit(unsigned long nr, volatile void *addr)
3056 +extern __inline__ void
3057 +set_bit(unsigned long nr, volatile void *addr)
3059 unsigned long *m = ((unsigned long *) addr) + (nr >> 6);
3062 * If it's called on the same region of memory simultaneously, the effect
3063 * may be that only one operation succeeds.
3065 -static inline void __set_bit(int nr, volatile void * addr)
3066 +extern __inline__ void __set_bit(int nr, volatile void * addr)
3068 unsigned long * m = ((unsigned long *) addr) + (nr >> 6);
3071 * you should call smp_mb__before_clear_bit() and/or smp_mb__after_clear_bit()
3072 * in order to ensure changes are visible on other processors.
3074 -static inline void clear_bit(unsigned long nr, volatile void *addr)
3075 +extern __inline__ void
3076 +clear_bit(unsigned long nr, volatile void *addr)
3078 unsigned long *m = ((unsigned long *) addr) + (nr >> 6);
3081 * Note that @nr may be almost arbitrarily large; this function is not
3082 * restricted to acting on a single-word quantity.
3084 -static inline void change_bit(unsigned long nr, volatile void *addr)
3085 +extern __inline__ void
3086 +change_bit(unsigned long nr, volatile void *addr)
3088 unsigned long *m = ((unsigned long *) addr) + (nr >> 6);
3091 * If it's called on the same region of memory simultaneously, the effect
3092 * may be that only one operation succeeds.
3094 -static inline void __change_bit(int nr, volatile void * addr)
3095 +extern __inline__ void __change_bit(int nr, volatile void * addr)
3097 unsigned long * m = ((unsigned long *) addr) + (nr >> 6);
3100 * This operation is atomic and cannot be reordered.
3101 * It also implies a memory barrier.
3103 -static inline unsigned long test_and_set_bit(unsigned long nr,
3104 - volatile void *addr)
3105 +extern __inline__ unsigned long
3106 +test_and_set_bit(unsigned long nr, volatile void *addr)
3108 unsigned long *m = ((unsigned long *) addr) + (nr >> 6);
3109 unsigned long temp, res;
3111 * If two examples of this operation race, one can appear to succeed
3112 * but actually fail. You must protect multiple accesses with a lock.
3114 -static inline int __test_and_set_bit(int nr, volatile void *addr)
3115 +extern __inline__ int
3116 +__test_and_set_bit(int nr, volatile void * addr)
3118 unsigned long mask, retval;
3119 long *a = (unsigned long *) addr;
3121 * This operation is atomic and cannot be reordered.
3122 * It also implies a memory barrier.
3124 -static inline unsigned long test_and_clear_bit(unsigned long nr,
3125 - volatile void *addr)
3126 +extern __inline__ unsigned long
3127 +test_and_clear_bit(unsigned long nr, volatile void *addr)
3129 unsigned long *m = ((unsigned long *) addr) + (nr >> 6);
3130 unsigned long temp, res;
3132 * If two examples of this operation race, one can appear to succeed
3133 * but actually fail. You must protect multiple accesses with a lock.
3135 -static inline int __test_and_clear_bit(int nr, volatile void * addr)
3136 +extern __inline__ int
3137 +__test_and_clear_bit(int nr, volatile void * addr)
3139 unsigned long mask, retval;
3140 unsigned long *a = (unsigned long *) addr;
3142 * This operation is atomic and cannot be reordered.
3143 * It also implies a memory barrier.
3145 -static inline unsigned long test_and_change_bit(unsigned long nr,
3146 - volatile void *addr)
3147 +extern __inline__ unsigned long
3148 +test_and_change_bit(unsigned long nr, volatile void *addr)
3150 unsigned long *m = ((unsigned long *) addr) + (nr >> 6);
3151 unsigned long temp, res;
3153 * If two examples of this operation race, one can appear to succeed
3154 * but actually fail. You must protect multiple accesses with a lock.
3156 -static inline int __test_and_change_bit(int nr, volatile void *addr)
3157 +extern __inline__ int
3158 +__test_and_change_bit(int nr, volatile void * addr)
3160 unsigned long mask, retval;
3161 unsigned long *a = (unsigned long *) addr;
3163 * @nr: bit number to test
3164 * @addr: Address to start counting from
3166 -static inline int test_bit(int nr, volatile void * addr)
3167 +extern __inline__ unsigned long
3168 +test_bit(int nr, volatile void * addr)
3170 return 1UL & (((const volatile unsigned long *) addr)[nr >> SZLONG_LOG] >> (nr & SZLONG_MASK));
3172 @@ -400,19 +412,20 @@
3174 * Undefined if no zero exists, so code should check against ~0UL first.
3176 -static __inline__ unsigned long ffz(unsigned long word)
3177 +extern __inline__ unsigned long ffz(unsigned long word)
3183 - s = 32; if (word << 32 != 0) s = 0; b += s; word >>= s;
3184 - s = 16; if (word << 48 != 0) s = 0; b += s; word >>= s;
3185 - s = 8; if (word << 56 != 0) s = 0; b += s; word >>= s;
3186 - s = 4; if (word << 60 != 0) s = 0; b += s; word >>= s;
3187 - s = 2; if (word << 62 != 0) s = 0; b += s; word >>= s;
3188 - s = 1; if (word << 63 != 0) s = 0; b += s;
3190 + if (word & 0x00000000ffffffffUL) { k -= 32; word <<= 32; }
3191 + if (word & 0x0000ffff00000000UL) { k -= 16; word <<= 16; }
3192 + if (word & 0x00ff000000000000UL) { k -= 8; word <<= 8; }
3193 + if (word & 0x0f00000000000000UL) { k -= 4; word <<= 4; }
3194 + if (word & 0x3000000000000000UL) { k -= 2; word <<= 2; }
3195 + if (word & 0x4000000000000000UL) { k -= 1; }
3203 * @offset: The bitnumber to start searching at
3204 * @size: The maximum size to search
3206 -static inline unsigned long find_next_zero_bit(void *addr, unsigned long size,
3207 - unsigned long offset)
3208 +extern __inline__ unsigned long
3209 +find_next_zero_bit(void *addr, unsigned long size, unsigned long offset)
3211 unsigned long *p = ((unsigned long *) addr) + (offset >> 6);
3212 unsigned long result = offset & ~63UL;
3217 -static inline int __test_and_set_le_bit(unsigned long nr, void * addr)
3219 +__test_and_set_le_bit(unsigned long nr, void * addr
3221 int mask, retval, flags;
3222 unsigned char *ADDR = (unsigned char *) addr;
3227 -static inline int __test_and_clear_le_bit(unsigned long nr, void * addr)
3229 +__test_and_clear_le_bit(unsigned long nr, void * addr)
3231 int mask, retval, flags;
3232 unsigned char *ADDR = (unsigned char *) addr;
3237 -static inline int test_le_bit(unsigned long nr, const void * addr)
3239 +test_le_bit(unsigned long nr, const void * addr)
3242 const unsigned char *ADDR = (const unsigned char *) addr;
3244 #define ext2_find_first_zero_bit(addr, size) \
3245 ext2_find_next_zero_bit((addr), (size), 0)
3247 -static inline unsigned long find_next_zero_le_bit(void *addr,
3248 - unsigned long size, unsigned long offset)
3249 +extern inline unsigned long find_next_zero_le_bit(void *addr,
3250 + unsigned long size, unsigned long offset)
3252 unsigned int *p = ((unsigned int *) addr) + (offset >> 5);
3253 unsigned int result = offset & ~31UL;
3254 diff -urN linux-2.4.20/include/asm-ppc/bitops.h linux-2.4.20-o1/include/asm-ppc/bitops.h
3255 --- linux-2.4.20/include/asm-ppc/bitops.h Tue Jun 12 04:15:27 2001
3256 +++ linux-2.4.20-o1/include/asm-ppc/bitops.h Wed Mar 12 00:41:43 2003
3258 #define _PPC_BITOPS_H
3260 #include <linux/config.h>
3261 +#include <linux/compiler.h>
3262 #include <asm/byteorder.h>
3263 #include <asm/atomic.h>
3266 * These used to be if'd out here because using : "cc" as a constraint
3267 * resulted in errors from egcs. Things appear to be OK with gcc-2.95.
3269 -static __inline__ void set_bit(int nr, volatile void * addr)
3270 +static __inline__ void set_bit(int nr, volatile unsigned long * addr)
3273 unsigned long mask = 1 << (nr & 0x1f);
3276 * non-atomic version
3278 -static __inline__ void __set_bit(int nr, volatile void *addr)
3279 +static __inline__ void __set_bit(int nr, volatile unsigned long *addr)
3281 unsigned long mask = 1 << (nr & 0x1f);
3282 unsigned long *p = ((unsigned long *)addr) + (nr >> 5);
3284 #define smp_mb__before_clear_bit() smp_mb()
3285 #define smp_mb__after_clear_bit() smp_mb()
3287 -static __inline__ void clear_bit(int nr, volatile void *addr)
3288 +static __inline__ void clear_bit(int nr, volatile unsigned long *addr)
3291 unsigned long mask = 1 << (nr & 0x1f);
3294 * non-atomic version
3296 -static __inline__ void __clear_bit(int nr, volatile void *addr)
3297 +static __inline__ void __clear_bit(int nr, volatile unsigned long *addr)
3299 unsigned long mask = 1 << (nr & 0x1f);
3300 unsigned long *p = ((unsigned long *)addr) + (nr >> 5);
3305 -static __inline__ void change_bit(int nr, volatile void *addr)
3306 +static __inline__ void change_bit(int nr, volatile unsigned long *addr)
3309 unsigned long mask = 1 << (nr & 0x1f);
3312 * non-atomic version
3314 -static __inline__ void __change_bit(int nr, volatile void *addr)
3315 +static __inline__ void __change_bit(int nr, volatile unsigned long *addr)
3317 unsigned long mask = 1 << (nr & 0x1f);
3318 unsigned long *p = ((unsigned long *)addr) + (nr >> 5);
3321 * test_and_*_bit do imply a memory barrier (?)
3323 -static __inline__ int test_and_set_bit(int nr, volatile void *addr)
3324 +static __inline__ int test_and_set_bit(int nr, volatile unsigned long *addr)
3326 unsigned int old, t;
3327 unsigned int mask = 1 << (nr & 0x1f);
3330 * non-atomic version
3332 -static __inline__ int __test_and_set_bit(int nr, volatile void *addr)
3333 +static __inline__ int __test_and_set_bit(int nr, volatile unsigned long *addr)
3335 unsigned long mask = 1 << (nr & 0x1f);
3336 unsigned long *p = ((unsigned long *)addr) + (nr >> 5);
3338 return (old & mask) != 0;
3341 -static __inline__ int test_and_clear_bit(int nr, volatile void *addr)
3342 +static __inline__ int test_and_clear_bit(int nr, volatile unsigned long *addr)
3344 unsigned int old, t;
3345 unsigned int mask = 1 << (nr & 0x1f);
3348 * non-atomic version
3350 -static __inline__ int __test_and_clear_bit(int nr, volatile void *addr)
3351 +static __inline__ int __test_and_clear_bit(int nr, volatile unsigned long *addr)
3353 unsigned long mask = 1 << (nr & 0x1f);
3354 unsigned long *p = ((unsigned long *)addr) + (nr >> 5);
3356 return (old & mask) != 0;
3359 -static __inline__ int test_and_change_bit(int nr, volatile void *addr)
3360 +static __inline__ int test_and_change_bit(int nr, volatile unsigned long *addr)
3362 unsigned int old, t;
3363 unsigned int mask = 1 << (nr & 0x1f);
3366 * non-atomic version
3368 -static __inline__ int __test_and_change_bit(int nr, volatile void *addr)
3369 +static __inline__ int __test_and_change_bit(int nr, volatile unsigned long *addr)
3371 unsigned long mask = 1 << (nr & 0x1f);
3372 unsigned long *p = ((unsigned long *)addr) + (nr >> 5);
3374 return (old & mask) != 0;
3377 -static __inline__ int test_bit(int nr, __const__ volatile void *addr)
3378 +static __inline__ int test_bit(int nr, __const__ volatile unsigned long *addr)
3380 __const__ unsigned int *p = (__const__ unsigned int *) addr;
3385 /* Return the bit position of the most significant 1 bit in a word */
3386 -static __inline__ int __ilog2(unsigned int x)
3387 +static __inline__ int __ilog2(unsigned long x)
3395 -static __inline__ int ffz(unsigned int x)
3396 +static __inline__ int ffz(unsigned long x)
3400 @@ -239,6 +247,11 @@
3404 +static inline int __ffs(unsigned long x)
3406 + return __ilog2(x & -x);
3410 * ffs: find first bit set. This is defined the same way as
3411 * the libc and compiler builtin ffs routines, therefore
3412 @@ -250,6 +263,18 @@
3416 + * fls: find last (most-significant) bit set.
3417 + * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.
3419 +static __inline__ int fls(unsigned int x)
3423 + asm ("cntlzw %0,%1" : "=r" (lz) : "r" (x));
3428 * hweightN: returns the hamming weight (i.e. the number
3429 * of bits set) of a N-bit word
3431 @@ -261,13 +286,86 @@
3432 #endif /* __KERNEL__ */
3435 + * Find the first bit set in a 140-bit bitmap.
3436 + * The first 100 bits are unlikely to be set.
3438 +static inline int sched_find_first_bit(unsigned long *b)
3440 + if (unlikely(b[0]))
3441 + return __ffs(b[0]);
3442 + if (unlikely(b[1]))
3443 + return __ffs(b[1]) + 32;
3444 + if (unlikely(b[2]))
3445 + return __ffs(b[2]) + 64;
3447 + return __ffs(b[3]) + 96;
3448 + return __ffs(b[4]) + 128;
3452 + * find_next_bit - find the next set bit in a memory region
3453 + * @addr: The address to base the search on
3454 + * @offset: The bitnumber to start searching at
3455 + * @size: The maximum size to search
3457 +static __inline__ unsigned long find_next_bit(unsigned long *addr,
3458 + unsigned long size, unsigned long offset)
3460 + unsigned int *p = ((unsigned int *) addr) + (offset >> 5);
3461 + unsigned int result = offset & ~31UL;
3464 + if (offset >= size)
3470 + tmp &= ~0UL << offset;
3474 + goto found_middle;
3478 + while (size >= 32) {
3479 + if ((tmp = *p++) != 0)
3480 + goto found_middle;
3489 + tmp &= ~0UL >> (32 - size);
3490 + if (tmp == 0UL) /* Are any bits set? */
3491 + return result + size; /* Nope. */
3493 + return result + __ffs(tmp);
3497 + * find_first_bit - find the first set bit in a memory region
3498 + * @addr: The address to start the search at
3499 + * @size: The maximum size to search
3501 + * Returns the bit-number of the first set bit, not the number of the byte
3502 + * containing a bit.
3504 +#define find_first_bit(addr, size) \
3505 + find_next_bit((addr), (size), 0)
3508 * This implementation of find_{first,next}_zero_bit was stolen from
3509 * Linus' asm-alpha/bitops.h.
3511 #define find_first_zero_bit(addr, size) \
3512 find_next_zero_bit((addr), (size), 0)
3514 -static __inline__ unsigned long find_next_zero_bit(void * addr,
3515 +static __inline__ unsigned long find_next_zero_bit(unsigned long * addr,
3516 unsigned long size, unsigned long offset)
3518 unsigned int * p = ((unsigned int *) addr) + (offset >> 5);
3523 -#define ext2_set_bit(nr, addr) __test_and_set_bit((nr) ^ 0x18, addr)
3524 -#define ext2_clear_bit(nr, addr) __test_and_clear_bit((nr) ^ 0x18, addr)
3525 +#define ext2_set_bit(nr, addr) __test_and_set_bit((nr) ^ 0x18, (unsigned long *)(addr))
3526 +#define ext2_clear_bit(nr, addr) __test_and_clear_bit((nr) ^ 0x18, (unsigned long *)(addr))
3528 static __inline__ int ext2_test_bit(int nr, __const__ void * addr)
3530 diff -urN linux-2.4.20/include/asm-ppc/smp.h linux-2.4.20-o1/include/asm-ppc/smp.h
3531 --- linux-2.4.20/include/asm-ppc/smp.h Sat Aug 3 02:39:45 2002
3532 +++ linux-2.4.20-o1/include/asm-ppc/smp.h Wed Mar 12 14:29:05 2003
3534 #define cpu_logical_map(cpu) (cpu)
3535 #define cpu_number_map(x) (x)
3537 -#define smp_processor_id() (current->processor)
3538 +#define smp_processor_id() (current->cpu)
3540 extern int smp_hw_index[NR_CPUS];
3541 #define hard_smp_processor_id() (smp_hw_index[smp_processor_id()])
3542 diff -urN linux-2.4.20/include/asm-ppc64/bitops.h linux-2.4.20-o1/include/asm-ppc64/bitops.h
3543 --- linux-2.4.20/include/asm-ppc64/bitops.h Sat Aug 3 02:39:45 2002
3544 +++ linux-2.4.20-o1/include/asm-ppc64/bitops.h Wed Mar 12 00:41:43 2003
3546 #define smp_mb__before_clear_bit() smp_mb()
3547 #define smp_mb__after_clear_bit() smp_mb()
3549 -static __inline__ int test_bit(unsigned long nr, __const__ volatile void *addr)
3550 +static __inline__ int test_bit(unsigned long nr, __const__ volatile unsigned long *addr)
3552 return (1UL & (((__const__ long *) addr)[nr >> 6] >> (nr & 63)));
3555 -static __inline__ void set_bit(unsigned long nr, volatile void *addr)
3556 +static __inline__ void set_bit(unsigned long nr, volatile unsigned long *addr)
3559 unsigned long mask = 1UL << (nr & 0x3f);
3564 -static __inline__ void clear_bit(unsigned long nr, volatile void *addr)
3565 +static __inline__ void clear_bit(unsigned long nr, volatile unsigned long *addr)
3568 unsigned long mask = 1UL << (nr & 0x3f);
3573 -static __inline__ void change_bit(unsigned long nr, volatile void *addr)
3574 +static __inline__ void change_bit(unsigned long nr, volatile unsigned long *addr)
3577 unsigned long mask = 1UL << (nr & 0x3f);
3582 -static __inline__ int test_and_set_bit(unsigned long nr, volatile void *addr)
3583 +static __inline__ int test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
3585 unsigned long old, t;
3586 unsigned long mask = 1UL << (nr & 0x3f);
3588 return (old & mask) != 0;
3591 -static __inline__ int test_and_clear_bit(unsigned long nr, volatile void *addr)
3592 +static __inline__ int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
3594 unsigned long old, t;
3595 unsigned long mask = 1UL << (nr & 0x3f);
3597 return (old & mask) != 0;
3600 -static __inline__ int test_and_change_bit(unsigned long nr, volatile void *addr)
3601 +static __inline__ int test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
3603 unsigned long old, t;
3604 unsigned long mask = 1UL << (nr & 0x3f);
3607 * non-atomic versions
3609 -static __inline__ void __set_bit(unsigned long nr, volatile void *addr)
3610 +static __inline__ void __set_bit(unsigned long nr, volatile unsigned long *addr)
3612 unsigned long mask = 1UL << (nr & 0x3f);
3613 unsigned long *p = ((unsigned long *)addr) + (nr >> 6);
3618 -static __inline__ void __clear_bit(unsigned long nr, volatile void *addr)
3619 +static __inline__ void __clear_bit(unsigned long nr, volatile unsigned long *addr)
3621 unsigned long mask = 1UL << (nr & 0x3f);
3622 unsigned long *p = ((unsigned long *)addr) + (nr >> 6);
3627 -static __inline__ void __change_bit(unsigned long nr, volatile void *addr)
3628 +static __inline__ void __change_bit(unsigned long nr, volatile unsigned long *addr)
3630 unsigned long mask = 1UL << (nr & 0x3f);
3631 unsigned long *p = ((unsigned long *)addr) + (nr >> 6);
3636 -static __inline__ int __test_and_set_bit(unsigned long nr, volatile void *addr)
3637 +static __inline__ int __test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
3639 unsigned long mask = 1UL << (nr & 0x3f);
3640 unsigned long *p = ((unsigned long *)addr) + (nr >> 6);
3642 return (old & mask) != 0;
3645 -static __inline__ int __test_and_clear_bit(unsigned long nr, volatile void *addr)
3646 +static __inline__ int __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
3648 unsigned long mask = 1UL << (nr & 0x3f);
3649 unsigned long *p = ((unsigned long *)addr) + (nr >> 6);
3651 return (old & mask) != 0;
3654 -static __inline__ int __test_and_change_bit(unsigned long nr, volatile void *addr)
3655 +static __inline__ int __test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
3657 unsigned long mask = 1UL << (nr & 0x3f);
3658 unsigned long *p = ((unsigned long *)addr) + (nr >> 6);
3659 diff -urN linux-2.4.20/include/asm-s390/bitops.h linux-2.4.20-o1/include/asm-s390/bitops.h
3660 --- linux-2.4.20/include/asm-s390/bitops.h Sat Aug 3 02:39:45 2002
3661 +++ linux-2.4.20-o1/include/asm-s390/bitops.h Wed Mar 12 00:41:43 2003
3662 @@ -47,272 +47,217 @@
3663 extern const char _oi_bitmap[];
3664 extern const char _ni_bitmap[];
3665 extern const char _zb_findmap[];
3666 +extern const char _sb_findmap[];
3670 * SMP save set_bit routine based on compare and swap (CS)
3672 -static __inline__ void set_bit_cs(int nr, volatile void * addr)
3673 +static inline void set_bit_cs(int nr, volatile void *ptr)
3675 - unsigned long bits, mask;
3676 - __asm__ __volatile__(
3677 + unsigned long addr, old, new, mask;
3679 + addr = (unsigned long) ptr;
3681 - " lhi %2,3\n" /* CS must be aligned on 4 byte b. */
3682 - " nr %2,%1\n" /* isolate last 2 bits of address */
3683 - " xr %1,%2\n" /* make addr % 4 == 0 */
3685 - " ar %0,%2\n" /* add alignement to bitnr */
3686 + addr ^= addr & 3; /* align address to 4 */
3687 + nr += (addr & 3) << 3; /* add alignment to bit number */
3690 - " nr %2,%0\n" /* make shift value */
3694 - " la %1,0(%0,%1)\n" /* calc. address for CS */
3695 - " sll %3,0(%2)\n" /* make OR mask */
3697 - "0: lr %2,%0\n" /* CS loop starts here */
3698 - " or %2,%3\n" /* set bit */
3699 - " cs %0,%2,0(%1)\n"
3701 - : "+a" (nr), "+a" (addr), "=&a" (bits), "=&d" (mask) :
3702 - : "cc", "memory" );
3703 + addr += (nr ^ (nr & 31)) >> 3; /* calculate address for CS */
3704 + mask = 1UL << (nr & 31); /* make OR mask */
3709 + " cs %0,%1,0(%4)\n"
3711 + : "=&d" (old), "=&d" (new), "+m" (*(unsigned int *) addr)
3712 + : "d" (mask), "a" (addr)
3717 * SMP save clear_bit routine based on compare and swap (CS)
3719 -static __inline__ void clear_bit_cs(int nr, volatile void * addr)
3720 +static inline void clear_bit_cs(int nr, volatile void *ptr)
3722 - static const int minusone = -1;
3723 - unsigned long bits, mask;
3724 - __asm__ __volatile__(
3725 + unsigned long addr, old, new, mask;
3727 + addr = (unsigned long) ptr;
3729 - " lhi %2,3\n" /* CS must be aligned on 4 byte b. */
3730 - " nr %2,%1\n" /* isolate last 2 bits of address */
3731 - " xr %1,%2\n" /* make addr % 4 == 0 */
3733 - " ar %0,%2\n" /* add alignement to bitnr */
3734 + addr ^= addr & 3; /* align address to 4 */
3735 + nr += (addr & 3) << 3; /* add alignment to bit number */
3738 - " nr %2,%0\n" /* make shift value */
3742 - " la %1,0(%0,%1)\n" /* calc. address for CS */
3744 - " x %3,%4\n" /* make AND mask */
3746 - "0: lr %2,%0\n" /* CS loop starts here */
3747 - " nr %2,%3\n" /* clear bit */
3748 - " cs %0,%2,0(%1)\n"
3750 - : "+a" (nr), "+a" (addr), "=&a" (bits), "=&d" (mask)
3751 - : "m" (minusone) : "cc", "memory" );
3752 + addr += (nr ^ (nr & 31)) >> 3; /* calculate address for CS */
3753 + mask = ~(1UL << (nr & 31)); /* make AND mask */
3758 + " cs %0,%1,0(%4)\n"
3760 + : "=&d" (old), "=&d" (new), "+m" (*(unsigned int *) addr)
3761 + : "d" (mask), "a" (addr)
3766 * SMP save change_bit routine based on compare and swap (CS)
3768 -static __inline__ void change_bit_cs(int nr, volatile void * addr)
3769 +static inline void change_bit_cs(int nr, volatile void *ptr)
3771 - unsigned long bits, mask;
3772 - __asm__ __volatile__(
3773 + unsigned long addr, old, new, mask;
3775 + addr = (unsigned long) ptr;
3777 - " lhi %2,3\n" /* CS must be aligned on 4 byte b. */
3778 - " nr %2,%1\n" /* isolate last 2 bits of address */
3779 - " xr %1,%2\n" /* make addr % 4 == 0 */
3781 - " ar %0,%2\n" /* add alignement to bitnr */
3782 + addr ^= addr & 3; /* align address to 4 */
3783 + nr += (addr & 3) << 3; /* add alignment to bit number */
3786 - " nr %2,%0\n" /* make shift value */
3790 - " la %1,0(%0,%1)\n" /* calc. address for CS */
3791 - " sll %3,0(%2)\n" /* make XR mask */
3793 - "0: lr %2,%0\n" /* CS loop starts here */
3794 - " xr %2,%3\n" /* change bit */
3795 - " cs %0,%2,0(%1)\n"
3797 - : "+a" (nr), "+a" (addr), "=&a" (bits), "=&d" (mask) :
3798 - : "cc", "memory" );
3799 + addr += (nr ^ (nr & 31)) >> 3; /* calculate address for CS */
3800 + mask = 1UL << (nr & 31); /* make XOR mask */
3805 + " cs %0,%1,0(%4)\n"
3807 + : "=&d" (old), "=&d" (new), "+m" (*(unsigned int *) addr)
3808 + : "d" (mask), "a" (addr)
3813 * SMP save test_and_set_bit routine based on compare and swap (CS)
3815 -static __inline__ int test_and_set_bit_cs(int nr, volatile void * addr)
3816 +static inline int test_and_set_bit_cs(int nr, volatile void *ptr)
3818 - unsigned long bits, mask;
3819 - __asm__ __volatile__(
3820 + unsigned long addr, old, new, mask;
3822 + addr = (unsigned long) ptr;
3824 - " lhi %2,3\n" /* CS must be aligned on 4 byte b. */
3825 - " nr %2,%1\n" /* isolate last 2 bits of address */
3826 - " xr %1,%2\n" /* make addr % 4 == 0 */
3828 - " ar %0,%2\n" /* add alignement to bitnr */
3829 + addr ^= addr & 3; /* align address to 4 */
3830 + nr += (addr & 3) << 3; /* add alignment to bit number */
3833 - " nr %2,%0\n" /* make shift value */
3837 - " la %1,0(%0,%1)\n" /* calc. address for CS */
3838 - " sll %3,0(%2)\n" /* make OR mask */
3840 - "0: lr %2,%0\n" /* CS loop starts here */
3841 - " or %2,%3\n" /* set bit */
3842 - " cs %0,%2,0(%1)\n"
3844 - " nr %0,%3\n" /* isolate old bit */
3845 - : "+a" (nr), "+a" (addr), "=&a" (bits), "=&d" (mask) :
3846 - : "cc", "memory" );
3848 + addr += (nr ^ (nr & 31)) >> 3; /* calculate address for CS */
3849 + mask = 1UL << (nr & 31); /* make OR/test mask */
3854 + " cs %0,%1,0(%4)\n"
3856 + : "=&d" (old), "=&d" (new), "+m" (*(unsigned int *) addr)
3857 + : "d" (mask), "a" (addr)
3859 + return (old & mask) != 0;
3863 * SMP save test_and_clear_bit routine based on compare and swap (CS)
3865 -static __inline__ int test_and_clear_bit_cs(int nr, volatile void * addr)
3866 +static inline int test_and_clear_bit_cs(int nr, volatile void *ptr)
3868 - static const int minusone = -1;
3869 - unsigned long bits, mask;
3870 - __asm__ __volatile__(
3871 + unsigned long addr, old, new, mask;
3873 + addr = (unsigned long) ptr;
3875 - " lhi %2,3\n" /* CS must be aligned on 4 byte b. */
3876 - " nr %2,%1\n" /* isolate last 2 bits of address */
3877 - " xr %1,%2\n" /* make addr % 4 == 0 */
3879 - " ar %0,%2\n" /* add alignement to bitnr */
3880 + addr ^= addr & 3; /* align address to 4 */
3881 + nr += (addr & 3) << 3; /* add alignment to bit number */
3884 - " nr %2,%0\n" /* make shift value */
3888 - " la %1,0(%0,%1)\n" /* calc. address for CS */
3891 - " x %3,%4\n" /* make AND mask */
3892 - "0: lr %2,%0\n" /* CS loop starts here */
3893 - " nr %2,%3\n" /* clear bit */
3894 - " cs %0,%2,0(%1)\n"
3897 - " nr %0,%3\n" /* isolate old bit */
3898 - : "+a" (nr), "+a" (addr), "=&a" (bits), "=&d" (mask)
3899 - : "m" (minusone) : "cc", "memory" );
3901 + addr += (nr ^ (nr & 31)) >> 3; /* calculate address for CS */
3902 + mask = ~(1UL << (nr & 31)); /* make AND mask */
3907 + " cs %0,%1,0(%4)\n"
3909 + : "=&d" (old), "=&d" (new), "+m" (*(unsigned int *) addr)
3910 + : "d" (mask), "a" (addr)
3912 + return (old ^ new) != 0;
3916 * SMP save test_and_change_bit routine based on compare and swap (CS)
3918 -static __inline__ int test_and_change_bit_cs(int nr, volatile void * addr)
3919 +static inline int test_and_change_bit_cs(int nr, volatile void *ptr)
3921 - unsigned long bits, mask;
3922 - __asm__ __volatile__(
3923 + unsigned long addr, old, new, mask;
3925 + addr = (unsigned long) ptr;
3927 - " lhi %2,3\n" /* CS must be aligned on 4 byte b. */
3928 - " nr %2,%1\n" /* isolate last 2 bits of address */
3929 - " xr %1,%2\n" /* make addr % 4 == 0 */
3931 - " ar %0,%2\n" /* add alignement to bitnr */
3932 + addr ^= addr & 3; /* align address to 4 */
3933 + nr += (addr & 3) << 3; /* add alignment to bit number */
3936 - " nr %2,%0\n" /* make shift value */
3940 - " la %1,0(%0,%1)\n" /* calc. address for CS */
3941 - " sll %3,0(%2)\n" /* make OR mask */
3943 - "0: lr %2,%0\n" /* CS loop starts here */
3944 - " xr %2,%3\n" /* change bit */
3945 - " cs %0,%2,0(%1)\n"
3947 - " nr %0,%3\n" /* isolate old bit */
3948 - : "+a" (nr), "+a" (addr), "=&a" (bits), "=&d" (mask) :
3949 - : "cc", "memory" );
3951 + addr += (nr ^ (nr & 31)) >> 3; /* calculate address for CS */
3952 + mask = 1UL << (nr & 31); /* make XOR mask */
3957 + " cs %0,%1,0(%4)\n"
3959 + : "=&d" (old), "=&d" (new), "+m" (*(unsigned int *) addr)
3960 + : "d" (mask), "a" (addr)
3962 + return (old & mask) != 0;
3964 #endif /* CONFIG_SMP */
3967 * fast, non-SMP set_bit routine
3969 -static __inline__ void __set_bit(int nr, volatile void * addr)
3970 +static inline void __set_bit(int nr, volatile void *ptr)
3972 - unsigned long reg1, reg2;
3973 - __asm__ __volatile__(
3979 - " la %1,0(%1,%3)\n"
3980 - " la %0,0(%0,%4)\n"
3981 - " oc 0(1,%1),0(%0)"
3982 - : "=&a" (reg1), "=&a" (reg2)
3983 - : "r" (nr), "a" (addr), "a" (&_oi_bitmap) : "cc", "memory" );
3986 -static __inline__ void
3987 -__constant_set_bit(const int nr, volatile void * addr)
3991 - __asm__ __volatile__ ("la 1,%0\n\t"
3993 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
3994 - : : "1", "cc", "memory");
3997 - __asm__ __volatile__ ("la 1,%0\n\t"
3999 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4000 - : : "1", "cc", "memory" );
4003 - __asm__ __volatile__ ("la 1,%0\n\t"
4005 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4006 - : : "1", "cc", "memory" );
4009 - __asm__ __volatile__ ("la 1,%0\n\t"
4011 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4012 - : : "1", "cc", "memory" );
4015 - __asm__ __volatile__ ("la 1,%0\n\t"
4017 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4018 - : : "1", "cc", "memory" );
4021 - __asm__ __volatile__ ("la 1,%0\n\t"
4023 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4024 - : : "1", "cc", "memory" );
4027 - __asm__ __volatile__ ("la 1,%0\n\t"
4029 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4030 - : : "1", "cc", "memory" );
4033 - __asm__ __volatile__ ("la 1,%0\n\t"
4035 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4036 - : : "1", "cc", "memory" );
4039 + unsigned long addr;
4041 + addr = (unsigned long) ptr + ((nr ^ 24) >> 3);
4042 + asm volatile("oc 0(1,%1),0(%2)"
4043 + : "+m" (*(char *) addr)
4044 + : "a" (addr), "a" (_oi_bitmap + (nr & 7))
4049 +__constant_set_bit(const int nr, volatile void *ptr)
4051 + unsigned long addr;
4053 + addr = ((unsigned long) ptr) + ((nr >> 3) ^ 3);
4056 + asm volatile ("oi 0(%1),0x01"
4057 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4060 + asm volatile ("oi 0(%1),0x02"
4061 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4064 + asm volatile ("oi 0(%1),0x04"
4065 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4068 + asm volatile ("oi 0(%1),0x08"
4069 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4072 + asm volatile ("oi 0(%1),0x10"
4073 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4076 + asm volatile ("oi 0(%1),0x20"
4077 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4080 + asm volatile ("oi 0(%1),0x40"
4081 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4084 + asm volatile ("oi 0(%1),0x80"
4085 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4090 #define set_bit_simple(nr,addr) \
4091 @@ -323,76 +268,58 @@
4093 * fast, non-SMP clear_bit routine
4095 -static __inline__ void
4096 -__clear_bit(int nr, volatile void * addr)
4098 +__clear_bit(int nr, volatile void *ptr)
4100 - unsigned long reg1, reg2;
4101 - __asm__ __volatile__(
4107 - " la %1,0(%1,%3)\n"
4108 - " la %0,0(%0,%4)\n"
4109 - " nc 0(1,%1),0(%0)"
4110 - : "=&a" (reg1), "=&a" (reg2)
4111 - : "r" (nr), "a" (addr), "a" (&_ni_bitmap) : "cc", "memory" );
4114 -static __inline__ void
4115 -__constant_clear_bit(const int nr, volatile void * addr)
4119 - __asm__ __volatile__ ("la 1,%0\n\t"
4121 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4122 - : : "1", "cc", "memory" );
4125 - __asm__ __volatile__ ("la 1,%0\n\t"
4127 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4128 - : : "1", "cc", "memory" );
4131 - __asm__ __volatile__ ("la 1,%0\n\t"
4133 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4134 - : : "1", "cc", "memory" );
4137 - __asm__ __volatile__ ("la 1,%0\n\t"
4139 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4140 - : : "1", "cc", "memory" );
4143 - __asm__ __volatile__ ("la 1,%0\n\t"
4145 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4146 - : : "cc", "memory" );
4149 - __asm__ __volatile__ ("la 1,%0\n\t"
4151 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4152 - : : "1", "cc", "memory" );
4155 - __asm__ __volatile__ ("la 1,%0\n\t"
4157 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4158 - : : "1", "cc", "memory" );
4161 - __asm__ __volatile__ ("la 1,%0\n\t"
4163 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4164 - : : "1", "cc", "memory" );
4167 + unsigned long addr;
4169 + addr = (unsigned long) ptr + ((nr ^ 24) >> 3);
4170 + asm volatile("nc 0(1,%1),0(%2)"
4171 + : "+m" (*(char *) addr)
4172 + : "a" (addr), "a" (_ni_bitmap + (nr & 7))
4177 +__constant_clear_bit(const int nr, volatile void *ptr)
4179 + unsigned long addr;
4181 + addr = ((unsigned long) ptr) + ((nr >> 3) ^ 3);
4184 + asm volatile ("ni 0(%1),0xFE"
4185 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4188 + asm volatile ("ni 0(%1),0xFD"
4189 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4192 + asm volatile ("ni 0(%1),0xFB"
4193 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4196 + asm volatile ("ni 0(%1),0xF7"
4197 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4200 + asm volatile ("ni 0(%1),0xEF"
4201 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4204 + asm volatile ("ni 0(%1),0xDF"
4205 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4208 + asm volatile ("ni 0(%1),0xBF"
4209 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4212 + asm volatile ("ni 0(%1),0x7F"
4213 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4218 #define clear_bit_simple(nr,addr) \
4219 @@ -403,75 +330,57 @@
4221 * fast, non-SMP change_bit routine
4223 -static __inline__ void __change_bit(int nr, volatile void * addr)
4224 +static inline void __change_bit(int nr, volatile void *ptr)
4226 - unsigned long reg1, reg2;
4227 - __asm__ __volatile__(
4233 - " la %1,0(%1,%3)\n"
4234 - " la %0,0(%0,%4)\n"
4235 - " xc 0(1,%1),0(%0)"
4236 - : "=&a" (reg1), "=&a" (reg2)
4237 - : "r" (nr), "a" (addr), "a" (&_oi_bitmap) : "cc", "memory" );
4240 -static __inline__ void
4241 -__constant_change_bit(const int nr, volatile void * addr)
4245 - __asm__ __volatile__ ("la 1,%0\n\t"
4247 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4248 - : : "cc", "memory" );
4251 - __asm__ __volatile__ ("la 1,%0\n\t"
4253 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4254 - : : "cc", "memory" );
4257 - __asm__ __volatile__ ("la 1,%0\n\t"
4259 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4260 - : : "cc", "memory" );
4263 - __asm__ __volatile__ ("la 1,%0\n\t"
4265 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4266 - : : "cc", "memory" );
4269 - __asm__ __volatile__ ("la 1,%0\n\t"
4271 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4272 - : : "cc", "memory" );
4275 - __asm__ __volatile__ ("la 1,%0\n\t"
4277 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4278 - : : "1", "cc", "memory" );
4281 - __asm__ __volatile__ ("la 1,%0\n\t"
4283 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4284 - : : "1", "cc", "memory" );
4287 - __asm__ __volatile__ ("la 1,%0\n\t"
4289 - : "=m" (*((volatile char *) addr + ((nr>>3)^3)))
4290 - : : "1", "cc", "memory" );
4293 + unsigned long addr;
4295 + addr = (unsigned long) ptr + ((nr ^ 24) >> 3);
4296 + asm volatile("xc 0(1,%1),0(%2)"
4297 + : "+m" (*(char *) addr)
4298 + : "a" (addr), "a" (_oi_bitmap + (nr & 7))
4303 +__constant_change_bit(const int nr, volatile void *ptr)
4305 + unsigned long addr;
4307 + addr = ((unsigned long) ptr) + ((nr >> 3) ^ 3);
4310 + asm volatile ("xi 0(%1),0x01"
4311 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4314 + asm volatile ("xi 0(%1),0x02"
4315 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4318 + asm volatile ("xi 0(%1),0x04"
4319 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4322 + asm volatile ("xi 0(%1),0x08"
4323 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4326 + asm volatile ("xi 0(%1),0x10"
4327 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4330 + asm volatile ("xi 0(%1),0x20"
4331 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4334 + asm volatile ("xi 0(%1),0x40"
4335 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4338 + asm volatile ("xi 0(%1),0x80"
4339 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
4344 #define change_bit_simple(nr,addr) \
4345 @@ -482,74 +391,54 @@
4347 * fast, non-SMP test_and_set_bit routine
4349 -static __inline__ int test_and_set_bit_simple(int nr, volatile void * addr)
4350 +static inline int test_and_set_bit_simple(int nr, volatile void *ptr)
4352 - unsigned long reg1, reg2;
4354 - __asm__ __volatile__(
4360 - " la %1,0(%1,%4)\n"
4363 - " la %2,0(%2,%5)\n"
4364 - " oc 0(1,%1),0(%2)"
4365 - : "=d&" (oldbit), "=&a" (reg1), "=&a" (reg2)
4366 - : "r" (nr), "a" (addr), "a" (&_oi_bitmap) : "cc", "memory" );
4367 - return oldbit & 1;
4368 + unsigned long addr;
4371 + addr = (unsigned long) ptr + ((nr ^ 24) >> 3);
4372 + ch = *(unsigned char *) addr;
4373 + asm volatile("oc 0(1,%1),0(%2)"
4374 + : "+m" (*(char *) addr)
4375 + : "a" (addr), "a" (_oi_bitmap + (nr & 7))
4377 + return (ch >> (nr & 7)) & 1;
4379 #define __test_and_set_bit(X,Y) test_and_set_bit_simple(X,Y)
4382 * fast, non-SMP test_and_clear_bit routine
4384 -static __inline__ int test_and_clear_bit_simple(int nr, volatile void * addr)
4385 +static inline int test_and_clear_bit_simple(int nr, volatile void *ptr)
4387 - unsigned long reg1, reg2;
4389 + unsigned long addr;
4392 - __asm__ __volatile__(
4398 - " la %1,0(%1,%4)\n"
4401 - " la %2,0(%2,%5)\n"
4402 - " nc 0(1,%1),0(%2)"
4403 - : "=d&" (oldbit), "=&a" (reg1), "=&a" (reg2)
4404 - : "r" (nr), "a" (addr), "a" (&_ni_bitmap) : "cc", "memory" );
4405 - return oldbit & 1;
4406 + addr = (unsigned long) ptr + ((nr ^ 24) >> 3);
4407 + ch = *(unsigned char *) addr;
4408 + asm volatile("nc 0(1,%1),0(%2)"
4409 + : "+m" (*(char *) addr)
4410 + : "a" (addr), "a" (_ni_bitmap + (nr & 7))
4412 + return (ch >> (nr & 7)) & 1;
4414 #define __test_and_clear_bit(X,Y) test_and_clear_bit_simple(X,Y)
4417 * fast, non-SMP test_and_change_bit routine
4419 -static __inline__ int test_and_change_bit_simple(int nr, volatile void * addr)
4420 +static inline int test_and_change_bit_simple(int nr, volatile void *ptr)
4422 - unsigned long reg1, reg2;
4424 + unsigned long addr;
4427 - __asm__ __volatile__(
4433 - " la %1,0(%1,%4)\n"
4436 - " la %2,0(%2,%5)\n"
4437 - " xc 0(1,%1),0(%2)"
4438 - : "=d&" (oldbit), "=&a" (reg1), "=&a" (reg2)
4439 - : "r" (nr), "a" (addr), "a" (&_oi_bitmap) : "cc", "memory" );
4440 - return oldbit & 1;
4441 + addr = (unsigned long) ptr + ((nr ^ 24) >> 3);
4442 + ch = *(unsigned char *) addr;
4443 + asm volatile("xc 0(1,%1),0(%2)"
4444 + : "+m" (*(char *) addr)
4445 + : "a" (addr), "a" (_oi_bitmap + (nr & 7))
4447 + return (ch >> (nr & 7)) & 1;
4449 #define __test_and_change_bit(X,Y) test_and_change_bit_simple(X,Y)
4451 @@ -574,25 +463,17 @@
4452 * This routine doesn't need to be atomic.
4455 -static __inline__ int __test_bit(int nr, volatile void * addr)
4456 +static inline int __test_bit(int nr, volatile void *ptr)
4458 - unsigned long reg1, reg2;
4460 + unsigned long addr;
4463 - __asm__ __volatile__(
4469 - " ic %0,0(%2,%4)\n"
4471 - : "=d&" (oldbit), "=&a" (reg1), "=&a" (reg2)
4472 - : "r" (nr), "a" (addr) : "cc" );
4473 - return oldbit & 1;
4474 + addr = (unsigned long) ptr + ((nr ^ 24) >> 3);
4475 + ch = *(unsigned char *) addr;
4476 + return (ch >> (nr & 7)) & 1;
4479 -static __inline__ int __constant_test_bit(int nr, volatile void * addr) {
4480 +static inline int __constant_test_bit(int nr, volatile void * addr) {
4481 return (((volatile char *) addr)[(nr>>3)^3] & (1<<(nr&7))) != 0;
4486 * Find-bit routines..
4488 -static __inline__ int find_first_zero_bit(void * addr, unsigned size)
4489 +static inline int find_first_zero_bit(void * addr, unsigned size)
4491 unsigned long cmp, count;
4493 @@ -642,7 +523,45 @@
4494 return (res < size) ? res : size;
4497 -static __inline__ int find_next_zero_bit (void * addr, int size, int offset)
4498 +static inline int find_first_bit(void * addr, unsigned size)
4500 + unsigned long cmp, count;
4505 + __asm__(" slr %1,%1\n"
4510 + "0: c %1,0(%0,%4)\n"
4516 + "1: l %2,0(%0,%4)\n"
4519 + " tml %2,0xffff\n"
4523 + "2: tml %2,0x00ff\n"
4528 + " ic %2,0(%2,%5)\n"
4531 + : "=&a" (res), "=&d" (cmp), "=&a" (count)
4532 + : "a" (size), "a" (addr), "a" (&_sb_findmap) : "cc" );
4533 + return (res < size) ? res : size;
4536 +static inline int find_next_zero_bit (void * addr, int size, int offset)
4538 unsigned long * p = ((unsigned long *) addr) + (offset >> 5);
4539 unsigned long bitvec, reg;
4540 @@ -680,11 +599,49 @@
4541 return (offset + res);
4544 +static inline int find_next_bit (void * addr, int size, int offset)
4546 + unsigned long * p = ((unsigned long *) addr) + (offset >> 5);
4547 + unsigned long bitvec, reg;
4548 + int set, bit = offset & 31, res;
4552 + * Look for set bit in first word
4554 + bitvec = (*p) >> bit;
4555 + __asm__(" slr %0,%0\n"
4557 + " tml %1,0xffff\n"
4561 + "0: tml %1,0x00ff\n"
4566 + " ic %1,0(%1,%3)\n"
4568 + : "=&d" (set), "+a" (bitvec), "=&d" (reg)
4569 + : "a" (&_sb_findmap) : "cc" );
4570 + if (set < (32 - bit))
4571 + return set + offset;
4572 + offset += 32 - bit;
4576 + * No set bit yet, search remaining full words for a bit
4578 + res = find_first_bit (p, size - 32 * (p - (unsigned long *) addr));
4579 + return (offset + res);
4583 * ffz = Find First Zero in word. Undefined if no zero exists,
4584 * so code should check against ~0UL first..
4586 -static __inline__ unsigned long ffz(unsigned long word)
4587 +static inline unsigned long ffz(unsigned long word)
4591 @@ -708,40 +665,109 @@
4595 + * __ffs = find first bit in word. Undefined if no bit exists,
4596 + * so code should check against 0UL first..
4598 +static inline unsigned long __ffs(unsigned long word)
4600 + unsigned long reg, result;
4602 + __asm__(" slr %0,%0\n"
4604 + " tml %1,0xffff\n"
4608 + "0: tml %1,0x00ff\n"
4613 + " ic %1,0(%1,%3)\n"
4615 + : "=&d" (result), "+a" (word), "=&d" (reg)
4616 + : "a" (&_sb_findmap) : "cc" );
4621 + * Every architecture must define this function. It's the fastest
4622 + * way of searching a 140-bit bitmap where the first 100 bits are
4623 + * unlikely to be set. It's guaranteed that at least one of the 140
4624 + * bits is cleared.
4626 +static inline int sched_find_first_bit(unsigned long *b)
4628 + return find_first_bit(b, 140);
4632 * ffs: find first bit set. This is defined the same way as
4633 * the libc and compiler builtin ffs routines, therefore
4634 * differs in spirit from the above ffz (man ffs).
4637 -extern int __inline__ ffs (int x)
4638 +extern int inline ffs (int x)
4645 - __asm__(" slr %0,%0\n"
4646 - " tml %1,0xffff\n"
4648 + __asm__(" tml %1,0xffff\n"
4653 "0: tml %1,0x00ff\n"
4658 "1: tml %1,0x000f\n"
4663 "2: tml %1,0x0003\n"
4668 "3: tml %1,0x0001\n"
4672 : "=&d" (r), "+d" (x) : : "cc" );
4678 + * fls: find last bit set.
4680 +extern __inline__ int fls(int x)
4686 + __asm__(" tmh %1,0xffff\n"
4690 + "0: tmh %1,0xff00\n"
4694 + "1: tmh %1,0xf000\n"
4698 + "2: tmh %1,0xc000\n"
4702 + "3: tmh %1,0x8000\n"
4706 + : "+d" (r), "+d" (x) : : "cc" );
4712 #define ext2_set_bit(nr, addr) test_and_set_bit((nr)^24, addr)
4713 #define ext2_clear_bit(nr, addr) test_and_clear_bit((nr)^24, addr)
4714 #define ext2_test_bit(nr, addr) test_bit((nr)^24, addr)
4715 -static __inline__ int ext2_find_first_zero_bit(void *vaddr, unsigned size)
4716 +static inline int ext2_find_first_zero_bit(void *vaddr, unsigned size)
4718 unsigned long cmp, count;
4721 return (res < size) ? res : size;
4724 -static __inline__ int
4726 ext2_find_next_zero_bit(void *vaddr, unsigned size, unsigned offset)
4728 unsigned long *addr = vaddr;
4729 diff -urN linux-2.4.20/include/asm-s390x/bitops.h linux-2.4.20-o1/include/asm-s390x/bitops.h
4730 --- linux-2.4.20/include/asm-s390x/bitops.h Sat Aug 3 02:39:45 2002
4731 +++ linux-2.4.20-o1/include/asm-s390x/bitops.h Wed Mar 12 00:41:43 2003
4732 @@ -51,271 +51,220 @@
4733 extern const char _oi_bitmap[];
4734 extern const char _ni_bitmap[];
4735 extern const char _zb_findmap[];
4736 +extern const char _sb_findmap[];
4740 * SMP save set_bit routine based on compare and swap (CS)
4742 -static __inline__ void set_bit_cs(unsigned long nr, volatile void * addr)
4743 +static inline void set_bit_cs(unsigned long nr, volatile void *ptr)
4745 - unsigned long bits, mask;
4746 - __asm__ __volatile__(
4747 + unsigned long addr, old, new, mask;
4749 + addr = (unsigned long) ptr;
4751 - " lghi %2,7\n" /* CS must be aligned on 4 byte b. */
4752 - " ngr %2,%1\n" /* isolate last 2 bits of address */
4753 - " xgr %1,%2\n" /* make addr % 4 == 0 */
4755 - " agr %0,%2\n" /* add alignement to bitnr */
4756 + addr ^= addr & 7; /* align address to 8 */
4757 + nr += (addr & 7) << 3; /* add alignment to bit number */
4760 - " nr %2,%0\n" /* make shift value */
4764 - " la %1,0(%0,%1)\n" /* calc. address for CS */
4765 - " sllg %3,%3,0(%2)\n" /* make OR mask */
4767 - "0: lgr %2,%0\n" /* CS loop starts here */
4768 - " ogr %2,%3\n" /* set bit */
4769 - " csg %0,%2,0(%1)\n"
4771 - : "+a" (nr), "+a" (addr), "=&a" (bits), "=&d" (mask) :
4772 - : "cc", "memory" );
4773 + addr += (nr ^ (nr & 63)) >> 3; /* calculate address for CS */
4774 + mask = 1UL << (nr & 63); /* make OR mask */
4779 + " csg %0,%1,0(%4)\n"
4781 + : "=&d" (old), "=&d" (new), "+m" (*(unsigned long *) addr)
4782 + : "d" (mask), "a" (addr)
4787 * SMP save clear_bit routine based on compare and swap (CS)
4789 -static __inline__ void clear_bit_cs(unsigned long nr, volatile void * addr)
4790 +static inline void clear_bit_cs(unsigned long nr, volatile void *ptr)
4792 - unsigned long bits, mask;
4793 - __asm__ __volatile__(
4794 + unsigned long addr, old, new, mask;
4796 + addr = (unsigned long) ptr;
4798 - " lghi %2,7\n" /* CS must be aligned on 4 byte b. */
4799 - " ngr %2,%1\n" /* isolate last 2 bits of address */
4800 - " xgr %1,%2\n" /* make addr % 4 == 0 */
4802 - " agr %0,%2\n" /* add alignement to bitnr */
4803 + addr ^= addr & 7; /* align address to 8 */
4804 + nr += (addr & 7) << 3; /* add alignment to bit number */
4807 - " nr %2,%0\n" /* make shift value */
4811 - " la %1,0(%0,%1)\n" /* calc. address for CS */
4813 - " rllg %3,%3,0(%2)\n" /* make AND mask */
4815 - "0: lgr %2,%0\n" /* CS loop starts here */
4816 - " ngr %2,%3\n" /* clear bit */
4817 - " csg %0,%2,0(%1)\n"
4819 - : "+a" (nr), "+a" (addr), "=&a" (bits), "=&d" (mask) :
4820 - : "cc", "memory" );
4821 + addr += (nr ^ (nr & 63)) >> 3; /* calculate address for CS */
4822 + mask = ~(1UL << (nr & 63)); /* make AND mask */
4827 + " csg %0,%1,0(%4)\n"
4829 + : "=&d" (old), "=&d" (new), "+m" (*(unsigned long *) addr)
4830 + : "d" (mask), "a" (addr)
4835 * SMP save change_bit routine based on compare and swap (CS)
4837 -static __inline__ void change_bit_cs(unsigned long nr, volatile void * addr)
4838 +static inline void change_bit_cs(unsigned long nr, volatile void *ptr)
4840 - unsigned long bits, mask;
4841 - __asm__ __volatile__(
4842 + unsigned long addr, old, new, mask;
4844 + addr = (unsigned long) ptr;
4846 - " lghi %2,7\n" /* CS must be aligned on 4 byte b. */
4847 - " ngr %2,%1\n" /* isolate last 2 bits of address */
4848 - " xgr %1,%2\n" /* make addr % 4 == 0 */
4850 - " agr %0,%2\n" /* add alignement to bitnr */
4851 + addr ^= addr & 7; /* align address to 8 */
4852 + nr += (addr & 7) << 3; /* add alignment to bit number */
4855 - " nr %2,%0\n" /* make shift value */
4859 - " la %1,0(%0,%1)\n" /* calc. address for CS */
4860 - " sllg %3,%3,0(%2)\n" /* make XR mask */
4862 - "0: lgr %2,%0\n" /* CS loop starts here */
4863 - " xgr %2,%3\n" /* change bit */
4864 - " csg %0,%2,0(%1)\n"
4866 - : "+a" (nr), "+a" (addr), "=&a" (bits), "=&d" (mask) :
4867 - : "cc", "memory" );
4868 + addr += (nr ^ (nr & 63)) >> 3; /* calculate address for CS */
4869 + mask = 1UL << (nr & 63); /* make XOR mask */
4874 + " csg %0,%1,0(%4)\n"
4876 + : "=&d" (old), "=&d" (new), "+m" (*(unsigned long *) addr)
4877 + : "d" (mask), "a" (addr)
4882 * SMP save test_and_set_bit routine based on compare and swap (CS)
4884 -static __inline__ int
4885 -test_and_set_bit_cs(unsigned long nr, volatile void * addr)
4887 +test_and_set_bit_cs(unsigned long nr, volatile void *ptr)
4889 - unsigned long bits, mask;
4890 - __asm__ __volatile__(
4891 + unsigned long addr, old, new, mask;
4893 + addr = (unsigned long) ptr;
4895 - " lghi %2,7\n" /* CS must be aligned on 4 byte b. */
4896 - " ngr %2,%1\n" /* isolate last 2 bits of address */
4897 - " xgr %1,%2\n" /* make addr % 4 == 0 */
4899 - " agr %0,%2\n" /* add alignement to bitnr */
4900 + addr ^= addr & 7; /* align address to 8 */
4901 + nr += (addr & 7) << 3; /* add alignment to bit number */
4904 - " nr %2,%0\n" /* make shift value */
4908 - " la %1,0(%0,%1)\n" /* calc. address for CS */
4909 - " sllg %3,%3,0(%2)\n" /* make OR mask */
4911 - "0: lgr %2,%0\n" /* CS loop starts here */
4912 - " ogr %2,%3\n" /* set bit */
4913 - " csg %0,%2,0(%1)\n"
4915 - " ngr %0,%3\n" /* isolate old bit */
4916 - : "+a" (nr), "+a" (addr), "=&a" (bits), "=&d" (mask) :
4917 - : "cc", "memory" );
4919 + addr += (nr ^ (nr & 63)) >> 3; /* calculate address for CS */
4920 + mask = 1UL << (nr & 63); /* make OR/test mask */
4925 + " csg %0,%1,0(%4)\n"
4927 + : "=&d" (old), "=&d" (new), "+m" (*(unsigned long *) addr)
4928 + : "d" (mask), "a" (addr)
4930 + return (old & mask) != 0;
4934 * SMP save test_and_clear_bit routine based on compare and swap (CS)
4936 -static __inline__ int
4937 -test_and_clear_bit_cs(unsigned long nr, volatile void * addr)
4939 +test_and_clear_bit_cs(unsigned long nr, volatile void *ptr)
4941 - unsigned long bits, mask;
4942 - __asm__ __volatile__(
4943 + unsigned long addr, old, new, mask;
4945 + addr = (unsigned long) ptr;
4947 - " lghi %2,7\n" /* CS must be aligned on 4 byte b. */
4948 - " ngr %2,%1\n" /* isolate last 2 bits of address */
4949 - " xgr %1,%2\n" /* make addr % 4 == 0 */
4951 - " agr %0,%2\n" /* add alignement to bitnr */
4952 + addr ^= addr & 7; /* align address to 8 */
4953 + nr += (addr & 7) << 3; /* add alignment to bit number */
4956 - " nr %2,%0\n" /* make shift value */
4960 - " la %1,0(%0,%1)\n" /* calc. address for CS */
4961 - " rllg %3,%3,0(%2)\n" /* make AND mask */
4963 - "0: lgr %2,%0\n" /* CS loop starts here */
4964 - " ngr %2,%3\n" /* clear bit */
4965 - " csg %0,%2,0(%1)\n"
4967 - " xgr %0,%2\n" /* isolate old bit */
4968 - : "+a" (nr), "+a" (addr), "=&a" (bits), "=&d" (mask) :
4969 - : "cc", "memory" );
4971 + addr += (nr ^ (nr & 63)) >> 3; /* calculate address for CS */
4972 + mask = ~(1UL << (nr & 63)); /* make AND mask */
4977 + " csg %0,%1,0(%4)\n"
4979 + : "=&d" (old), "=&d" (new), "+m" (*(unsigned long *) addr)
4980 + : "d" (mask), "a" (addr)
4982 + return (old ^ new) != 0;
4986 * SMP save test_and_change_bit routine based on compare and swap (CS)
4988 -static __inline__ int
4989 -test_and_change_bit_cs(unsigned long nr, volatile void * addr)
4991 +test_and_change_bit_cs(unsigned long nr, volatile void *ptr)
4993 - unsigned long bits, mask;
4994 - __asm__ __volatile__(
4995 + unsigned long addr, old, new, mask;
4997 + addr = (unsigned long) ptr;
4999 - " lghi %2,7\n" /* CS must be aligned on 4 byte b. */
5000 - " ngr %2,%1\n" /* isolate last 2 bits of address */
5001 - " xgr %1,%2\n" /* make addr % 4 == 0 */
5003 - " agr %0,%2\n" /* add alignement to bitnr */
5004 + addr ^= addr & 7; /* align address to 8 */
5005 + nr += (addr & 7) << 3; /* add alignment to bit number */
5008 - " nr %2,%0\n" /* make shift value */
5012 - " la %1,0(%0,%1)\n" /* calc. address for CS */
5013 - " sllg %3,%3,0(%2)\n" /* make OR mask */
5015 - "0: lgr %2,%0\n" /* CS loop starts here */
5016 - " xgr %2,%3\n" /* change bit */
5017 - " csg %0,%2,0(%1)\n"
5019 - " ngr %0,%3\n" /* isolate old bit */
5020 - : "+a" (nr), "+a" (addr), "=&a" (bits), "=&d" (mask) :
5021 - : "cc", "memory" );
5023 + addr += (nr ^ (nr & 63)) >> 3; /* calculate address for CS */
5024 + mask = 1UL << (nr & 63); /* make XOR mask */
5029 + " csg %0,%1,0(%4)\n"
5031 + : "=&d" (old), "=&d" (new), "+m" (*(unsigned long *) addr)
5032 + : "d" (mask), "a" (addr)
5034 + return (old & mask) != 0;
5036 #endif /* CONFIG_SMP */
5039 * fast, non-SMP set_bit routine
5041 -static __inline__ void __set_bit(unsigned long nr, volatile void * addr)
5042 +static inline void __set_bit(unsigned long nr, volatile void *ptr)
5044 - unsigned long reg1, reg2;
5045 - __asm__ __volatile__(
5051 - " la %1,0(%1,%3)\n"
5052 - " la %0,0(%0,%4)\n"
5053 - " oc 0(1,%1),0(%0)"
5054 - : "=&a" (reg1), "=&a" (reg2)
5055 - : "a" (nr), "a" (addr), "a" (&_oi_bitmap) : "cc", "memory" );
5058 -static __inline__ void
5059 -__constant_set_bit(const unsigned long nr, volatile void * addr)
5063 - __asm__ __volatile__ ("la 1,%0\n\t"
5065 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5066 - : : "1", "cc", "memory");
5069 - __asm__ __volatile__ ("la 1,%0\n\t"
5071 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5072 - : : "1", "cc", "memory" );
5075 - __asm__ __volatile__ ("la 1,%0\n\t"
5077 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5078 - : : "1", "cc", "memory" );
5081 - __asm__ __volatile__ ("la 1,%0\n\t"
5083 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5084 - : : "1", "cc", "memory" );
5087 - __asm__ __volatile__ ("la 1,%0\n\t"
5089 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5090 - : : "1", "cc", "memory" );
5093 - __asm__ __volatile__ ("la 1,%0\n\t"
5095 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5096 - : : "1", "cc", "memory" );
5099 - __asm__ __volatile__ ("la 1,%0\n\t"
5101 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5102 - : : "1", "cc", "memory" );
5105 - __asm__ __volatile__ ("la 1,%0\n\t"
5107 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5108 - : : "1", "cc", "memory" );
5111 + unsigned long addr;
5113 + addr = (unsigned long) ptr + ((nr ^ 56) >> 3);
5114 + asm volatile("oc 0(1,%1),0(%2)"
5115 + : "+m" (*(char *) addr)
5116 + : "a" (addr), "a" (_oi_bitmap + (nr & 7))
5121 +__constant_set_bit(const unsigned long nr, volatile void *ptr)
5123 + unsigned long addr;
5125 + addr = ((unsigned long) ptr) + ((nr >> 3) ^ 7);
5128 + asm volatile ("oi 0(%1),0x01"
5129 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5132 + asm volatile ("oi 0(%1),0x02"
5133 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5136 + asm volatile ("oi 0(%1),0x04"
5137 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5140 + asm volatile ("oi 0(%1),0x08"
5141 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5144 + asm volatile ("oi 0(%1),0x10"
5145 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5148 + asm volatile ("oi 0(%1),0x20"
5149 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5152 + asm volatile ("oi 0(%1),0x40"
5153 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5156 + asm volatile ("oi 0(%1),0x80"
5157 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5162 #define set_bit_simple(nr,addr) \
5163 @@ -326,76 +275,58 @@
5165 * fast, non-SMP clear_bit routine
5167 -static __inline__ void
5168 -__clear_bit(unsigned long nr, volatile void * addr)
5170 +__clear_bit(unsigned long nr, volatile void *ptr)
5172 - unsigned long reg1, reg2;
5173 - __asm__ __volatile__(
5179 - " la %1,0(%1,%3)\n"
5180 - " la %0,0(%0,%4)\n"
5181 - " nc 0(1,%1),0(%0)"
5182 - : "=&a" (reg1), "=&a" (reg2)
5183 - : "d" (nr), "a" (addr), "a" (&_ni_bitmap) : "cc", "memory" );
5186 -static __inline__ void
5187 -__constant_clear_bit(const unsigned long nr, volatile void * addr)
5191 - __asm__ __volatile__ ("la 1,%0\n\t"
5193 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5194 - : : "1", "cc", "memory" );
5197 - __asm__ __volatile__ ("la 1,%0\n\t"
5199 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5200 - : : "1", "cc", "memory" );
5203 - __asm__ __volatile__ ("la 1,%0\n\t"
5205 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5206 - : : "1", "cc", "memory" );
5209 - __asm__ __volatile__ ("la 1,%0\n\t"
5211 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5212 - : : "1", "cc", "memory" );
5215 - __asm__ __volatile__ ("la 1,%0\n\t"
5217 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5218 - : : "cc", "memory" );
5221 - __asm__ __volatile__ ("la 1,%0\n\t"
5223 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5224 - : : "1", "cc", "memory" );
5227 - __asm__ __volatile__ ("la 1,%0\n\t"
5229 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5230 - : : "1", "cc", "memory" );
5233 - __asm__ __volatile__ ("la 1,%0\n\t"
5235 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5236 - : : "1", "cc", "memory" );
5239 + unsigned long addr;
5241 + addr = (unsigned long) ptr + ((nr ^ 56) >> 3);
5242 + asm volatile("nc 0(1,%1),0(%2)"
5243 + : "+m" (*(char *) addr)
5244 + : "a" (addr), "a" (_ni_bitmap + (nr & 7))
5249 +__constant_clear_bit(const unsigned long nr, volatile void *ptr)
5251 + unsigned long addr;
5253 + addr = ((unsigned long) ptr) + ((nr >> 3) ^ 7);
5256 + asm volatile ("ni 0(%1),0xFE"
5257 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5260 + asm volatile ("ni 0(%1),0xFD"
5261 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5264 + asm volatile ("ni 0(%1),0xFB"
5265 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5268 + asm volatile ("ni 0(%1),0xF7"
5269 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5272 + asm volatile ("ni 0(%1),0xEF"
5273 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5276 + asm volatile ("ni 0(%1),0xDF"
5277 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5280 + asm volatile ("ni 0(%1),0xBF"
5281 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5284 + asm volatile ("ni 0(%1),0x7F"
5285 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5290 #define clear_bit_simple(nr,addr) \
5291 @@ -406,75 +337,57 @@
5293 * fast, non-SMP change_bit routine
5295 -static __inline__ void __change_bit(unsigned long nr, volatile void * addr)
5296 +static inline void __change_bit(unsigned long nr, volatile void *ptr)
5298 - unsigned long reg1, reg2;
5299 - __asm__ __volatile__(
5305 - " la %1,0(%1,%3)\n"
5306 - " la %0,0(%0,%4)\n"
5307 - " xc 0(1,%1),0(%0)"
5308 - : "=&a" (reg1), "=&a" (reg2)
5309 - : "d" (nr), "a" (addr), "a" (&_oi_bitmap) : "cc", "memory" );
5312 -static __inline__ void
5313 -__constant_change_bit(const unsigned long nr, volatile void * addr)
5317 - __asm__ __volatile__ ("la 1,%0\n\t"
5319 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5320 - : : "cc", "memory" );
5323 - __asm__ __volatile__ ("la 1,%0\n\t"
5325 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5326 - : : "cc", "memory" );
5329 - __asm__ __volatile__ ("la 1,%0\n\t"
5331 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5332 - : : "cc", "memory" );
5335 - __asm__ __volatile__ ("la 1,%0\n\t"
5337 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5338 - : : "cc", "memory" );
5341 - __asm__ __volatile__ ("la 1,%0\n\t"
5343 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5344 - : : "cc", "memory" );
5347 - __asm__ __volatile__ ("la 1,%0\n\t"
5349 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5350 - : : "1", "cc", "memory" );
5353 - __asm__ __volatile__ ("la 1,%0\n\t"
5355 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5356 - : : "1", "cc", "memory" );
5359 - __asm__ __volatile__ ("la 1,%0\n\t"
5361 - : "=m" (*((volatile char *) addr + ((nr>>3)^7)))
5362 - : : "1", "cc", "memory" );
5365 + unsigned long addr;
5367 + addr = (unsigned long) ptr + ((nr ^ 56) >> 3);
5368 + asm volatile("xc 0(1,%1),0(%2)"
5369 + : "+m" (*(char *) addr)
5370 + : "a" (addr), "a" (_oi_bitmap + (nr & 7))
5375 +__constant_change_bit(const unsigned long nr, volatile void *ptr)
5377 + unsigned long addr;
5379 + addr = ((unsigned long) ptr) + ((nr >> 3) ^ 7);
5382 + asm volatile ("xi 0(%1),0x01"
5383 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5386 + asm volatile ("xi 0(%1),0x02"
5387 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5390 + asm volatile ("xi 0(%1),0x04"
5391 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5394 + asm volatile ("xi 0(%1),0x08"
5395 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5398 + asm volatile ("xi 0(%1),0x10"
5399 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5402 + asm volatile ("xi 0(%1),0x20"
5403 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5406 + asm volatile ("xi 0(%1),0x40"
5407 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5410 + asm volatile ("xi 0(%1),0x80"
5411 + : "+m" (*(char *) addr) : "a" (addr) : "cc" );
5416 #define change_bit_simple(nr,addr) \
5417 @@ -485,77 +398,57 @@
5419 * fast, non-SMP test_and_set_bit routine
5421 -static __inline__ int
5422 -test_and_set_bit_simple(unsigned long nr, volatile void * addr)
5424 +test_and_set_bit_simple(unsigned long nr, volatile void *ptr)
5426 - unsigned long reg1, reg2;
5428 - __asm__ __volatile__(
5434 - " la %1,0(%1,%4)\n"
5437 - " la %2,0(%2,%5)\n"
5438 - " oc 0(1,%1),0(%2)"
5439 - : "=&d" (oldbit), "=&a" (reg1), "=&a" (reg2)
5440 - : "d" (nr), "a" (addr), "a" (&_oi_bitmap) : "cc", "memory" );
5441 - return oldbit & 1;
5442 + unsigned long addr;
5445 + addr = (unsigned long) ptr + ((nr ^ 56) >> 3);
5446 + ch = *(unsigned char *) addr;
5447 + asm volatile("oc 0(1,%1),0(%2)"
5448 + : "+m" (*(char *) addr)
5449 + : "a" (addr), "a" (_oi_bitmap + (nr & 7))
5451 + return (ch >> (nr & 7)) & 1;
5453 #define __test_and_set_bit(X,Y) test_and_set_bit_simple(X,Y)
5456 * fast, non-SMP test_and_clear_bit routine
5458 -static __inline__ int
5459 -test_and_clear_bit_simple(unsigned long nr, volatile void * addr)
5461 +test_and_clear_bit_simple(unsigned long nr, volatile void *ptr)
5463 - unsigned long reg1, reg2;
5465 + unsigned long addr;
5468 - __asm__ __volatile__(
5474 - " la %1,0(%1,%4)\n"
5477 - " la %2,0(%2,%5)\n"
5478 - " nc 0(1,%1),0(%2)"
5479 - : "=&d" (oldbit), "=&a" (reg1), "=&a" (reg2)
5480 - : "d" (nr), "a" (addr), "a" (&_ni_bitmap) : "cc", "memory" );
5481 - return oldbit & 1;
5482 + addr = (unsigned long) ptr + ((nr ^ 56) >> 3);
5483 + ch = *(unsigned char *) addr;
5484 + asm volatile("nc 0(1,%1),0(%2)"
5485 + : "+m" (*(char *) addr)
5486 + : "a" (addr), "a" (_ni_bitmap + (nr & 7))
5488 + return (ch >> (nr & 7)) & 1;
5490 #define __test_and_clear_bit(X,Y) test_and_clear_bit_simple(X,Y)
5493 * fast, non-SMP test_and_change_bit routine
5495 -static __inline__ int
5496 -test_and_change_bit_simple(unsigned long nr, volatile void * addr)
5498 +test_and_change_bit_simple(unsigned long nr, volatile void *ptr)
5500 - unsigned long reg1, reg2;
5502 + unsigned long addr;
5505 - __asm__ __volatile__(
5511 - " la %1,0(%1,%4)\n"
5514 - " la %2,0(%2,%5)\n"
5515 - " xc 0(1,%1),0(%2)"
5516 - : "=&d" (oldbit), "=&a" (reg1), "=&a" (reg2)
5517 - : "d" (nr), "a" (addr), "a" (&_oi_bitmap) : "cc", "memory" );
5518 - return oldbit & 1;
5519 + addr = (unsigned long) ptr + ((nr ^ 56) >> 3);
5520 + ch = *(unsigned char *) addr;
5521 + asm volatile("xc 0(1,%1),0(%2)"
5522 + : "+m" (*(char *) addr)
5523 + : "a" (addr), "a" (_oi_bitmap + (nr & 7))
5525 + return (ch >> (nr & 7)) & 1;
5527 #define __test_and_change_bit(X,Y) test_and_change_bit_simple(X,Y)
5529 @@ -580,26 +473,18 @@
5530 * This routine doesn't need to be atomic.
5533 -static __inline__ int __test_bit(unsigned long nr, volatile void * addr)
5534 +static inline int __test_bit(unsigned long nr, volatile void *ptr)
5536 - unsigned long reg1, reg2;
5538 + unsigned long addr;
5541 - __asm__ __volatile__(
5547 - " ic %0,0(%2,%4)\n"
5549 - : "=&d" (oldbit), "=&a" (reg1), "=&a" (reg2)
5550 - : "d" (nr), "a" (addr) : "cc" );
5551 - return oldbit & 1;
5552 + addr = (unsigned long) ptr + ((nr ^ 56) >> 3);
5553 + ch = *(unsigned char *) addr;
5554 + return (ch >> (nr & 7)) & 1;
5557 -static __inline__ int
5558 -__constant_test_bit(unsigned long nr, volatile void * addr) {
5560 +__constant_test_bit(unsigned long nr, volatile void *addr) {
5561 return (((volatile char *) addr)[(nr>>3)^7] & (1<<(nr&7))) != 0;
5566 * Find-bit routines..
5568 -static __inline__ unsigned long
5569 +static inline unsigned long
5570 find_first_zero_bit(void * addr, unsigned long size)
5572 unsigned long res, cmp, count;
5573 @@ -653,7 +538,49 @@
5574 return (res < size) ? res : size;
5577 -static __inline__ unsigned long
5578 +static inline unsigned long
5579 +find_first_bit(void * addr, unsigned long size)
5581 + unsigned long res, cmp, count;
5585 + __asm__(" slgr %1,%1\n"
5590 + "0: cg %1,0(%0,%4)\n"
5596 + "1: lg %2,0(%0,%4)\n"
5601 + " srlg %2,%2,32\n"
5602 + "2: lghi %1,0xff\n"
5603 + " tmll %2,0xffff\n"
5607 + "3: tmll %2,0x00ff\n"
5612 + " ic %2,0(%2,%5)\n"
5615 + : "=&a" (res), "=&d" (cmp), "=&a" (count)
5616 + : "a" (size), "a" (addr), "a" (&_sb_findmap) : "cc" );
5617 + return (res < size) ? res : size;
5620 +static inline unsigned long
5621 find_next_zero_bit (void * addr, unsigned long size, unsigned long offset)
5623 unsigned long * p = ((unsigned long *) addr) + (offset >> 6);
5624 @@ -697,14 +624,56 @@
5625 return (offset + res);
5628 +static inline unsigned long
5629 +find_next_bit (void * addr, unsigned long size, unsigned long offset)
5631 + unsigned long * p = ((unsigned long *) addr) + (offset >> 6);
5632 + unsigned long bitvec, reg;
5633 + unsigned long set, bit = offset & 63, res;
5637 + * Look for zero in first word
5639 + bitvec = (*p) >> bit;
5640 + __asm__(" slgr %0,%0\n"
5644 + " srlg %1,%1,32\n"
5645 + "0: lghi %2,0xff\n"
5646 + " tmll %1,0xffff\n"
5649 + " srlg %1,%1,16\n"
5650 + "1: tmll %1,0x00ff\n"
5655 + " ic %1,0(%1,%3)\n"
5657 + : "=&d" (set), "+a" (bitvec), "=&d" (reg)
5658 + : "a" (&_sb_findmap) : "cc" );
5659 + if (set < (64 - bit))
5660 + return set + offset;
5661 + offset += 64 - bit;
5665 + * No set bit yet, search remaining full words for a bit
5667 + res = find_first_bit (p, size - 64 * (p - (unsigned long *) addr));
5668 + return (offset + res);
5672 * ffz = Find First Zero in word. Undefined if no zero exists,
5673 * so code should check against ~0UL first..
5675 -static __inline__ unsigned long ffz(unsigned long word)
5676 +static inline unsigned long ffz(unsigned long word)
5678 - unsigned long reg;
5680 + unsigned long reg, result;
5682 __asm__(" lhi %2,-1\n"
5684 @@ -730,40 +699,112 @@
5688 + * __ffs = find first bit in word. Undefined if no bit exists,
5689 + * so code should check against 0UL first..
5691 +static inline unsigned long __ffs (unsigned long word)
5693 + unsigned long reg, result;
5695 + __asm__(" slgr %0,%0\n"
5699 + " srlg %1,%1,32\n"
5700 + "0: lghi %2,0xff\n"
5701 + " tmll %1,0xffff\n"
5704 + " srlg %1,%1,16\n"
5705 + "1: tmll %1,0x00ff\n"
5710 + " ic %1,0(%1,%3)\n"
5712 + : "=&d" (result), "+a" (word), "=&d" (reg)
5713 + : "a" (&_sb_findmap) : "cc" );
5718 + * Every architecture must define this function. It's the fastest
5719 + * way of searching a 140-bit bitmap where the first 100 bits are
5720 + * unlikely to be set. It's guaranteed that at least one of the 140
5721 + * bits is cleared.
5723 +static inline int sched_find_first_bit(unsigned long *b)
5725 + return find_first_bit(b, 140);
5729 * ffs: find first bit set. This is defined the same way as
5730 * the libc and compiler builtin ffs routines, therefore
5731 * differs in spirit from the above ffz (man ffs).
5734 -extern int __inline__ ffs (int x)
5735 +extern int inline ffs (int x)
5742 - __asm__(" slr %0,%0\n"
5743 - " tml %1,0xffff\n"
5745 + __asm__(" tml %1,0xffff\n"
5750 "0: tml %1,0x00ff\n"
5755 "1: tml %1,0x000f\n"
5760 "2: tml %1,0x0003\n"
5765 "3: tml %1,0x0001\n"
5769 : "=&d" (r), "+d" (x) : : "cc" );
5775 + * fls: find last bit set.
5777 +extern __inline__ int fls(int x)
5783 + __asm__(" tmh %1,0xffff\n"
5787 + "0: tmh %1,0xff00\n"
5791 + "1: tmh %1,0xf000\n"
5795 + "2: tmh %1,0xc000\n"
5799 + "3: tmh %1,0x8000\n"
5803 + : "+d" (r), "+d" (x) : : "cc" );
5809 #define ext2_set_bit(nr, addr) test_and_set_bit((nr)^56, addr)
5810 #define ext2_clear_bit(nr, addr) test_and_clear_bit((nr)^56, addr)
5811 #define ext2_test_bit(nr, addr) test_bit((nr)^56, addr)
5812 -static __inline__ unsigned long
5813 +static inline unsigned long
5814 ext2_find_first_zero_bit(void *vaddr, unsigned long size)
5816 unsigned long res, cmp, count;
5818 return (res < size) ? res : size;
5821 -static __inline__ unsigned long
5822 +static inline unsigned long
5823 ext2_find_next_zero_bit(void *vaddr, unsigned long size, unsigned long offset)
5825 unsigned long *addr = vaddr;
5826 diff -urN linux-2.4.20/include/asm-sparc/bitops.h linux-2.4.20-o1/include/asm-sparc/bitops.h
5827 --- linux-2.4.20/include/asm-sparc/bitops.h Fri Dec 21 18:42:03 2001
5828 +++ linux-2.4.20-o1/include/asm-sparc/bitops.h Wed Mar 12 00:44:05 2003
5829 @@ -207,6 +207,57 @@
5834 + * __ffs - find first bit in word.
5835 + * @word: The word to search
5837 + * Undefined if no bit exists, so code should check against 0 first.
5839 +static __inline__ int __ffs(unsigned long word)
5843 + if ((word & 0xffff) == 0) {
5847 + if ((word & 0xff) == 0) {
5851 + if ((word & 0xf) == 0) {
5855 + if ((word & 0x3) == 0) {
5859 + if ((word & 0x1) == 0)
5865 + * Every architecture must define this function. It's the fastest
5866 + * way of searching a 140-bit bitmap where the first 100 bits are
5867 + * unlikely to be set. It's guaranteed that at least one of the 140
5868 + * bits is cleared.
5870 +static __inline__ int sched_find_first_bit(unsigned long *b)
5873 + if (unlikely(b[0]))
5874 + return __ffs(b[0]);
5875 + if (unlikely(b[1]))
5876 + return __ffs(b[1]) + 32;
5877 + if (unlikely(b[2]))
5878 + return __ffs(b[2]) + 64;
5880 + return __ffs(b[3]) + 96;
5881 + return __ffs(b[4]) + 128;
5885 * ffs: find first bit set. This is defined the same way as
5886 * the libc and compiler builtin ffs routines, therefore
5887 @@ -323,6 +323,32 @@
5888 #define find_first_zero_bit(addr, size) \
5889 find_next_zero_bit((addr), (size), 0)
5892 + * find_next_bit - find the first set bit in a memory region
5893 + * @addr: The address to base the search on
5894 + * @offset: The bitnumber to start searching at
5895 + * @size: The maximum size to search
5897 + * Scheduler induced bitop, do not use.
5899 +static __inline__ int find_next_bit(unsigned long *addr, int size, int offset)
5901 + unsigned long *p = addr + (offset >> 5);
5902 + int num = offset & ~0x1f;
5903 + unsigned long word;
5906 + word &= ~((1 << (offset & 0x1f)) - 1);
5907 + while (num < size) {
5909 + return __ffs(word) + num;
5917 static __inline__ int test_le_bit(int nr, __const__ void * addr)
5919 __const__ unsigned char *ADDR = (__const__ unsigned char *) addr;
5920 diff -urN linux-2.4.20/include/asm-sparc/system.h linux-2.4.20-o1-preempt/include/asm-sparc/system.h
5921 --- linux-2.4.20/include/asm-sparc/system.h Wed Oct 31 00:08:11 2001
5922 +++ linux-2.4.20-o1-preempt/include/asm-sparc/system.h Tue Feb 18 03:51:30 2003
5925 * SWITCH_ENTER and SWITH_DO_LAZY_FPU do not work yet (e.g. SMP does not work)
5927 -#define prepare_to_switch() do { \
5928 +#define prepare_arch_switch(rq, next) do { \
5929 __asm__ __volatile__( \
5930 ".globl\tflush_patch_switch\nflush_patch_switch:\n\t" \
5931 "save %sp, -0x40, %sp; save %sp, -0x40, %sp; save %sp, -0x40, %sp\n\t" \
5933 "save %sp, -0x40, %sp\n\t" \
5934 "restore; restore; restore; restore; restore; restore; restore"); \
5936 +#define finish_arch_switch(rq, next) do{ }while(0)
5937 +#define task_running(rq, p) ((rq)->curr == (p))
5939 /* Much care has gone into this code, do not touch it.
5941 diff -urN linux-2.4.20/include/asm-sparc64/bitops.h linux-2.4.20-o1/include/asm-sparc64/bitops.h
5942 --- linux-2.4.20/include/asm-sparc64/bitops.h Fri Dec 21 18:42:03 2001
5943 +++ linux-2.4.20-o1/include/asm-sparc64/bitops.h Wed Mar 12 00:41:43 2003
5947 * bitops.h: Bit string operations on the V9.
5949 * Copyright 1996, 1997 David S. Miller (davem@caip.rutgers.edu)
5951 #ifndef _SPARC64_BITOPS_H
5952 #define _SPARC64_BITOPS_H
5954 +#include <linux/compiler.h>
5955 #include <asm/byteorder.h>
5957 -extern long ___test_and_set_bit(unsigned long nr, volatile void *addr);
5958 -extern long ___test_and_clear_bit(unsigned long nr, volatile void *addr);
5959 -extern long ___test_and_change_bit(unsigned long nr, volatile void *addr);
5960 +extern long ___test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
5961 +extern long ___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
5962 +extern long ___test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
5964 #define test_and_set_bit(nr,addr) ({___test_and_set_bit(nr,addr)!=0;})
5965 #define test_and_clear_bit(nr,addr) ({___test_and_clear_bit(nr,addr)!=0;})
5966 @@ -21,109 +22,132 @@
5967 #define change_bit(nr,addr) ((void)___test_and_change_bit(nr,addr))
5969 /* "non-atomic" versions... */
5970 -#define __set_bit(X,Y) \
5971 -do { unsigned long __nr = (X); \
5972 - long *__m = ((long *) (Y)) + (__nr >> 6); \
5973 - *__m |= (1UL << (__nr & 63)); \
5975 -#define __clear_bit(X,Y) \
5976 -do { unsigned long __nr = (X); \
5977 - long *__m = ((long *) (Y)) + (__nr >> 6); \
5978 - *__m &= ~(1UL << (__nr & 63)); \
5980 -#define __change_bit(X,Y) \
5981 -do { unsigned long __nr = (X); \
5982 - long *__m = ((long *) (Y)) + (__nr >> 6); \
5983 - *__m ^= (1UL << (__nr & 63)); \
5985 -#define __test_and_set_bit(X,Y) \
5986 -({ unsigned long __nr = (X); \
5987 - long *__m = ((long *) (Y)) + (__nr >> 6); \
5988 - long __old = *__m; \
5989 - long __mask = (1UL << (__nr & 63)); \
5990 - *__m = (__old | __mask); \
5991 - ((__old & __mask) != 0); \
5993 -#define __test_and_clear_bit(X,Y) \
5994 -({ unsigned long __nr = (X); \
5995 - long *__m = ((long *) (Y)) + (__nr >> 6); \
5996 - long __old = *__m; \
5997 - long __mask = (1UL << (__nr & 63)); \
5998 - *__m = (__old & ~__mask); \
5999 - ((__old & __mask) != 0); \
6001 -#define __test_and_change_bit(X,Y) \
6002 -({ unsigned long __nr = (X); \
6003 - long *__m = ((long *) (Y)) + (__nr >> 6); \
6004 - long __old = *__m; \
6005 - long __mask = (1UL << (__nr & 63)); \
6006 - *__m = (__old ^ __mask); \
6007 - ((__old & __mask) != 0); \
6010 +static __inline__ void __set_bit(int nr, volatile unsigned long *addr)
6012 + volatile unsigned long *m = addr + (nr >> 6);
6014 + *m |= (1UL << (nr & 63));
6017 +static __inline__ void __clear_bit(int nr, volatile unsigned long *addr)
6019 + volatile unsigned long *m = addr + (nr >> 6);
6021 + *m &= ~(1UL << (nr & 63));
6024 +static __inline__ void __change_bit(int nr, volatile unsigned long *addr)
6026 + volatile unsigned long *m = addr + (nr >> 6);
6028 + *m ^= (1UL << (nr & 63));
6031 +static __inline__ int __test_and_set_bit(int nr, volatile unsigned long *addr)
6033 + volatile unsigned long *m = addr + (nr >> 6);
6035 + long mask = (1UL << (nr & 63));
6037 + *m = (old | mask);
6038 + return ((old & mask) != 0);
6041 +static __inline__ int __test_and_clear_bit(int nr, volatile unsigned long *addr)
6043 + volatile unsigned long *m = addr + (nr >> 6);
6045 + long mask = (1UL << (nr & 63));
6047 + *m = (old & ~mask);
6048 + return ((old & mask) != 0);
6051 +static __inline__ int __test_and_change_bit(int nr, volatile unsigned long *addr)
6053 + volatile unsigned long *m = addr + (nr >> 6);
6055 + long mask = (1UL << (nr & 63));
6057 + *m = (old ^ mask);
6058 + return ((old & mask) != 0);
6061 #define smp_mb__before_clear_bit() do { } while(0)
6062 #define smp_mb__after_clear_bit() do { } while(0)
6064 -extern __inline__ int test_bit(int nr, __const__ void *addr)
6065 +static __inline__ int test_bit(int nr, __const__ volatile unsigned long *addr)
6067 - return (1UL & (((__const__ long *) addr)[nr >> 6] >> (nr & 63))) != 0UL;
6068 + return (1UL & ((addr)[nr >> 6] >> (nr & 63))) != 0UL;
6071 /* The easy/cheese version for now. */
6072 -extern __inline__ unsigned long ffz(unsigned long word)
6073 +static __inline__ unsigned long ffz(unsigned long word)
6075 unsigned long result;
6077 -#ifdef ULTRA_HAS_POPULATION_COUNT /* Thanks for nothing Sun... */
6078 - __asm__ __volatile__(
6081 -" xnor %0, %%g1, %%g2\n"
6083 -"1: " : "=&r" (result)
6087 -#if 1 /* def EASY_CHEESE_VERSION */
6094 - unsigned long tmp;
6099 - tmp = ~word & -~word;
6100 - if (!(unsigned)tmp) {
6104 - if (!(unsigned short)tmp) {
6108 - if (!(unsigned char)tmp) {
6112 + * __ffs - find first bit in word.
6113 + * @word: The word to search
6115 + * Undefined if no bit exists, so code should check against 0 first.
6117 +static __inline__ unsigned long __ffs(unsigned long word)
6119 + unsigned long result = 0;
6121 + while (!(word & 1UL)) {
6125 - if (tmp & 0xf0) result += 4;
6126 - if (tmp & 0xcc) result += 2;
6127 - if (tmp & 0xaa) result ++;
6134 + * fls: find last bit set.
6137 +#define fls(x) generic_fls(x)
6142 + * Every architecture must define this function. It's the fastest
6143 + * way of searching a 140-bit bitmap where the first 100 bits are
6144 + * unlikely to be set. It's guaranteed that at least one of the 140
6145 + * bits is cleared.
6147 +static inline int sched_find_first_bit(unsigned long *b)
6149 + if (unlikely(b[0]))
6150 + return __ffs(b[0]);
6151 + if (unlikely(((unsigned int)b[1])))
6152 + return __ffs(b[1]) + 64;
6154 + return __ffs(b[1] >> 32) + 96;
6155 + return __ffs(b[2]) + 128;
6159 * ffs: find first bit set. This is defined the same way as
6160 * the libc and compiler builtin ffs routines, therefore
6161 * differs in spirit from the above ffz (man ffs).
6164 -#define ffs(x) generic_ffs(x)
6165 +static __inline__ int ffs(int x)
6169 + return __ffs((unsigned long)x);
6173 * hweightN: returns the hamming weight (i.e. the number
6176 #ifdef ULTRA_HAS_POPULATION_COUNT
6178 -extern __inline__ unsigned int hweight32(unsigned int w)
6179 +static __inline__ unsigned int hweight32(unsigned int w)
6187 -extern __inline__ unsigned int hweight16(unsigned int w)
6188 +static __inline__ unsigned int hweight16(unsigned int w)
6196 -extern __inline__ unsigned int hweight8(unsigned int w)
6197 +static __inline__ unsigned int hweight8(unsigned int w)
6201 @@ -165,14 +189,69 @@
6203 #endif /* __KERNEL__ */
6206 + * find_next_bit - find the next set bit in a memory region
6207 + * @addr: The address to base the search on
6208 + * @offset: The bitnumber to start searching at
6209 + * @size: The maximum size to search
6211 +static __inline__ unsigned long find_next_bit(unsigned long *addr, unsigned long size, unsigned long offset)
6213 + unsigned long *p = addr + (offset >> 6);
6214 + unsigned long result = offset & ~63UL;
6215 + unsigned long tmp;
6217 + if (offset >= size)
6223 + tmp &= (~0UL << offset);
6227 + goto found_middle;
6231 + while (size & ~63UL) {
6232 + if ((tmp = *(p++)))
6233 + goto found_middle;
6242 + tmp &= (~0UL >> (64 - size));
6243 + if (tmp == 0UL) /* Are any bits set? */
6244 + return result + size; /* Nope. */
6246 + return result + __ffs(tmp);
6250 + * find_first_bit - find the first set bit in a memory region
6251 + * @addr: The address to start the search at
6252 + * @size: The maximum size to search
6254 + * Returns the bit-number of the first set bit, not the number of the byte
6255 + * containing a bit.
6257 +#define find_first_bit(addr, size) \
6258 + find_next_bit((addr), (size), 0)
6260 /* find_next_zero_bit() finds the first zero bit in a bit string of length
6261 * 'size' bits, starting the search at bit 'offset'. This is largely based
6262 * on Linus's ALPHA routines, which are pretty portable BTW.
6265 -extern __inline__ unsigned long find_next_zero_bit(void *addr, unsigned long size, unsigned long offset)
6266 +static __inline__ unsigned long find_next_zero_bit(unsigned long *addr, unsigned long size, unsigned long offset)
6268 - unsigned long *p = ((unsigned long *) addr) + (offset >> 6);
6269 + unsigned long *p = addr + (offset >> 6);
6270 unsigned long result = offset & ~63UL;
6273 @@ -211,15 +290,15 @@
6274 #define find_first_zero_bit(addr, size) \
6275 find_next_zero_bit((addr), (size), 0)
6277 -extern long ___test_and_set_le_bit(int nr, volatile void *addr);
6278 -extern long ___test_and_clear_le_bit(int nr, volatile void *addr);
6279 +extern long ___test_and_set_le_bit(int nr, volatile unsigned long *addr);
6280 +extern long ___test_and_clear_le_bit(int nr, volatile unsigned long *addr);
6282 #define test_and_set_le_bit(nr,addr) ({___test_and_set_le_bit(nr,addr)!=0;})
6283 #define test_and_clear_le_bit(nr,addr) ({___test_and_clear_le_bit(nr,addr)!=0;})
6284 #define set_le_bit(nr,addr) ((void)___test_and_set_le_bit(nr,addr))
6285 #define clear_le_bit(nr,addr) ((void)___test_and_clear_le_bit(nr,addr))
6287 -extern __inline__ int test_le_bit(int nr, __const__ void * addr)
6288 +static __inline__ int test_le_bit(int nr, __const__ unsigned long * addr)
6291 __const__ unsigned char *ADDR = (__const__ unsigned char *) addr;
6293 #define find_first_zero_le_bit(addr, size) \
6294 find_next_zero_le_bit((addr), (size), 0)
6296 -extern __inline__ unsigned long find_next_zero_le_bit(void *addr, unsigned long size, unsigned long offset)
6297 +static __inline__ unsigned long find_next_zero_le_bit(unsigned long *addr, unsigned long size, unsigned long offset)
6299 - unsigned long *p = ((unsigned long *) addr) + (offset >> 6);
6300 + unsigned long *p = addr + (offset >> 6);
6301 unsigned long result = offset & ~63UL;
6304 @@ -271,18 +350,22 @@
6308 -#define ext2_set_bit test_and_set_le_bit
6309 -#define ext2_clear_bit test_and_clear_le_bit
6310 -#define ext2_test_bit test_le_bit
6311 -#define ext2_find_first_zero_bit find_first_zero_le_bit
6312 -#define ext2_find_next_zero_bit find_next_zero_le_bit
6313 +#define ext2_set_bit(nr,addr) test_and_set_le_bit((nr),(unsigned long *)(addr))
6314 +#define ext2_clear_bit(nr,addr) test_and_clear_le_bit((nr),(unsigned long *)(addr))
6315 +#define ext2_test_bit(nr,addr) test_le_bit((nr),(unsigned long *)(addr))
6316 +#define ext2_find_first_zero_bit(addr, size) \
6317 + find_first_zero_le_bit((unsigned long *)(addr), (size))
6318 +#define ext2_find_next_zero_bit(addr, size, off) \
6319 + find_next_zero_le_bit((unsigned long *)(addr), (size), (off))
6321 /* Bitmap functions for the minix filesystem. */
6322 -#define minix_test_and_set_bit(nr,addr) test_and_set_bit(nr,addr)
6323 -#define minix_set_bit(nr,addr) set_bit(nr,addr)
6324 -#define minix_test_and_clear_bit(nr,addr) test_and_clear_bit(nr,addr)
6325 -#define minix_test_bit(nr,addr) test_bit(nr,addr)
6326 -#define minix_find_first_zero_bit(addr,size) find_first_zero_bit(addr,size)
6327 +#define minix_test_and_set_bit(nr,addr) test_and_set_bit((nr),(unsigned long *)(addr))
6328 +#define minix_set_bit(nr,addr) set_bit((nr),(unsigned long *)(addr))
6329 +#define minix_test_and_clear_bit(nr,addr) \
6330 + test_and_clear_bit((nr),(unsigned long *)(addr))
6331 +#define minix_test_bit(nr,addr) test_bit((nr),(unsigned long *)(addr))
6332 +#define minix_find_first_zero_bit(addr,size) \
6333 + find_first_zero_bit((unsigned long *)(addr),(size))
6335 #endif /* __KERNEL__ */
6337 diff -urN linux-2.4.20/include/asm-sparc64/smp.h linux-2.4.20-o1/include/asm-sparc64/smp.h
6338 --- linux-2.4.20/include/asm-sparc64/smp.h Fri Nov 29 00:53:15 2002
6339 +++ linux-2.4.20-o1/include/asm-sparc64/smp.h Wed Mar 12 00:41:43 2003
6344 -#define smp_processor_id() (current->processor)
6345 +#define smp_processor_id() (current->cpu)
6347 /* This needn't do anything as we do not sleep the cpu
6348 * inside of the idler task, so an interrupt is not needed
6349 diff -urN linux-2.4.20/include/asm-sparc64/system.h linux-2.4.20-o1/include/asm-sparc64/system.h
6350 --- linux-2.4.20/include/asm-sparc64/system.h Sat Aug 3 02:39:45 2002
6351 +++ linux-2.4.20-o1/include/asm-sparc64/system.h Wed Mar 12 00:41:43 2003
6352 @@ -143,7 +143,18 @@
6354 #define flush_user_windows flushw_user
6355 #define flush_register_windows flushw_all
6356 -#define prepare_to_switch flushw_all
6358 +#define prepare_arch_schedule(prev) task_lock(prev)
6359 +#define finish_arch_schedule(prev) task_unlock(prev)
6360 +#define prepare_arch_switch(rq, next) \
6361 +do { spin_lock(&(next)->switch_lock); \
6362 + spin_unlock(&(rq)->lock); \
6366 +#define finish_arch_switch(rq, prev) \
6367 +do { spin_unlock_irq(&(prev)->switch_lock); \
6370 #ifndef CONFIG_DEBUG_SPINLOCK
6371 #define CHECK_LOCKS(PREV) do { } while(0)
6372 diff -urN linux-2.4.20/include/linux/kernel_stat.h linux-2.4.20-o1/include/linux/kernel_stat.h
6373 --- linux-2.4.20/include/linux/kernel_stat.h Fri Nov 29 00:53:15 2002
6374 +++ linux-2.4.20-o1/include/linux/kernel_stat.h Wed Mar 12 00:41:43 2003
6376 #elif !defined(CONFIG_ARCH_S390)
6377 unsigned int irqs[NR_CPUS][NR_IRQS];
6379 - unsigned int context_swtch;
6382 extern struct kernel_stat kstat;
6383 diff -urN linux-2.4.20/include/linux/sched.h linux-2.4.20-o1/include/linux/sched.h
6384 --- linux-2.4.20/include/linux/sched.h Fri Nov 29 00:53:15 2002
6385 +++ linux-2.4.20-o1/include/linux/sched.h Wed Mar 12 00:41:43 2003
6387 extern unsigned long event;
6389 #include <linux/config.h>
6390 +#include <linux/compiler.h>
6391 #include <linux/binfmts.h>
6392 #include <linux/threads.h>
6393 #include <linux/kernel.h>
6395 #include <asm/mmu.h>
6397 #include <linux/smp.h>
6398 -#include <linux/tty.h>
6399 +//#include <linux/tty.h>
6400 #include <linux/sem.h>
6401 #include <linux/signal.h>
6402 #include <linux/securebits.h>
6404 #define CT_TO_SECS(x) ((x) / HZ)
6405 #define CT_TO_USECS(x) (((x) % HZ) * 1000000/HZ)
6407 -extern int nr_running, nr_threads;
6408 +extern int nr_threads;
6409 extern int last_pid;
6410 +extern unsigned long nr_running(void);
6411 +extern unsigned long nr_uninterruptible(void);
6413 -#include <linux/fs.h>
6414 +//#include <linux/fs.h>
6415 #include <linux/time.h>
6416 #include <linux/param.h>
6417 #include <linux/resource.h>
6418 @@ -119,12 +122,6 @@
6419 #define SCHED_FIFO 1
6423 - * This is an additional bit set when we want to
6424 - * yield the CPU for one re-schedule..
6426 -#define SCHED_YIELD 0x10
6428 struct sched_param {
6431 @@ -142,17 +139,21 @@
6434 extern rwlock_t tasklist_lock;
6435 -extern spinlock_t runqueue_lock;
6436 extern spinlock_t mmlist_lock;
6438 +typedef struct task_struct task_t;
6440 extern void sched_init(void);
6441 -extern void init_idle(void);
6442 +extern void init_idle(task_t *idle, int cpu);
6443 extern void show_state(void);
6444 extern void cpu_init (void);
6445 extern void trap_init(void);
6446 extern void update_process_times(int user);
6447 -extern void update_one_process(struct task_struct *p, unsigned long user,
6448 +extern void update_one_process(task_t *p, unsigned long user,
6449 unsigned long system, int cpu);
6450 +extern void scheduler_tick(int user_tick, int system);
6451 +extern void migration_init(void);
6452 +extern unsigned long cache_decay_ticks;
6454 #define MAX_SCHEDULE_TIMEOUT LONG_MAX
6455 extern signed long FASTCALL(schedule_timeout(signed long timeout));
6456 @@ -162,6 +163,28 @@
6457 extern void flush_scheduled_tasks(void);
6458 extern int start_context_thread(void);
6459 extern int current_is_keventd(void);
6460 +extern void FASTCALL(sched_exit(task_t * p));
6461 +extern int FASTCALL(idle_cpu(int cpu));
6464 + * Priority of a process goes from 0..MAX_PRIO-1, valid RT
6465 + * priority is 0..MAX_RT_PRIO-1, and SCHED_OTHER tasks are
6466 + * in the range MAX_RT_PRIO..MAX_PRIO-1. Priority values
6467 + * are inverted: lower p->prio value means higher priority.
6469 + * The MAX_RT_USER_PRIO value allows the actual maximum
6470 + * RT priority to be separate from the value exported to
6471 + * user-space. This allows kernel threads to set their
6472 + * priority to a value higher than any user task. Note:
6473 + * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
6475 + * Both values are configurable at compile-time.
6478 +#define MAX_USER_RT_PRIO 100
6479 +#define MAX_RT_PRIO MAX_USER_RT_PRIO
6481 +#define MAX_PRIO (MAX_RT_PRIO + 40)
6484 * The default fd array needs to be at least BITS_PER_LONG,
6486 extern struct user_struct root_user;
6487 #define INIT_USER (&root_user)
6489 +typedef struct prio_array prio_array_t;
6491 struct task_struct {
6493 * offsets of these are hardcoded elsewhere - touch with care
6494 @@ -301,35 +326,26 @@
6496 int lock_depth; /* Lock depth */
6499 - * offset 32 begins here on 32-bit platforms. We keep
6500 - * all fields in a single cacheline that are needed for
6501 - * the goodness() loop in schedule().
6505 - unsigned long policy;
6506 - struct mm_struct *mm;
6509 - * cpus_runnable is ~0 if the process is not running on any
6510 - * CPU. It's (1 << cpu) if it's running on a CPU. This mask
6511 - * is updated under the runqueue lock.
6513 - * To determine whether a process might run on a CPU, this
6514 - * mask is AND-ed with cpus_allowed.
6515 + * offset 32 begins here on 32-bit platforms.
6517 - unsigned long cpus_runnable, cpus_allowed;
6519 - * (only the 'next' pointer fits into the cacheline, but
6520 - * that's just fine.)
6522 - struct list_head run_list;
6523 - unsigned long sleep_time;
6525 + int prio, static_prio;
6527 + prio_array_t *array;
6529 - struct task_struct *next_task, *prev_task;
6530 - struct mm_struct *active_mm;
6531 + unsigned long sleep_avg;
6532 + unsigned long sleep_timestamp;
6534 + unsigned long policy;
6535 + unsigned long cpus_allowed;
6536 + unsigned int time_slice, first_time_slice;
6538 + task_t *next_task, *prev_task;
6540 + struct mm_struct *mm, *active_mm;
6541 struct list_head local_pages;
6543 unsigned int allocation_order, nr_local_pages;
6546 @@ -351,12 +367,12 @@
6547 * older sibling, respectively. (p->father can be replaced with
6550 - struct task_struct *p_opptr, *p_pptr, *p_cptr, *p_ysptr, *p_osptr;
6551 + task_t *p_opptr, *p_pptr, *p_cptr, *p_ysptr, *p_osptr;
6552 struct list_head thread_group;
6554 /* PID hash table linkage. */
6555 - struct task_struct *pidhash_next;
6556 - struct task_struct **pidhash_pprev;
6557 + task_t *pidhash_next;
6558 + task_t **pidhash_pprev;
6560 wait_queue_head_t wait_chldexit; /* for wait4() */
6561 struct completion *vfork_done; /* for vfork() */
6564 /* Protection of (de-)allocation: mm, files, fs, tty */
6565 spinlock_t alloc_lock;
6566 +/* context-switch lock */
6567 + spinlock_t switch_lock;
6569 /* journalling filesystem info */
6571 @@ -454,9 +472,15 @@
6573 #define _STK_LIM (8*1024*1024)
6575 -#define DEF_COUNTER (10*HZ/100) /* 100 ms time slice */
6576 -#define MAX_COUNTER (20*HZ/100)
6577 -#define DEF_NICE (0)
6579 +extern void set_cpus_allowed(task_t *p, unsigned long new_mask);
6581 +#define set_cpus_allowed(p, new_mask) do { } while (0)
6584 +extern void set_user_nice(task_t *p, long nice);
6585 +extern int task_prio(task_t *p);
6586 +extern int task_nice(task_t *p);
6588 extern void yield(void);
6590 @@ -477,14 +501,14 @@
6591 addr_limit: KERNEL_DS, \
6592 exec_domain: &default_exec_domain, \
6594 - counter: DEF_COUNTER, \
6596 + prio: MAX_PRIO-20, \
6597 + static_prio: MAX_PRIO-20, \
6598 policy: SCHED_OTHER, \
6599 + cpus_allowed: ~0UL, \
6601 active_mm: &init_mm, \
6602 - cpus_runnable: ~0UL, \
6603 - cpus_allowed: ~0UL, \
6604 run_list: LIST_HEAD_INIT(tsk.run_list), \
6610 pending: { NULL, &tsk.pending.head, {{0}}}, \
6612 alloc_lock: SPIN_LOCK_UNLOCKED, \
6613 + switch_lock: SPIN_LOCK_UNLOCKED, \
6614 journal_info: NULL, \
6617 @@ -518,24 +543,23 @@
6621 - struct task_struct task;
6623 unsigned long stack[INIT_TASK_SIZE/sizeof(long)];
6626 extern union task_union init_task_union;
6628 extern struct mm_struct init_mm;
6629 -extern struct task_struct *init_tasks[NR_CPUS];
6631 /* PID hashing. (shouldnt this be dynamic?) */
6632 #define PIDHASH_SZ (4096 >> 2)
6633 -extern struct task_struct *pidhash[PIDHASH_SZ];
6634 +extern task_t *pidhash[PIDHASH_SZ];
6636 #define pid_hashfn(x) ((((x) >> 8) ^ (x)) & (PIDHASH_SZ - 1))
6638 -static inline void hash_pid(struct task_struct *p)
6639 +static inline void hash_pid(task_t *p)
6641 - struct task_struct **htable = &pidhash[pid_hashfn(p->pid)];
6642 + task_t **htable = &pidhash[pid_hashfn(p->pid)];
6644 if((p->pidhash_next = *htable) != NULL)
6645 (*htable)->pidhash_pprev = &p->pidhash_next;
6646 @@ -543,16 +567,16 @@
6647 p->pidhash_pprev = htable;
6650 -static inline void unhash_pid(struct task_struct *p)
6651 +static inline void unhash_pid(task_t *p)
6654 p->pidhash_next->pidhash_pprev = p->pidhash_pprev;
6655 *p->pidhash_pprev = p->pidhash_next;
6658 -static inline struct task_struct *find_task_by_pid(int pid)
6659 +static inline task_t *find_task_by_pid(int pid)
6661 - struct task_struct *p, **htable = &pidhash[pid_hashfn(pid)];
6662 + task_t *p, **htable = &pidhash[pid_hashfn(pid)];
6664 for(p = *htable; p && p->pid != pid; p = p->pidhash_next)
6666 @@ -560,19 +584,6 @@
6670 -#define task_has_cpu(tsk) ((tsk)->cpus_runnable != ~0UL)
6672 -static inline void task_set_cpu(struct task_struct *tsk, unsigned int cpu)
6674 - tsk->processor = cpu;
6675 - tsk->cpus_runnable = 1UL << cpu;
6678 -static inline void task_release_cpu(struct task_struct *tsk)
6680 - tsk->cpus_runnable = ~0UL;
6683 /* per-UID process charging. */
6684 extern struct user_struct * alloc_uid(uid_t);
6685 extern void free_uid(struct user_struct *);
6686 @@ -599,47 +610,50 @@
6687 extern void FASTCALL(interruptible_sleep_on(wait_queue_head_t *q));
6688 extern long FASTCALL(interruptible_sleep_on_timeout(wait_queue_head_t *q,
6689 signed long timeout));
6690 -extern int FASTCALL(wake_up_process(struct task_struct * tsk));
6691 +extern int FASTCALL(wake_up_process(task_t * p));
6692 +extern void FASTCALL(wake_up_forked_process(task_t * p));
6694 #define wake_up(x) __wake_up((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1)
6695 #define wake_up_nr(x, nr) __wake_up((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, nr)
6696 #define wake_up_all(x) __wake_up((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 0)
6697 -#define wake_up_sync(x) __wake_up_sync((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1)
6698 -#define wake_up_sync_nr(x, nr) __wake_up_sync((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, nr)
6699 #define wake_up_interruptible(x) __wake_up((x),TASK_INTERRUPTIBLE, 1)
6700 #define wake_up_interruptible_nr(x, nr) __wake_up((x),TASK_INTERRUPTIBLE, nr)
6701 #define wake_up_interruptible_all(x) __wake_up((x),TASK_INTERRUPTIBLE, 0)
6702 -#define wake_up_interruptible_sync(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, 1)
6703 -#define wake_up_interruptible_sync_nr(x, nr) __wake_up_sync((x),TASK_INTERRUPTIBLE, nr)
6705 +#define wake_up_interruptible_sync(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, 1)
6707 +#define wake_up_interruptible_sync(x) __wake_up((x),TASK_INTERRUPTIBLE, 1)
6710 asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struct rusage * ru);
6712 extern int in_group_p(gid_t);
6713 extern int in_egroup_p(gid_t);
6715 extern void proc_caches_init(void);
6716 -extern void flush_signals(struct task_struct *);
6717 -extern void flush_signal_handlers(struct task_struct *);
6718 +extern void flush_signals(task_t *);
6719 +extern void flush_signal_handlers(task_t *);
6720 extern void sig_exit(int, int, struct siginfo *);
6721 extern int dequeue_signal(sigset_t *, siginfo_t *);
6722 extern void block_all_signals(int (*notifier)(void *priv), void *priv,
6724 extern void unblock_all_signals(void);
6725 -extern int send_sig_info(int, struct siginfo *, struct task_struct *);
6726 -extern int force_sig_info(int, struct siginfo *, struct task_struct *);
6727 +extern int send_sig_info(int, struct siginfo *, task_t *);
6728 +extern int force_sig_info(int, struct siginfo *, task_t *);
6729 extern int kill_pg_info(int, struct siginfo *, pid_t);
6730 extern int kill_sl_info(int, struct siginfo *, pid_t);
6731 extern int kill_proc_info(int, struct siginfo *, pid_t);
6732 -extern void notify_parent(struct task_struct *, int);
6733 -extern void do_notify_parent(struct task_struct *, int);
6734 -extern void force_sig(int, struct task_struct *);
6735 -extern int send_sig(int, struct task_struct *, int);
6736 +extern void notify_parent(task_t *, int);
6737 +extern void do_notify_parent(task_t *, int);
6738 +extern void force_sig(int, task_t *);
6739 +extern int send_sig(int, task_t *, int);
6740 extern int kill_pg(pid_t, int, int);
6741 extern int kill_sl(pid_t, int, int);
6742 extern int kill_proc(pid_t, int, int);
6743 extern int do_sigaction(int, const struct k_sigaction *, struct k_sigaction *);
6744 extern int do_sigaltstack(const stack_t *, stack_t *, unsigned long);
6746 -static inline int signal_pending(struct task_struct *p)
6747 +static inline int signal_pending(task_t *p)
6749 return (p->sigpending != 0);
6752 This is required every time the blocked sigset_t changes.
6753 All callers should have t->sigmask_lock. */
6755 -static inline void recalc_sigpending(struct task_struct *t)
6756 +static inline void recalc_sigpending(task_t *t)
6758 t->sigpending = has_pending_signals(&t->pending.signal, &t->blocked);
6760 @@ -785,16 +799,17 @@
6761 extern int expand_fdset(struct files_struct *, int nr);
6762 extern void free_fdset(fd_set *, int);
6764 -extern int copy_thread(int, unsigned long, unsigned long, unsigned long, struct task_struct *, struct pt_regs *);
6765 +extern int copy_thread(int, unsigned long, unsigned long, unsigned long, task_t *, struct pt_regs *);
6766 extern void flush_thread(void);
6767 extern void exit_thread(void);
6769 -extern void exit_mm(struct task_struct *);
6770 -extern void exit_files(struct task_struct *);
6771 -extern void exit_sighand(struct task_struct *);
6772 +extern void exit_mm(task_t *);
6773 +extern void exit_files(task_t *);
6774 +extern void exit_sighand(task_t *);
6776 extern void reparent_to_init(void);
6777 extern void daemonize(void);
6778 +extern task_t *child_reaper;
6780 extern int do_execve(char *, char **, char **, struct pt_regs *);
6781 extern int do_fork(unsigned long, unsigned long, struct pt_regs *, unsigned long);
6783 extern void FASTCALL(add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t * wait));
6784 extern void FASTCALL(remove_wait_queue(wait_queue_head_t *q, wait_queue_t * wait));
6786 +extern void wait_task_inactive(task_t * p);
6787 +extern void kick_if_running(task_t * p);
6789 #define __wait_event(wq, condition) \
6791 wait_queue_t __wait; \
6792 @@ -884,27 +902,12 @@
6793 for (task = next_thread(current) ; task != current ; task = next_thread(task))
6795 #define next_thread(p) \
6796 - list_entry((p)->thread_group.next, struct task_struct, thread_group)
6797 + list_entry((p)->thread_group.next, task_t, thread_group)
6799 #define thread_group_leader(p) (p->pid == p->tgid)
6801 -static inline void del_from_runqueue(struct task_struct * p)
6802 +static inline void unhash_process(task_t *p)
6805 - p->sleep_time = jiffies;
6806 - list_del(&p->run_list);
6807 - p->run_list.next = NULL;
6810 -static inline int task_on_runqueue(struct task_struct *p)
6812 - return (p->run_list.next != NULL);
6815 -static inline void unhash_process(struct task_struct *p)
6817 - if (task_on_runqueue(p))
6818 - out_of_line_bug();
6819 write_lock_irq(&tasklist_lock);
6822 @@ -914,12 +917,12 @@
6825 /* Protects ->fs, ->files, ->mm, and synchronises with wait4(). Nests inside tasklist_lock */
6826 -static inline void task_lock(struct task_struct *p)
6827 +static inline void task_lock(task_t *p)
6829 spin_lock(&p->alloc_lock);
6832 -static inline void task_unlock(struct task_struct *p)
6833 +static inline void task_unlock(task_t *p)
6835 spin_unlock(&p->alloc_lock);
6837 @@ -943,6 +946,26 @@
6841 +static inline void set_need_resched(void)
6843 + current->need_resched = 1;
6846 +static inline void clear_need_resched(void)
6848 + current->need_resched = 0;
6851 +static inline void set_tsk_need_resched(task_t *tsk)
6853 + tsk->need_resched = 1;
6856 +static inline void clear_tsk_need_resched(task_t *tsk)
6858 + tsk->need_resched = 0;
6861 static inline int need_resched(void)
6863 return (unlikely(current->need_resched));
6867 #endif /* __KERNEL__ */
6870 diff -urN linux-2.4.20/include/linux/smp.h linux-2.4.20-o1/include/linux/smp.h
6871 --- linux-2.4.20/include/linux/smp.h Thu Nov 22 20:46:19 2001
6872 +++ linux-2.4.20-o1/include/linux/smp.h Wed Mar 12 00:41:43 2003
6874 #define cpu_number_map(cpu) 0
6875 #define smp_call_function(func,info,retry,wait) ({ 0; })
6876 #define cpu_online_map 1
6877 +static inline void smp_send_reschedule(int cpu) { }
6878 +static inline void smp_send_reschedule_all(void) { }
6883 + * Common definitions:
6885 +#define cpu() smp_processor_id()
6888 diff -urN linux-2.4.20/include/linux/smp_balance.h linux-2.4.20-o1/include/linux/smp_balance.h
6889 --- linux-2.4.20/include/linux/smp_balance.h Thu Jan 1 01:00:00 1970
6890 +++ linux-2.4.20-o1/include/linux/smp_balance.h Wed Mar 12 00:41:43 2003
6892 +#ifndef _LINUX_SMP_BALANCE_H
6893 +#define _LINUX_SMP_BALANCE_H
6896 + * per-architecture load balancing logic, e.g. for hyperthreading
6899 +#ifdef ARCH_HAS_SMP_BALANCE
6900 +#include <asm/smp_balance.h>
6902 +#define arch_load_balance(x, y) (0)
6903 +#define arch_reschedule_idle_override(x, idle) (idle)
6906 +#endif /* _LINUX_SMP_BALANCE_H */
6907 diff -urN linux-2.4.20/include/linux/wait.h linux-2.4.20-o1/include/linux/wait.h
6908 --- linux-2.4.20/include/linux/wait.h Thu Nov 22 20:46:19 2001
6909 +++ linux-2.4.20-o1/include/linux/wait.h Wed Mar 12 00:41:43 2003
6911 # define wq_write_lock_irq write_lock_irq
6912 # define wq_write_lock_irqsave write_lock_irqsave
6913 # define wq_write_unlock_irqrestore write_unlock_irqrestore
6914 +# define wq_write_unlock_irq write_unlock_irq
6915 # define wq_write_unlock write_unlock
6917 # define wq_lock_t spinlock_t
6919 # define wq_write_lock_irq spin_lock_irq
6920 # define wq_write_lock_irqsave spin_lock_irqsave
6921 # define wq_write_unlock_irqrestore spin_unlock_irqrestore
6922 +# define wq_write_unlock_irq spin_unlock_irq
6923 # define wq_write_unlock spin_unlock
6926 diff -urN linux-2.4.20/init/main.c linux-2.4.20-o1/init/main.c
6927 --- linux-2.4.20/init/main.c Sat Aug 3 02:39:46 2002
6928 +++ linux-2.4.20-o1/init/main.c Wed Mar 12 00:41:43 2003
6930 extern void setup_arch(char **);
6931 extern void cpu_idle(void);
6933 -unsigned long wait_init_idle;
6937 #ifdef CONFIG_X86_LOCAL_APIC
6938 @@ -298,34 +296,24 @@
6939 APIC_init_uniprocessor();
6942 -#define smp_init() do { } while (0)
6943 +#define smp_init() do { } while (0)
6949 /* Called by boot processor to activate the rest. */
6950 static void __init smp_init(void)
6952 /* Get other processors into their bootup holding patterns. */
6954 - wait_init_idle = cpu_online_map;
6955 - clear_bit(current->processor, &wait_init_idle); /* Don't wait on me! */
6957 smp_threads_ready=1;
6960 - /* Wait for the other cpus to set up their idle processes */
6961 - printk("Waiting on wait_init_idle (map = 0x%lx)\n", wait_init_idle);
6962 - while (wait_init_idle) {
6966 - printk("All processors have done init_idle\n");
6973 * We need to finalize in a non-__init function or else race conditions
6974 * between the root thread and the init thread may cause start_kernel to
6977 kernel_thread(init, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGNAL);
6979 - current->need_resched = 1;
6986 * Activate the first processor.
6987 @@ -424,14 +411,18 @@
6992 printk("POSIX conformance testing by UNIFIX\n");
6995 - * We count on the initial thread going ok
6996 - * Like idlers init is an unlocked kernel thread, which will
6997 - * make syscalls (and thus be locked).
6998 + init_idle(current, smp_processor_id());
7000 + * We count on the initial thread going ok
7001 + * Like idlers init is an unlocked kernel thread, which will
7002 + * make syscalls (and thus be locked).
7006 + /* Do the rest non-__init'ed, we're now alive */
7010 @@ -460,6 +451,10 @@
7012 static void __init do_basic_setup(void)
7014 + /* Start the per-CPU migration threads */
7020 * Tell the world that we're going to be the grim
7021 diff -urN linux-2.4.20/kernel/capability.c linux-2.4.20-o1/kernel/capability.c
7022 --- linux-2.4.20/kernel/capability.c Sat Jun 24 06:06:37 2000
7023 +++ linux-2.4.20-o1/kernel/capability.c Wed Mar 12 00:41:43 2003
7025 #include <linux/mm.h>
7026 #include <asm/uaccess.h>
7028 +unsigned securebits = SECUREBITS_DEFAULT; /* systemwide security settings */
7030 kernel_cap_t cap_bset = CAP_INIT_EFF_SET;
7032 /* Note: never hold tasklist_lock while spinning for this one */
7033 diff -urN linux-2.4.20/kernel/exit.c linux-2.4.20-o1/kernel/exit.c
7034 --- linux-2.4.20/kernel/exit.c Fri Nov 29 00:53:15 2002
7035 +++ linux-2.4.20-o1/kernel/exit.c Wed Mar 12 00:41:43 2003
7038 static void release_task(struct task_struct * p)
7040 - if (p != current) {
7045 - * Wait to make sure the process isn't on the
7046 - * runqueue (active on some other CPU still)
7050 - if (!task_has_cpu(p))
7056 - } while (task_has_cpu(p));
7059 + wait_task_inactive(p);
7061 - atomic_dec(&p->user->processes);
7062 - free_uid(p->user);
7063 - unhash_process(p);
7065 - release_thread(p);
7066 - current->cmin_flt += p->min_flt + p->cmin_flt;
7067 - current->cmaj_flt += p->maj_flt + p->cmaj_flt;
7068 - current->cnswap += p->nswap + p->cnswap;
7070 - * Potentially available timeslices are retrieved
7071 - * here - this way the parent does not get penalized
7072 - * for creating too many processes.
7074 - * (this cannot be used to artificially 'generate'
7075 - * timeslices, because any timeslice recovered here
7076 - * was given away by the parent in the first place.)
7078 - current->counter += p->counter;
7079 - if (current->counter >= MAX_COUNTER)
7080 - current->counter = MAX_COUNTER;
7082 - free_task_struct(p);
7084 - printk("task releasing itself\n");
7086 + atomic_dec(&p->user->processes);
7087 + free_uid(p->user);
7088 + unhash_process(p);
7090 + release_thread(p);
7091 + current->cmin_flt += p->min_flt + p->cmin_flt;
7092 + current->cmaj_flt += p->maj_flt + p->cmaj_flt;
7093 + current->cnswap += p->nswap + p->cnswap;
7096 + free_task_struct(p);
7100 @@ -150,6 +123,79 @@
7105 + * reparent_to_init() - Reparent the calling kernel thread to the init task.
7107 + * If a kernel thread is launched as a result of a system call, or if
7108 + * it ever exits, it should generally reparent itself to init so that
7109 + * it is correctly cleaned up on exit.
7111 + * The various task state such as scheduling policy and priority may have
7112 + * been inherited from a user process, so we reset them to sane values here.
7114 + * NOTE that reparent_to_init() gives the caller full capabilities.
7116 +void reparent_to_init(void)
7118 + write_lock_irq(&tasklist_lock);
7120 + /* Reparent to init */
7121 + REMOVE_LINKS(current);
7122 + current->p_pptr = child_reaper;
7123 + current->p_opptr = child_reaper;
7124 + SET_LINKS(current);
7126 + /* Set the exit signal to SIGCHLD so we signal init on exit */
7127 + current->exit_signal = SIGCHLD;
7129 + current->ptrace = 0;
7130 + if ((current->policy == SCHED_OTHER) && (task_nice(current) < 0))
7131 + set_user_nice(current, 0);
7132 + /* cpus_allowed? */
7133 + /* rt_priority? */
7135 + current->cap_effective = CAP_INIT_EFF_SET;
7136 + current->cap_inheritable = CAP_INIT_INH_SET;
7137 + current->cap_permitted = CAP_FULL_SET;
7138 + current->keep_capabilities = 0;
7139 + memcpy(current->rlim, init_task.rlim, sizeof(*(current->rlim)));
7140 + current->user = INIT_USER;
7142 + write_unlock_irq(&tasklist_lock);
7146 + * Put all the gunge required to become a kernel thread without
7147 + * attached user resources in one place where it belongs.
7150 +void daemonize(void)
7152 + struct fs_struct *fs;
7156 + * If we were started as result of loading a module, close all of the
7157 + * user space pages. We don't need them, and if we didn't close them
7158 + * they would be locked into memory.
7162 + current->session = 1;
7163 + current->pgrp = 1;
7164 + current->tty = NULL;
7166 + /* Become as one with the init task */
7168 + exit_fs(current); /* current->fs->count--; */
7169 + fs = init_task.fs;
7171 + atomic_inc(&fs->count);
7172 + exit_files(current);
7173 + current->files = init_task.files;
7174 + atomic_inc(¤t->files->count);
7178 * When we die, we re-parent all our children.
7179 * Try to give them to another thread in our thread
7181 /* Make sure we're not reparenting to ourselves */
7182 p->p_opptr = child_reaper;
7184 + p->first_time_slice = 0;
7185 if (p->pdeath_signal) send_sig(p->pdeath_signal, p, 0);
7188 diff -urN linux-2.4.20/kernel/fork.c linux-2.4.20-o1/kernel/fork.c
7189 --- linux-2.4.20/kernel/fork.c Fri Nov 29 00:53:15 2002
7190 +++ linux-2.4.20-o1/kernel/fork.c Wed Mar 12 00:41:43 2003
7193 /* The idle threads do not count.. */
7198 unsigned long total_forks; /* Handle normal Linux uptimes. */
7201 struct task_struct *pidhash[PIDHASH_SZ];
7203 +rwlock_t tasklist_lock __cacheline_aligned = RW_LOCK_UNLOCKED; /* outer */
7205 void add_wait_queue(wait_queue_head_t *q, wait_queue_t * wait)
7207 unsigned long flags;
7209 if (p->pid == 0 && current->pid != 0)
7210 goto bad_fork_cleanup;
7212 - p->run_list.next = NULL;
7213 - p->run_list.prev = NULL;
7216 init_waitqueue_head(&p->wait_chldexit);
7217 p->vfork_done = NULL;
7219 init_completion(&vfork);
7221 spin_lock_init(&p->alloc_lock);
7222 + spin_lock_init(&p->switch_lock);
7225 init_sigpending(&p->pending);
7226 @@ -665,11 +664,11 @@
7230 - p->cpus_runnable = ~0UL;
7231 - p->processor = current->processor;
7233 /* ?? should we just memset this ?? */
7234 for(i = 0; i < smp_num_cpus; i++)
7235 - p->per_cpu_utime[i] = p->per_cpu_stime[i] = 0;
7236 + p->per_cpu_utime[cpu_logical_map(i)] =
7237 + p->per_cpu_stime[cpu_logical_map(i)] = 0;
7238 spin_lock_init(&p->sigmask_lock);
7241 @@ -706,15 +705,27 @@
7242 p->pdeath_signal = 0;
7245 - * "share" dynamic priority between parent and child, thus the
7246 - * total amount of dynamic priorities in the system doesn't change,
7247 - * more scheduling fairness. This is only important in the first
7248 - * timeslice, on the long run the scheduling behaviour is unchanged.
7250 - p->counter = (current->counter + 1) >> 1;
7251 - current->counter >>= 1;
7252 - if (!current->counter)
7253 - current->need_resched = 1;
7254 + * Share the timeslice between parent and child, thus the
7255 + * total amount of pending timeslices in the system doesnt change,
7256 + * resulting in more scheduling fairness.
7259 + if (!current->time_slice)
7261 + p->time_slice = (current->time_slice + 1) >> 1;
7262 + current->time_slice >>= 1;
7263 + p->first_time_slice = 1;
7264 + if (!current->time_slice) {
7266 + * This case is rare, it happens when the parent has only
7267 + * a single jiffy left from its timeslice. Taking the
7268 + * runqueue lock is not a problem.
7270 + current->time_slice = 1;
7271 + scheduler_tick(0,0);
7273 + p->sleep_timestamp = jiffies;
7277 * Ok, add it to the run-queues and make it
7278 @@ -750,11 +761,16 @@
7280 if (p->ptrace & PT_PTRACED)
7281 send_sig(SIGSTOP, p, 1);
7283 - wake_up_process(p); /* do this last */
7284 + wake_up_forked_process(p); /* do this last */
7286 if (clone_flags & CLONE_VFORK)
7287 wait_for_completion(&vfork);
7290 + * Let the child process run first, to avoid most of the
7291 + * COW overhead when the child exec()s afterwards.
7293 + current->need_resched = 1;
7297 diff -urN linux-2.4.20/kernel/ksyms.c linux-2.4.20-o1/kernel/ksyms.c
7298 --- linux-2.4.20/kernel/ksyms.c Fri Nov 29 00:53:15 2002
7299 +++ linux-2.4.20-o1/kernel/ksyms.c Wed Mar 12 00:41:43 2003
7301 /* process management */
7302 EXPORT_SYMBOL(complete_and_exit);
7303 EXPORT_SYMBOL(__wake_up);
7304 -EXPORT_SYMBOL(__wake_up_sync);
7305 EXPORT_SYMBOL(wake_up_process);
7306 EXPORT_SYMBOL(sleep_on);
7307 EXPORT_SYMBOL(sleep_on_timeout);
7309 EXPORT_SYMBOL(schedule_timeout);
7310 EXPORT_SYMBOL(yield);
7311 EXPORT_SYMBOL(__cond_resched);
7312 +EXPORT_SYMBOL(set_user_nice);
7313 +EXPORT_SYMBOL(nr_context_switches);
7314 EXPORT_SYMBOL(jiffies);
7315 EXPORT_SYMBOL(xtime);
7316 EXPORT_SYMBOL(do_gettimeofday);
7320 EXPORT_SYMBOL(kstat);
7321 -EXPORT_SYMBOL(nr_running);
7324 EXPORT_SYMBOL(panic);
7325 diff -urN linux-2.4.20/kernel/printk.c linux-2.4.20-o1/kernel/printk.c
7326 --- linux-2.4.20/kernel/printk.c Sat Aug 3 02:39:46 2002
7327 +++ linux-2.4.20-o1/kernel/printk.c Wed Mar 12 00:41:43 2003
7329 #include <linux/module.h>
7330 #include <linux/interrupt.h> /* For in_interrupt() */
7331 #include <linux/config.h>
7332 +#include <linux/delay.h>
7334 #include <asm/uaccess.h>
7336 diff -urN linux-2.4.20/kernel/ptrace.c linux-2.4.20-o1/kernel/ptrace.c
7337 --- linux-2.4.20/kernel/ptrace.c Sat Aug 3 02:39:46 2002
7338 +++ linux-2.4.20-o1/kernel/ptrace.c Wed Mar 12 00:41:43 2003
7340 if (child->state != TASK_STOPPED)
7343 - /* Make sure the child gets off its CPU.. */
7346 - if (!task_has_cpu(child))
7348 - task_unlock(child);
7350 - if (child->state != TASK_STOPPED)
7354 - } while (task_has_cpu(child));
7356 - task_unlock(child);
7357 + wait_task_inactive(child);
7361 diff -urN linux-2.4.20/kernel/sched.c linux-2.4.20-o1/kernel/sched.c
7362 --- linux-2.4.20/kernel/sched.c Fri Nov 29 00:53:15 2002
7363 +++ linux-2.4.20-o1/kernel/sched.c Wed Mar 12 00:41:43 2003
7366 * Kernel scheduler and related syscalls
7368 - * Copyright (C) 1991, 1992 Linus Torvalds
7369 + * Copyright (C) 1991-2002 Linus Torvalds
7371 * 1996-12-23 Modified by Dave Grothe to fix bugs in semaphores and
7372 * make semaphores SMP safe
7373 * 1998-11-19 Implemented schedule_timeout() and related stuff
7374 * by Andrea Arcangeli
7375 - * 1998-12-28 Implemented better SMP scheduling by Ingo Molnar
7376 + * 2002-01-04 New ultra-scalable O(1) scheduler by Ingo Molnar:
7377 + * hybrid priority-list and round-robin design with
7378 + * an array-switch method of distributing timeslices
7379 + * and per-CPU runqueues. Additional code by Davide
7380 + * Libenzi, Robert Love, and Rusty Russell.
7384 - * 'sched.c' is the main kernel file. It contains scheduling primitives
7385 - * (sleep_on, wakeup, schedule etc) as well as a number of simple system
7386 - * call functions (type getpid()), which just extract a field from
7390 -#include <linux/config.h>
7391 #include <linux/mm.h>
7392 -#include <linux/init.h>
7393 -#include <linux/smp_lock.h>
7394 #include <linux/nmi.h>
7395 #include <linux/interrupt.h>
7396 -#include <linux/kernel_stat.h>
7397 -#include <linux/completion.h>
7398 -#include <linux/prefetch.h>
7399 -#include <linux/compiler.h>
7401 +#include <linux/init.h>
7402 #include <asm/uaccess.h>
7403 +#include <linux/smp_lock.h>
7404 #include <asm/mmu_context.h>
7406 -extern void timer_bh(void);
7407 -extern void tqueue_bh(void);
7408 -extern void immediate_bh(void);
7409 +#include <linux/kernel_stat.h>
7410 +#include <linux/completion.h>
7413 - * scheduler variables
7414 + * Convert user-nice values [ -20 ... 0 ... 19 ]
7415 + * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
7418 +#define NICE_TO_PRIO(nice) (MAX_RT_PRIO + (nice) + 20)
7419 +#define PRIO_TO_NICE(prio) ((prio) - MAX_RT_PRIO - 20)
7420 +#define TASK_NICE(p) PRIO_TO_NICE((p)->static_prio)
7422 -unsigned securebits = SECUREBITS_DEFAULT; /* systemwide security settings */
7424 -extern void mem_use(void);
7426 + * 'User priority' is the nice value converted to something we
7427 + * can work with better when scaling various scheduler parameters,
7428 + * it's a [ 0 ... 39 ] range.
7430 +#define USER_PRIO(p) ((p)-MAX_RT_PRIO)
7431 +#define TASK_USER_PRIO(p) USER_PRIO((p)->static_prio)
7432 +#define MAX_USER_PRIO (USER_PRIO(MAX_PRIO))
7435 - * Scheduling quanta.
7436 + * These are the 'tuning knobs' of the scheduler:
7438 - * NOTE! The unix "nice" value influences how long a process
7439 - * gets. The nice value ranges from -20 to +19, where a -20
7440 - * is a "high-priority" task, and a "+10" is a low-priority
7443 - * We want the time-slice to be around 50ms or so, so this
7444 - * calculation depends on the value of HZ.
7445 + * Minimum timeslice is 10 msecs, default timeslice is 150 msecs,
7446 + * maximum timeslice is 300 msecs. Timeslices get refilled after
7450 -#define TICK_SCALE(x) ((x) >> 2)
7452 -#define TICK_SCALE(x) ((x) >> 1)
7454 -#define TICK_SCALE(x) (x)
7456 -#define TICK_SCALE(x) ((x) << 1)
7458 -#define TICK_SCALE(x) ((x) << 2)
7461 -#define NICE_TO_TICKS(nice) (TICK_SCALE(20-(nice))+1)
7463 +#define MIN_TIMESLICE ( 10 * HZ / 1000)
7464 +#define MAX_TIMESLICE (300 * HZ / 1000)
7465 +#define CHILD_PENALTY 50
7466 +#define PARENT_PENALTY 100
7467 +#define PRIO_BONUS_RATIO 25
7468 +#define INTERACTIVE_DELTA 2
7469 +#define MAX_SLEEP_AVG (2*HZ)
7470 +#define STARVATION_LIMIT (2*HZ)
7473 - * Init task must be ok at boot for the ix86 as we will check its signals
7474 - * via the SMP irq return path.
7475 + * If a task is 'interactive' then we reinsert it in the active
7476 + * array after it has expired its current timeslice. (it will not
7477 + * continue to run immediately, it will still roundrobin with
7478 + * other interactive tasks.)
7480 + * This part scales the interactivity limit depending on niceness.
7482 + * We scale it linearly, offset by the INTERACTIVE_DELTA delta.
7483 + * Here are a few examples of different nice levels:
7485 + * TASK_INTERACTIVE(-20): [1,1,1,1,1,1,1,1,1,0,0]
7486 + * TASK_INTERACTIVE(-10): [1,1,1,1,1,1,1,0,0,0,0]
7487 + * TASK_INTERACTIVE( 0): [1,1,1,1,0,0,0,0,0,0,0]
7488 + * TASK_INTERACTIVE( 10): [1,1,0,0,0,0,0,0,0,0,0]
7489 + * TASK_INTERACTIVE( 19): [0,0,0,0,0,0,0,0,0,0,0]
7491 + * (the X axis represents the possible -5 ... 0 ... +5 dynamic
7492 + * priority range a task can explore, a value of '1' means the
7493 + * task is rated interactive.)
7495 + * Ie. nice +19 tasks can never get 'interactive' enough to be
7496 + * reinserted into the active array. And only heavily CPU-hog nice -20
7497 + * tasks will be expired. Default nice 0 tasks are somewhere between,
7498 + * it takes some effort for them to get interactive, but it's not
7502 -struct task_struct * init_tasks[NR_CPUS] = {&init_task, };
7504 +#define SCALE(v1,v1_max,v2_max) \
7505 + (v1) * (v2_max) / (v1_max)
7508 + (SCALE(TASK_NICE(p), 40, MAX_USER_PRIO*PRIO_BONUS_RATIO/100) + \
7509 + INTERACTIVE_DELTA)
7511 +#define TASK_INTERACTIVE(p) \
7512 + ((p)->prio <= (p)->static_prio - DELTA(p))
7515 - * The tasklist_lock protects the linked list of processes.
7517 - * The runqueue_lock locks the parts that actually access
7518 - * and change the run-queues, and have to be interrupt-safe.
7520 - * If both locks are to be concurrently held, the runqueue_lock
7521 - * nests inside the tasklist_lock.
7522 + * TASK_TIMESLICE scales user-nice values [ -20 ... 19 ]
7523 + * to time slice values.
7525 - * task->alloc_lock nests inside tasklist_lock.
7526 + * The higher a process's priority, the bigger timeslices
7527 + * it gets during one round of execution. But even the lowest
7528 + * priority process gets MIN_TIMESLICE worth of execution time.
7530 -spinlock_t runqueue_lock __cacheline_aligned = SPIN_LOCK_UNLOCKED; /* inner */
7531 -rwlock_t tasklist_lock __cacheline_aligned = RW_LOCK_UNLOCKED; /* outer */
7533 -static LIST_HEAD(runqueue_head);
7534 +#define TASK_TIMESLICE(p) (MIN_TIMESLICE + \
7535 + ((MAX_TIMESLICE - MIN_TIMESLICE) * (MAX_PRIO-1-(p)->static_prio)/39))
7538 - * We align per-CPU scheduling data on cacheline boundaries,
7539 - * to prevent cacheline ping-pong.
7540 + * These are the runqueue data structures:
7543 - struct schedule_data {
7544 - struct task_struct * curr;
7545 - cycles_t last_schedule;
7547 - char __pad [SMP_CACHE_BYTES];
7548 -} aligned_data [NR_CPUS] __cacheline_aligned = { {{&init_task,0}}};
7550 -#define cpu_curr(cpu) aligned_data[(cpu)].schedule_data.curr
7551 -#define last_schedule(cpu) aligned_data[(cpu)].schedule_data.last_schedule
7552 +#define BITMAP_SIZE ((((MAX_PRIO+1+7)/8)+sizeof(long)-1)/sizeof(long))
7554 -struct kernel_stat kstat;
7555 -extern struct task_struct *child_reaper;
7556 +typedef struct runqueue runqueue_t;
7559 +struct prio_array {
7561 + unsigned long bitmap[BITMAP_SIZE];
7562 + list_t queue[MAX_PRIO];
7565 -#define idle_task(cpu) (init_tasks[cpu_number_map(cpu)])
7566 -#define can_schedule(p,cpu) \
7567 - ((p)->cpus_runnable & (p)->cpus_allowed & (1UL << cpu))
7569 + * This is the main, per-CPU runqueue data structure.
7571 + * Locking rule: those places that want to lock multiple runqueues
7572 + * (such as the load balancing or the process migration code), lock
7573 + * acquire operations must be ordered by ascending &runqueue.
7577 + unsigned long nr_running, nr_switches, expired_timestamp;
7578 + task_t *curr, *idle;
7579 + prio_array_t *active, *expired, arrays[2];
7580 + long nr_uninterruptible;
7583 + int prev_nr_running[NR_CPUS];
7584 + task_t *migration_thread;
7585 + list_t migration_queue;
7587 +} ____cacheline_aligned;
7590 +static struct runqueue runqueues[NR_CPUS] __cacheline_aligned;
7592 -#define idle_task(cpu) (&init_task)
7593 -#define can_schedule(p,cpu) (1)
7594 +#define cpu_rq(cpu) (runqueues + (cpu))
7595 +#define this_rq() cpu_rq(smp_processor_id())
7596 +#define task_rq(p) cpu_rq((p)->cpu)
7597 +#define cpu_curr(cpu) (cpu_rq(cpu)->curr)
7598 +#define rt_task(p) ((p)->prio < MAX_RT_PRIO)
7601 + * Default context-switch locking:
7603 +#ifndef prepare_arch_switch
7604 +# define prepare_arch_switch(rq, next) do { } while(0)
7605 +# define finish_arch_switch(rq, prev) spin_unlock_irq(&(rq)->lock)
7608 -void scheduling_functions_start_here(void) { }
7611 - * This is the function that decides how desirable a process is..
7612 - * You can weigh different processes against each other depending
7613 - * on what CPU they've run on lately etc to try to handle cache
7614 - * and TLB miss penalties.
7617 - * -1000: never select this
7618 - * 0: out of time, recalculate counters (but it might still be
7620 - * +ve: "goodness" value (the larger, the better)
7621 - * +1000: realtime process, select this.
7622 + * task_rq_lock - lock the runqueue a given task resides on and disable
7623 + * interrupts. Note the ordering: we can safely lookup the task_rq without
7624 + * explicitly disabling preemption.
7627 -static inline int goodness(struct task_struct * p, int this_cpu, struct mm_struct *this_mm)
7628 +static inline runqueue_t *task_rq_lock(task_t *p, unsigned long *flags)
7633 - * select the current process after every other
7634 - * runnable process, but before the idle thread.
7635 - * Also, dont trigger a counter recalculation.
7638 - if (p->policy & SCHED_YIELD)
7642 - * Non-RT process - normal case first.
7644 - if (p->policy == SCHED_OTHER) {
7646 - * Give the process a first-approximation goodness value
7647 - * according to the number of clock-ticks it has left.
7649 - * Don't do any other calculations if the time slice is
7652 - weight = p->counter;
7657 - /* Give a largish advantage to the same processor... */
7658 - /* (this is equivalent to penalizing other processors) */
7659 - if (p->processor == this_cpu)
7660 - weight += PROC_CHANGE_PENALTY;
7662 + struct runqueue *rq;
7664 - /* .. and a slight advantage to the current MM */
7665 - if (p->mm == this_mm || !p->mm)
7667 - weight += 20 - p->nice;
7671 + spin_lock_irqsave(&rq->lock, *flags);
7672 + if (unlikely(rq != task_rq(p))) {
7673 + spin_unlock_irqrestore(&rq->lock, *flags);
7674 + goto repeat_lock_task;
7680 - * Realtime process, select the first one on the
7681 - * runqueue (taking priorities within processes
7684 - weight = 1000 + p->rt_priority;
7687 +static inline void task_rq_unlock(runqueue_t *rq, unsigned long *flags)
7689 + spin_unlock_irqrestore(&rq->lock, *flags);
7693 - * the 'goodness value' of replacing a process on a given CPU.
7694 - * positive value means 'replace', zero or negative means 'dont'.
7695 + * Adding/removing a task to/from a priority array:
7697 -static inline int preemption_goodness(struct task_struct * prev, struct task_struct * p, int cpu)
7698 +static inline void dequeue_task(struct task_struct *p, prio_array_t *array)
7700 - return goodness(p, cpu, prev->active_mm) - goodness(prev, cpu, prev->active_mm);
7701 + array->nr_active--;
7702 + list_del(&p->run_list);
7703 + if (list_empty(array->queue + p->prio))
7704 + __clear_bit(p->prio, array->bitmap);
7708 - * This is ugly, but reschedule_idle() is very timing-critical.
7709 - * We are called with the runqueue spinlock held and we must
7710 - * not claim the tasklist_lock.
7712 -static FASTCALL(void reschedule_idle(struct task_struct * p));
7713 +#define enqueue_task(p, array) __enqueue_task(p, array, NULL)
7714 +static inline void __enqueue_task(struct task_struct *p, prio_array_t *array, task_t * parent)
7717 + list_add_tail(&p->run_list, array->queue + p->prio);
7718 + __set_bit(p->prio, array->bitmap);
7721 + list_add_tail(&p->run_list, &parent->run_list);
7722 + array = p->array = parent->array;
7724 + array->nr_active++;
7727 -static void reschedule_idle(struct task_struct * p)
7728 +static inline int effective_prio(task_t *p)
7731 - int this_cpu = smp_processor_id();
7732 - struct task_struct *tsk, *target_tsk;
7733 - int cpu, best_cpu, i, max_prio;
7734 - cycles_t oldest_idle;
7738 - * shortcut if the woken up task's last CPU is
7740 + * Here we scale the actual sleep average [0 .... MAX_SLEEP_AVG]
7741 + * into the -5 ... 0 ... +5 bonus/penalty range.
7743 + * We use 25% of the full 0...39 priority range so that:
7745 + * 1) nice +19 interactive tasks do not preempt nice 0 CPU hogs.
7746 + * 2) nice -20 CPU hogs do not get preempted by nice 0 tasks.
7748 + * Both properties are important to certain workloads.
7750 - best_cpu = p->processor;
7751 - if (can_schedule(p, best_cpu)) {
7752 - tsk = idle_task(best_cpu);
7753 - if (cpu_curr(best_cpu) == tsk) {
7757 - * If need_resched == -1 then we can skip sending
7758 - * the IPI altogether, tsk->need_resched is
7759 - * actively watched by the idle thread.
7761 - need_resched = tsk->need_resched;
7762 - tsk->need_resched = 1;
7763 - if ((best_cpu != this_cpu) && !need_resched)
7764 - smp_send_reschedule(best_cpu);
7768 + bonus = MAX_USER_PRIO*PRIO_BONUS_RATIO*p->sleep_avg/MAX_SLEEP_AVG/100 -
7769 + MAX_USER_PRIO*PRIO_BONUS_RATIO/100/2;
7772 - * We know that the preferred CPU has a cache-affine current
7773 - * process, lets try to find a new idle CPU for the woken-up
7774 - * process. Select the least recently active idle CPU. (that
7775 - * one will have the least active cache context.) Also find
7776 - * the executing process which has the least priority.
7778 - oldest_idle = (cycles_t) -1;
7779 - target_tsk = NULL;
7781 + prio = p->static_prio - bonus;
7782 + if (prio < MAX_RT_PRIO)
7783 + prio = MAX_RT_PRIO;
7784 + if (prio > MAX_PRIO-1)
7785 + prio = MAX_PRIO-1;
7789 - for (i = 0; i < smp_num_cpus; i++) {
7790 - cpu = cpu_logical_map(i);
7791 - if (!can_schedule(p, cpu))
7793 - tsk = cpu_curr(cpu);
7794 +#define activate_task(p, rq) __activate_task(p, rq, NULL)
7795 +static inline void __activate_task(task_t *p, runqueue_t *rq, task_t * parent)
7797 + unsigned long sleep_time = jiffies - p->sleep_timestamp;
7798 + prio_array_t *array = rq->active;
7800 + if (!parent && !rt_task(p) && sleep_time) {
7802 - * We use the first available idle CPU. This creates
7803 - * a priority list between idle CPUs, but this is not
7805 + * This code gives a bonus to interactive tasks. We update
7806 + * an 'average sleep time' value here, based on
7807 + * sleep_timestamp. The more time a task spends sleeping,
7808 + * the higher the average gets - and the higher the priority
7809 + * boost gets as well.
7811 - if (tsk == idle_task(cpu)) {
7812 -#if defined(__i386__) && defined(CONFIG_SMP)
7814 - * Check if two siblings are idle in the same
7815 - * physical package. Use them if found.
7817 - if (smp_num_siblings == 2) {
7818 - if (cpu_curr(cpu_sibling_map[cpu]) ==
7819 - idle_task(cpu_sibling_map[cpu])) {
7820 - oldest_idle = last_schedule(cpu);
7827 - if (last_schedule(cpu) < oldest_idle) {
7828 - oldest_idle = last_schedule(cpu);
7832 - if (oldest_idle == (cycles_t)-1) {
7833 - int prio = preemption_goodness(tsk, p, cpu);
7835 - if (prio > max_prio) {
7844 - if (oldest_idle != (cycles_t)-1) {
7845 - best_cpu = tsk->processor;
7846 - goto send_now_idle;
7848 - tsk->need_resched = 1;
7849 - if (tsk->processor != this_cpu)
7850 - smp_send_reschedule(tsk->processor);
7851 + p->sleep_timestamp = jiffies;
7852 + p->sleep_avg += sleep_time;
7853 + if (p->sleep_avg > MAX_SLEEP_AVG)
7854 + p->sleep_avg = MAX_SLEEP_AVG;
7855 + p->prio = effective_prio(p);
7859 + __enqueue_task(p, array, parent);
7863 +static inline void deactivate_task(struct task_struct *p, runqueue_t *rq)
7866 + if (p->state == TASK_UNINTERRUPTIBLE)
7867 + rq->nr_uninterruptible++;
7868 + dequeue_task(p, p->array);
7873 - int this_cpu = smp_processor_id();
7874 - struct task_struct *tsk;
7875 +static inline void resched_task(task_t *p)
7880 - tsk = cpu_curr(this_cpu);
7881 - if (preemption_goodness(tsk, p, this_cpu) > 0)
7882 - tsk->need_resched = 1;
7883 + need_resched = p->need_resched;
7884 + set_tsk_need_resched(p);
7885 + if (!need_resched && (p->cpu != smp_processor_id()))
7886 + smp_send_reschedule(p->cpu);
7888 + set_tsk_need_resched(p);
7897 - * This has to add the process to the _end_ of the
7898 - * run-queue, not the beginning. The goodness value will
7899 - * determine whether this process will run next. This is
7900 - * important to get SCHED_FIFO and SCHED_RR right, where
7901 - * a process that is either pre-empted or its time slice
7902 - * has expired, should be moved to the tail of the run
7903 - * queue for its priority - Bhavesh Davda
7904 + * Wait for a process to unschedule. This is used by the exit() and
7907 -static inline void add_to_runqueue(struct task_struct * p)
7908 +void wait_task_inactive(task_t * p)
7910 - list_add_tail(&p->run_list, &runqueue_head);
7912 + unsigned long flags;
7917 + if (unlikely(rq->curr == p)) {
7922 + rq = task_rq_lock(p, &flags);
7923 + if (unlikely(rq->curr == p)) {
7924 + task_rq_unlock(rq, &flags);
7927 + task_rq_unlock(rq, &flags);
7930 -static inline void move_last_runqueue(struct task_struct * p)
7932 + * Kick the remote CPU if the task is running currently,
7933 + * this code is used by the signal code to signal tasks
7934 + * which are in user-mode as quickly as possible.
7936 + * (Note that we do this lockless - if the task does anything
7937 + * while the message is in flight then it will notice the
7938 + * sigpending condition anyway.)
7940 +void kick_if_running(task_t * p)
7942 - list_del(&p->run_list);
7943 - list_add_tail(&p->run_list, &runqueue_head);
7944 + if (p == task_rq(p)->curr && p->cpu != smp_processor_id())
7950 +static int FASTCALL(reschedule_idle(task_t * p));
7951 +static void FASTCALL(load_balance(runqueue_t *this_rq, int idle));
7956 * Wake up a process. Put it on the run-queue if it's not
7957 @@ -345,429 +338,721 @@
7958 * progress), and as such you're allowed to do the simpler
7959 * "current->state = TASK_RUNNING" to mark yourself runnable
7960 * without the overhead of this.
7962 + * returns failure only if the task is already active.
7964 -static inline int try_to_wake_up(struct task_struct * p, int synchronous)
7965 +static int try_to_wake_up(task_t * p, int sync)
7967 unsigned long flags;
7972 + int migrated_to_idle = 0;
7978 + rq = task_rq_lock(p, &flags);
7979 + old_state = p->state;
7982 + if (likely(rq->curr != p)) {
7984 + if (unlikely(sync)) {
7985 + if (p->cpu != smp_processor_id() &&
7986 + p->cpus_allowed & (1UL << smp_processor_id())) {
7987 + p->cpu = smp_processor_id();
7988 + goto migrated_task;
7991 + if (reschedule_idle(p))
7992 + goto migrated_task;
7996 + if (old_state == TASK_UNINTERRUPTIBLE)
7997 + rq->nr_uninterruptible--;
7998 + activate_task(p, rq);
7999 + if (p->prio < rq->curr->prio)
8000 + resched_task(rq->curr);
8003 + p->state = TASK_RUNNING;
8007 - * We want the common case fall through straight, thus the goto.
8008 + * Subtle: we can load_balance only here (before unlock)
8009 + * because it can internally drop the lock. Claim
8010 + * that the cpu is running so it will be a light rebalance,
8011 + * if this cpu will go idle soon schedule() will trigger the
8012 + * idle rescheduling balancing by itself.
8014 - spin_lock_irqsave(&runqueue_lock, flags);
8015 - p->state = TASK_RUNNING;
8016 - if (task_on_runqueue(p))
8018 - add_to_runqueue(p);
8019 - if (!synchronous || !(p->cpus_allowed & (1UL << smp_processor_id())))
8020 - reschedule_idle(p);
8023 - spin_unlock_irqrestore(&runqueue_lock, flags);
8024 + if (success && migrated_to_idle)
8025 + load_balance(rq, 0);
8028 + task_rq_unlock(rq, &flags);
8034 + task_rq_unlock(rq, &flags);
8035 + migrated_to_idle = 1;
8036 + goto repeat_lock_task;
8040 -inline int wake_up_process(struct task_struct * p)
8041 +int wake_up_process(task_t * p)
8043 return try_to_wake_up(p, 0);
8046 -static void process_timeout(unsigned long __data)
8047 +void wake_up_forked_process(task_t * p)
8049 - struct task_struct * p = (struct task_struct *) __data;
8051 + task_t * parent = current;
8053 - wake_up_process(p);
8056 + spin_lock_irq(&rq->lock);
8059 - * schedule_timeout - sleep until timeout
8060 - * @timeout: timeout value in jiffies
8062 - * Make the current task sleep until @timeout jiffies have
8063 - * elapsed. The routine will return immediately unless
8064 - * the current task state has been set (see set_current_state()).
8066 - * You can set the task state as follows -
8068 - * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to
8069 - * pass before the routine returns. The routine will return 0
8071 - * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
8072 - * delivered to the current task. In this case the remaining time
8073 - * in jiffies will be returned, or 0 if the timer expired in time
8075 - * The current task state is guaranteed to be TASK_RUNNING when this
8076 - * routine returns.
8078 - * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule
8079 - * the CPU away without a bound on the timeout. In this case the return
8080 - * value will be %MAX_SCHEDULE_TIMEOUT.
8082 - * In all cases the return value is guaranteed to be non-negative.
8084 -signed long schedule_timeout(signed long timeout)
8086 - struct timer_list timer;
8087 - unsigned long expire;
8088 + p->state = TASK_RUNNING;
8089 + if (likely(!rt_task(p) && parent->array)) {
8091 + * We decrease the sleep average of forked
8092 + * children, to keep max-interactive tasks
8093 + * from forking tasks that are max-interactive.
8094 + * CHILD_PENALTY is set to 50% since we have
8095 + * no clue if this is still an interactive
8096 + * task like the parent or if this will be a
8097 + * cpu bound task. The parent isn't touched
8098 + * as we don't make assumption about the parent
8099 + * changing behaviour after the child is forked.
8101 + parent->sleep_avg = parent->sleep_avg * PARENT_PENALTY / 100;
8102 + p->sleep_avg = p->sleep_avg * CHILD_PENALTY / 100;
8106 - case MAX_SCHEDULE_TIMEOUT:
8108 - * These two special cases are useful to be comfortable
8109 - * in the caller. Nothing more. We could take
8110 - * MAX_SCHEDULE_TIMEOUT from one of the negative value
8111 - * but I' d like to return a valid offset (>=0) to allow
8112 - * the caller to do everything it want with the retval.
8113 + * For its first schedule keep the child at the same
8114 + * priority (i.e. in the same list) of the parent,
8115 + * activate_forked_task() will take care to put the
8116 + * child in front of the parent (lifo) to guarantee a
8117 + * schedule-child-first behaviour after fork.
8122 + p->prio = parent->prio;
8125 - * Another bit of PARANOID. Note that the retval will be
8126 - * 0 since no piece of kernel is supposed to do a check
8127 - * for a negative retval of schedule_timeout() (since it
8128 - * should never happens anyway). You just have the printk()
8129 - * that will tell you if something is gone wrong and where.
8130 + * Take the usual wakeup path if it's RT or if
8131 + * it's a child of the first idle task (during boot
8136 - printk(KERN_ERR "schedule_timeout: wrong timeout "
8137 - "value %lx from %p\n", timeout,
8138 - __builtin_return_address(0));
8139 - current->state = TASK_RUNNING;
8142 + p->prio = effective_prio(p);
8146 - expire = timeout + jiffies;
8147 + p->cpu = smp_processor_id();
8148 + __activate_task(p, rq, parent);
8149 + spin_unlock_irq(&rq->lock);
8152 - init_timer(&timer);
8153 - timer.expires = expire;
8154 - timer.data = (unsigned long) current;
8155 - timer.function = process_timeout;
8157 + * Potentially available exiting-child timeslices are
8158 + * retrieved here - this way the parent does not get
8159 + * penalized for creating too many processes.
8161 + * (this cannot be used to 'generate' timeslices
8162 + * artificially, because any timeslice recovered here
8163 + * was given away by the parent in the first place.)
8165 +void sched_exit(task_t * p)
8168 + if (p->first_time_slice) {
8169 + current->time_slice += p->time_slice;
8170 + if (unlikely(current->time_slice > MAX_TIMESLICE))
8171 + current->time_slice = MAX_TIMESLICE;
8176 - add_timer(&timer);
8178 - del_timer_sync(&timer);
8180 +asmlinkage void schedule_tail(task_t *prev)
8182 + finish_arch_switch(this_rq(), prev);
8186 +static inline task_t * context_switch(task_t *prev, task_t *next)
8188 + struct mm_struct *mm = next->mm;
8189 + struct mm_struct *oldmm = prev->active_mm;
8191 + if (unlikely(!mm)) {
8192 + next->active_mm = oldmm;
8193 + atomic_inc(&oldmm->mm_count);
8194 + enter_lazy_tlb(oldmm, next, smp_processor_id());
8196 + switch_mm(oldmm, mm, next, smp_processor_id());
8198 + if (unlikely(!prev->mm)) {
8199 + prev->active_mm = NULL;
8203 - timeout = expire - jiffies;
8204 + /* Here we just switch the register state and the stack. */
8205 + switch_to(prev, next, prev);
8208 - return timeout < 0 ? 0 : timeout;
8213 - * schedule_tail() is getting called from the fork return path. This
8214 - * cleans up all remaining scheduler things, without impacting the
8217 -static inline void __schedule_tail(struct task_struct *prev)
8218 +unsigned long nr_running(void)
8222 + unsigned long i, sum = 0;
8225 - * prev->policy can be written from here only before `prev'
8226 - * can be scheduled (before setting prev->cpus_runnable to ~0UL).
8227 - * Of course it must also be read before allowing prev
8228 - * to be rescheduled, but since the write depends on the read
8229 - * to complete, wmb() is enough. (the spin_lock() acquired
8230 - * before setting cpus_runnable is not enough because the spin_lock()
8231 - * common code semantics allows code outside the critical section
8232 - * to enter inside the critical section)
8234 - policy = prev->policy;
8235 - prev->policy = policy & ~SCHED_YIELD;
8237 + for (i = 0; i < smp_num_cpus; i++)
8238 + sum += cpu_rq(cpu_logical_map(i))->nr_running;
8241 - * fast path falls through. We have to clear cpus_runnable before
8242 - * checking prev->state to avoid a wakeup race. Protect against
8243 - * the task exiting early.
8246 - task_release_cpu(prev);
8248 - if (prev->state == TASK_RUNNING)
8249 - goto needs_resched;
8254 - task_unlock(prev); /* Synchronise here with release_task() if prev is TASK_ZOMBIE */
8256 +/* Note: the per-cpu information is useful only to get the cumulative result */
8257 +unsigned long nr_uninterruptible(void)
8259 + unsigned long i, sum = 0;
8262 - * Slow path - we 'push' the previous process and
8263 - * reschedule_idle() will attempt to find a new
8264 - * processor for it. (but it might preempt the
8265 - * current process as well.) We must take the runqueue
8266 - * lock and re-check prev->state to be correct. It might
8267 - * still happen that this process has a preemption
8268 - * 'in progress' already - but this is not a problem and
8269 - * might happen in other circumstances as well.
8273 - unsigned long flags;
8274 + for (i = 0; i < smp_num_cpus; i++)
8275 + sum += cpu_rq(cpu_logical_map(i))->nr_uninterruptible;
8278 - * Avoid taking the runqueue lock in cases where
8279 - * no preemption-check is necessery:
8281 - if ((prev == idle_task(smp_processor_id())) ||
8282 - (policy & SCHED_YIELD))
8287 - spin_lock_irqsave(&runqueue_lock, flags);
8288 - if ((prev->state == TASK_RUNNING) && !task_has_cpu(prev))
8289 - reschedule_idle(prev);
8290 - spin_unlock_irqrestore(&runqueue_lock, flags);
8294 - prev->policy &= ~SCHED_YIELD;
8295 -#endif /* CONFIG_SMP */
8296 +unsigned long nr_context_switches(void)
8298 + unsigned long i, sum = 0;
8300 + for (i = 0; i < smp_num_cpus; i++)
8301 + sum += cpu_rq(cpu_logical_map(i))->nr_switches;
8306 -asmlinkage void schedule_tail(struct task_struct *prev)
8307 +inline int idle_cpu(int cpu)
8309 - __schedule_tail(prev);
8310 + return cpu_curr(cpu) == cpu_rq(cpu)->idle;
8315 - * 'schedule()' is the scheduler function. It's a very simple and nice
8316 - * scheduler: it's not perfect, but certainly works for most things.
8318 - * The goto is "interesting".
8320 - * NOTE!! Task 0 is the 'idle' task, which gets called when no other
8321 - * tasks can run. It can not be killed, and it cannot sleep. The 'state'
8322 - * information in task[0] is never used.
8323 + * Lock the busiest runqueue as well, this_rq is locked already.
8324 + * Recalculate nr_running if we have to drop the runqueue lock.
8326 -asmlinkage void schedule(void)
8327 +static inline unsigned int double_lock_balance(runqueue_t *this_rq,
8328 + runqueue_t *busiest, int this_cpu, int idle, unsigned int nr_running)
8330 - struct schedule_data * sched_data;
8331 - struct task_struct *prev, *next, *p;
8332 - struct list_head *tmp;
8334 + if (unlikely(!spin_trylock(&busiest->lock))) {
8335 + if (busiest < this_rq) {
8336 + spin_unlock(&this_rq->lock);
8337 + spin_lock(&busiest->lock);
8338 + spin_lock(&this_rq->lock);
8339 + /* Need to recalculate nr_running */
8340 + if (idle || (this_rq->nr_running > this_rq->prev_nr_running[this_cpu]))
8341 + nr_running = this_rq->nr_running;
8343 + nr_running = this_rq->prev_nr_running[this_cpu];
8345 + spin_lock(&busiest->lock);
8347 + return nr_running;
8351 + * Move a task from a remote runqueue to the local runqueue.
8352 + * Both runqueues must be locked.
8354 +static inline int pull_task(runqueue_t *src_rq, prio_array_t *src_array, task_t *p, runqueue_t *this_rq, int this_cpu)
8358 - spin_lock_prefetch(&runqueue_lock);
8359 + dequeue_task(p, src_array);
8360 + src_rq->nr_running--;
8361 + p->cpu = this_cpu;
8362 + this_rq->nr_running++;
8363 + enqueue_task(p, this_rq->active);
8365 + * Note that idle threads have a prio of MAX_PRIO, for this test
8366 + * to be always true for them.
8368 + if (p->prio < this_rq->curr->prio)
8371 - BUG_ON(!current->active_mm);
8374 - this_cpu = prev->processor;
8378 - if (unlikely(in_interrupt())) {
8379 - printk("Scheduling in interrupt\n");
8381 +static inline int idle_cpu_reschedule(task_t * p, int cpu)
8383 + if (unlikely(!(p->cpus_allowed & (1UL << cpu))))
8385 + return idle_cpu(cpu);
8388 +#include <linux/smp_balance.h>
8390 +static int reschedule_idle(task_t * p)
8392 + int p_cpu = p->cpu, i;
8394 + if (idle_cpu(p_cpu))
8397 + p_cpu = cpu_number_map(p_cpu);
8399 + for (i = (p_cpu + 1) % smp_num_cpus;
8401 + i = (i + 1) % smp_num_cpus) {
8402 + int physical = cpu_logical_map(i);
8404 + if (idle_cpu_reschedule(p, physical)) {
8405 + physical = arch_reschedule_idle_override(p, physical);
8406 + p->cpu = physical;
8411 - release_kernel_lock(prev, this_cpu);
8416 + * Current runqueue is empty, or rebalance tick: if there is an
8417 + * inbalance (current runqueue is too short) then pull from
8418 + * busiest runqueue(s).
8420 + * We call this with the current runqueue locked,
8423 +static void load_balance(runqueue_t *this_rq, int idle)
8425 + int imbalance, nr_running, load, max_load,
8426 + idx, i, this_cpu = this_rq - runqueues;
8428 + runqueue_t *busiest, *rq_src;
8429 + prio_array_t *array;
8430 + list_t *head, *curr;
8434 - * 'sched_data' is protected by the fact that we can run
8435 - * only one process per CPU.
8436 + * Handle architecture-specific balancing, such as hyperthreading.
8438 - sched_data = & aligned_data[this_cpu].schedule_data;
8439 + if (arch_load_balance(this_cpu, idle))
8442 - spin_lock_irq(&runqueue_lock);
8445 + * We search all runqueues to find the most busy one.
8446 + * We do this lockless to reduce cache-bouncing overhead,
8447 + * we re-check the 'best' source CPU later on again, with
8450 + * We fend off statistical fluctuations in runqueue lengths by
8451 + * saving the runqueue length during the previous load-balancing
8452 + * operation and using the smaller one the current and saved lengths.
8453 + * If a runqueue is long enough for a longer amount of time then
8454 + * we recognize it and pull tasks from it.
8456 + * The 'current runqueue length' is a statistical maximum variable,
8457 + * for that one we take the longer one - to avoid fluctuations in
8458 + * the other direction. So for a load-balance to happen it needs
8459 + * stable long runqueue on the target CPU and stable short runqueue
8460 + * on the local runqueue.
8462 + * We make an exception if this CPU is about to become idle - in
8463 + * that case we are less picky about moving a task across CPUs and
8464 + * take what can be taken.
8466 + if (idle || (this_rq->nr_running > this_rq->prev_nr_running[this_cpu]))
8467 + nr_running = this_rq->nr_running;
8469 + nr_running = this_rq->prev_nr_running[this_cpu];
8471 - /* move an exhausted RR process to be last.. */
8472 - if (unlikely(prev->policy == SCHED_RR))
8473 - if (!prev->counter) {
8474 - prev->counter = NICE_TO_TICKS(prev->nice);
8475 - move_last_runqueue(prev);
8479 + for (i = 0; i < smp_num_cpus; i++) {
8480 + int logical = cpu_logical_map(i);
8482 - switch (prev->state) {
8483 - case TASK_INTERRUPTIBLE:
8484 - if (signal_pending(prev)) {
8485 - prev->state = TASK_RUNNING;
8489 - del_from_runqueue(prev);
8490 - case TASK_RUNNING:;
8491 + rq_src = cpu_rq(logical);
8492 + if (idle || (rq_src->nr_running < this_rq->prev_nr_running[logical]))
8493 + load = rq_src->nr_running;
8495 + load = this_rq->prev_nr_running[logical];
8496 + this_rq->prev_nr_running[logical] = rq_src->nr_running;
8498 + if ((load > max_load) && (rq_src != this_rq)) {
8503 - prev->need_resched = 0;
8505 + if (likely(!busiest))
8508 + imbalance = (max_load - nr_running) / 2;
8510 + /* It needs an at least ~25% imbalance to trigger balancing. */
8511 + if (!idle && (imbalance < (max_load + 3)/4))
8515 - * this is the scheduler proper:
8516 + * Make sure nothing significant changed since we checked the
8517 + * runqueue length.
8519 + if (double_lock_balance(this_rq, busiest, this_cpu, idle, nr_running) > nr_running ||
8520 + busiest->nr_running < max_load)
8521 + goto out_unlock_retry;
8525 - * Default process to select..
8526 + * We first consider expired tasks. Those will likely not be
8527 + * executed in the near future, and they are most likely to
8528 + * be cache-cold, thus switching CPUs has the least effect
8531 - next = idle_task(this_cpu);
8533 - list_for_each(tmp, &runqueue_head) {
8534 - p = list_entry(tmp, struct task_struct, run_list);
8535 - if (can_schedule(p, this_cpu)) {
8536 - int weight = goodness(p, this_cpu, prev->active_mm);
8538 - c = weight, next = p;
8539 + if (busiest->expired->nr_active)
8540 + array = busiest->expired;
8542 + array = busiest->active;
8546 + /* Start searching at priority 0: */
8550 + idx = sched_find_first_bit(array->bitmap);
8552 + idx = find_next_bit(array->bitmap, MAX_PRIO, idx);
8553 + if (idx == MAX_PRIO) {
8554 + if (array == busiest->expired) {
8555 + array = busiest->active;
8561 - /* Do we need to re-calculate counters? */
8562 - if (unlikely(!c)) {
8563 - struct task_struct *p;
8565 - spin_unlock_irq(&runqueue_lock);
8566 - read_lock(&tasklist_lock);
8568 - p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice);
8569 - read_unlock(&tasklist_lock);
8570 - spin_lock_irq(&runqueue_lock);
8571 - goto repeat_schedule;
8572 + head = array->queue + idx;
8573 + curr = head->prev;
8575 + tmp = list_entry(curr, task_t, run_list);
8578 + * We do not migrate tasks that are:
8579 + * 1) running (obviously), or
8580 + * 2) cannot be migrated to this CPU due to cpus_allowed, or
8581 + * 3) are cache-hot on their current CPU.
8584 +#define CAN_MIGRATE_TASK(p,rq,this_cpu) \
8585 + ((jiffies - (p)->sleep_timestamp > cache_decay_ticks) && \
8586 + ((p) != (rq)->curr) && \
8587 + ((p)->cpus_allowed & (1UL << (this_cpu))))
8589 + curr = curr->prev;
8591 + if (!CAN_MIGRATE_TASK(tmp, busiest, this_cpu)) {
8597 + resched |= pull_task(busiest, array, tmp, this_rq, this_cpu);
8598 + if (--imbalance > 0) {
8605 + spin_unlock(&busiest->lock);
8607 + resched_task(this_rq->curr);
8610 + spin_unlock(&busiest->lock);
8615 - * from this point on nothing can prevent us from
8616 - * switching to the next task, save this fact in
8619 - sched_data->curr = next;
8620 - task_set_cpu(next, this_cpu);
8621 - spin_unlock_irq(&runqueue_lock);
8623 - if (unlikely(prev == next)) {
8624 - /* We won't go through the normal tail, so do this by hand */
8625 - prev->policy &= ~SCHED_YIELD;
8626 - goto same_process;
8628 + * One of the idle_cpu_tick() or the busy_cpu_tick() function will
8629 + * gets called every timer tick, on every CPU. Our balancing action
8630 + * frequency and balancing agressivity depends on whether the CPU is
8633 + * busy-rebalance every 250 msecs. idle-rebalance every 100 msec.
8635 +#define BUSY_REBALANCE_TICK (HZ/4 ?: 1)
8636 +#define IDLE_REBALANCE_TICK (HZ/10 ?: 1)
8638 +static inline void idle_tick(void)
8640 + if (unlikely(time_before_eq(this_rq()->last_jiffy + IDLE_REBALANCE_TICK, jiffies))) {
8641 + spin_lock(&this_rq()->lock);
8642 + load_balance(this_rq(), 1);
8643 + spin_unlock(&this_rq()->lock);
8644 + this_rq()->last_jiffy = jiffies;
8650 - * maintain the per-process 'last schedule' value.
8651 - * (this has to be recalculated even if we reschedule to
8652 - * the same process) Currently this is only used on SMP,
8653 - * and it's approximate, so we do not have to maintain
8654 - * it while holding the runqueue spinlock.
8656 - sched_data->last_schedule = get_cycles();
8660 - * We drop the scheduler lock early (it's a global spinlock),
8661 - * thus we have to lock the previous process from getting
8662 - * rescheduled during switch_to().
8665 + * We place interactive tasks back into the active array, if possible.
8667 + * To guarantee that this does not starve expired tasks we ignore the
8668 + * interactivity of a task if the first expired task had to wait more
8669 + * than a 'reasonable' amount of time. This deadline timeout is
8670 + * load-dependent, as the frequency of array switched decreases with
8671 + * increasing number of running tasks:
8673 +#define EXPIRED_STARVING(rq) \
8674 + ((rq)->expired_timestamp && \
8675 + (jiffies - (rq)->expired_timestamp >= \
8676 + STARVATION_LIMIT * ((rq)->nr_running) + 1))
8678 -#endif /* CONFIG_SMP */
8680 + * This function gets called by the timer code, with HZ frequency.
8681 + * We call it with interrupts disabled.
8683 +void scheduler_tick(int user_tick, int system)
8685 + int cpu = smp_processor_id();
8686 + runqueue_t *rq = this_rq();
8687 + task_t *p = current;
8689 - kstat.context_swtch++;
8691 - * there are 3 processes which are affected by a context switch:
8693 - * prev == .... ==> (last => next)
8695 - * It's the 'much more previous' 'prev' that is on next's stack,
8696 - * but prev is set to (the just run) 'last' process by switch_to().
8697 - * This might sound slightly confusing but makes tons of sense.
8699 - prepare_to_switch();
8701 - struct mm_struct *mm = next->mm;
8702 - struct mm_struct *oldmm = prev->active_mm;
8704 - BUG_ON(next->active_mm);
8705 - next->active_mm = oldmm;
8706 - atomic_inc(&oldmm->mm_count);
8707 - enter_lazy_tlb(oldmm, next, this_cpu);
8709 - BUG_ON(next->active_mm != mm);
8710 - switch_mm(oldmm, mm, next, this_cpu);
8711 + if (p == rq->idle) {
8712 + if (local_bh_count(cpu) || local_irq_count(cpu) > 1)
8713 + kstat.per_cpu_system[cpu] += system;
8719 + if (TASK_NICE(p) > 0)
8720 + kstat.per_cpu_nice[cpu] += user_tick;
8722 + kstat.per_cpu_user[cpu] += user_tick;
8723 + kstat.per_cpu_system[cpu] += system;
8725 + /* Task might have expired already, but not scheduled off yet */
8726 + if (p->array != rq->active) {
8727 + set_tsk_need_resched(p);
8730 + spin_lock(&rq->lock);
8731 + if (unlikely(rt_task(p))) {
8733 + * RR tasks need a special form of timeslice management.
8734 + * FIFO tasks have no timeslices.
8736 + if ((p->policy == SCHED_RR) && !--p->time_slice) {
8737 + p->time_slice = TASK_TIMESLICE(p);
8738 + p->first_time_slice = 0;
8739 + set_tsk_need_resched(p);
8741 + /* put it at the end of the queue: */
8742 + dequeue_task(p, rq->active);
8743 + enqueue_task(p, rq->active);
8748 + * The task was running during this tick - update the
8749 + * time slice counter and the sleep average. Note: we
8750 + * do not update a process's priority until it either
8751 + * goes to sleep or uses up its timeslice. This makes
8752 + * it possible for interactive tasks to use up their
8753 + * timeslices at their highest priority levels.
8757 + if (!--p->time_slice) {
8758 + dequeue_task(p, rq->active);
8759 + set_tsk_need_resched(p);
8760 + p->prio = effective_prio(p);
8761 + p->time_slice = TASK_TIMESLICE(p);
8762 + p->first_time_slice = 0;
8764 + if (!TASK_INTERACTIVE(p) || EXPIRED_STARVING(rq)) {
8765 + if (!rq->expired_timestamp)
8766 + rq->expired_timestamp = jiffies;
8767 + enqueue_task(p, rq->expired);
8769 + enqueue_task(p, rq->active);
8773 + if (unlikely(time_before_eq(this_rq()->last_jiffy + BUSY_REBALANCE_TICK, jiffies))) {
8774 + load_balance(rq, 0);
8775 + rq->last_jiffy = jiffies;
8778 + spin_unlock(&rq->lock);
8781 +void scheduling_functions_start_here(void) { }
8784 + * 'schedule()' is the main scheduler function.
8786 +asmlinkage void schedule(void)
8788 + task_t *prev, *next;
8790 + prio_array_t *array;
8794 + if (unlikely(in_interrupt()))
8798 - prev->active_mm = NULL;
8804 + release_kernel_lock(prev, smp_processor_id());
8805 + prev->sleep_timestamp = jiffies;
8806 + spin_lock_irq(&rq->lock);
8808 + switch (prev->state) {
8809 + case TASK_INTERRUPTIBLE:
8810 + if (unlikely(signal_pending(prev))) {
8811 + prev->state = TASK_RUNNING;
8815 + deactivate_task(prev, rq);
8816 + case TASK_RUNNING:
8822 + if (unlikely(!rq->nr_running)) {
8824 + load_balance(rq, 2);
8825 + rq->last_jiffy = jiffies;
8826 + if (rq->nr_running)
8827 + goto pick_next_task;
8830 + rq->expired_timestamp = 0;
8831 + goto switch_tasks;
8835 - * This just switches the register state and the
8838 - switch_to(prev, next, prev);
8839 - __schedule_tail(prev);
8840 + array = rq->active;
8841 + if (unlikely(!array->nr_active)) {
8843 + * Switch the active and expired arrays.
8845 + rq->active = rq->expired;
8846 + rq->expired = array;
8847 + array = rq->active;
8848 + rq->expired_timestamp = 0;
8851 + idx = sched_find_first_bit(array->bitmap);
8852 + queue = array->queue + idx;
8853 + next = list_entry(queue->next, task_t, run_list);
8857 + clear_tsk_need_resched(prev);
8859 + if (likely(prev != next)) {
8860 + rq->nr_switches++;
8863 + prepare_arch_switch(rq, next);
8864 + prev = context_switch(prev, next);
8867 + finish_arch_switch(rq, prev);
8869 + spin_unlock_irq(&rq->lock);
8872 reacquire_kernel_lock(current);
8873 - if (current->need_resched)
8874 - goto need_resched_back;
8876 + if (need_resched())
8877 + goto need_resched;
8881 - * The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just wake everything
8882 - * up. If it's an exclusive wakeup (nr_exclusive == small +ve number) then we wake all the
8883 - * non-exclusive tasks and one exclusive task.
8884 + * The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just
8885 + * wake everything up. If it's an exclusive wakeup (nr_exclusive == small +ve
8886 + * number) then we wake all the non-exclusive tasks and one exclusive task.
8888 * There are circumstances in which we can try to wake a task which has already
8889 - * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns zero
8890 - * in this (rare) case, and we handle it by contonuing to scan the queue.
8891 + * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns
8892 + * zero in this (rare) case, and we handle it by continuing to scan the queue.
8894 -static inline void __wake_up_common (wait_queue_head_t *q, unsigned int mode,
8895 - int nr_exclusive, const int sync)
8896 +static inline void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int sync)
8898 struct list_head *tmp;
8899 - struct task_struct *p;
8901 - CHECK_MAGIC_WQHEAD(q);
8902 - WQ_CHECK_LIST_HEAD(&q->task_list);
8904 - list_for_each(tmp,&q->task_list) {
8905 - unsigned int state;
8906 - wait_queue_t *curr = list_entry(tmp, wait_queue_t, task_list);
8907 + unsigned int state;
8908 + wait_queue_t *curr;
8911 - CHECK_MAGIC(curr->__magic);
8912 + list_for_each(tmp, &q->task_list) {
8913 + curr = list_entry(tmp, wait_queue_t, task_list);
8916 - if (state & mode) {
8917 - WQ_NOTE_WAKER(curr);
8918 - if (try_to_wake_up(p, sync) && (curr->flags&WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
8919 + if ((state & mode) && try_to_wake_up(p, sync) &&
8920 + ((curr->flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive))
8926 -void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr)
8927 +void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr_exclusive)
8930 - unsigned long flags;
8931 - wq_read_lock_irqsave(&q->lock, flags);
8932 - __wake_up_common(q, mode, nr, 0);
8933 - wq_read_unlock_irqrestore(&q->lock, flags);
8935 + unsigned long flags;
8940 + wq_read_lock_irqsave(&q->lock, flags);
8941 + __wake_up_common(q, mode, nr_exclusive, 0);
8942 + wq_read_unlock_irqrestore(&q->lock, flags);
8945 -void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr)
8948 +void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr_exclusive)
8951 - unsigned long flags;
8952 - wq_read_lock_irqsave(&q->lock, flags);
8953 - __wake_up_common(q, mode, nr, 1);
8954 - wq_read_unlock_irqrestore(&q->lock, flags);
8956 + unsigned long flags;
8961 + wq_read_lock_irqsave(&q->lock, flags);
8962 + if (likely(nr_exclusive))
8963 + __wake_up_common(q, mode, nr_exclusive, 1);
8965 + __wake_up_common(q, mode, nr_exclusive, 0);
8966 + wq_read_unlock_irqrestore(&q->lock, flags);
8971 void complete(struct completion *x)
8973 unsigned long flags;
8975 - spin_lock_irqsave(&x->wait.lock, flags);
8976 + wq_write_lock_irqsave(&x->wait.lock, flags);
8978 __wake_up_common(&x->wait, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1, 0);
8979 - spin_unlock_irqrestore(&x->wait.lock, flags);
8980 + wq_write_unlock_irqrestore(&x->wait.lock, flags);
8983 void wait_for_completion(struct completion *x)
8985 - spin_lock_irq(&x->wait.lock);
8986 + wq_write_lock_irq(&x->wait.lock);
8988 DECLARE_WAITQUEUE(wait, current);
8990 @@ -775,14 +1060,14 @@
8991 __add_wait_queue_tail(&x->wait, &wait);
8993 __set_current_state(TASK_UNINTERRUPTIBLE);
8994 - spin_unlock_irq(&x->wait.lock);
8995 + wq_write_unlock_irq(&x->wait.lock);
8997 - spin_lock_irq(&x->wait.lock);
8998 + wq_write_lock_irq(&x->wait.lock);
9000 __remove_wait_queue(&x->wait, &wait);
9003 - spin_unlock_irq(&x->wait.lock);
9004 + wq_write_unlock_irq(&x->wait.lock);
9007 #define SLEEP_ON_VAR \
9008 @@ -850,44 +1135,41 @@
9010 void scheduling_functions_end_here(void) { }
9014 - * set_cpus_allowed() - change a given task's processor affinity
9015 - * @p: task to bind
9016 - * @new_mask: bitmask of allowed processors
9018 - * Upon return, the task is running on a legal processor. Note the caller
9019 - * must have a valid reference to the task: it must not exit() prematurely.
9020 - * This call can sleep; do not hold locks on call.
9022 -void set_cpus_allowed(struct task_struct *p, unsigned long new_mask)
9024 - new_mask &= cpu_online_map;
9025 - BUG_ON(!new_mask);
9027 - p->cpus_allowed = new_mask;
9030 - * If the task is on a no-longer-allowed processor, we need to move
9031 - * it. If the task is not current, then set need_resched and send
9032 - * its processor an IPI to reschedule.
9034 - if (!(p->cpus_runnable & p->cpus_allowed)) {
9035 - if (p != current) {
9036 - p->need_resched = 1;
9037 - smp_send_reschedule(p->processor);
9040 - * Wait until we are on a legal processor. If the task is
9041 - * current, then we should be on a legal processor the next
9042 - * time we reschedule. Otherwise, we need to wait for the IPI.
9044 - while (!(p->cpus_runnable & p->cpus_allowed))
9048 -#endif /* CONFIG_SMP */
9050 +void set_user_nice(task_t *p, long nice)
9052 + unsigned long flags;
9053 + prio_array_t *array;
9056 + if (TASK_NICE(p) == nice || nice < -20 || nice > 19)
9059 + * We have to be careful, if called from sys_setpriority(),
9060 + * the task might be in the middle of scheduling on another CPU.
9062 + rq = task_rq_lock(p, &flags);
9064 + p->static_prio = NICE_TO_PRIO(nice);
9069 + dequeue_task(p, array);
9070 + p->static_prio = NICE_TO_PRIO(nice);
9071 + p->prio = NICE_TO_PRIO(nice);
9073 + enqueue_task(p, array);
9075 + * If the task is running and lowered its priority,
9076 + * or increased its priority then reschedule its CPU:
9078 + if (p == rq->curr)
9079 + resched_task(rq->curr);
9082 + task_rq_unlock(rq, &flags);
9088 @@ -860,7 +1180,7 @@
9090 asmlinkage long sys_nice(int increment)
9096 * Setpriority might change our priority at the same moment.
9097 @@ -876,32 +1196,46 @@
9101 - newprio = current->nice + increment;
9102 - if (newprio < -20)
9106 - current->nice = newprio;
9107 + nice = PRIO_TO_NICE(current->static_prio) + increment;
9112 + set_user_nice(current, nice);
9118 -static inline struct task_struct *find_process_by_pid(pid_t pid)
9120 + * This is the priority value as seen by users in /proc
9122 + * RT tasks are offset by -200. Normal tasks are centered
9123 + * around 0, value goes from -16 to +15.
9125 +int task_prio(task_t *p)
9127 - struct task_struct *tsk = current;
9128 + return p->prio - MAX_USER_RT_PRIO;
9132 - tsk = find_task_by_pid(pid);
9134 +int task_nice(task_t *p)
9136 + return TASK_NICE(p);
9139 -static int setscheduler(pid_t pid, int policy,
9140 - struct sched_param *param)
9141 +static inline task_t *find_process_by_pid(pid_t pid)
9143 + return pid ? find_task_by_pid(pid) : current;
9146 +static int setscheduler(pid_t pid, int policy, struct sched_param *param)
9148 struct sched_param lp;
9149 - struct task_struct *p;
9150 + prio_array_t *array;
9151 + unsigned long flags;
9157 if (!param || pid < 0)
9158 @@ -915,14 +1249,19 @@
9159 * We play safe to avoid deadlocks.
9161 read_lock_irq(&tasklist_lock);
9162 - spin_lock(&runqueue_lock);
9164 p = find_process_by_pid(pid);
9170 + goto out_unlock_tasklist;
9173 + * To be able to change p->policy safely, the apropriate
9174 + * runqueue lock must be held.
9176 + rq = task_rq_lock(p, &flags);
9181 @@ -931,40 +1270,48 @@
9182 policy != SCHED_OTHER)
9188 - * Valid priorities for SCHED_FIFO and SCHED_RR are 1..99, valid
9189 - * priority for SCHED_OTHER is 0.
9190 + * Valid priorities for SCHED_FIFO and SCHED_RR are
9191 + * 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_OTHER is 0.
9194 - if (lp.sched_priority < 0 || lp.sched_priority > 99)
9195 + if (lp.sched_priority < 0 || lp.sched_priority > MAX_USER_RT_PRIO-1)
9197 if ((policy == SCHED_OTHER) != (lp.sched_priority == 0))
9201 - if ((policy == SCHED_FIFO || policy == SCHED_RR) &&
9202 + if ((policy == SCHED_FIFO || policy == SCHED_RR) &&
9203 !capable(CAP_SYS_NICE))
9205 if ((current->euid != p->euid) && (current->euid != p->uid) &&
9206 !capable(CAP_SYS_NICE))
9211 + deactivate_task(p, task_rq(p));
9214 p->rt_priority = lp.sched_priority;
9216 - current->need_resched = 1;
9217 + if (policy != SCHED_OTHER)
9218 + p->prio = MAX_USER_RT_PRIO-1 - p->rt_priority;
9220 + p->prio = p->static_prio;
9222 + activate_task(p, task_rq(p));
9225 - spin_unlock(&runqueue_lock);
9226 + task_rq_unlock(rq, &flags);
9227 +out_unlock_tasklist:
9228 read_unlock_irq(&tasklist_lock);
9234 -asmlinkage long sys_sched_setscheduler(pid_t pid, int policy,
9235 +asmlinkage long sys_sched_setscheduler(pid_t pid, int policy,
9236 struct sched_param *param)
9238 return setscheduler(pid, policy, param);
9239 @@ -977,7 +1324,7 @@
9241 asmlinkage long sys_sched_getscheduler(pid_t pid)
9243 - struct task_struct *p;
9248 @@ -988,7 +1335,7 @@
9249 read_lock(&tasklist_lock);
9250 p = find_process_by_pid(pid);
9252 - retval = p->policy & ~SCHED_YIELD;
9253 + retval = p->policy;
9254 read_unlock(&tasklist_lock);
9257 @@ -997,7 +1344,7 @@
9259 asmlinkage long sys_sched_getparam(pid_t pid, struct sched_param *param)
9261 - struct task_struct *p;
9263 struct sched_param lp;
9266 @@ -1028,42 +1375,64 @@
9268 asmlinkage long sys_sched_yield(void)
9271 - * Trick. sched_yield() first counts the number of truly
9272 - * 'pending' runnable processes, then returns if it's
9273 - * only the current processes. (This test does not have
9274 - * to be atomic.) In threaded applications this optimization
9275 - * gets triggered quite often.
9277 + runqueue_t *rq = this_rq();
9278 + prio_array_t *array;
9281 - int nr_pending = nr_running;
9282 + spin_lock_irq(&rq->lock);
9284 + if (unlikely(rq->nr_running == 1)) {
9285 + spin_unlock_irq(&rq->lock);
9291 + array = current->array;
9292 + if (unlikely(rt_task(current))) {
9293 + list_del(¤t->run_list);
9294 + list_add_tail(¤t->run_list, array->queue + current->prio);
9298 - // Subtract non-idle processes running on other CPUs.
9299 - for (i = 0; i < smp_num_cpus; i++) {
9300 - int cpu = cpu_logical_map(i);
9301 - if (aligned_data[cpu].schedule_data.curr != idle_task(cpu))
9303 + if (unlikely(array == rq->expired) && rq->active->nr_active)
9306 + list_del(¤t->run_list);
9307 + if (!list_empty(array->queue + current->prio)) {
9308 + list_add(¤t->run_list, array->queue[current->prio].next);
9312 - // on UP this process is on the runqueue as well
9317 + __clear_bit(current->prio, array->bitmap);
9318 + if (likely(array == rq->active) && array->nr_active == 1) {
9320 - * This process can only be rescheduled by us,
9321 - * so this is safe without any locking.
9322 + * We're the last task in the active queue so
9323 + * we must move ourself to the expired array
9324 + * to avoid running again immediatly.
9326 - if (current->policy == SCHED_OTHER)
9327 - current->policy |= SCHED_YIELD;
9328 - current->need_resched = 1;
9330 - spin_lock_irq(&runqueue_lock);
9331 - move_last_runqueue(current);
9332 - spin_unlock_irq(&runqueue_lock);
9333 + array->nr_active--;
9334 + array = rq->expired;
9335 + array->nr_active++;
9338 + i = sched_find_first_bit(array->bitmap);
9340 + BUG_ON(i == MAX_PRIO);
9341 + BUG_ON(i == current->prio && array == current->array);
9343 + if (array == current->array && i < current->prio)
9344 + i = current->prio;
9346 + current->array = array;
9347 + current->prio = i;
9349 + list_add(¤t->run_list, array->queue[i].next);
9350 + __set_bit(i, array->bitmap);
9353 + spin_unlock_irq(&rq->lock);
9360 @@ -1075,14 +1444,13 @@
9364 - set_current_state(TASK_RUNNING);
9365 + __set_current_state(TASK_RUNNING);
9370 void __cond_resched(void)
9372 - set_current_state(TASK_RUNNING);
9373 + __set_current_state(TASK_RUNNING);
9377 @@ -1093,7 +1461,7 @@
9382 + ret = MAX_USER_RT_PRIO-1;
9386 @@ -1120,7 +1488,7 @@
9387 asmlinkage long sys_sched_rr_get_interval(pid_t pid, struct timespec *interval)
9390 - struct task_struct *p;
9392 int retval = -EINVAL;
9395 @@ -1130,8 +1498,8 @@
9396 read_lock(&tasklist_lock);
9397 p = find_process_by_pid(pid);
9399 - jiffies_to_timespec(p->policy & SCHED_FIFO ? 0 : NICE_TO_TICKS(p->nice),
9401 + jiffies_to_timespec(p->policy & SCHED_FIFO ?
9402 + 0 : TASK_TIMESLICE(p), &t);
9403 read_unlock(&tasklist_lock);
9405 retval = copy_to_user(interval, &t, sizeof(t)) ? -EFAULT : 0;
9406 @@ -1139,14 +1507,14 @@
9410 -static void show_task(struct task_struct * p)
9411 +static void show_task(task_t * p)
9413 unsigned long free = 0;
9415 static const char * stat_nam[] = { "R", "S", "D", "Z", "T", "W" };
9417 printk("%-13.13s ", p->comm);
9418 - state = p->state ? ffz(~p->state) + 1 : 0;
9419 + state = p->state ? __ffs(p->state) + 1 : 0;
9420 if (((unsigned) state) < sizeof(stat_nam)/sizeof(char *))
9421 printk(stat_nam[state]);
9423 @@ -1187,7 +1555,7 @@
9424 printk(" (NOTLB)\n");
9427 - extern void show_trace_task(struct task_struct *tsk);
9428 + extern void show_trace_task(task_t *tsk);
9432 @@ -1209,7 +1577,7 @@
9434 void show_state(void)
9436 - struct task_struct *p;
9439 #if (BITS_PER_LONG == 32)
9441 @@ -1232,128 +1600,280 @@
9442 read_unlock(&tasklist_lock);
9446 - * reparent_to_init() - Reparent the calling kernel thread to the init task.
9448 - * If a kernel thread is launched as a result of a system call, or if
9449 - * it ever exits, it should generally reparent itself to init so that
9450 - * it is correctly cleaned up on exit.
9452 + * double_rq_lock - safely lock two runqueues
9454 - * The various task state such as scheduling policy and priority may have
9455 - * been inherited fro a user process, so we reset them to sane values here.
9456 + * Note this does not disable interrupts like task_rq_lock,
9457 + * you need to do so manually before calling.
9459 +static inline void double_rq_lock(runqueue_t *rq1, runqueue_t *rq2)
9462 + spin_lock(&rq1->lock);
9465 + spin_lock(&rq1->lock);
9466 + spin_lock(&rq2->lock);
9468 + spin_lock(&rq2->lock);
9469 + spin_lock(&rq1->lock);
9475 + * double_rq_unlock - safely unlock two runqueues
9477 - * NOTE that reparent_to_init() gives the caller full capabilities.
9478 + * Note this does not restore interrupts like task_rq_unlock,
9479 + * you need to do so manually after calling.
9481 -void reparent_to_init(void)
9482 +static inline void double_rq_unlock(runqueue_t *rq1, runqueue_t *rq2)
9484 - struct task_struct *this_task = current;
9485 + spin_unlock(&rq1->lock);
9487 + spin_unlock(&rq2->lock);
9490 - write_lock_irq(&tasklist_lock);
9491 +void __init init_idle(task_t *idle, int cpu)
9493 + runqueue_t *idle_rq = cpu_rq(cpu), *rq = cpu_rq(idle->cpu);
9494 + unsigned long flags;
9496 - /* Reparent to init */
9497 - REMOVE_LINKS(this_task);
9498 - this_task->p_pptr = child_reaper;
9499 - this_task->p_opptr = child_reaper;
9500 - SET_LINKS(this_task);
9501 + __save_flags(flags);
9503 + double_rq_lock(idle_rq, rq);
9505 + idle_rq->curr = idle_rq->idle = idle;
9506 + deactivate_task(idle, rq);
9507 + idle->array = NULL;
9508 + idle->prio = MAX_PRIO;
9509 + idle->state = TASK_RUNNING;
9511 + double_rq_unlock(idle_rq, rq);
9512 + set_tsk_need_resched(idle);
9513 + __restore_flags(flags);
9516 - /* Set the exit signal to SIGCHLD so we signal init on exit */
9517 - this_task->exit_signal = SIGCHLD;
9518 +extern void init_timervecs(void);
9519 +extern void timer_bh(void);
9520 +extern void tqueue_bh(void);
9521 +extern void immediate_bh(void);
9523 - /* We also take the runqueue_lock while altering task fields
9524 - * which affect scheduling decisions */
9525 - spin_lock(&runqueue_lock);
9526 +void __init sched_init(void)
9531 + for (i = 0; i < NR_CPUS; i++) {
9532 + prio_array_t *array;
9534 - this_task->ptrace = 0;
9535 - this_task->nice = DEF_NICE;
9536 - this_task->policy = SCHED_OTHER;
9537 - /* cpus_allowed? */
9538 - /* rt_priority? */
9540 - this_task->cap_effective = CAP_INIT_EFF_SET;
9541 - this_task->cap_inheritable = CAP_INIT_INH_SET;
9542 - this_task->cap_permitted = CAP_FULL_SET;
9543 - this_task->keep_capabilities = 0;
9544 - memcpy(this_task->rlim, init_task.rlim, sizeof(*(this_task->rlim)));
9545 - this_task->user = INIT_USER;
9547 + rq->active = rq->arrays;
9548 + rq->expired = rq->arrays + 1;
9549 + spin_lock_init(&rq->lock);
9551 + INIT_LIST_HEAD(&rq->migration_queue);
9554 - spin_unlock(&runqueue_lock);
9555 - write_unlock_irq(&tasklist_lock);
9556 + for (j = 0; j < 2; j++) {
9557 + array = rq->arrays + j;
9558 + for (k = 0; k < MAX_PRIO; k++) {
9559 + INIT_LIST_HEAD(array->queue + k);
9560 + __clear_bit(k, array->bitmap);
9562 + // delimiter for bitsearch
9563 + __set_bit(MAX_PRIO, array->bitmap);
9567 + * We have to do a little magic to get the first
9568 + * process right in SMP mode.
9571 + rq->curr = current;
9572 + rq->idle = current;
9573 + current->cpu = smp_processor_id();
9574 + wake_up_process(current);
9577 + init_bh(TIMER_BH, timer_bh);
9578 + init_bh(TQUEUE_BH, tqueue_bh);
9579 + init_bh(IMMEDIATE_BH, immediate_bh);
9582 + * The boot idle thread does lazy MMU switching as well:
9584 + atomic_inc(&init_mm.mm_count);
9585 + enter_lazy_tlb(&init_mm, current, smp_processor_id());
9591 - * Put all the gunge required to become a kernel thread without
9592 - * attached user resources in one place where it belongs.
9593 + * This is how migration works:
9595 + * 1) we queue a migration_req_t structure in the source CPU's
9596 + * runqueue and wake up that CPU's migration thread.
9597 + * 2) we down() the locked semaphore => thread blocks.
9598 + * 3) migration thread wakes up (implicitly it forces the migrated
9599 + * thread off the CPU)
9600 + * 4) it gets the migration request and checks whether the migrated
9601 + * task is still in the wrong runqueue.
9602 + * 5) if it's in the wrong runqueue then the migration thread removes
9603 + * it and puts it into the right queue.
9604 + * 6) migration thread up()s the semaphore.
9605 + * 7) we wake up and the migration is done.
9608 -void daemonize(void)
9612 + struct completion done;
9616 + * Change a given task's CPU affinity. Migrate the process to a
9617 + * proper CPU and schedule it away if the CPU it's executing on
9618 + * is removed from the allowed bitmask.
9620 + * NOTE: the caller must have a valid reference to the task, the
9621 + * task must not exit() & deallocate itself prematurely. The
9622 + * call is not atomic; no spinlocks may be held.
9624 +void set_cpus_allowed(task_t *p, unsigned long new_mask)
9626 - struct fs_struct *fs;
9627 + unsigned long flags;
9628 + migration_req_t req;
9631 + new_mask &= cpu_online_map;
9635 + rq = task_rq_lock(p, &flags);
9636 + p->cpus_allowed = new_mask;
9638 - * If we were started as result of loading a module, close all of the
9639 - * user space pages. We don't need them, and if we didn't close them
9640 - * they would be locked into memory.
9641 + * Can the task run on the task's current CPU? If not then
9642 + * migrate the process off to a proper CPU.
9645 + if (new_mask & (1UL << p->cpu)) {
9646 + task_rq_unlock(rq, &flags);
9650 - current->session = 1;
9651 - current->pgrp = 1;
9652 - current->tty = NULL;
9654 + * If the task is not on a runqueue, then it is safe to
9655 + * simply update the task's cpu field.
9657 + if (!p->array && (p != rq->curr)) {
9658 + p->cpu = __ffs(p->cpus_allowed);
9659 + task_rq_unlock(rq, &flags);
9663 - /* Become as one with the init task */
9664 + init_completion(&req.done);
9666 + list_add(&req.list, &rq->migration_queue);
9667 + task_rq_unlock(rq, &flags);
9668 + wake_up_process(rq->migration_thread);
9670 - exit_fs(current); /* current->fs->count--; */
9671 - fs = init_task.fs;
9673 - atomic_inc(&fs->count);
9674 - exit_files(current);
9675 - current->files = init_task.files;
9676 - atomic_inc(¤t->files->count);
9677 + wait_for_completion(&req.done);
9680 -extern unsigned long wait_init_idle;
9681 +static __initdata int master_migration_thread;
9683 -void __init init_idle(void)
9684 +static int migration_thread(void * bind_cpu)
9686 - struct schedule_data * sched_data;
9687 - sched_data = &aligned_data[smp_processor_id()].schedule_data;
9688 + int cpu = cpu_logical_map((int) (long) bind_cpu);
9689 + struct sched_param param = { sched_priority: MAX_RT_PRIO-1 };
9693 - if (current != &init_task && task_on_runqueue(current)) {
9694 - printk("UGH! (%d:%d) was on the runqueue, removing.\n",
9695 - smp_processor_id(), current->pid);
9696 - del_from_runqueue(current);
9698 + sigfillset(¤t->blocked);
9699 + set_fs(KERNEL_DS);
9701 + * The first migration thread is started on the boot CPU, it
9702 + * migrates the other migration threads to their destination CPUs.
9704 + if (cpu != master_migration_thread) {
9705 + while (!cpu_rq(master_migration_thread)->migration_thread)
9707 + set_cpus_allowed(current, 1UL << cpu);
9709 - sched_data->curr = current;
9710 - sched_data->last_schedule = get_cycles();
9711 - clear_bit(current->processor, &wait_init_idle);
9713 + printk("migration_task %d on cpu=%d\n", cpu, smp_processor_id());
9714 + ret = setscheduler(0, SCHED_FIFO, ¶m);
9716 -extern void init_timervecs (void);
9718 + rq->migration_thread = current;
9720 -void __init sched_init(void)
9723 - * We have to do a little magic to get the first
9724 - * process right in SMP mode.
9726 - int cpu = smp_processor_id();
9728 + sprintf(current->comm, "migration_CPU%d", smp_processor_id());
9730 - init_task.processor = cpu;
9732 + runqueue_t *rq_src, *rq_dest;
9733 + struct list_head *head;
9734 + int cpu_src, cpu_dest;
9735 + migration_req_t *req;
9736 + unsigned long flags;
9739 - for(nr = 0; nr < PIDHASH_SZ; nr++)
9740 - pidhash[nr] = NULL;
9741 + spin_lock_irqsave(&rq->lock, flags);
9742 + head = &rq->migration_queue;
9743 + current->state = TASK_INTERRUPTIBLE;
9744 + if (list_empty(head)) {
9745 + spin_unlock_irqrestore(&rq->lock, flags);
9749 + req = list_entry(head->next, migration_req_t, list);
9750 + list_del_init(head->next);
9751 + spin_unlock_irqrestore(&rq->lock, flags);
9754 + cpu_dest = __ffs(p->cpus_allowed);
9755 + rq_dest = cpu_rq(cpu_dest);
9758 + rq_src = cpu_rq(cpu_src);
9760 + local_irq_save(flags);
9761 + double_rq_lock(rq_src, rq_dest);
9762 + if (p->cpu != cpu_src) {
9763 + double_rq_unlock(rq_src, rq_dest);
9764 + local_irq_restore(flags);
9767 + if (rq_src == rq) {
9768 + p->cpu = cpu_dest;
9770 + deactivate_task(p, rq_src);
9771 + activate_task(p, rq_dest);
9774 + double_rq_unlock(rq_src, rq_dest);
9775 + local_irq_restore(flags);
9778 + complete(&req->done);
9782 - init_bh(TIMER_BH, timer_bh);
9783 - init_bh(TQUEUE_BH, tqueue_bh);
9784 - init_bh(IMMEDIATE_BH, immediate_bh);
9785 +void __init migration_init(void)
9790 - * The boot idle thread does lazy MMU switching as well:
9792 - atomic_inc(&init_mm.mm_count);
9793 - enter_lazy_tlb(&init_mm, current, cpu);
9794 + master_migration_thread = smp_processor_id();
9795 + current->cpus_allowed = 1UL << master_migration_thread;
9797 + for (cpu = 0; cpu < smp_num_cpus; cpu++) {
9798 + if (kernel_thread(migration_thread, (void *) (long) cpu,
9799 + CLONE_FS | CLONE_FILES | CLONE_SIGNAL) < 0)
9802 + current->cpus_allowed = -1L;
9804 + for (cpu = 0; cpu < smp_num_cpus; cpu++)
9805 + while (!cpu_rq(cpu_logical_map(cpu))->migration_thread)
9806 + schedule_timeout(2);
9809 +#endif /* CONFIG_SMP */
9810 diff -urN linux-2.4.20/kernel/signal.c linux-2.4.20-o1/kernel/signal.c
9811 --- linux-2.4.20/kernel/signal.c Fri Nov 29 00:53:15 2002
9812 +++ linux-2.4.20-o1/kernel/signal.c Wed Mar 12 00:41:43 2003
9813 @@ -490,12 +490,9 @@
9814 * process of changing - but no harm is done by that
9815 * other than doing an extra (lightweight) IPI interrupt.
9817 - spin_lock(&runqueue_lock);
9818 - if (task_has_cpu(t) && t->processor != smp_processor_id())
9819 - smp_send_reschedule(t->processor);
9820 - spin_unlock(&runqueue_lock);
9821 -#endif /* CONFIG_SMP */
9823 + if ((t->state == TASK_RUNNING) && (t->cpu != cpu()))
9824 + kick_if_running(t);
9826 if (t->state & TASK_INTERRUPTIBLE) {
9829 diff -urN linux-2.4.20/kernel/softirq.c linux-2.4.20-o1/kernel/softirq.c
9830 --- linux-2.4.20/kernel/softirq.c Fri Nov 29 00:53:15 2002
9831 +++ linux-2.4.20-o1/kernel/softirq.c Wed Mar 12 00:41:43 2003
9832 @@ -364,13 +364,13 @@
9833 int cpu = cpu_logical_map(bind_cpu);
9836 - current->nice = 19;
9837 + set_user_nice(current, 19);
9838 sigfillset(¤t->blocked);
9840 /* Migrate to the right CPU */
9841 - current->cpus_allowed = 1UL << cpu;
9842 - while (smp_processor_id() != cpu)
9844 + set_cpus_allowed(current, 1UL << cpu);
9848 sprintf(current->comm, "ksoftirqd_CPU%d", bind_cpu);
9854 -static __init int spawn_ksoftirqd(void)
9855 +__init int spawn_ksoftirqd(void)
9859 diff -urN linux-2.4.20/kernel/sys.c linux-2.4.20-o1/kernel/sys.c
9860 --- linux-2.4.20/kernel/sys.c Sat Aug 3 02:39:46 2002
9861 +++ linux-2.4.20-o1/kernel/sys.c Wed Mar 12 00:41:43 2003
9862 @@ -220,10 +220,10 @@
9864 if (error == -ESRCH)
9866 - if (niceval < p->nice && !capable(CAP_SYS_NICE))
9867 + if (niceval < task_nice(p) && !capable(CAP_SYS_NICE))
9870 - p->nice = niceval;
9871 + set_user_nice(p, niceval);
9873 read_unlock(&tasklist_lock);
9877 if (!proc_sel(p, which, who))
9879 - niceval = 20 - p->nice;
9880 + niceval = 20 - task_nice(p);
9881 if (niceval > retval)
9884 diff -urN linux-2.4.20/kernel/timer.c linux-2.4.20-o1/kernel/timer.c
9885 --- linux-2.4.20/kernel/timer.c Fri Nov 29 00:53:15 2002
9886 +++ linux-2.4.20-o1/kernel/timer.c Wed Mar 12 00:41:43 2003
9889 #include <asm/uaccess.h>
9891 +struct kernel_stat kstat;
9894 * Timekeeping variables
9896 @@ -598,25 +600,7 @@
9897 int cpu = smp_processor_id(), system = user_tick ^ 1;
9899 update_one_process(p, user_tick, system, cpu);
9901 - if (--p->counter <= 0) {
9904 - * SCHED_FIFO is priority preemption, so this is
9905 - * not the place to decide whether to reschedule a
9906 - * SCHED_FIFO task or not - Bhavesh Davda
9908 - if (p->policy != SCHED_FIFO) {
9909 - p->need_resched = 1;
9913 - kstat.per_cpu_nice[cpu] += user_tick;
9915 - kstat.per_cpu_user[cpu] += user_tick;
9916 - kstat.per_cpu_system[cpu] += system;
9917 - } else if (local_bh_count(cpu) || local_irq_count(cpu) > 1)
9918 - kstat.per_cpu_system[cpu] += system;
9919 + scheduler_tick(user_tick, system);
9923 @@ -624,17 +608,7 @@
9925 static unsigned long count_active_tasks(void)
9927 - struct task_struct *p;
9928 - unsigned long nr = 0;
9930 - read_lock(&tasklist_lock);
9931 - for_each_task(p) {
9932 - if ((p->state == TASK_RUNNING ||
9933 - (p->state & TASK_UNINTERRUPTIBLE)))
9936 - read_unlock(&tasklist_lock);
9938 + return (nr_running() + nr_uninterruptible()) * FIXED_1;
9942 @@ -827,6 +801,89 @@
9946 +static void process_timeout(unsigned long __data)
9948 + wake_up_process((task_t *)__data);
9952 + * schedule_timeout - sleep until timeout
9953 + * @timeout: timeout value in jiffies
9955 + * Make the current task sleep until @timeout jiffies have
9956 + * elapsed. The routine will return immediately unless
9957 + * the current task state has been set (see set_current_state()).
9959 + * You can set the task state as follows -
9961 + * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to
9962 + * pass before the routine returns. The routine will return 0
9964 + * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
9965 + * delivered to the current task. In this case the remaining time
9966 + * in jiffies will be returned, or 0 if the timer expired in time
9968 + * The current task state is guaranteed to be TASK_RUNNING when this
9969 + * routine returns.
9971 + * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule
9972 + * the CPU away without a bound on the timeout. In this case the return
9973 + * value will be %MAX_SCHEDULE_TIMEOUT.
9975 + * In all cases the return value is guaranteed to be non-negative.
9977 +signed long schedule_timeout(signed long timeout)
9979 + struct timer_list timer;
9980 + unsigned long expire;
9984 + case MAX_SCHEDULE_TIMEOUT:
9986 + * These two special cases are useful to be comfortable
9987 + * in the caller. Nothing more. We could take
9988 + * MAX_SCHEDULE_TIMEOUT from one of the negative value
9989 + * but I' d like to return a valid offset (>=0) to allow
9990 + * the caller to do everything it want with the retval.
9996 + * Another bit of PARANOID. Note that the retval will be
9997 + * 0 since no piece of kernel is supposed to do a check
9998 + * for a negative retval of schedule_timeout() (since it
9999 + * should never happens anyway). You just have the printk()
10000 + * that will tell you if something is gone wrong and where.
10004 + printk(KERN_ERR "schedule_timeout: wrong timeout "
10005 + "value %lx from %p\n", timeout,
10006 + __builtin_return_address(0));
10007 + current->state = TASK_RUNNING;
10012 + expire = timeout + jiffies;
10014 + init_timer(&timer);
10015 + timer.expires = expire;
10016 + timer.data = (unsigned long) current;
10017 + timer.function = process_timeout;
10019 + add_timer(&timer);
10021 + del_timer_sync(&timer);
10023 + timeout = expire - jiffies;
10026 + return timeout < 0 ? 0 : timeout;
10029 /* Thread ID - the internal kernel "pid" */
10030 asmlinkage long sys_gettid(void)
10032 @@ -873,4 +930,3 @@
10037 diff -urN linux-2.4.20/mm/oom_kill.c linux-2.4.20-o1/mm/oom_kill.c
10038 --- linux-2.4.20/mm/oom_kill.c Fri Nov 29 00:53:15 2002
10039 +++ linux-2.4.20-o1/mm/oom_kill.c Wed Mar 12 00:41:43 2003
10041 * Niced processes are most likely less important, so double
10042 * their badness points.
10045 + if (task_nice(p) > 0)
10049 @@ -146,7 +146,7 @@
10050 * all the memory it needs. That way it should be able to
10051 * exit() and clear out its resources quickly...
10053 - p->counter = 5 * HZ;
10054 + p->time_slice = HZ;
10055 p->flags |= PF_MEMALLOC | PF_MEMDIE;
10057 /* This process has hardware access, be more careful. */
10058 diff -urN linux-2.4.20/net/bluetooth/bnep/core.c linux-2.4.20-o1/net/bluetooth/bnep/core.c
10059 --- linux-2.4.20/net/bluetooth/bnep/core.c Fri Nov 29 00:53:15 2002
10060 +++ linux-2.4.20-o1/net/bluetooth/bnep/core.c Wed Mar 12 00:41:43 2003
10061 @@ -458,7 +458,7 @@
10062 sigfillset(¤t->blocked);
10063 flush_signals(current);
10065 - current->nice = -15;
10066 + set_user_nice(current, -15);
10070 --- linux-2.4.22-smp/net/bluetooth/cmtp/core.c.orig Sat Sep 20 22:21:20 2003
10071 +++ linux-2.4.22-smp/net/bluetooth/cmtp/core.c Sat Sep 20 22:22:04 2003
10072 @@ -298,7 +298,7 @@
10073 sigfillset(¤t->blocked);
10074 flush_signals(current);
10076 - current->nice = -15;
10077 + set_user_nice(current, -15);