Linux Kernel Scheduler

The point of this article is to give you a brief on how the Linux scheduler works.

The Linux kernel scheduler may seem complex initially, partly due to its modular nature. It allows multiple scheduling algorithms to coexist to manage different classes of processes, known as scheduler classes. In this article, we'll focus on two key schedulers: the Completely Fair Scheduler (CFS) and the Real-Time Scheduler. We'll also explore a potential replacement for the CFS scheduler after its 15-year reign.

Completely Fair Scheduler (CFS)

The CFS aims to ensure that every process in the system receives a fair share of CPU time. It applies to processes in the SCHED_NORMAL, SCHED_BATCH, and SCHED_IDLE classes. Here's an overview of how CFS works:

CFS uses a Red-Black (RB) tree where each node represents a task ready to run.
The leftmost node in the RB tree represents the task that has received the least CPU time.
The kernel selects the leftmost task and runs it for a predefined time slice.
If the task isn't completed within this time slice, it's reinserted into the RB tree; otherwise, it's discarded or exited.
The RB tree rebalances itself to find the next task that has gotten the least CPU time.
The complexity of this algorithm is O(log n) due to the nature of RB trees.

Real-Time Scheduler

Tasks in the Real-Time Scheduler have potential weights that determine their priority. The scheduler operates in two classes: SCHED_RR and SCHED_FIFO.

SCHED_FIFO tasks run until they complete or yield.
SCHED_RR tasks are run in a round-robin fashion for a maximum slice time.

SCHED_RR and SCHED_FIFO tasks have higher priority than SCHED_NORMAL and SCHED_BATCH tasks, and they can preempt lower-priority tasks.

Global Scheduling Policy and Multi-core Systems

Almost all chips today are multicore. In a multi-core system, we also have a global scheduling policy that determines how tasks are distributed among the available CPUs to make the most efficient use of the system's resources.

Tasks are assigned to CPUs based on load balancing algorithms that aim to distribute tasks evenly across all available CPUs. The CFS constantly monitors the CPU usage of each task and the overall system load to make decisions about task placement. When a new task is created or an existing task needs to be rescheduled, the CFS uses its load balancing algorithm to determine which CPU should run the task. This algorithm considers factors such as the current CPU load, the CPU affinity of the task (if specified), and the historical CPU usage of the task.

By distributing tasks evenly across all CPUs, the global scheduling policy helps to maximize the overall throughput of the system and ensure that no CPU is over or under-utilized. This approach improves the overall performance and responsiveness of the system, especially in multi-core systems where efficient task distribution is critical.

Scheduler Implementation

These schedulers operate per CPU, with each CPU managing processes under various algorithms. The “structure rq” structure contains data structures for managing these, such as struct rt_rq and struct cfs_rq. You can find these data structures in the linux/sched.h header file.

Potential Replacement: Earliest Eligible Virtual Deadline First (EEVDF) Scheduler

After 15 years, there's a potential replacement for the CFS scheduler in the form of the EEVDF scheduler. This new scheduler aims to further improve upon the fairness and efficiency of task scheduling in the Linux kernel. It promises better performance and fairness by leveraging virtual deadlines to manage task scheduling more effectively.

This revised version includes the necessary corrections and additional details for clarity and completeness.