Per backing device dirty data writeback*
Per backing device dirty data writeback replaces pdflush driven writeback in an attempt to speed up this operation.
The current 2.6 kernels use a pdflush driven approach to writing out dirty data on. pdflush is a thread pool implementation that by default has anywhere from 2 to 8 threads running in the system. Each pdflush thread can be working a number of devices, however there can only be one pdflush active against a specific device at the time.
This approach worked well and was a big step up from the 2.4 days where we had a single bdflush working all devices. There are, however, some problems with our current setup. pdflush has to work in non-blocking mode since it handles multiple devices, which can cause request starvation against a particular device since it cannot afford to wait for request allocation. There have also been reports on really fast devices out running pdflush, more than one thread would be needed to keep them running at full speed. This is not possible with the current design.
I have implemented a new design for flushing dirty data, in which dirty inodes and flushing is tracked on a per-backing device basis. This reduces both locking scope and improves locality by keeping the flushing local to one (or a set) of thread(s). Keeping the dirty inode list local to the device instead of per-superblock also reduces the amount of scanning we have to do. No request starvation can happen with this design, as local threads are allowed to block on a single device.
An experimental feature of this patch set is the ability to have multiple threads per backing device, with the file system directing the placement of inodes. Involvement from parties that suffer from pdflush being too slow for a single device will be required to finalize this.
And finally, the new approach is a lot more flexible. Threads are created and exit lazily if no work has happened for a period of time. So it should be more flexible at both ends of the spectrum, having zero threads active on an writeback idle system and scaling to more threads than pdflush should it be needed.
storage, linux, kernel, writeback
Linux kernel developer working for Oracle. I work mostly on block IO and storage related things. I’m the maintainer of the Linux block IO layer.