4K Sectors and ZFS

Tags: Delphix Engineering

For over 30 years, hard drives have designated the smallest storage location as 512 bytes. In January 2011, all major hard drive manufactures began shipping their hard drive platforms using a new standard called Advanced Format. To aid  in the transition, these new hard drives provide a 512 byte emulation mode, known Advanced Format 512e, that allows the drives to advertise themselves as a 512 byte addressable devices. This can severely impact write performance resulting in the need for read-modify-write operations for any misaligned or partial writes that are issued.

The problem is not limited to just physical hardware. Other storage platforms may also provide LUNs (logical unit number) that presents themselves as a 512 byte addressable devices when, in fact, they use a 4K sector size internally. Although ZFS has built-in support for 4K sectors, it has no automatic way of dealing with the lies that the storage devices tell. In October 2012, I had the opportunity to present at the first ever ZFS Day to talk about the challenges that have affected ZFS especially when devices lie (slides and video).

I wanted to go into more detail about what Illumos has done to workaround the storage lies. I should mention that hard drive manufactures are finally getting it right. Newer versions of the AF 512e drives are advertising their physical sector size correctly but there are still drives and storage platforms that exist that continue to lie about their physical characteristics.

Illumos has had a way to work around this for some time. Users of OpenSolaris, and Illumos-based distros have had to solve this problem long before I came along. And to their credit their solution paved the way to "Ease the Suffering" of early AF 512e adoption. That solution, however, had challenges (as I think all solutions do). The initial solution allowed you to create a ZFS pool and override the ashift property forcing ZFS to use 4K sector sizes. One of the challenges with previous solutions was that you could not add multiple devices with mixed sector sizes (i.e. zpool add tank <512 byte disk> <4K disk> ...)

To address this short-coming I took a different approach that allows the administrator to define the product and vendor ids for drives they know are not properly advertising physical sector size. A great write-up on it's usage can be found here. One challenge with all of these approaches is dealing with the initial installation of the operating system. You could build a custom distro that ships with one of this override methods on the ISO image. But if you don't want to get into the distro business here's an alternative approach.

First, make sure you're using a distro that contains the fix for '2665 sd.conf should be able to override physical-block-size':

commit 2384d9f8fcca0a7ef8b3ae674d94df82832c0fce
Author: George Wilson <gwilson@delphix.com>
Date:   Thu May 3 05:49:19 2012 -0700

    2665 sd.conf should be able to override physical-block-size
    2671 zpool import should not fail if vdev ashift has increased
    Reviewed by: Adam Leventhal <ahl@delphix.com>
    Reviewed by: Eric Schrock <eric.schrock@delphix.com>
    Reviewed by: Richard Elling <richard.elling@richardelling.com>
    Reviewed by: Gordon Ross <gwr@nexenta.com>
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Approved by: Richard Lowe <richlowe@richlowe.net>

I'm using OpenIndiana 151a5 Desktop Edition for this example. The first thing you'll need to do is determine the Vendor and Product ID for the device you're installing onto. A simple way to do this is to run 'format'

From here we can use the Vendor and Product information to populate the /kernel/drv/sd.conf file. Here's what mine looks like:

sd-config-list =
 "VMware, VMware Virtual S", "physical-block-size:4096";

Now we must update the running kernel with the changes we just made:

# update_drv -vf sd
Cannot unload module: sd
Will be unloaded upon reboot.
Forcing update of sd.conf
sd.conf updated in the kernel.

Don't worry about the errors we only care that the sd.conf was re-read by the kernel. Unfortunately this new setting will not take effect as long as the current disk is attached so we must force it to unattach and reattach:

The trick here is to unconfigure the device using 'cfgadm' and to then reconfigure it. This will force the sd driver to re-attach the device with the new physical-block-size. Once the install is started you can check that the pool has the correct ashift property:

That's it! This shows that the disk correctly advertised itself to ZFS as a 4K (2^12) sector device. We can just let the installation continue and we're done.