Oracle 10g Read Performance
When running Oracle 10g databases on Red Hat 5.5 with datafiles mounted over NFS v3 we saw suboptimal read performance, specifically db file scattered read requests showing high IO latency. The IO latency coming out of Disk was under 1ms. At the NFS server, we observed similar latency. NFS client was also showing similar latency plus the network latency. But Oracle was seeing latency multiple of what the NFS client was serving, almost 20+ms. Oracle relies on two types of operations for reading in large chunks of data - Direct Path Reads (DPR) and Scattered Reads (SCAT).
Scattered Read requests are issued by Oracle to bring in large chunks of data into the buffer cache, assuming data will be re-used or is not large enough to bypass the cache. Direct Path Reads do not bring data into the Cache and are designed for data which Oracle believes will not have re-use. If you observe performance issues on your Oracle 10g databases, caused specifically due to db file scattered read requests showing high IO latency, you could be affected by this bug. The block size used for these reads is determined by the following two parameters:
The number of blocks requested for each multi-block read is determined by "db_file_multiblock_read_count.".\ So, for the above database, with db_block_size of 8KB, the multi-block read requests will be 128KB. The maximum values allowed for db_file_multiblock_read_count is 128 blocks. For a typical oracle installation with block size of 8KB, multiblock reads can be as high as 1MB in size.
We noticed that, for Oracle 10g databases running on R.H Linux 5.5 with datafiles mounted over NFS, a multi-block SCAT read request issued by Oracle is broken into individual block reads by the time it reaches the NFS server. Oracle uses the readv(2) system call with a vector of blocks to be fetched into the SGA. The blocks are likely contiguous on disk but non-contiguous in memory, so a readv(2) system call is designed to perform a scatter-gather operation and fetch all the blocks as a single read request.
In Red Hat Linux 5.5, we observed that the readv(2) system call is issuing individual requests for each block vector. So the break-up of the single large block read into multiple small block reads was occurring at the Linux OS. Even if the NFS server latency for the individual 8k reads is low, network and RPC latency for each request accumulates to make the total response time for Oracle significantly higher compared to the actual disk read latency.
For example, in our lab tests, we observed that, even if NFS server responses never exceed 1ms, the Oracle latency for SCAT requests reached 20+ms. The following linux bugs discuss this issue in Linux 5.5 and the fixes made earlier this year.
We verified that this issue is caused by Linux and not the NFS client or Oracle 10g. When running the same environment, Oracle 10G database on R.H. Linux 6.4, we observed that SCAT requests of size 128KB were issued as-is by the NFS Client to the NFS server. We could see an almost 20x improvement in overall latency as the 128KB blocks were being serviced by the NFS server within 1ms.
So the problem is in Linux's implementation of readv(2) system call, which appears to be fixed with R.H. 6.4 If you are observing high IO Waits on Oracle, caused due to the latency of db file scattered read requests, and if you are running Oracle 10g on R.H Linux 5.5, you may be running into this issue. Note that direct path reads are not affected by this bug.
There are a few workarounds for this, in the following order of preference: Upgrade your Linux to R.H. 6.4 or newer. We confirmed that the issue is resolved in that version of Linux. Force multi-block reads to use direct path rather than scattered read Set "_serial_direct_read" to TRUE. This forces all serial scans that exceed the small table threshold to use direct path reads into PGA bypassing the buffer cache and avoiding the readv(2) issue with scattered reads.
Parallel reads are already issued as direct path. This is the semantics used by Oracle 11g, so this workaround will force 10g databases to use those semantics. Change the filesystemio_options init parameter to "ASYNC". The issue manifests only when using DIRECT_IO on the NFS mounts. Oracle recommends using "SETALL" to take advantage of ASYNC and DIRECT_IO. So not using DIRECT_IO will mitigate the problem.
The downside to not using DIRECT_IO is potential caching at the NFS Client. This means data could be double buffered, both in the OS Buffers as well as Oracle's SGA. This will lead to additional CPU usage on the host copying data between the buffers. If the Linux host running Oracle has sufficient memory outside the SGA and CPU, this can potentially improve performance due to additional cache greedily allocated to the database.
We discovered a bug in older versions of Linux, Red Hat ver 5.5, that was causing read performance issue for Oracle 10g databases under certain conditions. We present three possible workarounds to mitigate the issue, each with its own merits. Users should evaluate the three options and choose the best one suited for their environments.