Comparative tests to determine the effect of fragmentation on extent access

Premise:

To determine what degree of effect discontiguous extents (fragmented segments) have on full table scan performance. To do so, I established a fresh, unused 1G tablespace and created in it a 500M table in different configurations. This table was populated with 5,810,041 rows.

Basic configuration:

SPARCStation 5 170 MHz HyperSPARC, 128M memory 512 swap, Solaris 2.7, Oracle 7.3.4 EE with no other significant processes running. Disk is Seagate Barracuda 8G 7200rpm with Oracle software on it and the entire database on one disk. This configuration was chosen because it is

  1. slow enough to magnify significant differences in system performance,
  2. isolated from other activities that might influence the test, and
  3. available.

The database is a pretty normal one, created for testing but with a new catalogue. Block size is 2048. For the painfully curious, these are the full params.

Equi-spaced Fragmented segment configuration:

7,490 extents of 70k each with each extent positioned 70k away from any other extent by interleafing a second dummy table with alternating extent allocation. This is maximal fragmentation.

Randomly-spaced Fragmented segment configuration:

7,490 extents of 70k each with each extent positioned random(0 .. 6635) blocks away from its logical neighbour. This was produced by alternately allocating an extent for the test table and allocating an extent of some random size for the filler table, then deallocating the filler table extents and coalescing the tablespace. Repeat 4790 times.

The distribution acheived is about that of a badly managed table, where about half of the extents are contiguous, and the other half are widely distributed. A breakdown of the extent offsets is available here.

Contiguous segment configuration:

7,490 extents of 70k with each extent adjacent to its logically adjacent extent.

Single-extent configuration:

Single extent of 524300k.

In all configurations the pctfree nd all other parameters were left to default.

In each case, SQL*Loader was used to load directly from a perl script which generated rows like this:

"$n++","aaaaaaaaaaaaaaaaaaaaaa","12042001","01012001","12","34","Why am I doing this?"

where col 1 is a sequence number, 2 is a varchar2, 3 and 4 are dates, 5 and 6 are numbers, and 7 is a varchar2.

I then changed the setting of db_file_multiblock_read_count initially to 35 (because that would be a whole extent read and therefore maximal performance on calls to pread) then to the more conventional settings of 32, 16, and deafult 8.

After each change of db_file_multiblock_read_count, the database was restarted and then the test began.

Each test consisted of six executions of the following query [1]:

select count(*) from multi_frag 

The first result was discarded and the remaining five executions were timed using the Time::HiRes perl module to get good millisecond accuracy. All six tests were performed with no activity in between, and within the same session.

The results

Times in seconds. mbrc is db_file_multiblock_read_count.
mbrc Equi-
Fragmented
Random-
Fragmented
Contiguous Single-extent
50 Max 113.0560150 145.8112080
Avg 111.9193334 145.6063335
Min 111.0810420 145.4509680
48 Max 114.9446550 147.5079680
Avg 113.6529900 146.6654857
Min 112.8780010 145.7271010
35 Max 118.6068630 113.6115420 145.9530180
Avg 117.0498126 112.1446948 145.5263648
Min 115.5066920 109.3606230 145.0670800
32 Max 130.7510110 116.4618670 145.3982130 145.1209010
Avg 120.0842376 115.8266760 105.2245420 141.3603106
Min 115.9592350 114.7451980 88.4458040 136.7923820
24 116.3248760 135.5023560
116.1398850 134.7885750
115.8248950 134.0072550
16 Max 328.7927530 116.4866980 87.5720500 88.8285000
Avg 239.4676784 114.8729186 86.7090202 87.4721010
Min 118.1331340 113.8363470 85.6402580 86.9118490
8 Max 343.6603230 117.6407090 88.0483660 90.3987690
Avg 247.4077822 116.7668960 87.6826790 89.3174622
Min 159.6123270 115.7156940 87.2905530 86.0655490

During the initial discarded run of each bank of test, I ran truss against the dedicated server. In each of the fragmented and contiguous tests, a number of pread calls were followed by a semctl call presumably to another bg process (pmon?). This semctl call coincides with the beginning of a new extent being read. In the case of Single-extent, no calls were made other than preads and a single write (pipe return?).

I won't make any detailed comment on the coniguous vs. single-extent results here. I will though point out that the 'single-extent is the true path' brigade have something to think about when at 7,490 contiguous extents beats single-extent in every execution except one.

The fascinating part to me is the relatively good performance of the outrageously fragmented configuration. At best it is faster than contiguous. But when mbrc is lowered, fragmented falls off terribly while contiguous gets consistently better to the apex (16 blocks = 32k, the best read for my setup).

More work required?

In due course I will run more trials with a better analysis of the underlying system calls, and update the experiment to 8.1.7 as a server comes available.

Finally

If you are mad enough or bored enough to want to replicate this test, I have written a couple of fairly straight-forward tools to make it easier, particularly loading 5.8 million rows of fake data and creating fragmented tables (I'm not sure how much use that is). I may upload them here, or I may not. If you want them though, merely email and ask for them.

In future, all these technical disucssions are to be moved to SQL.org.uk where there are free discussion lists as well.

Paul Miller



Footnotes:

1.For those of you that have asked, yes this does a full table scan. There are no indexes on the table, and Oracle can only full scan it to count it. back

© 2001 Carib Data Limited.