Sunday, 6 October 2013

Basic Compression makes null column consideration irrelevant?

Basic Compression could negate the effects of frequently null columns scattered in a row.

OakTable World 2013 started with Tim Gorman emphasising one of the fundamental aspects of table design: If you've got columns that frequently contain null values, simply put them towards the end of the row.

But I did an experiment which seems to suggest that although the theory above would manifest itself perfectly on non-compressed tables, on tables with Basic Compression the situation is different.

The test was done on a single node 11g (, with 32K block size; one heap table, not partitioned/clustered.

The disputed table had 190 columns (telecomm data), 119 of them are completely populated. From the rest, 38 columns between 46-99 percent null, and 33 columns are completely null.

Initially (no compression) the table occupied 7,340,032 bytes (Based on USER_SEGMENTS) on 224 blocks.

When the less frequently populated columns were moved to the end of the row (and stats gathered), the size of the segment dropped to 4,194,304 bytes on 128 blocks; The reduction in size was expected, but the extent of reduction was surprising for me.

But the more surprising result appeared when both tables were compressed; I still expected that the sizes to be different, but both tables had 3,145,728 Bytes on 96 Blocks!

I'd investigate the reason for this as soon as possible, but I'd really appreciate it if you'd enlighten me, if you know the reason. :)

p.s.: the code for the operation is not in the post because: 1) I was trying as hard as I can to have the first post on the blog, and 2) The operation was not extremely complicated. Still, I'd do my best to attach a dump of the data and the code.

No comments:

Post a comment