Wednesday 25 December 2013

IMPDP/EXPDP, Virtual Columns, and user-defined functions

If you have virtual columns based on user defined functions in a schema, you can not use IMPDP/EXPDP for that schema; not that easily, at least.

In this post I would lay out a semantic scheme for a test, and propose a successfully-tested work-around.

Required material for the scenario:

- A schema named VCOL_TEST, containing:
- A function named VCOL_FUNCTION
- A table named VCOL_TABLE

The script below can be used to create such schema:

create user vcol_test identified by vcol_test;

grant   create session,
        select any dictionary,
        create procedure,
        create table,
        debug connect session,
        debug any procedure
to vcol_test;

conn vcol_test/vcol_test;

create or replace function vcol_function(p_parameter number)
return number
deterministic
is
begin
  if p_parameter > 10 then
    return 10;
  else
    return 1;
  end if;
end vcol_function;
/

show errors;

create table vcol_table(id number, vcol as(vcol_function(id)));

exit;

What's the problem?

If you try to use IMPDP to create a "remapped" schema (e.g. VCOL_TEST2) on a different server, you would come across this:

Processing object type SCHEMA_EXPORT/TABLE/TABLE
ORA-39083: Object type TABLE:"VCOL_TEST2"."VCOL_TABLE" failed to create with error:
ORA-00904: "VCOL_TEST"."VCOL_FUNCTION": invalid identifier
Failing sql is:
CREATE TABLE "VCOL_TEST2"."VCOL_TABLE" ("ID" NUMBER, "VCOL" NUMBER GENERATED ALWAYS AS ("VCOL_TEST"."VCOL_FUNCTION"("ID")) VIRTUAL VISIBLE ) SEGMENT CREATION DEFERRED PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING TABLESPACE "USERS"

Processing object type SCHEMA_EXPORT/FUNCTION/ALTER_FUNCTION
Job "SYSTEM"."SYS_IMPORT_SCHEMA_01" completed with 1 error(s)
What's happening here?

Checking the "Failing sql" would manifest the bug: IMPDP is trying to create table VCOL_TEST2.VCOL_TABLE referring to function VCOL_TEST.VCOL_FUNCTION (as opposed to VCOL_TEST2.VCOL_FUNCTION).

i.e. IMPDP is not able to remap the owner of the function used for the virtual column, to the new schema name.

What does Oracle say about this?

Officially, nothing.

Although a consulting member of technical staff at Oracle was kind enough to send the following response:

"In releases prior to 11.2.0.4, a table with a virtual column based on the result of a PL/SQL function would not be recreated successfully on Import, even when REMAP_SCHEMA was not used. This is because the necessary PL/SQL function would not have been created before the table was created.

The problem was addressed by modifying Import to create the table with a temporary definition for virtual columns, then creating the function, then modifying the definition of the virtual columns to use the correct function.

The initial fix was in BUG#10186633 and this was later extended in BUG#14506804. The extended fix is also included in certain patch set updates and one-off patches for 11.2.0.2 and 11.2.0.3.


Unfortunately this fix does not address the problem of REMAP_SCHEMA. Even in 11.2.0.4.0 or 12.1.0.1.0, it is still possible to encounter a failure during Import, or a failure during runtime, depending on the privileges of the owner of the target schema and the existence of the source schema.

The work-around

Since re-importing and remapping existing test schemas is part of our daily tasks, I have developed a bash script which follows these steps:

In Export:
1. Create a list of user defined functions which are used in virtual columns, and according to their dependencies, use DBMS_METADATA to create a SQL script which can recreate the functions with no reference to the schema name.

2. Create a list of tables whose virtual columns rely on the exported functions, and (using DBMS_METADATA, again) build a schema-independent SQL script to recreate them.

3. Obviously you would need to build another script to grant quotas on different tablespaces that would hold the concerning tables, to the new user/schema.
4. Export as usual, using EXPDP.

In Import:
1. Create the new schema, grant all the privileges and quotas according to the source schema.
2. Use the script created in step 1, above, to create the functions.
3. Use the script created in step 2, above, to create the concerning tables.
4. IMPDP everything EXCLUDING the tables mentioned in the previous step. Ignore the "already exists" messages during the process.
5. IMPDP DATA_ONLY for the tables mentioned in previous step.

IMPDP is done in two passes, due to the fact that the existence of the already-created concerning tables (in step 3 of Import) would definitely lead to "already exist" errors, which would prevent importing the data for the tables.

Note: The method above only works, if -like us- the concerning tables do not hold any referential constraints to any other tables (our "solution" is a datawarehouse and the concerning tables are "fact" tables whose integrity constraints -if any- are enforced through application).

If they did, you might have to make sure that their "child" tables -if any- are imported after they are.

Final Note

The same problem could occur for function-based indexes as well, and it does not seem to be an issue that Oracle DBAs can overlook.

Although Virtual Columns are extremely helpful, But as Matthias Rogel explains in an example their implementation has not been without issues.

Fortunately this time, based on an Oracle insider, the Datapump team are aware of the bug.

There's still hope!

Thursday 7 November 2013

DBA_SEGMENTS Vs. DBA_EXTENTS for calculating data size

If you are dealing with a single datafile tablespace and you'd need to know the size of a specific chunk of data, you need to use DBA_SEGMENTS!

Here's a simple experiment on tablespaces with 32K Blocks, on ASM and 11.2.0.3, and PARALLEL_DEGREE_POLICY set to MANUAL:

SELECT /*+ PARALLEL(DE,4) */ SUM(BYTES)
FROM DBA_EXTENTS DE
WHERE TABLESPACE_NAME='ACTIX_DATAPART_0001';
-- Result: 8346468352 IN 162.67 SECs

SELECT /*+ PARALLEL(DS,4) */ SUM(BYTES)
FROM DBA_SEGMENTS DS
WHERE TABLESPACE_NAME='ACTIX_DATAPART_0001'
-- Result: 8346468352 IN 3.687 SECs
The SQL statements were run exactly in the same order that are displayed, but to make sure that the caching of the base segments (like SYS_OBJECTS view or any other SYS schema object) is not contributing to the achieved time, shared pool was flushed and SQL statements were run in reverse order. Same results.

The tablespace used has two datafiles, but same statements were also tried on single file tablespaces and again, the same results.

Obviously DBA_EXTENT would allow you to query based on FILE_ID, which -unfortunately- will be crucial when you are interested to know about segments/extents on one of the files. In my case, I need it for a "datafile defragmentation" proc, which requires the BLOCK_ID of extents; hence DBA_EXTENTS.

Side note: The SUM(BYTES) would return the total sum of the size of the blocks that contain data; not the actual size of the data, although I'm not sure whether that matters or not!

Post-publish-newbie-Observation: Just noticed that DBA_SEGMENTS is entirely based on SYS.SYS_DBA_SEGS, whose HEADER_FILE column represents FILE_ID.

Also that the "Bytes" column in DBA_EXTENTS/DBA_SEGMENTS is calculated using SYS_DBA_SEGS based in the following (ugly looking, but time saving!) formula:

decode(bitand(segment_flags, 131072), 131072, blocks,  (decode(bitand(segment_flags, 1), 1, dbms_space_admin.segment_number_blocks(tablespace_id,relative_fno, header_block, segment_type_id, buffer_pool_id,segment_flags,
segment_objd, blocks), blocks)))*blocksize

Regarding the EXTENT_ID: what you need is in sys.x$ktfbue.ktfbuebno; that is, if you need a precise BLOCK_ID, but if -like me, using a 5MB margin for resizing datafiles- a "more-or-less" precise BLOCK_ID does the job, then your answer lies in SYS.SYS_DBA_SEGS.HEADER_BLOCK.

It is either the same as the BLOCK_ID you'd get from sys.x$ktfbue.ktfbuebno and sys.uet$.block# (that is DBA_EXTENTS.BLOCK_ID) or it's only a mere 2 blocks higher!

The king is dead! All hail the king!

Sunday 3 November 2013

Oracle and releasing documentation: Do it, or don't!

Something is fundamentally wrong here!

I was watching a great presentation by Carlos Sierra on Adaptive Cursor Sharing, similar to what he presented in OakTable 2013 that I was present in (Yes, I chose Carlos' talk over Jonathan Lewis' on 12c Histograms! What are you gonna do about it?!), just to remind myself of the critical points made about the subject.

If you have seen the presentation (around 01:05:45+) or have been present at his talk, you'd know that there is a certain point in the presentation that EVERYBODY seems to be confused.

The confusion is about how the ranges of row counts in three different buckets -that define the buckets, are calculated. Carlos has explicitly admitted to the issue in his article"Observations on a test case built on 11.2.0.3 seem to indicate that boundaries for these 3 buckets are: less than 1,000; between 1,000 and 1,000,000; and more than 1,000,000."

Carlos is a person who rightly praises OakTable members using a definition that members -rightly, again- attribute to themselves; i.e. Oracle Scientists. But isn't it a bit strange to call people who study the behaviour of a man-made entity, scientists?! Would you call the greatest mechanics you know, (for example) Ford Scientists?!

And that brings me to my point: I've been privileged to be among the members of this group on the Summer of 2013, and I know that "Scientist" is only one of many titles you could choose to describe them with. But the absurdity of the title arises from the unprofessional behaviour of Orcale itself.

I have absolutely no problem with Oracle keeping its documentation secret; after all they've spent millions of man-hours to develop a capable piece of software, but you'd have to make a choice here: either -for whatever reason you choose to do so- you release ALL your documentation, (likes of which you can find here), or you don't do any of it.

The fact that many professional like OakTable members have to do research to find out about the behaviour of a man-made product (as opposed to a natural phenomena/being) whose creator possesses the information about its inner-workings, can only make an independent observer more intent on judging the creator as "yet another corporation" or -in a best case scenario, "not able to provide an association between the size of the company and the quality of its product," while it can be regarded as an actual player in the field of data science.

The ambiguities such as the ones about the ACS could only bolster the position of those who might ask: Is Oracle releasing its documentation -the way it has- merely a PR move?

Sunday 20 October 2013

How to start a career in Oracle at the age of 38

Getting back to IT could be  a "bad decision" for me.

After all, I could deceive myself of the illusion of "being an established journalist in a globally known media organisation;" same as many of my previous colleagues did and still do.

But you can not be dishonest to yourself; being "naturally" good at something is not the best that one can be. There has to be an element of "effort to fulfil a potential" involved.

So with the help of a true friend, I left it all at the age of 38, and started in a Telecomm software company; as an Oracle Developer/Admin ... well, "to be admin" to be precise!

I have to say that it didn't start out of the blue; I used to be an Oracle developer (8i) just before 2000 and an MSc in Database systems (2009); so it wasn't a field that I was completely out of touch with.

But in one sentence: It was extremely hard; so much so that it dragged depression's foot into my life and started to threaten my personal life. And that was despite being blessed with great colleagues and understanding managers.

What made it hard?

1. Myself! Obviously I was trying too hard for making the lost ground on learning, therefore I felt obliged to pay attention to EVERY single thing that was being said around me. Also coming form public sector, I was wrongly convinced that I'd be kicked out ASAP if I under-perform.

2. Having extremely talented and intelligent colleagues, and the fact that I wanted to perform like them there and then; although I've been away from Oracle for a decade.

3. Not getting enough/clear communication from my managers about what is expected from me; mainly due the force of -always imminent- release deadlines. I could also blame the first point (above) for not being able to fully apprehend the existing communication form my manager, which generally conveyed a message of sympathy.

4. Oracle is not the most straight forward software to deal with. A HUGE number of factors influence its performance and the presence of mind to pinpoint "the" main one in any circumstance, mainly stems from experience, rather than intelligence.

How can I survive?

1. Refrain from setting vague and all-encompassing goals (such as "being a great DBA") for yourself! Start with small and clear goals through coordination with your manager or senior members of your team.

2. Serialize your tasks. What you need the most at the start of this career is the focus; and trying to do more than one thing at a time would deprive you of that. Just remember that being focused is not generally easy when you're 38, anyway!

3. Communicate! Communicate! Communicate! Try to clarify what's asked from you, and find out if there is a "preferred" way in the company for such a task. Try to establish the appropriate/expected deadlines explicitly, and eventually ask for feedback; whether it's task by task, day by day, or week by week.

4. If you haven't said in your job interview that you've got "10 years of Oracle DBA experience" while all you've done is "once or twice installing an Oracle database," you do not need to compete with your more experienced colleagues. Truly experienced people in Oracle tend to be humble and decent personalities (I had a great experience in OakTable World 2013 - Thanks to Kyle Hailey!, and I shouldn't forget my own colleagues). Ask them to guide you, and they will. In general it is extremely rare to come across genuinely knowledgable people who are jerks.

5. Do not expect yourself to know everything about Oracle; in fact the existence of ORA-07445s would tell you that even Oracle doesn't know everything about Oracle! Looking at the manners of big names in Oracle would teach you that even they (some with more than 3 decades of experience) still regard themselves as a student of Oracle.

Be prepared to be a scientist; not a technician!

p.s. Do not forget about the great Oracle community. What inspired me to revisit my decision to change career path was a tweet by Jonathan Lewis who asked:


Those are the questions that keep your brain alive, and remind you that you actually enjoy dealing with this thing the world calls "Oracle."

Thanks Jonathan Lewis.

Wednesday 16 October 2013

How to recreate ASM spfile after dropping the associated diskgroup

Disclaimer: This post lacks any scientific value whatsoever, and no one can hold the writer accountable for that! It merely came to existence just to serve as a warning.

I do feel like a 5 years old kid; It seems like the most efficient way of learning for me has been learning through making mistakes and correcting them afterwards!

After installing Clusterware for a single node instance, I noticed I have not created the DiskGroups I needed; so out goes the default "Data" diskgroup (despite Oracle begging me to reconsider, but as usual the FORCE option is ever so tempting) and in come two new diskgroups.

The diskgroups were supposed to be used for a duplicate database, so -obviously- when I started the active duplication process failures began to show.

I noticed on the recently created ASM instance that "sho parameter dummy" would return a bold scary "TRUE".

If you're as naive as I am, the rest is for you:

1. Shutdown the "Dummy" ASM instance.
2. Create a pfile, in which you'd explicitly name your existing Diskgroups. (e.g. "*.asm_diskgroups='BACKUP_01','DATA_01'")

The next two steps are for my amazement, actually, and for you to truly understand that you'd need to think twice before making silly prompt decisions.
3. Startup the ASM instance using the pfile you've just created
4. Create an spfile from the available pfile.
create spfile from pfile='/home/oracle/initASM.ora';
You should get something like this:

ERROR at line 1:
ORA-00349: failure obtaining block size for
'+DATA/asm/asmparameterfile/registry.253.828460173'

ORA-15001: diskgroup "DATA" does not exist or is not mounted
4. (Again, and using a more logical approach) Create an spfile, in one of the existing diskgroups, from the available pfile.
create spfile='+DATA_01' from pfile='/home/oracle/initASM.ora'
And this time:
ERROR at line 1: ORA-17502: ksfdcre:4 Failed to create file +DATA_01 ORA-15221: ASM operation requires compatible.asm of 11.2.0.0.0 or higher
I hope you're filled with regret and remorse from your actions on the night before! If not, do refrain from reading the rest of the post! So we go back a step: 3.9. Change the "compatible.asm" -obviously!
alter diskgroup DATA_01 set attribute 'compatible.asm'='11.2.0.0.0';
4. Create an spfile, in one of the existing diskgroups, from the available pfile.
create spfile='+DATA_01' from pfile='/home/oracle/initASM.ora'
Which would result in:

File created.
You can restart the ASM instance like a normal and decent human being now!

Final Thoughts: My dear God! Why?! Why did you put an ape's brain in my skull?! Do I have to make an attempt to put EVERY rectangular object into round holes to understand that the two shapes are different?! WHY?!

Sunday 13 October 2013

Everything about Hakan Factor

Yes, EVERYTHING! At least that's the aim; primed by the lack of official Oracle documentation, and in effect, not having one single piece of document to refer to.

There are, in total, 28 documents in Oracle support (as of the date this post is published) that have a reference to Hakan Factor; but not once an all encompassing definition of this attribute and its effect on Oracle's inner working is provided!

What is Hakan Factor?

Oracle provides its best explanation in Metalink 1389444.1: "The Hakan factor is used to unique map rows in a Bitmap Index to the rows in the base table. This factor is related to the max number of rows that can be stored in the a single block of a table;" Although I've noticed Richard Foote believes concretely that Hakan Factor is in fact the "maximum number of possible rows that could potentially fit in a block."

Oracle states (same Metalink): "
Several factors like the column type and not null constraints influence the Hakan factor;" but those "several factors" are not explicitly mentioned.

When does it come into play, and what's the prognosis?

If you'd need to exchange partitions of a table that has Bitmap Indexes defined on it, the Hakan Factor for the partitioned table AND the non-partitioned table should be the same; otherwise the exchange operation would fail. 

Oracle checks the Hakan Factor on both sides "to prevent corruptions. The check
is introduced in the fix for bug 2114490." (Ref: Bug ID 7352308)

If the Hakan Factors do not match, the prognosis is either of the error messages below:

ORA-14642: Bitmap index mismatch for tables in ALTER TABLE EXCHANGE PARTITION
OR
ORA-14643: Hakan factor mismatch for tables in ALTER TABLE EXCHANGE PARTITION
How can you retrieve its current value?

Oracle provides the following query (Metalink 248643.1) for this purpose:

SQL> select a.object_name,b.spare1
     from dba_objects a, tab$ b
     where a.object_id=b.obj#
     and a.object_name in ('<Partioned_table>','<Non-Partitioned-Table>')
     and a.object_type='TABLE';
The query should be run under SYS user.

Can Hakan Factor change?

Yes. According to Metalink 397827.1 "The HAKAN-factor (records_per_block) is re-calculated when an ALTER TABLE is executed" which is only logical. The case that the Metalink refers to, takes place in "versions 9.2.0.7 to 10.2.0.5" when unused columns are dropped before exchange partitions, that would lead to a change in the "RECORDS_PER_BLOCK" value.

Oracle warns about the hidden impact of changing "null" constraints on the Hakan Factor as well;  A very interesting scenario is mentioned in Bug ID 6026618. I ran the bit of code on oracle 11.2.0.3 with 32K blocks:

create table hk(c char(1) not null, d char(100) not null);
create bitmap index bi on hk(c);
-- Hakan Factor is 311

alter table hk modify (d null);
-- Hakan Factor is still 311
create table hk2 as select * from hk;

select o.name, mod(t.spare1, 32768) hakan
from tab$ t, obj$ o
where t.obj# = o.obj#
and name in ('HK', 'HK2');
-- Hakan factor for "hk2" is 2971
It's obvious that changing the "not null" constraint to "null" would affect the row length of the records, and therefore RECORDS_PER_BLOCK would change too, but the change will not be obvious until an attempt is made to create a replica (CTAS) of the original table.

In Metalink 1389444.1 Oracle clarifies that "The factor will also be recalculated when a table is modified until there is the first bitmap Index is created," and in Bug ID 8407142 more detailed test scenarios (compared to the one in 6026618) can be found that manifest the issue on compressed tables.

Although if you add columns to a non-compressed table after the first Bitmap index is created, the Hakan factor still changes; at least on 11.2.0.3. So the above description only holds for compressed tables.

Is there a way to set the value of the Hakan Factor?

No, or at least not directly; you would assume that a proficient DBA has a full grasp of the row sizes, so he/she can control the number of rows in a block through (for example) manipulating free space in blocks.

But you can directly reset Hakan Factor's value. According to Bug ID 6026618 "Altering the table for instance with nominimize records_per_block resets the factor;" Although "The nominimize can only be done if all bitmapped indexes are dropped first."

Again, you do not need to drop the indexes to have the Hakan Factor recalculated; The indexes can be set to unused, but some might argue that rebuilding the index afterwards has the same cost as creating a new index. I'm not sure how dropping an index is compared to setting it to unused.

Is there a way to avoid the consequences of having different Hakan Factors?

Oracle generally provides solutions that seem to be devised by people totally uninterested in real-life scenarios, specifically scalability! 

Solution 1 (Metalink 248643.1):

"Issue the exchange command with EXCLUDING INDEXES, then rebuild partitioned tables' unusable indexes."

Solution 2 (Metalink 248643.1):

"Drop Bitmap indexes of tables, perform the exchange command with INCLUDE INDEXES for the rest of the indexes, then recreate Bitmap indexes."

Solution 3 (Metalink 1389444.1):

"Create the exchange table with Create Table As Select (CTAS) while the event 14529  is set, then EXCHANGE PARTITION will go through. The event should only be set for CTAS."

Solution 4:

The proposed action for ORA-14643 is clear enough:
"Action:If records_per_block has been minimized for one of the tables, 
 but not the other, either perform alter table with the NOMINIMIZE 
 RECORDS_PER_BLOCK option for both tables, or perform alter table 
 with the MINIMIZE RECORDS_PER_BLOCK for both tables. If the hakan 
 factors do not match perform alter table with the NOMINIMIZE 
 RECORDS_PER_BLOCK option for both tables."

How to make sure that differences in Hakan Factor do not happen?

Two different situations have to be considered: 

1. When both tables involved in partition exchange do not have any Bitmap indexes:
ALTER TABLE X NOMINIMIZE RECORDS_PER_BLOCK;
This statement basically recalculates the RECORDS_PER_BLOCK, which for two identical tables should be the same; unless either your tables are located on different tablespaces with different block sizes, or they're not identical in the first place!

2. When Bitmap indexes exist: Metalink 1389444.1: "The Hakan factor has to be protected for the existing bitmap indexes. If a new table created to exchange data with the partitioned table, with a table layout that include columns added after the bitmap index creation on the partitioned table, will most likely result in a different Hakan factor.

i.e. Do not ALTER TABLEs involved in the partition exchange after Bitmap Indexes are created on them!

Just for clarification, the above Metalink refers to situations that the non-partitioned table is created AFTER the changes were made in the partitioned table; so both tables look identical in terms of column names and data types.

Final thoughts:

[Considering my previous post as well,] I only wonder why Oracle has failed to provide a detailed description on the requirements of partition exchange operation? What [explicitly and specifically] constitutes "identical" tables, and why a full list of logical and physical attributes involved is not published?

Wednesday 9 October 2013

Note to self: How (not) to exchange partitions

Disclaimer: This is a "note to self;" not a guide, and by no means anything close to a "complete" guide! These are just the points that I can remember RIGHT NOW! :)

I spent a complete day on diagnosing a failing exchange subpartition operation; hence this post.

I was particularly disappointed (because of the OakTable accreditations) when I noticed that "Expert Oracle Practices" (Apress - OakTable, p353) describes the required conditions for a successful partition exchange in only one -occasionally vague- sentence: "[both segments should] have the same logical shape (that is, the same columns, data types, scalability[DATA_SCALE I assume], precision, data- integrity constraints such as NULL/NOT-NULL, and so forth)" although it has to be said that this book has got far more content than any other books I've recently read, and specifically details the workflow for exchanging partitions, in page 356, under "Loading Millions of Rows."

Pro Oracle 11g Administration sheds a little bit more light (p295) by saying: "[both segments] must have the same primary-key structure defined."

Oracle Database 11g Handbook suffices to say that (p266) both segments should have "the same column structure" and has reminder that in terms of the work flow you should:
  1. Load data into the stand alone table
  2. Analyze the table
  3. Build the indexes "to match the partitioned table’s indexes" but doesn't elaborate on what does this "matching" entails. It also adds: "the indexes [on the partitioned table, obv!] must be local, and not global, indexes."
Based on my experience (11.2.0.3) you actually CAN exchange partitions without gathering stats on the stand alone table, but I'm not sure how it would benefit anyone, mainly because it can lead to stat anomalies on partitioned table, and severely affect the execution plan; unless -for whatever reason and through whatever means- you've made sure that you're not gathering any kind of stats on the partitioned table.

Also the indexes on both sides do not need to have the same name; all that matters is that both sides have the same number of indexes, and the indexes are matching in terms of type (Bitmap, B-Tree) and the columns that the indexes are built on.

Two other points on the indexes:
  1. If you are using Bitmap indexes you HAVE TO make sure the Hakan Factor on both segments involved in the exchange are the same. (details: Metalink 1389444.1, although I'm more in favour of doing an "ALTER TABLE xxx NOMINIMIZE RECORDS_PER_BLOCK" on both tables when they're empty.)
  2. if there are indexes on virtual columns, the definition of the virtual columns should match as well.
On the nature of columns:

Apart from name, type, and size/precision of columns on both sides, the order of the columns should match too. The presence of virtual columns in between non-virtual ones will not cause failure in partition exchange as long as the order of non-virtual columns are the same.

Notes:
  • I have not tested exchanging partitions that contain Spatial indexes.
  • This post will be updated, as soon as I remember/come across other points
  • Or if you'll kindly let me know, if you know of other missing bits of info :)

Sunday 6 October 2013

Basic Compression makes null column consideration irrelevant?

Basic Compression could negate the effects of frequently null columns scattered in a row.

OakTable World 2013 started with Tim Gorman emphasising one of the fundamental aspects of table design: If you've got columns that frequently contain null values, simply put them towards the end of the row.

But I did an experiment which seems to suggest that although the theory above would manifest itself perfectly on non-compressed tables, on tables with Basic Compression the situation is different.

The test was done on a single node 11g (11.2.0.3), with 32K block size; one heap table, not partitioned/clustered.

The disputed table had 190 columns (telecomm data), 119 of them are completely populated. From the rest, 38 columns between 46-99 percent null, and 33 columns are completely null.

Initially (no compression) the table occupied 7,340,032 bytes (Based on USER_SEGMENTS) on 224 blocks.

When the less frequently populated columns were moved to the end of the row (and stats gathered), the size of the segment dropped to 4,194,304 bytes on 128 blocks; The reduction in size was expected, but the extent of reduction was surprising for me.

But the more surprising result appeared when both tables were compressed; I still expected that the sizes to be different, but both tables had 3,145,728 Bytes on 96 Blocks!

I'd investigate the reason for this as soon as possible, but I'd really appreciate it if you'd enlighten me, if you know the reason. :)

p.s.: the code for the operation is not in the post because: 1) I was trying as hard as I can to have the first post on the blog, and 2) The operation was not extremely complicated. Still, I'd do my best to attach a dump of the data and the code.