Second Half: Oracle

Friday, 1 December 2017

Cast your NULLs in case you use CASE!

You might tolerate a programming environment which is unable to cast (for example) numbers in string "containers" to numbers, but not being able to cast a NULL value from one datatype to another could drive any Buddhist monk crazy!

So lock your Buddhist monks away, because that can happen in [Oracle] SQL!

However the issue could manifest itself in situations not worthy of being known as "best practice!"

(Wow! Three exclamation marks already! Well, five including this sentence. This post better be good.)

You stored your Dates as WHAT?!

You might remember Richard Foote's articles (1, 2, and 3) on the consequences of storing Date values as Numbers or Varchars.

This is another example of the consequences of such bad practices. I recently came across this example, which made this post happen.

SQL>WITH RND AS
(
    SELECT
        CASE
            WHEN DBMS_RANDOM.VALUE(1, 100) <= 50
                THEN 0
            ELSE
                20171009
        END NUM
    FROM DUAL
    CONNECT BY LEVEL < 10
)
SELECT
    CASE
        WHEN NUM = 0 THEN DECODE(NUM,0,NULL,TO_DATE(TO_CHAR(NUM), 'YYYYMMDD'))
        ELSE
            SYSDATE
    END N
FROM RND;

The "RND" factored subquery represents a horizontal-and-vertical slice of a staging table. Dates were stored as Numbers and if not present, stored as 0!

During the ETL the "date" value was merged into a table considering an existing "Date" value from another table (presented by SYSDATE), using a CASE expression. The main SELECT statement represents an equivalent of that MERGE.

Note: Do not take the "WHEN NUM = 0" literally. It's just a simple/silly replacement for a meaningful predicate.

This is the result:

SQL Error: ORA-00932: inconsistent datatypes: expected CHAR got DATE
00932. 00000 - "inconsistent datatypes: expected %s got %s"

It's too confusing for Oracle, too!

It takes a couple of re-writes to understand that the "DECODE" is not the culprit here.

SQL>WITH RND AS
(
    SELECT
        CASE
            WHEN DBMS_RANDOM.VALUE(1, 100) <= 50
                THEN 0
            ELSE
                20171009
        END NUM
    FROM DUAL
    CONNECT BY LEVEL < 10
)
SELECT
    DECODE(NUM,0,NULL,TO_DATE(TO_CHAR(NUM), 'YYYYMMDD')) AS N
FROM RND;

N
------------------
09-OCT-17

09-OCT-17

09-OCT-17
09-OCT-17
09-OCT-17

09-OCT-17

9 rows selected.

As you can see the DECODE function has no problems handling the NULL value alongside a value of type Date.

"In case you use CASE"

You don't need to be Bryn Llewelyn to infer that the CASE expression seems to be trying to make sure that the result of all of its conditions have matching data types.

In this case, it's really easy for Oracle to find a reference point. SYSDATE is easily revealing the expected datatype, but if "NUM = 0" the output of the DECODE function would be NULL.

What is the intrinsic data type of NULL?

Just to expose mysel to humour, I'd say: NULL!

My completely uneducated (and somewhat silly) guess is that at some point during interpretation of the CASE expression Oracle must be doing a comparison between the output datatypes of different conditions in the CASE expression and (if you're living in fantasy-land and imagine that Oracle will still be using SQL for that!) it will run a statement like this:

SQL>SELECT
        CASE
            WHEN 'DATE' = NULL THEN 'DATE'
            ELSE 'NOT DATE'
        END
    "DATE = NULL?"
FROM
    dual;

DATE = N?
--------
NOT DATE

Why is DECODE not guilty?

Because of the ridiculous "TO_DATE(NULL)" in the statement below:

SQL>WITH RND AS
(
    SELECT
        CASE
            WHEN DBMS_RANDOM.VALUE(1, 100) <= 50
                THEN 0
            ELSE
                20171009
        END NUM
    FROM DUAL
    CONNECT BY LEVEL < 10
)
SELECT
    CASE
        WHEN NUM = 0 THEN DECODE(NUM,0,TO_DATE(NULL),TO_DATE(TO_CHAR(NUM), 'YYYYMMDD'))
        ELSE
            SYSDATE
    END AS N
FROM RND;

N
---------
30-NOV-17
30-NOV-17
30-NOV-17

30-NOV-17
30-NOV-17

9 rows selected.

DECODE is not the same as CASE

This is one of the two points which makes me not fully agree with a statement Steven Feuerstein made in one of his recent articles (go to "Hey, What about DECODE?" at the bottom).

DECODE behaves differently than CASE expression, specifically when it comes to dealing with NULL values.

SQL>WITH test_values AS
(
    SELECT A, b
    FROM
    (
        SELECT 1 A FROM dual
        UNION
        SELECT NULL A FROM dual
    )
    CROSS JOIN
    (
        SELECT 1 b FROM dual
        UNION
        SELECT NULL b FROM dual
    )
)
SELECT
    A,
    b,
        CASE
            WHEN A = b THEN 'EQUAL'
            ELSE 'NOT EQUAL'
        END AS "CASE equality test",
       DECODE(A,b,'EQUAL','NOT EQUAL') AS "DECODE equality test"
FROM
    test_values;

A     B     CASE equality test    DECODE equality test
----- ----- --------------------- --------------------
1     1     EQUAL                 EQUAL
1     null NOT EQUAL             NOT EQUAL
null 1     NOT EQUAL             NOT EQUAL
null null NOT EQUAL             EQUAL

What we saw in this post had to do with casting data types, but another thing which makes me (sometimes - especially when the number of conditions are one or two) choose the geeky, yet concise DECODE function over the human-readable CASE expression is the fact that DECODE is capable of comparing NULL values (and not only doing an equality test) and inferring TRUE.

Friday, 22 September 2017

On the burden of unjustifiable human interfaces

"If something doesn't have an API, it doesn't exist!"

For me, that quote form Mitchell Hashimoto represents one idea: If you don't automate access to resources, you'll be extinct.

It's not about cloud, containers, microservices, serverless technologies, or any other 2017 buzzword that you can come up with.

Those are not the aim. Those are simply tools to help you free up access to resources, to help things move faster.

Below is simply an example (rather specific, albeit) of how inaccessibility of resources can reduce productivity.

1. The problem

1.1. The context

A "Trade" data warehouse, running on Oracle 11g database. Moved to Exadata recently.

Multiple types of financial instruments, multiple types of trades, years of data [sub]partitioned daily.

Example of oddities: One of the (badly designed, in my opinion) "dimension" tables holds tens of millions of rows!

Countless number of hints in every SQL statement to make hundreds (if not thousands) of queries per hour work on the legacy (i.e. non-Exadata) system.

Prospects of changing the data model: almost zero.

State of Preprod/UAT DB: much less volume of data, not receiving as much data as the Prod by a large margin.

State of Dev DB: Almost non-existent.

The "DBA" sits in a completely different environment. Might vaguely know about the data model. It's -supposedly- a big corporation.

1.2. The daily workflow

Customer complains about queries running slow. The complaint is sent to Dev.

Dev are the people who have written the SQL statements. They know about the data model, with good knowledge of Oracle internals. Dev has no access to Cloud Control or DBA workflow; for example no idea when the backups run last, or if it was running at the time of an incident.

DBA has no visibility over the complaints or DB performance. And there's no point in sending the complaint to the DBA: He can't qualify a SQL statement as "problematic" or "OK," if the query is taking too long to complete, because he doesn't know the data model.

1.3. The example scenario

An issue is raised. Dev checks the SQL statement.

As a matter of principle/process runs the statement on PreProd. PreProd doesn't have the same characteristics of the Prod, so the issue doesn't always reappear.

Dev is aware of the qualities of the PreProd DB. Also the customer's not delusional; something IS running slow.

Just imagine that when it comes to executing a join on (e.g.) 10 different row sources, CBO had (twice) chosen to do Nested Loops and tens of thousands of Range Index Scans on one of the massive tables because of incorrect stats on the driving table. Can't get clearer than that, can we?

Trying to establish whether the behavior can re-occur (are we actually dealing with a systematic issue? or just a one-off anomaly?) Dev re-runs the statement on Prod. CBO chooses a "bad" plan, query runs like a dog, can't cancel the query on SQL Developer.

Obviously something's wrong. Any "fix" should be re-examined.

Including the re-examination (if successful, that is) the query has run 3 times on Prod.

SQL Developer is still hanging.

Just a reminder that breaching SLAs in financial sector can lead to fines of millions of pounds by the regulator, at least!

This has to be fixed quickly.

Dev (the same person who understands the data model, knows what's running on his SQL Developer session, and most probably knows why it's taking too long) asks the DBA to kill the session.

The DBA (not familiar with the data model, unaware of the activity): "Could you raise a ticket, please?"

Tick, Tock, Tick, Tock, Tick, Tock...

2. Red Lines and questions

2.1. Prod != PreProd

In my opinion that's the root cause of the whole issue.

If you're big enough to be in a sector with SLAs which puts you in danger of loosing millions of pounds in fines, you should be big enough to have the EXACT SAME environment in PreProd, as in Prod. If you don't, you're embarrassing yourself.

If you're not a software vendor and your organisation effectively owns the production data, perhaps you should use Data Virtualisation tools such as Delphix.

However, this factor is simply an easy target to criticize; especially if you don't have to deal with it from a pragmatic angel (because you'll not suffer the consequences of missing the SLA, for starters).

If you're not an architect of this system, or you're brought it long after the date that system has become operational, you have to make lemonade using the lemons that life has given you.

2.2. Auditing for the sake of auditing

Why are "kill this session, please" tickets raised? Are we simply trying to serialize DBA's workload and avoid him getting overwhelmed by a flood of requests every minute? (If yes, perhaps he's controlling too many resources.)

Has a "regulator" asked you to do this? If not, is there a clear process of review?

What is the value in raising those tickets?
What's the action plan for reviewing those ticket?
Can you set up a down-stream automated process which processes the raised tickets for later analysis?
Is the underlying data in your ticketing system accessible in a programmable way?
Would you be able to chart/categorize your tickets easily?
Can you use your ticketing system to produce quantifiable indexes to measure trends in your organization?

How long do you keep those tickets in your system? Why?
How do you ensure that such ticket won't have the same fate as billions of tourist photos which are taken every day which won't be EVER revisited again?

2.3. Role of DBA

What kind of DBA deals with such tickets? Is he/she a DR/HA DBA? If they're not application DBAs (who's familiar with data/business model), why not?

What is the process of communicating with DBA? How many DBs he's responsible for? How many teams/products/services is he responsible for? Does Dev know about his responsibilities? Is he part of the team?

How does he contribute to [increasing] productivity? Has he turned to a bottleneck? If yes, why? Too much responsibilities? Is he not eager to be part of the product?

2.4. The value of involving DBA in menial tasks

Why should the DBA deal with such tickets? What's the difference, if Dev does it and it's logged?

If you are a DBA who knows that menial tasks can be automated AND logged in a way that satisfies "process" requirements and you still choose to receive tickets, perhaps you SHOULD feel threatened by the notion of "autonomous DBs"...

The world has changed, everybody is automating menial tasks, anyway!

2.5. The value of being involved in "non-menial" tasks

If you're the only person who's privileged to (e.g.) deploy a package, would you be held responsible, if the package causes data loss? If Dev will eventually be held responsible, then why Dev is not deploying the code?

Do you -the DBA- know how use tools such as Jenkins to enable the Dev do the deployments in a safe manner? Do you want to enable Dev to move things faster in a safe manner? or you feel too proud of being the only person who does the deployments?

Do you -the DBA- do code reviews? Do you clearly know what does the code that you're deploying do? If not, then why are you deploying it, anyway?!

Conclusion

I'm sure all of us can see the problems hidden in the answers to some of the questions above. Obviously the elephant in the room is the segregation of DBA and the development process.

But if you can't embed the DBA in your development team, my solution is "Provide an API for your resources!"

I'm not sure how many of us can justify the delays caused by having a human interface for a resource which doesn't need one. In my opinion the notion of "raising a ticket will only take two minutes" is simply not true, at least because if the DBA is not busy to be able deal with his incoming notifications immediately (is he/she EVER?!) it would require another two minutes to deal with the issue, and another two minutes to reflect the outcome in the ticket, which is subject to all the questions I mentioned in 2.2.

This means your Dev has had an opportunity to distract himself either with another task, or Instagram! Context/task switching in humans is an extremely costly factor in production, and it's by no means as efficient as context switching in CPUs.

That applies to DBAs too (surprise!) since they're not sent from above. When DBA receives tens of notifications every minute for requests which can be automated, he/she have to deal with context switching as well.

Are your reasons for keeping the human interfaces to your resources justified? When was the last time you did a soul searching and self examination to re-examine your reasons?

Sunday, 13 December 2015

Dude, Where's My Datafile?

disclaimer: here

ASM can be too transparent at times.

I found that out when I was investigating read errors (ORA-19501 & ORA-15081 - which caused RMAN backup to fail) on a datafile.

For a reason that will be explained below, it was necessary to find out which ASM disk the datafile resides on, but you can't retrieve that information from V$ASM_ views, because they do not reveal the association between datafiles (i.e. the ASM Aliases) and the ASM disk itself.

Environment: "External" redundancy across ASM diskgroup, which is shared between six RAC DBs. GI and Databse are both 11.2.0.4, but the solutions applies to 12.1.0.2 as well.

unrecoverable error ORA-15188 raised in ASM I/O path

Considering the chances of logical corruption, and using the block number reported in ORA-19501, it is possible to find the affected segment (see Stefan Knecht's article) and take an action to prevent the same error from happening again during RMAN backup; such as deleting the segment.

But after deleting the segment, although I could run the backup successfully, validation of the datafile in RMAN failed, citing the exact same errors, on the exact same block.

This is due to the difference in the mechanism of backup compared to validation's: "When validating whole files, RMAN checks every block of the input file" while RMAN backup touches blocks that contain data, and skips the blocks that do not contain data.

Incidentally checking the v$ASM_disk.READ_ERRS revealed that one of the ASM disks has experienced read errors. Interestingly the errors could also be traced in ASM instance's alert.log, as well, citing information about the location of the issue:

WARNING: Read Failed. group:2 disk:2 AU:235530 offset:786432 size:131072
ERROR: unrecoverable error ORA-15188 raised in ASM I/O path; terminating process 28997

Enter The Headless Chicken!

That's me: totally ignoring the fact that although I don't know the solution, I SHOULD know what's NOT the solution!

Indications of the issue in ASM instance's alert.log (warnings similar to the one in the previous lines) dated back to the last time an ASM disk was added to the diskgroup, and rebalancing was triggered.

Rebalancing had made ASM go through a series of continuous start/failure/stop of ARB0 background process: The initial attempt for rebalancing (after adding a new ASM disk to the diskgroup) triggered the start of ARB0, failed with ORA-15118, which lead to ARB0 stopping, and restarting the whole process over and over again.

Yet, the headless chicken decided to drop the ASM disk from the diskgroup to rectify the problem!

Eventually the futility of the process dawned on me and I executed the following commands:

alter diskgroup DATA_01 undrop disks;
alter diskgroup DATA_01 rebalance power 0;

The solution

The key lies in the fact that both RMAN validation and ASM rebalacing deal with the same thing: the whole datafile.

In this case, the ORA-15118s were an indication that ASM was unable to access some non-data-holding parts of the datafile on the storage, so the solution was shrinking the datafile to a size as close as possible to the volume of the data; because RMAN backup is carried out with no errors.

But how can we make sure that datafiles of other DBs which are stored on the same diskgroup, are not affected? First we need to find out which datafiles from which DBs have bits and pieces on the same ASM disk, and then validate each one using RMAN.

Which gets us to the main point of this post (finally!): the missing logical link between data files and ASM disks can be found in X$KFFXP fixed table.

I found out about it from "Secrets of Oracle Database" (Chapter 12, Norbert Debes, Apress, 2009) but there are other sources which discuss the fixed tables in ASM instance, such as "Pro Oracle Database 10g RAC on Linux" and Luca Canali's article here.

The following query can be used for this purpose:

select *
from v$asm_alias
where SYSTEM_CREATED = 'N'
and (FILE_NUMBER, FILE_INCARNATION) in
(
select distinct NUMBER_KFFXP, INCARN_KFFXP
from x$kffxp
where DISK_KFFXP = disk_number according to alert.log
and LXN_KFFXP = 0
and GROUP_KFFXP= group_number according to alert.log
);

The join columns reveal the purpose of NUMBER_KFFXP, and INCARN_KFFXP columns, GROUP_KFFXP and DISK_KFFXP are equal to v$asm_disk.GROUP_NUMBER and v$asm_disk.DISK_NUMBER, and LXN_KFFXP is the Logical Extent Number (0 = primary, 1 = mirrored copy).

Note for the proponents of set theory: Yes, I know that this query doesn't manifest an understanding of your beloved theory (and mine too, believe me!) but I found this syntax fast enough without requiring any hints to make sure the right join method is used.

Finally

The query revealed all the files (from different DBs) that were located on that specific ASM disk. Fortunately all of them were validated successfully by RMAN.

Eventually the segments of the problematic datafile were moved to a new tablespace, the data file was shrunk with no issues (and validated successfully by RMAN), and afterwards I was able to drop the ASM disk from the group.

Apparently some people have been living happily ever after.

p.s. I like to thank Martin Widlake and Neil Chandler for offering me advice for this post. However, I'm responsible for statements made here, so please do let me know if you notice any errors.

Sunday, 17 May 2015

Nobody reads your blog!

Yes, nobody!

I mean: REALLY read it; going through all the words and sentences and paragraphs and get through to the end.

During the time that I was working for BBC World Service (2002-2012) one of the more important guidelines for online content producers was: "All you have to play with is the first page, and in fact, first paragraph of your piece. You have to grab your audience's attention there, and say whatever you want to say in the top part of the page, before the user needs to scroll down, because they rarely do!"

There has been plenty of evidence supporting this theory (see likes of which in here, here and here) and as far as I know, this is still what's being preached in BBC, and many prominent international media organizations.

So what should we do?

Your blog post is a product


Your post should be a matchbox: simple, small, to the point

And considering the evidence (read their conclusion as: "global short attention-span syndrome") you need to come up with a product like a matchbox, rather than Swiss army knife. Simple, Small, to the point.

As a product you need to think about two main aspects:

a. Content (the idea itself, what is it that your "product" supposed to do)
b. Form (packaging, how can you make sure that the "product" is bought, and also used)

a. Content

You are a technical blogger (an assumed audience of this post) and have something to write about. I believe you need to answer these questions, before putting pen on paper (or fingertip on keyboard... whatever)

1. What is the purpose of your post? Are you trying to spread the knowledge? educating people? or just blowing off steam about some bug you've come across? Do you actually want your post to be read by as many people as possible? or you're happy as long as one or two like-minded people read it?

1.5. Consistency always raises your profile. Are you happy with being a part-time educator, part-time bug nagger? Or you actually prefer to be known by only one of them?

2. Have you thought about your audience? Who are they? Are they "Oracle Ace directors?" Are they supposed to be technically as able as you are? How you considered the fact that even among Oracle Ace Directors people have different abilities? Not everybody is an "Optimizer guru," "Recovery master," or "Replication genius", and they all might still be Oracle Ace Directors. Do you care if someone reads your post and doesn't understand it?

3. Are you true to yourself regarding your own -possible- attempt at a shot at fame? If you want to be known as THE educator, or THE expert in a subject, it needs a lot of effort, not only to be that person, but also to make people notice you (i.e. how to package/sell your "product"). It needs commitment to presentation/marketing fundamentals, and awareness of the market characteristics and trends.

I found Tim Hall's advice very insightful on many other matters concerning management of this creative process.

b. Form

You can find plenty of style-guides on the internet (here and here, for example). They seem to have a number of key elements in common:

A Bad blog post!

Choose intriguing yet short titles.

Most of the blog posts are advertised via Twitter. So more compact title provides you with more room for hashtags to advertise your Tweet (and therefore your blog post) properly.

Do consider concepts such as "impact" and "controversy" when you come up with a title. Avoid questions in titles; that's generally frowned upon in the world of "proper" journalism.
Start with an abstract/introduction.

Don't waste your readers' time by not stating the aim of your post, because if you make them go through it and they find out that it's not what they were looking for, they will hesitate to come back to your blog.

Also people usually don't scroll down! If you don't present your idea in an exciting manner in the first two or three sentences, they'll simply leave your post!
ALWAYS write short sentences, and short paragraphs.

BBC World Service's guideline was "one sentence is enough in a paragraph," and if you read news on BBC's website, you can see how it looks.

Nobody likes going through tens of uninterrupted lines of writing; especially on technical blogs which are usually wider than normal to avoid line-wraps when including programming code (or "execution plans" among Oracle bloggers).
Break your post into little pieces.

I advise using sub-headers religiously. It can make the scanning eyes of your readers stop, and pay attention to what you want them to pay attention to.

Also think about the flow of your post. If you can use "ordered lists" as your sub-headers, do it.
Use any tool that's available at your disposal to make your post less boring: images, links, unordered lists, Alex Gorbatchev's SyntaxHighlighter.

You might think that your audience are too sophisticated for such "marketing tricks," but unfortunately we all (at least the more fortunate ones) can only read with our eyes. If the eyes don't like something they see, they try not to see it!
A "Conclusion" part is as important as the "abstract/introduction."
Audience is ALWAYS thinking: "I don't care who you think you are; I decide who you are!"

So don't think they'll keep reading your posts no matter how long it is. They simply won't. Keep it short! (Unlike what I've done here, in my post!)
Remove distractions from your blog post, as much as possible.

So you are an OCA and an OCP and an OCM and an Oracle Certified Expert on something and you get tons of traffic that tens of companies want to advertise through your website? Sure, but every time someone comes into your page and the content of your post isn't the first thing that their eyes meet, you've effectively lost a reader.
Making your blog Mobile-Aware increases your exposure

Make your post as accessible/visible as you can.

Use meta tags or anything else that can help your post appear in search engines when looking for a related word/phrase.

Also I need to echo Julian Dyke's advice (thanks to Martin Widlake for reminding me of it): For better exposure you need to make your blog mobile-aware. Google announced on 26th February, 2015 how they're changing their search engine's criteria to give higher rankings to Mobile-aware blogs and sites. Also guidelines to test, and the definition of "mobile-awareness" is included in the statement.

Conclusion

Technical bloggers seem to read blogs in two circumstances, generally:

When they're looking for a solution to their own problem (i.e. when they read a blog post they keep saying in their head: "cut the bullshit!")
At night, after they're finished with day to day stuff. (i.e. their eyes and brains are too tired to spend a lot of energy to go through your "jewel of a post" line by line, and word by word.)

If you don't engage with them as soon as they start reading the title of your post, and don't make the whole experience of reading your post as comforting (and informative) at the same time, You have effectively wasted your time!

Thursday, 14 May 2015

How to keep track of execution in multi-file/part Oracle scripts

disclaimer: here

Installation scripts can be a pain.

With multiple anonymous PL/SQL blocks here and there, and scripts calling other scripts and bind variables being initiated through running some other scripts and.... You get the image.

So if -God forbid- we want to check the result of an activity which might have happened in the earlier parts of this "spaghetti of codes," we need to kill a bunch of those precious gray cells and find a way!

Must of the time decisions can be based on "footprints in the data;" e.g. something is supposed to happen later in the script based on whether we have deleted a row or not. But If the script is supposed to be re-runnable (such as an installation/upgrade/patch script) how can you judge whether THAT rows has been dropped in this run, or a previous one?

Well, there is a solution after all.

Conditional Compilation Flags to the rescue!

As long as your script is running on an 10gR2 or later released instance, and your whole script runs in one session, you can use the session variables, aka Conditional Compilation Flags.

Their biggest advantage is that unlike the bind variables (set by DEFINE command), you can set/unset your session variables inside OR outside a PL/SQL block.

Either

ALTER SESSION SET PLSQL_CCFLAGS = 'flagName:value[,flag2Name:value2, ...]';

or inside a block:

EXECUTE IMMEDIATE 'ALTER SESSION SET PLSQL_CCFLAGS = ''flagName:value[,flag2Name:value2, ...]''';

Example Scenario

Imagine the following "install.sql" script (positioned in the same directory as "one.sql" and "two.sql"

CONN / AS SYSDBA
@@one.sql
@@two.sql
EXIT

"one.sql"

This sample script has two parts. The first anonymous block simulates an activity that whose result affect the operation (i.e. code-path) in the second script in the same file, and consequently the operation in a second file.

The first script populates a flag named "activityResult" (Note: Conditional Compilation Flags are case-sensitive) with either 0 or 1, depending on the results of the simulated operation.

The second script shows how can you check the value of the flag and create alternative code-paths.

$IF ... $THEN $ELSIF $ELSE $END is THE construct when it comes to Conditional Compilation.

The flags themselves are referred to as $$flagName.

SET SERVEROUTPUT ON
DECLARE
        v_RandomActivityResult NUMBER := CEIL(DBMS_RANDOM.VALUE(0,6));
BEGIN
        dbms_output.put_line('script one - first block');
        dbms_output.put_line('Random Activity Result:' || to_char(v_RandomActivityResult));

        IF v_RandomActivityResult < 3 THEN
                -- Failure
                EXECUTE IMMEDIATE 'alter session set PLSQL_CCFLAGS = ''activityResult:0''';
        ELSE
                -- Success
                EXECUTE IMMEDIATE 'alter session set PLSQL_CCFLAGS = ''activityResult:1''';
        END IF;
END;
/

BEGIN
        dbms_output.put_line('script one - second block');
$IF $$activityResult = 1 $THEN
        dbms_output.put_line('The activity in the first block was successful!');
        dbms_output.put_line('Carrying out related activity...');
        EXECUTE IMMEDIATE 'alter session set PLSQL_CCFLAGS = ''relatedActivity:1''';
$ELSE
        dbms_output.put_line('The activity in the first block failed!');
        EXECUTE IMMEDIATE 'alter session set PLSQL_CCFLAGS = ''relatedActivity:0''';
$END
END;
/

"two.sql"

The "one.sql" showed that it is possible to access a session flag inside an anonymous block which hasn't populated/initiated the flag. The following script simply provides an example to clarify what does "session" mean!

SET SERVEROUTPUT ON
BEGIN
        dbms_output.put_line('script two');
$IF $$relatedActivity = 1 $THEN
        dbms_output.put_line('Related activity was carried out.');
$ELSE
        dbms_output.put_line('Related activity was not carried out.');
$END
END;
/

A sample output:

"two.sql" is merely another file which (as can be seen above) is run as part of "install.sql" so it's still in the same session, and can access the "relatedActivity" flag that was initiated in the second block inside "one.sql."

$ sqlplus /nolog @install

SQL*Plus: Release 11.2.0.4.0 Production on Thu May 14 14:20:00 2015

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected.

Session altered.

script one - first block
Random Activity Result:1

PL/SQL procedure successfully completed.

script one - second block
The activity in the first block failed!

PL/SQL procedure successfully completed.

script two
Related activity was not carried out.

PL/SQL procedure successfully completed.

Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options
$

Conclusion

This post was an extremely simple implementation of what's being offered by Oracle through the concept of Conditional Compilation, and the directives are not limited to what were used in this post.

You can find Oracle's reference on the subject here.

Saturday, 2 May 2015

DBMS_XPLAN can lie!

disclaimer: here

"Don't trust what you see!"

A cliche in spy movies, this sentence can also have a less impressive interpretation: "Don't be over-excited by any finding!"

I came to believe in this interpretation when I was studying for Oracle SQL Tuning Expert exam.

Going through experiments on using different DOP for INSERT...SELECT statement (a more comprehensive post coming up soon), I came across something which looked like a behavior change between 11.2.0.4 and 12.1.0.2 and -more importantly- seemed to contradict the documentation.

All I heard in my brain was this: "Shit! This is going to put me on the map! OakTable! Here I come!"

I got in touch with Martin Widlake, who was kind enough to go through the results of my experiment, and advised me to publish the results -which I'm extremely grateful for it.

DOP in INSERT..SELECT

Just for clarification, I'd like to rewrite the rule for calculation of DOP in INSERT..SELECT as below:

After the decision to parallelize the SELECT or INSERT operation is made, one parallel directive is picked for deciding the DOP of the whole statement, using the following precedence rule:

1. Insert hint directive

2. then Session

3. then Parallel declaration specification of the inserting table

4. and then Maximum query directive.

Oracle states: "In this context, maximum query directive means that among multiple tables and indexes, the table or index that has the maximum DOP determines the parallelism for the query operation."

(You can find 11gR2's original documentation here and 12.1.0.2's here.)

But the results of the experiment below in 12.1.0.2 appears to suggest that rule 4 overrides the rule 3, in this version.

Tables:

SQL> CREATE TABLE SOURCE_DATA NOLOGGING PARALLEL 4 AS
SELECT ROWNUM ID , MOD(ROWNUM,5) PART_ID, ROUND(DBMS_RANDOM.VALUE(1,ROWNUM)*5,5) VALUE
FROM DUAL
CONNECT BY LEVEL < 1000001;

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'SOURCE_DATA');

SQL> CREATE TABLE DESTINATION_DATA AS
SELECT * FROM SOURCE_DATA WHERE 1=2;

Session:

SQL> ALTER SESSION ENABLE PARALLEL DML;

The experiment

SQL> ALTER TABLE DESTINATION_DATA PARALLEL 3;

As you can see below, there are no hints on the statement, and Parallel DML is merely enabled on he session. We are doing a simple direct-path insert into a table with parallel declaration of 3, and selecting from a table which has a parallel declaration of 4. So based on Oracle documentation we expect the DOP to be 3 (i.e. "Parallel declaration specification of the inserting table") despite the higher DOP on the source table.

Using DBMS_XPLAN.DISPLAY_CURSOR, we get:

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
SQL_ID cwrqam2gbx7ds, child number 0
-------------------------------------
INSERT /*+ append TEST3 */ INTO DESTINATION_DATA SELECT * FROM
SOURCE_DATA

Plan hash value: 2115782672

-----------------------------------------------------------------------------------------------
| Id | Operation                          | Name        | Rows |    TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                   |             |       |        |      |            |
|   1 | PX COORDINATOR                    |             |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10000    | 1000K| Q1,00 | P->S | QC (RAND) |
|   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)|             |       | Q1,00 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING |             | 1000K| Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR             |             | 1000K| Q1,00 | PCWC |            |
|* 6 |       TABLE ACCESS FULL            | SOURCE_DATA | 1000K| Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
   - Degree of Parallelism is 4 because of table property

28 rows selected

WHAT?! DOP is 4?!

Not so fast though ... #sigh

During the comparison of the behavior between 11g and 12c, I kept checking the V$SQL.PX_SERVERS_EXECUTIONS, mainly because the DBMS_XPLAN.DISPLAY_CURSOR was not giving any hints on the employed DOP.

I hadn't done it for 12.1.0.2 because the output seemed to have enough information in that regard. But Martin Widlake suggested I do the same, to be consistent.

SQL> SELECT sql_id, child_number, PX_SERVERS_EXECUTIONS
FROM v$sql
WHERE sql_text LIKE '%TEST3%';

SQL_ID CHILD_NUMBER PX_SERVERS_EXECUTIONS
------------- ------------ ---------------------
cwrqam2gbx7ds 0 3

There you go! Oracle's documentation was not wrong after all. The DOP is in fact equal to the parallel declaration on the inserting table (i.e. 3).

One last blow came from OEM. You can clearly see the parallel servers involved in the INSERT...SELECT.

Just a note that seeing only one set of parallel servers proves Oracle's statement that when the decision is made to do both INSERT & SELECT in parallel, ONE parallel directive is used to carry out the operation; i.e. both operation use the same DOP.

Conclusion

Don't be deceived by the glamor of the extra information which is presented in 12.1.0.2. DBMS_XPLAN can lie.

Here I have to mention that I had also sought Jonathan Lewis' opinion about the post.

I'd like to quote this from Jonathan's response, when I got back to him to say that I was wrong:

"There's always one more thing to check."

Disclaimer!

Dear Reader!

I hereby declare that the material published in this blog have no other purpose, but to remind fellow colleagues about some aspects of Oracle Database's behavior that 1st) I have come across, and 2nd) I've felt that they're not some of the commonly known aspects of Oracle's behavior.

Also I hope that the posts could help finding problems and [possible] solutions easier on the web, by simply spreading the word, and increasing the exposure of posts' subjects to search engines such as Google.

Therefore I would be grateful if the reader could remind me of a neglected [perhaps earlier] reference to the subject, by leaving a comment.

Most importantly, the posts should by no means be regarded as a manifestation of knowledge, nor experience. They are merely results of experiments that I have been forced to carry out, mainly by chance, and influenced by the projects that I'm involved in, exams I've taken, or curiosity.

It's up to you, the reader, to re-establish what's being presented in this blog as facts, and I humbly ask you to leave a comment, if you find an incorrect or incomplete piece of information on a post.

Regards,
Babak Tourani