Jonathan Lewis

Subscribe to Jonathan Lewis feed Jonathan Lewis
Just another Oracle weblog
Updated: 2 hours 21 min ago

Interval Partition Problem

Wed, 2018-02-21 02:40

Assume you’ve got a huge temporary tablespace, there’s plenty of space in your favourite tablespace, you’ve got a very boring, simple table you want to copy and partition, and no-one and nothing is using the system. Would you really expect a (fairly) ordinary “create table t2 as select * from t1” to end with an Oracle error “ORA-1652: unable to extend temp segment by 128 in tablespace TEMP” . That’s the temporary tablespace that’s out of space, not the target tablespace for the copy.

Here’s a sample data set (tested on 11.2.0.4 and 12.1.0.2) to demonstrate the surprise – you’ll need about 900MB of space by the time the entire model has run to completion:

rem
rem     Script:         pt_interval_threat_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2018
rem

column tomorrow new_value m_tomorrow
select to_char(sysdate,'dd-mon-yyyy') tomorrow from dual;

create table t1
as
with g as (
        select rownum id
        from dual
        connect by level <= 2e3
)
select
        rownum id,
        trunc(sysdate) + g2.id  created,
        rpad('x',50)            padding
from
        g g1,
        g g2
where
        rownum  comment to avoid WordPress format mess
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

I’ve created a table of 4 million rows, covering 2,000 dates out into the future starting from sysdate+1 (tomorrow). As you can see there’s nothing in the slightest bit interesting, unusual, or exciting about the data types and content of the table.

I said my “create table as select” was fairly ordinary – but it’s actually a little bit out of the way because it’s going to create a partitioned copy of this table.


execute snap_my_stats.start_snap

create table t2
partition by range(created)
interval(numtodsinterval(7, 'day'))
(
        partition p_start       values less than (to_date('&m_tomorrow','dd-mon-yyyy'))
)
storage(initial 1M)
nologging
as
select
        *
from
        t1
;

set serveroutput on
execute snap_my_stats.end_snap

I’ve created the table as a range-partitioned table with an interval() declared. Conveniently I need only mention the partitioning column by name in the declaration, rather than listing all the columns with their types, and I’ve only specified a single starting partition. Since the interval is 7 days and the data spans 2,000 days I’m going to end up with nearly 290 partitions added.

There’s no guarantee that you will see the ORA-01652 error when you run this test – the data size is rather small and your machine may have sufficient other resources to hide the problem even when you’re looking for it – but the person who reported the problem on the OTN/ODC database forum was copying a table of 2.5 Billion rows using about 200 GB of storage, so size is probably important, hence the 4 million rows as a starting point on my small system.

Of course, hitting an ORA-01652 on TEMP when doing a simple “create as select” is such an unlikely sounding error that you don’t necessarily have to see it actually happen; all you need to see (at least as a starting point in a small model) is TEMP being used unexpectedly so, for my first test (on 11.2.0.4), I’ve included some code to calculate and report changes in the session stats – that’s the calls to the package snap_my_stats. Here are some of the more interesting results:


---------------------------------
Session stats - 20-Feb 16:58:24
Interval:-  14 seconds
---------------------------------
Name                                                                     Value
----                                                                     -----
table scan rows gotten                                               4,000,004
table scan blocks gotten                                                38,741

session pga memory max                                             181,338,112

sorts (rows)                                                         2,238,833

physical reads direct temporary tablespace                              23,313
physical writes direct temporary tablespace                             23,313

The first couple of numbers show the 4,000,000 rows being scanned from 38,741 table blocks – and that’s not a surprise. But for a simple copy the 181MB of PGA memory we’ve acquired is a little surprising, though less so when we see that we’ve sorted 2.2M rows, and then ended up spilling 23,313 blocks to the temporary tablespace. But why are we sorting anything – what are those rows ?

My first thought was that there was a bug in some recursive SQL that was trying to define or identify dynamically created partitions, or maybe something in the space management code trying to find free space, so the obvious step was to enable extended tracing and look for any recursive statements that were running a large number of times or doing a lot of work. There weren’t any – and the trace file (particularly the detailed wait events) suggested the problem really was purely to do with the CTAS itself; so I ran the code again enabling events 10032 and 10033 (the sort traces) and found the following:


---- Sort Statistics ------------------------------
Initial runs                              1
Input records                             2140000
Output records                            2140000
Disk blocks 1st pass                      22292
Total disk blocks used                    22294
Total number of comparisons performed     0
Temp segments allocated                   1
Extents allocated                         175
Uses version 1 sort
Uses asynchronous IO

One single operation had resulted in Oracle sorting 2.14 million rows (but not making any comparisons!) – and the only table in the entire system with enough rows to do that was my source table! Oracle seems to be sorting a large fraction of the data for no obvious reason before inserting it.

  • Why, and why only 2.14M out of 4M ?
  • Does it do the same on 12.1.0.2 (yes), what about 12.2.0.1 (no – hurrah: unless it just needs a larger data set!).
  • Is there any clue about this on MoS (yes Bug 17655392 – though that one is erroneously, I think, flagged as “closed not a bug”)
  • Is there a workaround ? (Yes – I think so).

Playing around and trying to work out what’s happening the obvious pointers are the large memory allocation and the “incomplete” spill to disc – what would happen if I fiddled around with workarea sizing – switching it to manual, say, or setting the pga_aggregate_target to a low value. At one point I got results showing 19M rows (that’s not a typo, it really was close to 5 times the number of rows in the table) sorted with a couple of hundred thousand blocks of TEMP used – the 10033 trace showed 9 consecutive passes (that I can’t explain) as the code executed from which I’ve extract the row counts, temp blocks used, and number of comparisons made:


Input records                             3988000
Total disk blocks used                    41544
Total number of comparisons performed     0

Input records                             3554000
Total disk blocks used                    37023
Total number of comparisons performed     0

Input records                             3120000
Total disk blocks used                    32502
Total number of comparisons performed     0

Input records                             2672000
Total disk blocks used                    27836
Total number of comparisons performed     0

Input records                             2224000
Total disk blocks used                    23169
Total number of comparisons performed     0

Input records                             1762000
Total disk blocks used                    18357
Total number of comparisons performed     0

Input records                             1300000
Total disk blocks used                    13544
Total number of comparisons performed     0

Input records                             838000
Total disk blocks used                    8732
Total number of comparisons performed     0

Input records                             376000
Total disk blocks used                    3919
Total number of comparisons performed     0

There really doesn’t seem to be any good reason why Oracle should do any sorting of the data (and maybe it wasn’t given the total number of comparisons performed in this case) – except, perhaps, to allow it to do bulk inserts into each partition in turn or, possibly, to avoid creating an entire new partition at exactly the moment it finds just the first row that needs to go into a new partition. Thinking along these lines I decided to pre-create all the necessary partitions just in case this made any difference – the code is at the end of the blog note. Another idea was to create the table empty (with, and without, pre-created partitions), then do an “insert /*+ append */” of the data.

Nothing changed (much – though the number of rows sorted kept varying).

And then — it all started working perfectly with virtually no rows reported sorted and no I/O to the temporary tablespace !

Fortunately I thought of looking at v$memory_resize_ops and found that the automatic memory management had switched a lot of memory to the PGA, allowing Oracle to do whatever it needed to do completely in memory without reporting any sorting. A quick re-start of the instance fixed that “workaround”.

Still struggling with finding a reasonable workaround I decided to see if the same anomaly would appear if the table were range partitioned but didn’t have an interval clause. This meant I had to precreate all the necessary partitions, of course – which I did by starting with an interval partitioned table, letting Oracle figure out which partitions to create, then disabling the interval feature – again, see the code at the end of this note.

The results: no rows sorted on the insert, no writes to temp. Unless it’s just a question of needing even more data to reproduce the problem with simple range partitioned tables, it looks as if there’s a problem somewhere in the code for interval partitioned tables and all you have to do to work around it is precreate loads of partitions, disable intervals, load, then re-enable the intervals.

Footnote:

Here’s the “quick and dirty” code I used to generate the t2 table with precreated partitions:


create table t2
partition by range(created)
interval(numtodsinterval(7, 'day'))
(
        partition p_start values less than (to_date('&m_tomorrow','dd-mon-yyyy'))
)
storage(initial 1M)
nologging
monitoring
as
select
        *
from
        t1
where
        rownum <= 0
;


declare
        m_max_date      date;
begin
        select  max(created)
        into    expand.m_max_date
        from    t1
        ;

        
        for i in 1..expand.m_max_date - trunc(sysdate) loop
                dbms_output.put(
                        to_char(trunc(sysdate) + loop.i,'dd-mon-yyyy') || chr(9)
                );
                execute immediate
                        'lock table t2 partition for ('''  ||
                        to_char(trunc(sysdate) + loop.i,'dd-mon-yyyy') ||
                        ''') in exclusive mode'
                ;
        end loop;
        dbms_output.new_line();
end;
/

prompt  ========================
prompt  How to disable intervals
prompt  ========================

alter table t2 set interval();

The code causes partitions to be created by locking the relevant partition for each date between the minimum and maximum in the t1 table; locking the partition is enough to create it if it doesn’t already exists. The code is a little wasteful since it locks each partition 7 times as we walk through the dates – but it’s only a quick demo for a model, and for copying a very large table wastage would probably be very small compared to the work of doing the actual data copy. Obviously one could be more sophisticated and limit the code to locking and creating only the partitions needed, and only locking them once each.

 

Taking Notes – 2

Tue, 2018-02-20 05:08

[Originally written August 2015, but not previously published]

If I’m taking notes in a presentation that you’re giving there are essentially four possible reasons:

  • You’ve said something interesting that I didn’t know and I’m going to check it and think about the consequences
  • You’ve said something that I knew but you’ve said it in a way that made me think of some possible consequences that I need to check
  • You’ve said something that I think is wrong or out of date and I need to check it
  • You’ve said something that has given me a brilliant idea for solving a problem I’ve had to work around in the past and I need to work out the details

Any which way, if I’m taking notes it means I’ve probably just added a few more hours of work to my todo list.

Footnote

“Checking” can include:

  • having a chat
  • reading the manuals
  • finding a recent Oracle white-paper
  • searching MoS
  • building some models

Philosophy

Tue, 2018-02-20 05:03

Here’s a note I’ve just re-discovered – at the time I was probably planning to extend it into a longer article but I’ve decided to publish the condensed form straight away.

In a question to the Oak Table a couple of years ago (May 2015) Cary Millsap asked the following:

If you had an opportunity to tell a wide audience of system owners, users, managers, project leaders, system architects, DBAs, and developers “The most important things you should know about Oracle” what would you tell them?

I imagine that since then Cary has probably discussed the pros and cons of some of the resulting thoughts in one of his excellent presentations on how to do the right things, but this was my quick response:

If I had to address them all at once it would be time to go more philosophical than technical.

The single most important point: Oracle is a very large, complex, and flexible product. It doesn’t matter where you are approaching it from you will not have enough information on your own to make best use of it. You have to talk to your peer group to get alternative ideas, and you have to talk to the people at least one step either side of you on the technology chain (dev to dba, dba to sysadmin, Architect to dev, dba to auditor etc.) to understand options and consequences. Create 4 or 5 scenarios of how your system should behave and then get other people – and not just your peer group – to identify their advantages and threats.

Assumptions

Tue, 2018-02-20 02:57

As the years roll on I’ve found it harder and harder to supply quick answers to “simple” questions on the Oracle-L list server and OTN/ODC forum because things are constantly changing and an answer that may have been right the last time I checked could now be wrong. A simple example of the consequences of change showed up recently on the OTN/ODC forum where one reply to a question started:

Just why do you need distinct in a subquery??? That’s the first thing that appears really shocking to me. If it’s a simple in (select …) adding a distinct to the subquery would just impose a sort unique(as you can see in the explain plan), which may be quite costly.

Three question-marks is already tip-toeing its way to the Pratchett limit – but “really shocking” ? It’s bad enough that the comment goes operatic, but going operatic in order to introduce an error pushes the thing into tragedy (or possibly comedy – or maybe both). To make the self-inflicted injury worse, there were two execution plans supplied in the original post anyway of which only one showed any attempt to achieve uniqueness.

Bottom line – when you’re about to correct someone for doing something that is “obviously” wrong, be a little bit kind about it and then be kind to yourself and do a quick sanity check that your attempt at correction is itself correct. A good guideline would be to ask yourself: “How do I know what I know – and am I about to make myself look like an idiot.”

Check It

Question: Does a  “distinct” in a subquery impose a sort (or hash) unique ?

Answer: No – a uniqueness operation may appear, but it’s not guaranteed to appear.

Here’s a quick example which does not result in any attempt at imposing uniqueness (running 11.2.0.4):


drop table t2 purge;
drop table t1 purge;
create table t1 as select * from all_objects where rownum  <= 100;
create table t2 as select * from all_objects where rownum <= 100;

create index t1_i1 on t1(owner);
create index t2_i2 on t2(object_type);

set autotrace traceonly explain

select  * 
from    t1 
where   owner = 'OUTLN' 
and     object_name in (
                select distinct object_name 
                from   t2 
                where  object_type = 'TABLE'
        )
;


Execution Plan
----------------------------------------------------------
Plan hash value: 3169044451

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     3 |   558 |     4   (0)| 00:00:01 |
|*  1 |  HASH JOIN SEMI              |       |     3 |   558 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     3 |   474 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     3 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2    |    12 |   336 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T2_I2 |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_NAME"="OBJECT_NAME")
   3 - access("OWNER"='OUTLN')
   5 - access("OBJECT_TYPE"='TABLE')

Note
-----
   - dynamic sampling used for this statement (level=2)


There’s no sign of a sort unique or hash unique. The optimizer has decided that the IN subquery can be transformed into an EXISTS subquery, which can then be transformed into a semi-join.

I can think of three other execution plan strategies that might have appeared depending on the data, indexing, and statstics:

a) Transform the IN subquery to an EXISTS subquery then operate as a filter subquery (with no uniqueness imposed):


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |   158 |     5   (0)| 00:00:01 |
|*  1 |  FILTER                      |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     3 |   474 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     3 |       |     1   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |    28 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T2_I2 |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

b) Simple unnest with sort/hash unique and join


---------------------------------------------------------------------------
| Id  | Operation                      | Name     | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |     1 |   175 |     9 |
|*  1 |  HASH JOIN                     |          |     1 |   175 |     9 |
|   2 |   TABLE ACCESS BY INDEX ROWID  | T1       |     4 |   632 |     1 |
|*  3 |    INDEX RANGE SCAN            | T1_I1    |     2 |       |     1 |
|   4 |   VIEW                         | VW_NSO_1 |     4 |    68 |     7 |
|   5 |    SORT UNIQUE                 |          |     4 |   112 |     7 |
|   6 |     TABLE ACCESS BY INDEX ROWID| T2       |     4 |   112 |     1 |
|*  7 |      INDEX RANGE SCAN          | T2_I2    |     2 |       |     1 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_NAME"="OBJECT_NAME")
   3 - access("OWNER"='OUTLN')
   7 - access("OBJECT_TYPE"='TABLE')

For this data set I actually had to take the optimizer_features_enable back to ‘8.1.7’ to get this plan – but you can see that there’s a SORT UNIQUE at operation 5, but that would have been there whether or not the DISTINCT keyword had appeared in the SQL. Effectively the query has been transformed to:

select  t1.*
from    (
                select  distinct t2.object_name object_name
                from    t2
                where   t2.object_type='TABLE'
        )
        vw_nso_1,
        t1
where   t1.owner = 'OUTLN'
and     t1.object_name = vw_nso_1.object_name
/

c) Unnest, then “place group by” so that the distinct is applied after the join

--------------------------------------------------------------------------------------------
| Id  | Operation                      | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |     3 |   474 |     5  (20)| 00:00:01 |
|   1 |  VIEW                          | VM_NWVW_1 |     3 |   474 |     5  (20)| 00:00:01 |
|   2 |   HASH UNIQUE                  |           |     3 |   594 |     5  (20)| 00:00:01 |
|*  3 |    HASH JOIN                   |           |     3 |   594 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1        |     3 |   510 |     2   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN          | T1_I1     |     3 |       |     1   (0)| 00:00:01 |
|   6 |     TABLE ACCESS BY INDEX ROWID| T2        |    12 |   336 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN          | T2_I2     |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."OBJECT_NAME"="T2"."OBJECT_NAME")
   5 - access("T1"."OWNER"='OUTLN')
   7 - access("T2"."OBJECT_TYPE"='TABLE')

Note
-----
   - dynamic sampling used for this statement (level=2)

Again, the plan would be the same whether or not the original subquery had a redundant DISTINCT. (Again, with this data set, I had to be a bit devious to get this lan).

The things you think you know may have been true 10 years ago – but maybe they’re not true any longer, or maybe they’re still true on your version of the database but not every  version of the database. So I often end up looking at a question, thinking the poster’s claim can’t be right, and then working out and modelling the circumstances that might make the poster’s observations appear (and learning something new).

Remember: “I’ve never seen it before” doesn’t mean “It doesn’t happen”.

 

Join Factorization

Wed, 2018-02-14 09:38

This item is, by a roundabout route, a follow-up to yesterday’s note on a critical difference in cardinality estimates that appeared if you used the coalesce() function in its simplest form as a substitute for the nvl() function. Connor McDonald wrote a followup note about how using the nvl() function in a suitable predicate could lead to Oracle splitting a query into a UNION ALL (in version 12.2), which led me to go back to a note I’d written on the same topic about 10 years earlier where the precursor of this feature already existed but used CONCATENATION instead of OR-EXPANSION. The script I’d used for my earlier article was actually one I’d written in February 2003 and tested fairly regularly since – which brings me to this article, because I finally tested my script against 12.2.0.1 to discover a very cute bit of optimisation.

The business of splitting a query into two parts can be used even when the queries are more complex and include joins – this doesn’t always happen automatically and sometimes has to be hinted, but that can be a costs/statistics thing) for example, from 12.1.0.2 – a query and its execution plan:


select
        *
from
        t1, t2
where
        t1.v1 = nvl(:v1,t1.v1)
and     t2.n1 = t1.n1
;

---------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |         |  1001 |   228K|    11   (0)| 00:00:01 |
|   1 |  CONCATENATION                          |         |       |       |            |          |
|*  2 |   FILTER                                |         |       |       |            |          |
|*  3 |    HASH JOIN                            |         |  1000 |   228K|     8   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL                   | T2      |  1000 |   106K|     4   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL                   | T1      |  1000 |   122K|     4   (0)| 00:00:01 |
|*  6 |   FILTER                                |         |       |       |            |          |
|   7 |    NESTED LOOPS                         |         |     1 |   234 |     3   (0)| 00:00:01 |
|   8 |     NESTED LOOPS                        |         |     1 |   234 |     3   (0)| 00:00:01 |
|   9 |      TABLE ACCESS BY INDEX ROWID BATCHED| T1      |     1 |   125 |     2   (0)| 00:00:01 |
|* 10 |       INDEX RANGE SCAN                  | T1_IDX1 |     1 |       |     1   (0)| 00:00:01 |
|* 11 |      INDEX UNIQUE SCAN                  | T2_PK   |     1 |       |     0   (0)| 00:00:01 |
|  12 |     TABLE ACCESS BY INDEX ROWID         | T2      |     1 |   109 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(:V1 IS NULL)
   3 - access("T2"."N1"="T1"."N1")
   5 - filter("T1"."V1" IS NOT NULL)
   6 - filter(:V1 IS NOT NULL)
  10 - access("T1"."V1"=:V1)
  11 - access("T2"."N1"="T1"."N1")

You can see in this plan how Oracle has split the query into two queries combined through concatenation with FILTER operations at lines 2 (:v1 is null) and 6 (:v1 is not null) to allow the runtime engine to execute only the appropriate branch. You’ll also note that each branch can be optimised separately and in this case the two branches get dramatically different paths because of the enormous difference in the estimated volumes of data.

So let’s move up to 12.2.0.1 and see what happens to this query – but first I’m going to execute a naughty “alter session…”:


------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                 |  1001 |   180K|    11   (0)| 00:00:01 |
|   1 |  VIEW                                    | VW_ORE_F79C84EE |  1001 |   180K|    11   (0)| 00:00:01 |
|   2 |   UNION-ALL                              |                 |       |       |            |          |
|*  3 |    FILTER                                |                 |       |       |            |          |
|   4 |     NESTED LOOPS                         |                 |     1 |   234 |     3   (0)| 00:00:01 |
|   5 |      NESTED LOOPS                        |                 |     1 |   234 |     3   (0)| 00:00:01 |
|   6 |       TABLE ACCESS BY INDEX ROWID BATCHED| T1              |     1 |   125 |     2   (0)| 00:00:01 |
|*  7 |        INDEX RANGE SCAN                  | T1_IDX1         |     1 |       |     1   (0)| 00:00:01 |
|*  8 |       INDEX UNIQUE SCAN                  | T2_PK           |     1 |       |     0   (0)| 00:00:01 |
|   9 |      TABLE ACCESS BY INDEX ROWID         | T2              |     1 |   109 |     1   (0)| 00:00:01 |
|* 10 |    FILTER                                |                 |       |       |            |          |
|* 11 |     HASH JOIN                            |                 |  1000 |   228K|     8   (0)| 00:00:01 |
|  12 |      TABLE ACCESS FULL                   | T2              |  1000 |   106K|     4   (0)| 00:00:01 |
|* 13 |      TABLE ACCESS FULL                   | T1              |  1000 |   122K|     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(:V1 IS NOT NULL)
   7 - access("T1"."V1"=:V1)
   8 - access("T2"."N1"="T1"."N1")
  10 - filter(:V1 IS NULL)
  11 - access("T2"."N1"="T1"."N1")
  13 - filter("T1"."V1" IS NOT NULL)

There’s nothing terribly exciting about the change – except for the disappearence of the CONCATENATION operator and the appearance of the VIEW and UNION ALL operators to replace it (plus you’ll see that the two branches appear in the opposite order in the plan). But let’s try again, without doing that “alter session…”:


--------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                    |  1001 |   229K|    10   (0)| 00:00:01 |
|*  1 |  HASH JOIN                              |                    |  1001 |   229K|    10   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                     | T2                 |  1000 |   106K|     4   (0)| 00:00:01 |
|   3 |   VIEW                                  | VW_JF_SET$A2355C8B |  1001 |   123K|     6   (0)| 00:00:01 |
|   4 |    UNION-ALL                            |                    |       |       |            |          |
|*  5 |     FILTER                              |                    |       |       |            |          |
|*  6 |      TABLE ACCESS FULL                  | T1                 |  1000 |   122K|     4   (0)| 00:00:01 |
|*  7 |     FILTER                              |                    |       |       |            |          |
|   8 |      TABLE ACCESS BY INDEX ROWID BATCHED| T1                 |     1 |   125 |     2   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN                  | T1_IDX1            |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N1"="ITEM_1")
   5 - filter(:V1 IS NULL)
   6 - filter("T1"."V1" IS NOT NULL)
   7 - filter(:V1 IS NOT NULL)
   9 - access("T1"."V1"=:V1)

The plan now shows a VIEW which is a UNION ALL involving only table t1 in both its branches. The result set from the view is then used as the probe table of a hash join with t2. You’ll note that the name of the view is now VW_JF_SET$A2355C8B – that’s JF for “Join Factorization”, and the alter session I excecuted to get the first plan was to disable the feature: ‘alter session set “_optimizer_join_factorization”= false;’.

Join factorization can occur when the optimizer sees a union all view with some tables that are common to both (all) branches of the query, and finds that it can move those tables outside the query while getting the same end result at a lower cost. In this case it happens to be a nice example of how the optimizer can transform and transform again to get to the lowest cost plan.

It’s worth noting that Join Factorization has been around since 11.2.x.x, and Or Expansion has been around for even longer – but it’s not until 12.2 that nvl() transforms through Or Expansion, which allows it to transform through Join Factorization.

You’ll note, by the way that with this plan we always do a full tablescan of t2, whereas with just Or-Expansion it’s a potential threat that may never (or hardly ever) be realised.  That’s a point to check if you find that the transformation starts to appear inappropriately on an upgrade. There is a hint to disable the feature for a query, but it’s not trivial to get it right so if you do need to block the feature the smart hint (or SQL Patch) would be “opt_param(‘_optimizer_join_factorization’ ‘false’)”.

Footnote:

If you want to run the experiments yourself, here’s the script I used to generate the data. It’s more complicated than it needs to be because I use the same tables in several different tests:

rem
rem     Script:         null_plan_122.sql
rem     Author:         Jonathan Lewis
rem     Dated:          February 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1        Join Factorization
rem             12.1.0.2        Concatenation
rem
rem

drop table t2;
drop table t1;

-- @@setup  -- various set commands etc.

create table t1 (
        n1              number(5),
        n2              number(5),
        v1              varchar2(10),
        v2              varchar2(10),
        v3              varchar2(10),
        v4              varchar2(10),
        v5              varchar2(10),
        padding         varchar2(100),
        constraint t1_pk primary key(n1)
);

insert into t1
select
        rownum,
        rownum,
        rownum,
        trunc(100 * dbms_random.value),
        trunc(100 * dbms_random.value),
        trunc(100 * dbms_random.value),
        trunc(100 * dbms_random.value),
        rpad('x',100)
from all_objects
where
        rownum <= 1000 -- > comment to avoid WordPress format mess
;

create unique index t1_n2 on t1(n2);

create index t1_idx1 on t1(v1);
create index t1_idx2 on t1(v2,v1);
create index t1_idx3 on t1(v3,v2,v1);

create table t2 (
        n1              number(5),
        v1              varchar2(10),
        padding         varchar2(100),
        constraint t2_pk primary key(n1)
);

insert into t2
select
        rownum,
        rownum,
        rpad('x',100)
from all_objects
where
        rownum <= 1000     -- > comment to avoid WordPress format mess
;

create index t2_idx on t2(v1);

begin dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

variable n1 number
variable n2 number
variable v1 varchar2(10)
variable v2 varchar2(10)
variable v3 varchar2(10)

exec :n1 := null
exec :n2 := null
exec :v1 := null
exec :v2 := null
exec :v3 := null

spool null_plan_122

set autotrace traceonly explain

prompt  ============================================
prompt  One colx = nvl(:b1,colx) predicate with join
prompt  ============================================

select
        *
from
        t1, t2
where
        t1.v1 = nvl(:v1,t1.v1)
and     t2.n1 = t1.n1
;

alter session set "_optimizer_join_factorization" = false;

select
        *
from
        t1, t2
where
        t1.v1 = nvl(:v1,t1.v1)
and     t2.n1 = t1.n1
;

alter session set "_optimizer_join_factorization" = true;

set autotrace off

spool off

Coalesce v. NVL

Tue, 2018-02-13 05:23

“Modern” SQL should use the coalesce() function rather than the nvl() function – or so the story goes – but do you always want to do that to an Oracle database ? The answer is “maybe not”. Although the coalesce() function can emulate the nvl() function (in many cases) there are significant differences in behaviour, some that suggest it’s a good idea to use the substitution and others that suggest otherwise. Different decisions may be appropriate for different circumstances, and this note highlights one case against the substitution. We’ll start with a simple data set:

rem
rem     Script:         nvl_coalesce_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2018
rem

create table t1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level  comment to avoid wordpress format mess
)
select
        rownum                          id,
        case mod(rownum,4)
                when 0  then 'Y'
                        else 'N'
        end                             yes_no,
        case mod(rownum,5)
                when 0  then 'Y'
                when 1  then null
                        else 'N'
        end                             yes_null_no,
        lpad('x',100,'x')               padding
from
        generator
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1 for columns size 5 yes_no yes_null_no'
        );
end;
/

I’ve created a table with 10,000 rows which two columns with a highly skewed data distribution. Because I know that the skew is supposed to have a significant effect I’ve used a non-standard method_opt when gathering stats – in a production system I would have the packaged procedure dbms_stats.set_table_prefs() to associate this with the table.

The difference between the yes_no and the yes_null_no columns is that the latter is null for a significant fraction of the rows.

  • yes_no has: 7,500 N, 2,500 Y
  • yes_null_no has: 6,000 N, 2,000 null, 2,000 Y

Let’s now try to count the “N or null” rows using two different functions and see what estimates the optimizer produces for the counts. First counting the yes_no column – using nvl() then coalesce()


set autotrace traceonly explain

select * from t1 where nvl(yes_no,'N') = 'N';
select * from t1 where coalesce(yes_no,'N') = 'N';

set autotrace off

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  7500 |   798K|    24   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  7500 |   798K|    24   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NVL("YES_NO",'N')='N')

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100 | 10900 |    25   (8)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 | 10900 |    25   (8)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(COALESCE("YES_NO",'N')='N')

The estimate for the nvl() is accurate; the estimate for the coalesce() query is 100 rows.

Let’s repeat the test using the yes_null_no column, again starting with nvl() followed by coalesce():


set autotrace traceonly explain

select * from t1 where nvl(yes_null_no,'N') = 'N';
select * from t1 where coalesce(yes_null_no,'N') = 'N';

set autotrace off

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  8000 |   851K|    24   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  8000 |   851K|    24   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NVL("YES_NULL_NO",'N')='N')

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100 | 10900 |    25   (8)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 | 10900 |    25   (8)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(COALESCE("YES_NULL_NO",'N')='N')

Again we get the right result for the nvl() estimate (8,000 = 6,000 N + 2,000 null) and 100 for the coalesce() estimate.

By now you’ve probably realised that the coalesce() estimate is simply the “1% guess for equality” that applies to most cases of function(column). So, as we saw in the previous post, coalesce() gives us the benefits of “short-circuiting” but now we see it also threatens us with damaged cardinality estimates. The latter is probably less important than the former in many cases (especially since we might ne able to address the problem very efficiently using virtual columns), but it’s probably worth remembering.

 

 

Histogram Threat

Tue, 2018-01-30 02:07

Have you ever seen a result like this:


SQL> select sql_id, count(*) from V$sql group by sql_id having count(*) > 1000;

SQL_ID		COUNT(*)
------------- ----------
1dbzmt8gpg8x7	   30516

A client of mine who had recently upgraded to 12.2.0.1 RAC, using DRCP (database resident connection pooling) for an application using PHP was seeing exactly this type of behaviour for a small number of very simple SQL statements and wanted to find out what was going on because they were also seeing an undesirable level of contention in the library cache when the system load increased.

In this note I just want to highlight a particular detail of their problem – with an example – showing how easily histograms can introduce problems if you don’t keep an eye out for the dangers.

One of their queries really was as simple as this:

select count(*), sum(skew2) from t1 where skew = :b1;

And I’m going to use this query to model the problem. All I have to do is arrange for a data set that results in a hybrid (or height-balanced) histogram being created on the skew column, and then run the query lots of times with different input bind values. In the case of the client there were around 18,000 possible values for the column, and the number of rows per value varied from 1 to about 20,000 – but whatever the number of rows selected the optimum execution plan was always going to be an indexed access.


rem
rem     Script:         acs_child_cursors.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem

create table t1 (
        id, skew, skew2, padding
)
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 3e3
)
select
        rownum  id,
        g1.id   id1,
        g2.id   id2,
        rpad('x',100)
from
        generator       g1,
        generator       g2
where
        g2.id <= g1.id     -- > comment to avoid WordPress format issue
order by
        g2.id, g1.id
;

alter table t1 modify skew not null;
alter table t1 modify skew2 not null;

create index t1_skew on t1(skew);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1 for columns skew size 254'
--              method_opt       => 'for all columns size 1'
        );
end;
/

variable b1 number
exec :b1 := 0;

set termout off
@start_1000
@start_1000
@start_1000
set termout on

set linesize 120
set trimspool on

column sql_text format a55

select
        child_number, plan_hash_value, executions,
        is_bind_sensitive,
        is_bind_aware,
        is_shareable,
        sql_text
from
        v$sql
where   sql_id = 'b82my582cnvut'
;

The data set contains 3,000 distinct values for skew and the way I’ve generated the rows means that the value N will appear N times – so there’s one row with the value 1 and 3,000 rows with the value 3,000 and so on for a total of 4,501,500 rows. If you want to run the tes the code is likely to take a couple of minutes to complete, requiring roughly 700 MB of disk space.

The mechanism of the script start_1000 is something I published a few years ago, and essentially it executes a script called start_1.sql 1,000 times which, for this test, contains the following two lines:


exec :b1 := :b1 + 1

select count(*), sum(skew2) from t1 where skew = :b1;

The net effect of the 3 calls to start_1000.sql is that my simple SQL statement is called once in turn for each value of skew from 1 to 3,000. The SQL_ID of the statement is ‘b82my582cnvut’ which I’ve used to query v$sql when the run is complete, with the following result:


CHILD_NUMBER PLAN_HASH_VALUE EXECUTIONS I I I SQL_TEXT
------------ --------------- ---------- - - - -------------------------------------------------------
	   0	  1041516234	    498 Y N N select count(*), sum(skew2) from t1 where skew = :b1
	   1	  1041516234	     25 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   2	  1041516234	    104 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   3	  1041516234	    308 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   4	  1041516234	    429 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   5	  1041516234	    640 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   6	  1041516234	     31 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   7	  1041516234	    305 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   8	  1041516234	    660 Y Y Y select count(*), sum(skew2) from t1 where skew = :b1

9 rows selected.

I’ve got 9 child cursors, all with the same execution plan, all except the last labelled as not shareable (you may find that you don’t get exactly the same result, on repeated tests I got between 5 and 9 cursors). Given enough time all these cursors except the last (shareable) one would be aged out of the library cache. In the case of the client, who had a shared pool that was probably quite a bit larger than needed, the number of non-shareable cursors could get very large and they were hanging around for ages. Typically most of the cursors would report an execution count less than 30, with many showing just one or two executions and a handful showing execution counts in the region of several hundred (and that hanful were the ones that were still marked as shareable).

After eliminating the critical histogram (using dbms_stats.delete_column_stats()) and eliminating the redundant child cursors (using dbms_pool.purge()) the massive proliferation stopped happening and the performance threat disappeared. The only issue then was to change the table preferences for stats collection on this table to add the clause “for columns size 1 skew” so that the histogram would not be recreated on the next gather.

Further Observations.

I suspect that part of the client’s probem – something that exaggerated the count rather than causing it – could be attributed to using DRCP (database resident connection pool) which probably still has some quirky bits of behaviour. It was also true that the client’s connection pool was probably much bigger than it needed to be so if there were any funny little bits of emergent behaviour at scale the client would probably have hit them.

The problem of escalating child cursors is one that Oracle has been working on for quite a long time, and there’s a (hidden) parameter that was introduced late in 11gR2 (though I think that the 11g mechanism first appeared through a fix control) to allow Oracle to mark a parent cursor obsolete if it acquired too many child cursors.  There’s a note on MoS that the client had read on this topic: Doc ID: 2298504.1: Cursor Mutex X Wait Events: After Upgrading To 12.2″ which looked as if it was describing their symptoms so they had set this parameter (_cursor_obsolete_threshold) from 8192 (the 12.2 default) down to 1024 (the default for 12.1 and earlier versions). This had been of some help with the library cache problem.  When the sql_id at the top of this article reported 30,516 child cursors that would be 29 “obsolete” parent cursors with 1,024 childs cursor and one “live” parent cursor with 820 child cursors.

You can appreciate that if Oracle has to find a parent cursor and pin it while walking a chain of 30,516 child cursors that’s likely to take a lot more time than walking a chain of 30 parent cursors (while holding a library cache mutex, perhaps) to find the one non-obsolete parent, then acquiring the parent mutex to walk a chain of 820 child cursor.

I don’t know the exact details of the mechanisms involved with manipulating the various chains – but there are likely to be times when one process needs exclusive mutexes/latches to change a chain while other processes are holding shared mutexes/latches to search the chain. When you’ve got 30,000 child cursors in total the important questions become: “where’s the trade-off between making the child chains shorter and making the parent search longer ?” (I’d expect – or hope – that the Oracle developers had actually found a way to code the obsolence so that the new parent was first in the search, and the rest were never visited, of course.)

One of the suggestions I made to try to alleviate the problem – which I had assumed was due in part to the degree of concurrent execution of the statement – was to mark the cursor as “hot” This resulted in 36 differet sql_ids for the same statement (the client machine had 72 CPUs). This had some effect but ultimately meant that there were 36 chains of parents that would eventually end up with lots of child cursors – the only difference was the rate at which the total cursor count was growing (a lot slower), balanced against the threat that they might all manage to get to 30,000+ child cursors! Instead, as a quick and dirty workaround, I supplied the client with a script that could be run at quieter moments during the day to call dbms_shared_pool.purge() for the sql_id to flush all its cursors from the library cache.

One final oddity – which isn’t from the client site. When I changed my test above to avoid generating the histogram (using the commented out method_opt “for all columns size 1”) I found that I still got two child cursors; the first ended up marked as bind-aware but insensitive and non-shareable, the second appeared (time and time again) when my bind value got to 499 and was marked bind-sensitive, bind-aware and shareable.  I still have no idea why this happened.

Footnote:

When repeating the test I started with a “flush shared_pool” – but I’ve removed this line from the code above in case anyone ran it without considering the possible side effects. You do need to clear previous copies of the key statement from the library cache, though, if you want to avoid confusing the issue on repeated runs of the test.

 

Case Study – 1

Mon, 2018-01-29 04:45

It has been some time since I wrote an article walking through the analysis of information on an AWR report, but a nice example appeared a few weeks ago on Twitter that broke a big AWR picture into a sequence of bite-sized chunks that made a little story so here it is, replayed in sync with my ongoing thoughts. The problem started with the (highly paraphrased) question – “How could I get these headline figures when all the ‘SQL ordered by’ sections of the report show captured SQL account for 0.0% of Total?”. The report was a 30 minute snapshot from 11.2.0.3, and here’s the Load Profile:As you can see, the database was doing quite a lot of work as measured by the physical reads and writes, the number of block changes and size of redo, and the fact that the average CPU usage by the foreground sessions in this instance accounts for 9.5 CPUs. Generally speaking the 49 seconds per second of DB time is also a fairly good hint,when combined with the other numbers, that the instance is busy but, in the absence of any other information, that could be one session holding a blocking lock with 48 other sessions waiting for the lock to be released.

There are a couple of unusual (though not impossible) features to this profile. Given the amount of work the figures for user calls and executes look very small – again not technically impossible, just unlikely in “normal” database processing given the apparent workload – and bear in mind that the “SQL ordered by ..” figures seem to account for none of the workload. Similarly the figures for logical reads and block changes are a little unusual (for “normal” processing) – on average this instance seems to have modified every block it visited (without any work showing up in the captured SQL).

Next in the set of images was the Time Model:As you can see, the foreground time (DB time) is 85,944 seconds or which foreground CPU usage (DB CPU) is 16,735 seconds – with about 69,000 seconds unaccounted ! THere’s virtually no time spend on PL/SQL or SQL, and Rman doesn’t even make an appearance  (I mention rman specifically because there was at least one version of Oracle where the rman time was accidentally lost from this summary).

So what does the Top Timed Events look like:

It’s nice to see that this is consistent with the Load Profile: the DB CPU matches, and there’s a lot of physical reads (and a quick check says that 6,560,642/1,800 = 3,644 … which is pretty close to the 3,746 physical reads per second in the Load Profile).  There’s one major anomaly here, though: the huge number of (and time spent on) latch: row cache objects. and even though it’s not the single largest component of time it’s the most obvious one to pursue so the next bit of the report to check is the Dictionary Cache Statistics, with the Tablespace IO Stats and Segments by Physical Reads to follow. I don’t have an image for the dictionary cache stats, but the answer to the question “What were all the rowcache object gets for?” was: “dc_tablespaces (214,796,434)” – which (probably) told me everything I needed to know.

I could show you the specific Instance Activity statistic that I wanted to see next, but I’ll postpone that for a moment and jump to the Tablespace IO Stats – which we were planning to do and might have done anyway if we hadn’t known the rocket science behind massive number of gets on dc_tablespaces.

That’s a horrendous number of (single block) reads of the undo tablespace – and why would they be happening ? The instance engaged in some massive rollback activity (and the transactions being rolled back are on objects in the GIRO tablespace – which is why it is also suffering a lot of single block reads) and this is the point where we jump to the relevant Instance Activity statistic to confirm the claim:

There are two other “undo records applied” statistics, but we don’t need to think about them – the match between the count of records applied and the gets on the dc_tablespaces latch is nearly perfect. Almost everything that this instance is doing is rolling back – there must have been some enormous data changes (possibly only one, possibly parallel-enabled) that failed in the previous half hour and now the entire mess is being cleaned up.

One little detail to note – the “undo records applied” per second is 122,355, but the Load Profile reported 247,885 “Block changes” per second. The two figures are consistent with each other. Each application of an undo record is two block changes – the first when you modify the original data block, the second when you update the undo record itself to flag it as “applied”:  122,355 * 2  = 244,710, which is a good match for 247,855.

Final Thoughts

There is a second reason why you could see lots of gets on dc_tablespaces – but I think it was a bug in 9i relating to temporary tablespace groups. The phenomenon as a side effect of rolling back was something I discovered in the 8i timeline and I’ve not actually checked what an AWR report really would look like if I forced a massive rollback to take place as the only workload across the interval – so here’s a quick test I constructed and ran to finish the thread off:


set timing on

create table t1 as
with g as (select rownum id from dual connect by level <= 1000)
select rownum id, rownum n1, rpad ('x',150) padding from g,g
;

create index t1_i1 on t1(id);

begin
        for i in 1..1000000 loop
                update t1 set n1 = n1 where id = i;
        end loop;
end;
/

alter system flush buffer_cache;

pause take AWR snapshot from another session

rollback;

prompt take AWR snapshot from another session

On the small VM I used to run the test it took a little over 60 seconds to run the update, and the same again to complete the rollback. The “DB time” show in the Time Model section of the AWR report was 62.0 seconds, while the “sql execute elapsed time” was 3.9 seconds (which was the SQL executed while capturing the AWR data).

Conclusion

This was a problem where the database seemed to have done a lot of work that couldn’t be attributed to an SQL. While I had a little rocket science up my sleeve that may have allowed me to identify the source more quickly and with more confidence than the average DBA all I’ve done in this note it highlight a couple of oddities and big numbers that anyone could have spotted, and followed a couple of simple guesses:

a) DB time was large, but sql (and pl/sql) time was non-existent

b) Top timed events were I/O and latches – so identify the latch

c) The specific latch pointed towards tablespaces – so check the Tablespace I/O and note the UNDO tablespace

d) Look at any Instance Activity statistics about “undo”.

e) Job done – but a quick test case is the icing on the cake.

 

 

gc buffer busy

Thu, 2018-01-25 08:12

I had to write this post because I can never remember which way round Oracle named the two versions of gc  buffer busy when it split them. There are two scenarios to cover when my session wants my instance to acquire a global cache lock on a block and some other session is already trying to acquire that lock (or is holding it in an incompatible fashion):

  • The other session is in my instance
  • The other session is in a remote instance

One of these cases is reported as “gc buffer busy acquire”, the other as a “gc buffer busy release” – and I always have to check which is which. I think I usually get it right first time when I see it, but I always manage to convince myself that I might have got it wrong and end up searching the internet for Riyaj Shamsudeen’s blog posting about it.

The “release” is waiting for another instance to surrender the lock to my instance; the “acquire” is waiting for another session in my instance to finish acquiring the lock from the other  instance.

I decided to jot down this note so I didn’t have to keep searching for Riyaj’s and also because a little problem on OTN at the moment showed a couple of AWR reports with an unlikely combination of waits for acquire (180,000,000) and release (2,000) waits.

If you’re wondering why this looks odd – if I’m waiting for an acquire someone else in my instance must be waiting for a release.  Obviously many sessions could be waiting for one release, and if acquirers time out very rapidly (though they’re not reported as doing so) then the ratio could get very high – but 90,000 acquires per release doesn’t look right.

 

Histograms

Tue, 2018-01-23 01:58

I had a sudden urge to go a bit meta – so here’s a relative frequency histogram of my observations of the general use of histograms in Oracle:

histogram_histogram

 

Nested MVs

Fri, 2018-01-19 08:43

A recent client was seeing a very large redo penalty from refreshing materialized views. Unfortunately they had to be refreshed very frequently, and were being handled with a complete refresh in atomic mode – which means delete every row from every MV then re-insert every row.  The total redo was running at about 5GB per hour, which wasn’t a problem for throughput, but the space for handling backup and recovery was getting a bit extreme.

The requirement consisted of two MVs which extracted and aggregated row and column subsets in two different ways from a single table; then two MVs that aggregated one of the first MVs in two different ways; then two MVs which each joined one of the first level MVs to one of the scond level MVs.

No problem – join MVs are legal, aggregate MVs are legal, “nested” MVs are legal: all you have to do is create the right MV logs and pick the right refresh command.  Since the client was also running Standard Editions (SE2) there was no need to worry about how to ensure that query rewrite would work (feature not implemented on SE).

So here, simplified and camouflaged, is a minimum subset of just the first few stages of the construction: a base table with MV log, one first-level aggregate MV with its own MV log, and two aggregate MVs based on the first MV.

drop materialized view log on req_line;
drop materialized view log on jpl_req_group_numlines;

drop materialized view jpl_req_group_numlines;
drop materialized view jpl_req_numsel;
drop materialized view jpl_req_basis;

drop table req_line;

-- ----------
-- Base Table
-- ----------

create table req_line(
        eventid         number(10,0),
        selected        number(10,0),
        req             number(10,0),
        basis           number(10,0),
        lnid            number(10,0),
        area            varchar2(10),
        excess          number(10,0),
        available       number(10,0),
        kk_id           number(10,0),
        eventdate       number(10,0),
        rs_id           number(10,0)
)
;

-- --------------------
-- MV log on base table
-- --------------------

create materialized view log 
on
req_line
with rowid(
        req, basis, lnid, eventid, selected, area,
        excess, available, kk_id, eventdate, rs_id
)
including new values
;

-- --------------------
-- Level 1 aggregate MV
-- --------------------

create materialized view jpl_req_group_numlines(
        eventid, selected, 
        row_ct, req_ct, basis_ct, req, basis, 
        maxlnid, excess, numsel, area, available, kk_id, 
        rs_id, eventdate
)
segment creation immediate
build immediate
refresh fast on demand 
as 
select 
        eventid,
        selected,
        count(*)        row_ct,
        count(req)      req_ct,
        count(basis)    basis_ct,
        sum(req)        req,
        sum(basis)      basis,
        max(lnid)       maxlnid,
        excess,
        count(selected) numsel,
        area,
        available,
        kk_id,
        rs_id,
        eventdate
from 
        req_line
group by 
        eventid, selected, area, excess,
        available, kk_id, eventdate, rs_id
;

-- ------------------------
-- MV log on first level MV
-- ------------------------

create materialized view log 
on
jpl_req_group_numlines
with rowid 
(
        eventid, area, selected, available,
        basis, req, maxlnid, numsel
)
including new values
;


-- ----------------------------
-- First "level 2" aggregate MV
-- ----------------------------

create materialized view jpl_req_numsel(
        eventid, selected, 
        row_ct, totalreq_ct, totalbasis_ct, totalreq, totalbasis, 
        maxlnid, numsel_ct, numsel, area
)
segment creation immediate
build immediate
refresh fast on demand
as 
select 
        eventid,
        selected,
        count(*)        row_ct,
        count(req)      req_ct,
        count(basis)    basis_ct,
        sum(req)        req,
        sum(basis)      basis,
        max(maxlnid)    maxlnid,
        count(numsel)   numsel_ct,
        sum(numsel)     numsel,
        area
from 
        jpl_req_group_numlines
group by 
        eventid, selected, area
;


-- -----------------------------
-- Second "level 2" aggregate MV
-- -----------------------------

create materialized view jpl_req_basis(
        eventid, 
        row_ct, totalbasis_ct, totalreq_ct, totalbasis, totalreq, 
        area, selected, available, maxlnid ,
        numsel_ct, numsel
)
segment creation immediate
build immediate
refresh fast on demand
as 
select 
        eventid,
        count(*)        row_ct,
        count(basis)    totalbasis_ct,
        count(req)      totalreq_ct,
        sum(basis)      totalbasis,
        sum(req)        totalreq,
        area,
        selected,
        available,
        max(maxlnid)    maxlnid,
        count(numsel)   numsel,
        sum(numsel)     numsel
from
        jpl_req_group_numlines
group by 
        eventid, area, available, selected
;

Once the table, MV logs and MVs exist we can insert some data into the base table, then try refreshing the views. I have tried three different calls to the dbms_refresh package, dbms_mview.refresh_all_mviews(), dbms_mview.refresh_dependent(), and dbms_mview.refresh(), specifying the ‘F’ (fast) refresh method, atomic refresh, and nested. All three fail in the same way on 12.2.0.1. The code below shows only the refresh_dependent() call.

I’ve included a query to report the current state of the materialized views before and after the calls, and set a two second sleep before the refresh so that changes in “last refresh” time will appear. The final queries are just to check that the expected volume of data has been transferred to the materialized views.


-- ------------------------------------
-- Insert some data into the base table
-- ------------------------------------

begin
        for i in 1..100 loop
                execute immediate 'insert into req_line values( :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx, :xxx)' 
                using i,i,i,i,i,i,i,i,i,i,i;
                commit;
        end loop;
end;
/

set linesize 144
column mview_name format a40

select
        mview_name, staleness, compile_state, last_refresh_type, 
        to_char(last_refresh_date,'dd-mon hh24:mi:ss')          ref_time
from
        user_mviews
ORDER by
        last_refresh_date, mview_name
;

prompt  Waiting for 2 seconds to allow refresh time to change

execute dbms_lock.sleep(2)

declare
        m_fail_ct       number(6,0);
begin
        dbms_mview.refresh_dependent(
                number_of_failures      => m_fail_ct,
                list                    => 'req_line',
                method                  => 'F',
                nested                  => true,
                atomic_refresh          => true
        );

        dbms_output.put_line('Failures: ' || m_fail_ct);
end;
/

select
        mview_name, staleness, compile_state, last_refresh_type, 
        to_char(last_refresh_date,'dd-mon hh24:mi:ss')          ref_time
from
        user_mviews
order by
        last_refresh_date, mview_name
;

-- --------------------------------
-- Should be 100 rows in each table
-- --------------------------------

select count(*) from jpl_req_basis;
select count(*) from jpl_req_group_numlines;
select count(*) from jpl_req_numsel;

Both the earlier versions of Oracle are happy with this code and refresh all three materialized view without fail. Oracle 12.2.0.1 crashes the procedure call with a deadlock error which, when traced, shows itself to be a self-deadlock while attempting to select a data dictionary row for update:


MVIEW_NAME                               STALENESS	     COMPILE_STATE	 LAST_REF REF_TIME
---------------------------------------- ------------------- ------------------- -------- ------------------------
JPL_REQ_BASIS                            FRESH		     VALID		 COMPLETE 19-jan 14:03:01
JPL_REQ_GROUP_NUMLINES			 NEEDS_COMPILE	     NEEDS_COMPILE	 COMPLETE 19-jan 14:03:01
JPL_REQ_NUMSEL                           FRESH		     VALID		 COMPLETE 19-jan 14:03:01

3 rows selected.

Waiting for 2 seconds to allow refresh time to change

PL/SQL procedure successfully completed.

declare
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2952
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 85
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 245
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 1243
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2414
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2908
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3699
ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3723
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 75
ORA-06512: at line 4


MVIEW_NAME				 STALENESS	     COMPILE_STATE	 LAST_REF REF_TIME
---------------------------------------- ------------------- ------------------- -------- ------------------------
JPL_REQ_NUMSEL                           NEEDS_COMPILE	     NEEDS_COMPILE	 COMPLETE 19-jan 14:03:01
JPL_REQ_BASIS                            FRESH		     VALID		 FAST	  19-jan 14:03:04
JPL_REQ_GROUP_NUMLINES                   FRESH		     VALID		 FAST	  19-jan 14:03:04

The deadlock graph from the trace file, with a little extra surrounding information, looks like this:


Deadlock graph:
                                          ------------Blocker(s)-----------  ------------Waiter(s)------------
Resource Name                             process session holds waits serial  process session holds waits serial
TX-00020009-00000C78-A9B090F8-00000000         26      14     X        40306      26      14           X  40306


*** 2018-01-19T14:18:03.925859+00:00 (ORCL(3))
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=1, mask=0x0)
----- Error Stack Dump -----
----- Current SQL Statement for this session (sql_id=2vnzfjzg6px33) -----
select log, oldest, oldest_pk, oldest_oid, oldest_new, youngest+1/86400,  flag, yscn, oldest_seq, oscn, oscn_pk, oscn_oid, oscn_new, oscn_seq  from sys.mlog$ where mowner = :1 and master = :2 for update
----- PL/SQL Stack -----

So far I haven’t been able to spot whether or not I’m doing something wrong, or prohibited, and I haven’t been able to find a matching problem on MoS. Since the code works on 11gR2 and 12cR1 I’m inclined to believe it’s a bug introduced in the 12cR2 timeline – which is a nuisance for my client, but if it is a bug then perhaps a fix will appear fairly promptly.

Column Stats

Thu, 2018-01-18 08:22

I’ve made several comments in the past about the need for being selective when gathering objects statistics with particular reference to the trade-offs when creating histograms. With Oracle 12c it’s now reasonably safe (as far as I’m concerned) to set a method_opt as a table preference that identifies columns where you expect to see Frequency or (pace the buggy behaviour described in a recent post) a Top-N histograms. The biggest problem I have is that I keep forgetting the exact syntax I need – so I’ve written this note more as a reminder to myself than anything else.

Typically I might expect to use the standard 254 columns for gathering histograms, with an occasional variation to increase the bucket count; but for the purposes of this note I’m going to demonstarate with a much lower value. So here’s a table creation statement (running 12.1.0.2 – so it will gather basic stats on the create) and two variations of a call to gather stats with a specific method_opt – followed by a question:

create table t1
as
select
        object_type o1,
        object_type o2,
        object_type o3,
        object_id,
        object_name
from
        all_objects
where
        rownum <= 50000 -- > comment to bypass wordpress format problem
;

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1' order by column_id;

execute dbms_lock.sleep(2)

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for all columns size 1 for columns o1 o2 o3 size 15'
        );
end;
/

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1' order by column_id;

execute dbms_lock.sleep(2)

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for all columns size 1 for columns size 15 o1 o2 o3'
        );
end;
/

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1';


The big question is this: which columns will have histograms after each of the gather_table_stats() calls:

method_opt=>'for all columns size 1 for columns o1 o2 o3 size 15'
method_opt=>'for all columns size 1 for columns size 15 o1 o2 o3'

The problem I have is simple – to me both options look as if they will create histograms on all three named columns but the first option is the one that I type in “intuitively” if I don’t stop to think about it carefully. The first option, alas, will only gather a histogram on column o3 – the second option is the one that creates three histograms.

The manuals are a little unclear and ambiguous about how to construct a slightly complicated method_opt; there’s a fragment of text with the usual mix of square brackets, italics and ellipses to indicate optional and repeated clauses (interestingly the only clue about multiple columns is that comma separation seems to be required – despite one of the examples above working withough commas) but there’s no explanation of when a “size” clause should go before a “column” column and when it should go after.

So here are a few more method_opt clauses – can you work out in advance which columns would have histograms if you used them and how many buckets in each histogram; there are a couple that may surprise you:


for columns o1 size 12, o2 size 13, o3 size 14

for columns o1 size 15 o2 size 16 o3 size 17

for columns size 18 o1 size 19 o2 size 20 o3

for columns size 21 o1 o2 size 22 o3

for columns o1 size 12, o2 size 12, o3 size 13, object_id size 13 object_name size 14

for columns size 22 o1 o2 for columns size 23 o3 object_id for columns size 24  object_name

Bottom line – to me – is to check very carefully that the method_opt is going to do what I want it to do; and for production systems I tend to use the final form that repeats the “for columns {size clause} {column list}”.

Histogram Hassle

Mon, 2018-01-15 07:01

I came across a simple performance problem recently that ended up highlighting a problem with the 12c hybrid histogram algorithm. It was a problem that I had mentioned in passing a few years ago, but only in the context of Top-N histograms and without paying attention to the consequences. In fact I should have noticed the same threat in a recent article by Maria Colgan that mentioned the problems introduced in 12c by the option “for all columns size repeat”.

So here’s the context (note – all numbers used in this example are approximations to make the arithmetic obvious).  The client had a query with a predicate like the follwing:

    t4.columnA = :b1
and t6.columnB = :b2

The optimizer was choosing to drive the query through an indexed access path into t6, which returned ca. 1,000,000 rows before joining (two tables later) to t4 at which point all but a couple of rows remained – typical execution time was in the order of tens of minutes. A /*+ leading(t4) */ hint to start on t4 with an index that returned two rows reduced the response time to the classic “sub-second”.

The problem had arisen because the optimizer had estimated a cardinality of 2 rows for the index on t6 and the reason for this was that, on average, that was the correct number. There were 2,000,000 rows in the table with 1,000,000 distinct values. It was just very unlucky that one of the values appeared 1,000,000 times and that was the value the users always wanted to query – and there was no histogram on the column to tell the optimizer that there was a massive skew in the data distibribution.

Problem solved – all I had to do was set a table preference for this table to add a histogram to this column and gather stats. Since there were so many distinct values and so much “non-popular” data in the table the optimizer should end up with a hybrid histogram that would highlight this value. I left instructions for the required test and waited for the email telling me that my suggestion was brilliant and the results were fantastic… I got an email telling me it hadn’t worked.

Here’s a model of the situation – I’ve created a table with 2 million rows and a column where every other row contains the same value but otherwise contains the rownum. Because the client code was using a varchar2() column I’ve done the same here, converting the numbers to character strings left-padded with zeros. There are a few rows (about 20) where the column value is higher than the very popular value.


rem
rem     Script:         histogram_problem_12c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem

create table t1
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 2e4
)
select
        rownum  as id,
        case
                when mod(rownum,2) = 0
                        then '999960'
                        else lpad(rownum,6,'0')
        end     as bad_col
from
        generator       v1,
        generator       v2
where
        rownum <= 2e6
;

Having created the data I’m going to create a histogram on the bad_col – specifying 254 columns – then query user_tab_histograms for the resulting histogram (from which I’ll delete a huge chunk of boring rows in the middle):


begin

        dbms_stats.gather_table_stats(
                ownname         => 'TEST_USER',
                tabname         => 'T1',
                method_opt      => 'for columns bad_col size 254'
        );

end;
/

select
        column_name, histogram, sample_size
from
        user_tab_columns
where
        table_name = 'T1'
;

column end_av format a12

select
        endpoint_number         end_pt,
        to_char(endpoint_value,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx') end_val,
        endpoint_actual_value   end_av,
        endpoint_repeat_count   end_rpt
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = 'BAD_COL'
order by
        endpoint_number
;


COLUMN_NAME          HISTOGRAM             Sample
-------------------- --------------- ------------
BAD_COL              HYBRID                 5,513
ID                   NONE               2,000,000

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
         1  303030303031001f0fe211e0800000 000001                1
        12  3030383938311550648a5e3d200000 008981                1
        23  303135323034f8f5cbccd2b4a00000 015205                1
        33  3032333035311c91ae91eb54000000 023051                1
        44  303239373236f60586ef3a0ae00000 029727                1
...
      2685  3938343731391ba0f38234fde00000 984719                1
      2695  39393235303309023378c0a1400000 992503                1
      2704  3939373537370c2db4ae83e2000000 997577                1
      5513  393939393938f86f9b35437a800000 999999                1

254 rows selected.

So we have a hybrid histogram, we’ve sampled 5,513 rows to build the histogram, we have 254 buckets in the histogram report, and the final row in the histogram is end point 5513 (matching the sample size). The first row of the histogram shows us the (real) low value in the column and the last row of the histogram reports the (real) high value. But there’s something very odd about the histogram – we know that ‘999960’ is the one popular value, occurring 50% of the time in the data, but it doesn’t appear in the histogram at all.

Looking more closely we see that every bucket covers a range of about 11 (sometimes 9 or 10) rows from the sample, and the highest value in each bucket appears just once; but the last bucket covers 2,809 rows from the sample with the highest value in the bucket appearing just once. We expect a hybrid histogram to have buckets which (at least initially) are all roughly the same size – i.e. “sample size”/”number of buckets” – with some buckets being larger by something like the amount that appears in their repeat count, so it doesn’t seem right that we have an enormous bucket with a repeat count of just 1. Something is broken.

The problem is that the sample didn’t find the low and high values for the column – although the initial full tablescan did, of course – so Oracle has “injected” the low and high values into the histogram fiddling with the contents of the first and last buckets. At the bottom end of the histogram this hasn’t really caused any problems (in our case), but at the top end it has taken the big bucket for our very popular ‘999960’ and apparently simply replaced the value with the high value of ‘999999’ and a repeat count of 1.

As an indication of the truth of this claim, here are the last few rows of the histogram if I repeat the experiment but, before gathering the histogram, delete the rows where bad_col is greater than ‘999960’. (Oracle’s sample is random, of course, and has changed slightly for this run.)

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
      2641  3938373731371650183cf7a0a00000 987717                1
      2652  3939353032310e65c1acf984a00000 995021                1
      2661  393938393433125319cc9f5ba00000 998943                1
      5426  393939393630078c23b063cf600000 999960             2764

Similarly, if I inserted a few hundred rows with a higher value than my popular value (in this case I thought 500 rows would be a fairly safe bet as the sample was about one in 360 rows) I got a histogram which started with a bucket about the popular bucket, so the problem of that bucket being hacked to the high value was less significant:


    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
      2718  393736313130fe68d8cfd6e4000000 976111                1
      2729  393836373630ebfe9c2b7b94c00000 986761                1
      2740  39393330323515efa3c99771600000 993025                1
      5495  393939393630078c23b063cf600000 999960             2747
      5497  393939393938f86f9b35437a800000 999999                1

Bottom line, then: if you have an important popular value in a column and there aren’t very many rows with a higher value, you may find that Oracle loses sight of the popular value as it fudges the column’s high value into the final bucket.

Workaround

I did consider writing a bit of PL/SQL for the client to fake a realistic frequency histogram, but decided that that wouldn’t be particularly friendly to future DBAs who might have to cope with changes. Luckily the site doesn’t gather stats using the automatic scheduler job and only rarely updates stats anyway, so I suggested we create a histogram on the column using an estimate_percent of 100. This took about 8 minutes to run – for reasons that I will go into in a moment – after which I suggested we lock stats on the table and document the fact that when stats are collected on this table it’s got to be a two-pass job – the normal gather with its auto_sample_size to start with, then a 100% sample for this column to gather the histogram:


begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt       => 'for columns bad_col size 254',
                estimate_percent => 100,
                cascade          => false
        );
end;
/

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
       125  39363839393911e01d15b75c600000 968999                0
       126  393834373530e98510b6f19a000000 984751                0
       253  393939393630078c23b063cf600000 999960                0
       254  393939393938f86f9b35437a800000 999999                0

129 rows selected.

This took a lot longer, of course, and produced an old-style height-balanced histogram. Part of the time came from the increased volume of data that had to be processed, part of it came from a suprise (which also appeared, in a different guise, in the code that created the original hybrid histogram).

I had specifically chosen the method_opt to gather for nothing but the single column. In fact whether I forced the “legact” (height-balanced) code or the modern (hybrid) code, I got a full tablescan that did some processing of EVERY column in the table and then threw most of the results away. Here are fragements of the SQL – old version first:


select /*+  
            no_parallel(t) no_parallel_index(t) dbms_stats
            cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 
            xmlindex_sel_idx_tbl no_substrb_pad  
       */
       count(*), 
       count("ID"), sum(sys_op_opnsize("ID")),      
       count("BAD_COL"), sum(sys_op_opnsize("BAD_COL"))    
       ...
from
       "TEST_USER"."T1" t


select /*+
           full(t)    no_parallel(t) no_parallel_index(t) dbms_stats
           cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
           xmlindex_sel_idx_tbl no_substrb_pad
       */
       to_char(count("ID")),
       to_char(count("BAD_COL")),
       substrb(dump(min("BAD_COL"),16,0,64),1,240),
       substrb(dump(max("BAD_COL"),16,0,64),1,240),
       ...
       count(rowidtochar(rowid)) 
from
       "TEST_USER"."T1" t  /* ACL,TOPN,NIL,NIL,RWID,U,U254U*/

The new code only used the substrb() functions on the bad_col, but all other columns in the table were subject to the to_char(count()).
The old code applied the count() and sys_op_opnsize() to every column in the table.

This initial scan was a bit expensive – and disappointing – for the client since their table had 290 columns (which means intra-block chaining as a minimum) and had been updated so much that 45% of the rows in the table had to be “continued fetches”. I can’t think why every column had to be processed like this, but if they hadn’t been that would have saved a lot of CPU and I/O since the client’s critical column was very near the start of the table.

Finally

This problem with the popular value going missing is a known issue, for which there is a bug number, but there is further work going on in the same area which means this particular detail is being rolled into another bug fix. More news when it becomes available.

 

 

ASSM tangle

Thu, 2018-01-11 11:35

Here’s a follow-on from Tuesday’s (serious) note about a bug in 12.1.0.2 that introduces random slowdown on large-scale inserts. This threat in this note, while truthful and potentially a nuisance, is much less likely to become visible because it depends on you doing something that you probably shouldn’t be doing.

There have always been problems with ASSM and large-scale deletes – when should Oracle mark a block as having free space on deletion: if your session does it immediately then other sessions will start trying to use the free space that isn’t really there until you commit; if your session doesn’t do it immediately when can it happen, since you won’t want it done on commit – but that means the segment could “lose” a lot of free space if something doesn’t come along in a timely fashion and tidy up.

But here’s a quirky problem that takes things one step further. What happens if you try to delete a load of data and fail and your session rolls back? If we start with yesterday’s script (running on 11.2.0.4 or 12.2.0.1) we can create a table with 1M rows in it and the following space usage:


Unformatted		      : 	   0 /		      0
Freespace 1 (  0 -  25% free) : 	   0 /		      0
Freespace 2 ( 25 -  50% free) : 	   1 /		  8,192
Freespace 3 ( 50 -  75% free) : 	   0 /		      0
Freespace 4 ( 75 - 100% free) : 	  67 /		548,864
Full			      :       41,666 /	    341,327,872

You will recall that each “Full” block actually had the basic 10% free space, plus a couple of hundred extra bytes which Oracle had to “forget about” because the incoming rows were always 290 bytes long. Let’s take this table and delete the first 100,000 rows, then emulate a session error and roll back, and then check the space usage:


delete from t1 where rownum <= 100000;
rollback;

-- generate space usage report

Unformatted		      : 	   0 /		      0
Freespace 1 (  0 -  25% free) :        4,167 /	     34,136,064
Freespace 2 ( 25 -  50% free) : 	   1 /		  8,192
Freespace 3 ( 50 -  75% free) : 	   0 /		      0
Freespace 4 ( 75 - 100% free) : 	  67 /		548,864
Full			      :       37,499 /	    307,191,808

We have 4,167 blocks which were full, and we know they are effectively full for the purposes of our data, but they’re now declared as having some free space. When Oracle rolled back the delete it wasn’t running code that would attempt to discover that the block was going to go over the limit, it simply calculated the byte change from re-inserting the row, added it to the total free space (tosp) and produced a number that hadn’t reached the limit set by pctfree – so flagged the block accordingly. (Remember my comment in the earlier article that Oracle doesn’t generate undo for the state changes on the Level 1 bitmap blocks – this is, at least in part – a consequence of that strategy).

Guideline

If you’re going to do large-scale deletes in an ASSM environment, make sure they don’t fail or subsequent inserts may take a long time.

 


ASSM Argh 2

Wed, 2018-01-10 07:24

After yesterday’s post one of the obvious follow-up questions was whether the problem I demonstrated was a side effect of my use of PL/SQL arrays and loops to load data. What would happen with a pure “insert select” statement.  It’s easy enough to check:


rem
rem     Script:         assm_argh2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

drop table t2 purge;
drop table t1 purge;

create table t2
segment creation immediate
tablespace test_8k_assm
as
select * from all_objects where rownum <= 50000 -- >comment to avoid WordPress anomaly
;

create table t1
segment creation immediate
tablespace test_8k_assm
as
select * from all_objects where rownum = 0
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

insert /*+ append */ into t1
select t2.*
from
        (
         select /*+ cardinality(40) */ rownum id
         from dual connect by level <= 40 -- > comment to avoid WordPress anomaly
        ) d,
        t2
;

commit;

declare
        m_unformatted_blocks    number;
        m_unformatted_bytes     number;
        m_fs1_blocks            number;
        m_fs1_bytes             number;
        m_fs2_blocks            number;
        m_fs2_bytes             number;

        m_fs3_blocks            number;
        m_fs3_bytes             number;
        m_fs4_blocks            number;
        m_fs4_bytes             number;
        m_full_blocks           number;
        m_full_bytes            number;

begin
        dbms_space.SPACE_USAGE(
                segment_owner           => 'TEST_USER',
                segment_name            => 'T1',
                segment_type            => 'TABLE',
                unformatted_blocks      => m_unformatted_blocks,
                unformatted_bytes       => m_unformatted_bytes,
                fs1_blocks              => m_fs1_blocks ,
                fs1_bytes               => m_fs1_bytes,
                fs2_blocks              => m_fs2_blocks,
                fs2_bytes               => m_fs2_bytes,
                fs3_blocks              => m_fs3_blocks,
                fs3_bytes               => m_fs3_bytes,
                fs4_blocks              => m_fs4_blocks,
                fs4_bytes               => m_fs4_bytes,
                full_blocks             => m_full_blocks,
                full_bytes              => m_full_bytes
        );

        dbms_output.put_line('Unformatted                   : ' || to_char(m_unformatted_blocks,'999,999,990') || ' / ' || to_char(m_unformatted_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 1 (  0 -  25% free) : ' || to_char(m_fs1_blocks,'999,999,990') || ' / ' || to_char(m_fs1_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 2 ( 25 -  50% free) : ' || to_char(m_fs2_blocks,'999,999,990') || ' / ' || to_char(m_fs2_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 3 ( 50 -  75% free) : ' || to_char(m_fs3_blocks,'999,999,990') || ' / ' || to_char(m_fs3_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 4 ( 75 - 100% free) : ' || to_char(m_fs4_blocks,'999,999,990') || ' / ' || to_char(m_fs4_bytes,'999,999,999,990'));
        dbms_output.put_line('Full                          : ' || to_char(m_full_blocks,'999,999,990') || ' / ' || to_char(m_full_bytes,'999,999,999,990'));
end;
/

I’ve copied the first 50,000 rows from all_objects as a way of generating date, then cloned it 40 times into the main table to give me a total of 2,000,000 rows.

A comment on yesterday’s blog reported that the behaviour I described has been fixed in the October bundle patch for 12.1.0.2, but I haven’t patched my copy yet. So here are the results (with a little cosmetic editing) from running the insert and reporting on space usage from 11.2.0.4, 12.1.0.2, and 12.2.0.1 in order:


11.2.0.4
========
2000000 rows created.

Unformatted                   :          764 /        6,258,688
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          133 /        1,089,536
Full                          :       28,579 /      234,119,168

12.1.0.2
========
2000000 rows created.

Unformatted                   :          256 /        2,097,152
Freespace 1 (  0 -  25% free) :       32,810 /      268,779,520
Freespace 2 ( 25 -  50% free) :            0 /                0
Freespace 3 ( 50 -  75% free) :            1 /            8,192
Freespace 4 ( 75 - 100% free) :           47 /          385,024
Full                          :          443 /        3,629,056

12.2.0.1
========
2000000 rows created.

Unformatted		      : 	 764 /	      6,258,688
Freespace 1 (  0 -  25% free) : 	   0 /		      0
Freespace 2 ( 25 -  50% free) : 	   1 /		  8,192
Freespace 3 ( 50 -  75% free) : 	   0 /		      0
Freespace 4 ( 75 - 100% free) : 	 226 /	      1,851,392
Full			      :       39,706 /	    325,271,552

The total number of blocks involved changes from version to version, of course, thanks to the huge differences in the contents of all_objects, but the headline message is clear – 12.1.0.2 is broken for this basic requirement. On the plus side, though, this is what you get from 12.1.0.2 if you change that insert to include the /*+ append */ hint:


2000000 rows created.

Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            0 /                0
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :            0 /                0
Full                          :       33,380 /      273,448,960

Unsurprisingly, 11.2.0.4 and 12.2.0.1 also behave and report 100% Full. This is a slightly special case, of course since there was no previous data in the table, but even when I started the big insert after inserting and committing a few rows all three versions behaved.


ASSM argh!

Tue, 2018-01-09 11:53

Here’s a problem with ASSM that used to exist in older versions of Oracle had disappeared by 11.2.0.4 and then re-appeared in 12.1.0.2 – disappearing again by 12.2.0.1. It showed up on MoS a few days ago under the heading: “Insert is running long with more waits on db file sequential read”.

The obvious response to this heading is to question the number of indexes on the table – because big tables with lots of indexes tend to give you lots of random I/O as Oracle maintains the indexes – but this table had no indexes. The owner of the problem supplied several of bits of information in the initial post, with further material in response to follow-up questions, including the tkprof summary of the 10046/level 12 trace of the insert and two extracts from the trace file to show us some of the “db file sequential read” waits – the first extract made me wonder if there might be some issue involving 16KB blocks but the second one dispelled that illusion.

There are several buggy things that can appear with ASSM and large-scale DML operations, and sometimes the problems can appear long after the original had done the dirty deed, so I thought I’d create a simple model based on the information supplied to date – and discovered what the problem (probably) was. Here’s how it starts – I’ve created a tablespace using ASSM, and in this tablespace I’ve created a table which has 48 columns with a row length of 290 bytes (roughly matching the OP’s table), and I’ve hacked out a simple PL/SQL block that loops around inserting arrays of 100 rows at a time into the table for a total of 1M rows before committing.


rem
rem     Script:         assm_cleanout.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2        Lots of blocks left "not full"
rem             11.2.0.4
rem

rem
rem     using OMF, so no file-name needed
rem     Ran this bit as SYS, and altered test user to have unlimited quota
rem

/*
create
        tablespace test_8k_assm
        datafile size 1G
        extent management local
        autoallocate
        segment space management auto
;
*/

rem
rem     Create the table, pre-allocate some space.
rem     This means we should get consistent 8M extents and not initial little ones
rem

create table t1 (
        v001 varchar2(5), v002 varchar2(5), v003 varchar2(5), v004 varchar2(5), v005 varchar2(5),
        v006 varchar2(5), v007 varchar2(5), v008 varchar2(5), v009 varchar2(5), v010 varchar2(5),
        v011 varchar2(5), v012 varchar2(5), v013 varchar2(5), v014 varchar2(5), v015 varchar2(5),
        v016 varchar2(5), v017 varchar2(5), v018 varchar2(5), v019 varchar2(5), v020 varchar2(5),
        v021 varchar2(5), v022 varchar2(5), v023 varchar2(5), v024 varchar2(5), v025 varchar2(5),
        v026 varchar2(5), v027 varchar2(5), v028 varchar2(5), v029 varchar2(5), v030 varchar2(5),
        v031 varchar2(5), v032 varchar2(5), v033 varchar2(5), v034 varchar2(5), v035 varchar2(5),
        v036 varchar2(5), v037 varchar2(5), v038 varchar2(5), v039 varchar2(5), v040 varchar2(5),
        v041 varchar2(5), v042 varchar2(5), v043 varchar2(5), v044 varchar2(5), v045 varchar2(5),
        v046 varchar2(5), v047 varchar2(5), v048 varchar2(5)
)
segment creation immediate
tablespace test_8k_assm
storage(initial 8M)
;

alter table t1 allocate extent (size 8M);
alter table t1 allocate extent (size 8M);

rem
rem     Simple anonymous pl/sql block
rem     Large insert, handled with array inserts
rem     Can modify loop count and array size very easily
rem

declare
        type tab_array is table of t1%rowtype;
        junk_array tab_array;
begin

        select
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx'
        bulk collect into
                junk_array
        from
                all_objects
        where
                rownum  <= 100 -- > comment to avoid WordPress format issue
        ;

        for i in 1..10000 loop
                forall j in 1..junk_array.count
                        insert into t1 values junk_array(j) ;
        end loop;

end;

commit;

The number of rows per block after this insert is 24, with 1038 bytes free space left (808 due to the pctfree = 10, then the bit that was too small to take a 25th row before breaching the pctfree barrier). This means we should report 1M/24 = 41,666 full blocks and one block with some free space. So we query the table using the dbms_space package:


declare
        m_unformatted_blocks    number;
        m_unformatted_bytes     number;
        m_fs1_blocks            number;
        m_fs1_bytes             number;
        m_fs2_blocks            number;
        m_fs2_bytes             number;

        m_fs3_blocks            number;
        m_fs3_bytes             number;
        m_fs4_blocks            number;
        m_fs4_bytes             number;
        m_full_blocks           number;
        m_full_bytes            number;

begin
        dbms_space.SPACE_USAGE(
                segment_owner           => 'TEST_USER',
                segment_name            => 'T1',
                segment_type            => 'TABLE',
                unformatted_blocks      => m_unformatted_blocks,
                unformatted_bytes       => m_unformatted_bytes,
                fs1_blocks              => m_fs1_blocks ,
                fs1_bytes               => m_fs1_bytes,
                fs2_blocks              => m_fs2_blocks,
                fs2_bytes               => m_fs2_bytes,
                fs3_blocks              => m_fs3_blocks,
                fs3_bytes               => m_fs3_bytes,
                fs4_blocks              => m_fs4_blocks,
                fs4_bytes               => m_fs4_bytes,
                full_blocks             => m_full_blocks,
                full_bytes              => m_full_bytes
        );

        dbms_output.new_line;
        dbms_output.put_line('Unformatted                   : ' || to_char(m_unformatted_blocks,'999,999,990') || ' / ' || to_char(m_unformatted_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 1 (  0 -  25% free) : ' || to_char(m_fs1_blocks,'999,999,990') || ' / ' || to_char(m_fs1_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 2 ( 25 -  50% free) : ' || to_char(m_fs2_blocks,'999,999,990') || ' / ' || to_char(m_fs2_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 3 ( 50 -  75% free) : ' || to_char(m_fs3_blocks,'999,999,990') || ' / ' || to_char(m_fs3_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 4 ( 75 - 100% free) : ' || to_char(m_fs4_blocks,'999,999,990') || ' / ' || to_char(m_fs4_bytes,'999,999,999,990'));
        dbms_output.put_line('Full                          : ' || to_char(m_full_blocks,'999,999,990') || ' / ' || to_char(m_full_bytes,'999,999,999,990'));

end;
/

The results aren’t what we expect:


Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :       35,001 /      286,728,192
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :           67 /          548,864
Full                          :        6,665 /       54,599,680

We have one block marked as 25 – 50% free (that’s the one block with 16 rows in it, which means about 40% space currently free) but our 41,666 full blocks are actually reported as 6,665 full blocks and 35,001 blocks with some space available. That’s going to hurt eventually if some process wants to insert more rows and finds that it has to fail its way through 35,001 blocks before finding a block which has enough free space.

So what happens when I repeat the PL/SQL block (and commit)? Here are the results from calls to dbms_space after the next two cycles:


Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :       70,002 /      573,456,384
Freespace 2 ( 25 -  50% free) :            2 /           16,384
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          220 /        1,802,240
Full                          :       13,330 /      109,199,360

Unformatted                   :          256 /        2,097,152
Freespace 1 (  0 -  25% free) :      105,003 /      860,184,576
Freespace 2 ( 25 -  50% free) :            3 /           24,576
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          117 /          958,464
Full                          :       19,995 /      163,799,040

Every time we execute the PL/SQL block we leave a trail of 35,001 more blocks which are flagged as “not quite full”.

Looking at the session stats while running the insert loop I can tell that Oracle isn’t checking to see whether or not it should be using those blocks. (A quick way of proving this is to flush the buffer cache before each execution of the PL/SQL and note that Oracle doesn’t read back the 105,000 blocks before inserting any data). So somehow, sometime, someone might get a nasty surprise – and here’s one way that it might happen:

Since I know I my data fits 24 rows per block I’m going to modify my PL/SQL block to select one row into the array then loop round the insert 25 times – so I know I’m inserting a little bit more than one block’s worth of data. Starting from the state with 105,003 blocks marked as “Freespace 1” this is what I saw – first, the free space report after inserting 25 rows:


Unformatted                   :          240 /        1,966,080
Freespace 1 (  0 -  25% free) :        1,074 /        8,798,208
Freespace 2 ( 25 -  50% free) :            0 /                0
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          133 /        1,089,536
Full                          :      123,927 /    1,015,209,984

Then a few wait events and session statistics for the insert:


---------------------------------------------------------
SID:    39:TEST_USER - jonathan
Session Events - 09-Jan 16:57:18
Interval:-      6 seconds
---------------------------------------------------------
Event                                             Waits   Time_outs        Csec    Avg Csec    Max Csec
-----                                             -----   ---------        ----    --------    --------
db file sequential read                          15,308           0         128        .008           3
db file scattered read                           20,086           0         271        .014           4

---------------------------------
Session stats - 09-Jan 16:57:18
Interval:-  6 seconds
---------------------------------
Name                                                                     Value
----                                                                     -----
session logical reads                                                  269,537
physical read total IO requests                                         35,401
db block gets                                                          229,522
consistent gets                                                         40,015
physical reads                                                         124,687
physical reads cache                                                   124,687
db block changes                                                       208,489
physical reads cache prefetch                                           89,293
redo entries                                                           207,892
redo size                                                           16,262,724
undo change vector size                                                  1,720
deferred (CURRENT) block cleanout applications                         103,932
table scan blocks gotten                                                20,797
HSC Heap Segment Block Changes                                              25

The session has read and updated almost all of the level 1 bitmap blocks. I don’t know exactly what triggered this re-read, but seems to be related to the number of rows inserted (or, perhaps, the amount of space used rather than the row count) as an insert crosses the pctfree boundary and fails over to the next block. I’ve only done a couple of little tests to try and get a better idea of why an insert sometimes sweeps through the bitmap blocks – so I know that inserting 2 or 3 rows at a time will also trigger the cleanout – but there are probably several little details involved that need to be identified.

You might note a couple of details in the stats:

  • Because I had flushed the buffer cache before the insert Oracle did its “cache warmup” tablescanning trick – if this had not happened I would probably have done a single block read for every single bitmap block I touched.
  • There are 103,932 block cleanout applications – but 208,000 db block changes and redo entries. Roughly half the latter are for data block cleanouts (OP code 4.1) and half are the state changes on the level 1 bitmap blocks (OP code 13.22). You’ll notice that neither change produces any undo.
  • I’ve also included the HSC Heap Segment Block Changes statistics to show you that not all changes to Heap Segment Blocks show up where you might expect them.
And finally:

If you re-run the tests on 11.2.0.4 and 12.2.0.1 you get the following results after the intial script run – the problem doesn’t exist:


11.2.0.4
========
Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :           67 /          548,864
Full                          :       41,666 /      341,327,872

12.2.0.1
========
Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :           67 /          548,864
Full                          :       41,666 /      341,327,872

So how does this help the OP.
  • First, there may be a huge mess still waiting to be cleaned in the table – but at 34M blocks I’m not too keen to suggest running the dbms_space routine to find out what it looks like – but maybe that’s necessary.
  • Secondly – an upgrade to 12.2 will probably avoid the problem in future.
  • Thirdly – if the number of rows per block is very close to uniform, write a little code to do a loop that inserts (say) 2 * expected number of rows per block as single row inserts and rolls back; the inserts will probably trigger a massive though perhaps not complete cleanout, so rinse and repeat until the cleanout is complete. Try to find a time when you don’t mind the extra load to get this job done.
  • Finally – on the big job that does the bulk insert – repeat the dummy insert/rollback at the end of the job to clean up the mess made by the job.
Addenda

Prompted by comment #2 below, I should add that if the problem has been fixed in 12.2 then possibly there’s a bug report and patch for it already. If there isn’t then the OP could raise an SR (referencing this blog note), and request a bug fix or back-port from 12.2.

And with 24 hours of publication, comment #4 (from Yury Pudovchenko) tells us that the bug is fixed by the Oct 2017 Bundle Patch.

 

 


Defaults

Tue, 2018-01-02 02:43

Following on from a Twitter reference and an update to an old posting about a side effect of  constraints on the work done inserting data, I decided to have a closer look at the more general picture of default values and inserts. Here’s a script that I’ve tested against 11.2.0.4, 12.1.0.2, and 12.2.0.1 (original install, no patches applied in all cases):


rem
rem     Script:         defaults_cost.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2017
rem

create table t1 (
        column1  varchar2(10),
        column2  varchar2(10),
        column3  varchar2(10),
        column4  varchar2(10),
        column32 varchar2(32)   default 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
        column33 varchar2(33)   default 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
        virtual1      generated always as (
                column4 || column3 || column2 || column1
        ) virtual
)
segment creation immediate
;

execute dbms_output.put_line(dbms_stats.create_extended_stats(user,'t1','(column1 || column2 || column3 || column4)'))
execute dbms_output.put_line(dbms_stats.create_extended_stats(user,'t1','(column1,column2,column3)'))

create or replace function plsqlfunction_with_30char_name(
        i_in varchar2
)
return varchar
deterministic
is
begin
        return initcap(i_in);
end;
/

create index t1_i1 on t1(substr(plsqlfunction_with_30char_name(column1),1,10));

When you create a function-based index you get a hidden, virtual column supporting the index expression; when you create extended stats (of either type) you get a hidden virtual column holding the extension definition, when you create any type of virtual column, including a “real” virtual column you get a data dictionary entry holding the column name and the expression definition: all these options use the “data_default” column from user_tab_cols to display the defining information – as we can see when we the following query:


select  column_name, data_default
from    user_tab_cols
where   table_name = 'T1'
order by
         column_id
;

COLUMN_NAME                      DATA_DEFAULT
-------------------------------- --------------------------------------------------------------------------------
COLUMN1
COLUMN2
COLUMN3
COLUMN4
COLUMN32                         'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
COLUMN33                         'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
VIRTUAL1                         "COLUMN4"||"COLUMN3"||"COLUMN2"||"COLUMN1"
SYS_STUDAN97VB_XDKRTR_NPFAB80P   "COLUMN1"||"COLUMN2"||"COLUMN3"||"COLUMN4"
SYS_STUTCBJ8OFFSY1D9ZXRYZ0W3W#   SYS_OP_COMBINED_HASH("COLUMN1","COLUMN2","COLUMN3")
SYS_NC00010$                     SUBSTR("TEST_USER"."PLSQLFUNCTION_WITH_30CHAR_NAME"("COLUMN1"),1,10)

Apart from the special cases I’ve just listed, you’ll also see the “default values” I specified for column32 and column33 – you’ll notice that I’ve supplied a 30 character string as the default for column32, and a 31 character string as the default for column33 – this is a convenience that means the used space in the data_default (which is a long column) corresponds to the name of the column once you include the single quotes in the their character count.

Having set my data up I’m going to emulate a bad application that uses lots of literal string SQL and leaves Oracle to fill in the default values (and, of course, derive the various virtual values it might need).


alter session set events '10046 trace name context forever, level 4';

begin
        for i in 1..10 loop
                execute immediate '
                        insert into t1 (column1, column2, column3, column4)
                        values( ' || i || ', ' || i || ', ' || i || ', ' || i || ')'
                ;
                commit;
        end loop;
end;
/

alter session set events '10046 trace name context off';

This code generates 10 strings that populate column1 through to column4 only. But you’ll notice the call to enable SQL tracing – and here’s the interesting bit of the output from applying tkprof to the trace file:


  SQL ID: 47r1y8yn34jmj Plan Hash: 2191121161

select default$
from
 col$ where rowid=:1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse       50      0.00       0.00          0          0          0           0
Execute     50      0.00       0.00          0          0          0           0
Fetch       50      0.00       0.00          0        100          0          50
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      150      0.00       0.00          0        100          0          50

The summary is the same for all three versions of Oracle I tested*** – we’ve queried for a default value from col$ 5 times for each row we insert! (Technically that’s 5 times for each insert statement we’ve had to (hard-)parse; this anomaly wouldn’t appear if we have been using a bind-variable method and reusing the insert statement.) *** There is one difference in 12.2 – the number of parse calls reported for this statement was 1 rather than 50 but, judging by the various cursor cache hit stats, that may be due to a change in accounting rather than a change in workload.

Check the table definition: there are two “real defaults” and 4 expressions due to the various virtual columns – so why 5 calls per insert and not 6 ? The answer lies in the length of the actual value involved – if the text that appears in the (long) data_default column is 32 characters or shorter it will be stored in the the dictionary cache (rowcache), but only one of our 6 values is that short, so Oracle looks up the other five on each insert (hard parse).

This is a little strange on two counts: first, given the enormous memories that have been available for years and the extreme usefulness of virtual columns and extended stats it seems something of an oversight to limit the memory available to the cache that holds such critical definitions; secondly, we don’t need to evaluate the virtual columns (unless they are indexed) or extended stats on inserts so why is Oracle querying the definitions anyway ? [Possible answer: perhaps Oracle is using generic code that allows for column check constraints – which might exist on users’ virtual columns – and hasn’t catered for bypassing system-generated virtual columns.]

A key point to remember before you start worrying too much about the impact of the execution count for this query against col$ is that it’s only relevant to “hard” parses – so you’re only likely to notice it if you have a large number of different “literal string” inserts that should be using bind variables; and that means you’ve probably got an important coding defect to address before you worry too much about the extra impact caused by this particular call. Nevertheless there are a couple of bug reports on MoS that have been raised against this query and, after writing this note, I did a quick Google search for the critical SQL_ID and found (among others) this production example from Franck Pachot.

 


Been There

Sun, 2017-12-31 04:51

It’s the end of the year and time for a retrospective of some sort so I was thinking of listing the top 10 most popular pages on my blog, but Connor McDonald beat me to it, so I decided to see if I could remember all the countries I’d visited since starting to work with the Oracle software, and here’s the list in alphabetical order:

Antigua
Australia
Austria
Belgium
Bosnia
Brunei
Bulgaria
Canada
China
Croatia
Czech Republic
Denmark
Dubai
Egypt
Estonia
Finland
France
Germany
Greece
Hungary
Iceland
India
Indonesia
Ireland
Israel
Italy
Japan
Latvia
Lithuania
Malaysia

A few of these were holidays rather than work, and I may have forgotten a couple, so if you’ve seen me in your country and it’s not on the list let me know.

The list can be a bit of a nuisance, I had to list “all the countries you’ve visited in the last 10 years” for both the US and Russian visas: the US form only allowed for 5 countries and the Russian one for 40; and the US expected me to list EVERY visit, with dates and city!

 


nvarchar2

Sat, 2017-12-30 06:08

Here’s an odd little quirk that appeared when I was playing around with default values just recently. I think it’s one I’ve seen before, I may even have written about it many years ago but I can’t find any reference to it at present. Let’s start with a script that I’ll run on 12.2.0.1 (the effect does appear on earlier versions):


rem
rem     Script:         nvarchar2_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2017
rem

create table t1 (
        column1  varchar2(10),
        column2  nvarchar2(10)
);

create table t2 (
        column1  varchar2(10)
);

alter table t2 add column2 nvarchar2(10);

create table t3 (
        column1  varchar2(10),
        column2  nvarchar2(10) default 'xxxxxxxx'
);

create table t4 (
        column1  varchar2(10)
);

alter table t4 add column2 nvarchar2(10) default 'xxxxxxxx';

insert into t1(column1) values('a');
insert into t2(column1) values('a');
insert into t3(column1) values('a');
insert into t4(column1) values('a');

All I’ve done it create 4 tables which. when described will all look the same:


 Name                    Null?    Type
 ----------------------- -------- ----------------
 COLUMN1                          VARCHAR2(10)
 COLUMN2                          NVARCHAR2(10)

There is a significant different between the first two and the last two, of course, thanks to the specification of a default value which means that the inserts will produce two possible results: the first two tables will have nulls in column2; the last two will have the nvarchar2 equivalent of ‘xxxxxxxx’ which, in my instance, will be a string of 16 bytes: “0,78,0,78,0,78,0,78,0,78,0,78,0,78,0,78”.

Surprisingly, though, there is a dramatic oddity between t3 and t4 which shows up when I query user_tab_cols:

select
        table_name, column_id, column_name,  segment_column_id, data_default
from    user_tab_cols
where   table_name like 'T_'
order by
        table_name, column_id
;

TABLE_NAME            COLUMN_ID COLUMN_NAME          SEGMENT_COLUMN_ID DATA_DEFAULT
-------------------- ---------- -------------------- ----------------- --------------------
T1                            1 COLUMN1                              1
                              2 COLUMN2                              2

T2                            1 COLUMN1                              1
                              2 COLUMN2                              2

T3                            1 COLUMN1                              1
                              2 COLUMN2                              2 'xxxxxxxx'

T4                            1 COLUMN1                              1
                              2 COLUMN2                              3 'xxxxxxxx'
                                SYS_NC00002$                         2

Table t4 has acquired two columns – a hidden column (which physically exists as the second column in the stored data and is declared as raw(126)) and the column which I had specified. You’ll note that the test shows two differences that may be significant: comparing t3/t4 we see that adding, rather than initially defining, the nvarchar2() column introduces the extra column; comparing t2/t4 we see that adding a varchar2() rather than an nvarchar2() doesn’t produce the same effect. Tentative assumption, therefore, is that there is something special about adding nvarchar2() columns.

Casting my mind back to various customers who have gone through multiple upgrades of 3rd party applications that invariably seem to add columns to tables, I wondered whether this extra column appeared every time you added an nvarchar2(). I’d not noticed anything in that past that suggested this might be the case, but it’s obviously worth checking: and in my simple tests it looked as if Oracle created just one extra column and used it to capture a value that seemed to be determined by the number and location of columns that had been added.

It’s a curiosity, and leaves room for further investigation – so if anyone has links to related articles please feel free to add them in the comments.

 


Tech 2017 Agenda

Tue, 2017-11-28 08:29

As usual it’s hard to pick a personal agenda from the wealth of content available for the UKOUG annual conference, but this is my starting list:

Sunday

13:40 – 14:25 Roger MacNicol: “My query plan says Table Access Full: what happens next ?”

14:40 – 15:30 Gert Poel: “Smart Database Development with PL/SQL and Oracle REST Data Services”

16:10 – 17:00 Kellyn Pot’Vin: “Oracle vs. SQL Server – the War of the Indexes”

17:10 – 18:00 Luiza Koziel: “How to improve your presentations AK the Tool is Just a Tool… Learn how to use it for a Good Cause”

Monday

9:00 – 9:50  ME! I launch (one stream of) the conference with “Index Statistics and Column Groups”

11:35 – 13:25 Ivica Arsov: “Parallel Execution with Oracle 12c” … there may be some overlap with my 2nd presentation

14:25 – 15:15 ME again! At the CBO round table — we’ll may answer a few of the more specific questions that came in for the CBO panel.

15:25 – 16:25 Community Keynote – how could one possibly miss Maria Colgan, Connor McDonald and Chris Saxon … (maybe with Tim-Tams ?)

16:55 – 17:45 Michael Salt – “An in-memory paradox – increased I/O”.  I think I know the answer, but this reveal another way to do it wrong. (Pity to miss Richard Foote on AWRs)

17:55 – 18:45 Franck Pachot – “From Transportable Tablespace to Pluggable Databases”

 

Tuesday

9:00 – 9:50 ME yet again, starting the day with Maria Colgan, Nigel Bayliss, Chris Antognini and Richard Foote on the CBO Panel, with Martin Widlake and Neil Chandler doing the MC bit and making sarky comments.

10:00 – 10:50 Marcin Przepierowski: “Rman – from Beginner to Advanced”, because you’ve always got to review what you thought you knew about recovery (and backup).  It means I have to miss Bryn Llewellyn and Kamil Stawiarski arguing about how to address performance in PL/SQL

11:25 – 12:15 Lucas Jellema: “Intro to Docker Containers & the Oracle Platform – Database, Weblogic and Cloud”.

12:25 – 13:15 Mark Rittman: “How Analytics is changing the World (again)”.

14:20 – 15:05 Stew Ashton: “Meet your Match: Advanced Row Pattern Matching”

15:40 – 16:25 Martin Berger: “Escape from Exadata”.

16:40 – 17:30 Roger MacNicol: “Using Oracle Columnar Technologies across the Information Lifecycle”

Wednesday

9:00 – 9:45 Simon Pane: “Modernizing your DBA Scripts and Backups with the Oracle Scheduler”

10:00 – 10:45 Allan McAleavy: “Moving to an All Flash Array – Dude Where’s my Bottleneck”

11:25 – 12:15 Jason Arneil: “An introduction to Sharding”

12:25 – 13:15 My final session: “Parallel Execution” – if you come to this it’s probably a good idea to see Ivica Arsov on Monday morning as well

14:15 – 15:05 Jaromir D.B. Nemec: “Anomaly Detection in Database Workload”.

 

Of course I may change my mind between now and the start of the event so if you’re feeling deprived (or relieved) that I’m not going to be in your audience – or if you’re feeling pressured that I am – never mind, my agenda isn’t cast in stone and I’ll probably end up at the wrong sessions anyway because I’ve been busy talking to someone without keeping an eye on the time.

 

 


Pages