Oracle Programming Concepts

Wednesday, September 27, 2006

 

Materialized Views

Materialized views, previously known as snapshots, are a grossly underutilized feature of the Oracle database. These "summary tables" can significantly reduce data access times by several orders of magnitude. With the use of materialized views, Oracle throws open the possibility of querying terabytes of data in a matter of seconds. they accomplish this by transparently using pre-computed summarizations and joins of data.

I guess I have already answered the "serious questions" associated with the use of materialized views, namely,
The answers to the above questions would be

Use of materialzed views results in huge gains in performance. By calculating the answers to the really hard questions up front, we will greatly reduce the load on our server. The positive effects are the following: ~

However, these are the limitations: ~
We shall look at the use of this splendid tool by going through some examples. Readers are encouraged o try out these examples, specifically change these to suit your requirement, find out if there is a performance gain, and use it if they are satisfied. Like on all other occasions, I would remind you not to accept any stated fact without proof.

To run the examples, your user account needs to have these specific privileges:
The first three privileges may be granted to one of your roles, while the fourth should be granted directly to you. You will also need around 200 MB of free space (that is, your quota shall be at least 200 MB). Your version of Oracle database is assumed to be using the Cost Based Optimizer (CBO), so that you can make use of the query-rewrite capability.

First Example: What a Snapshot is Capable of

We shall try to see if we can reduce the execution time of a long running query by using materialized views. Since we need a pretty large table to demonstrate the dramatic impact of mateirlized views, we shall be creating a large table. The following points may be noted: ~


We'll start with a query to our own data dictionary, and get a list of all the objects and their owners.
First, we shall setup timed statistics:

SQL> set timing on

Now we shall create our table. From the time, we can see that it is a considerable table.

SQL> CREATE TABLE mv_demotable NOLOGGING
2 AS SELECT * FROM all_objects
3 UNION ALL SELECT * FROM all_objects
4 UNION ALL SELECT * FROM all_objects
5 UNION ALL SELECT * FROM all_objects
6 UNION ALL SELECT * FROM all_objects
7 UNION ALL SELECT * FROM all_objects
8 UNION ALL SELECT * FROM all_objects
9 UNION ALL SELECT * FROM all_objects
10 UNION ALL SELECT * FROM all_objects
11 UNION ALL SELECT * FROM all_objects
12 /

Table created.

Elapsed: 00:00:04.08

And we get the count of records as

SQL> SELECT COUNT(*) FROM mv_demotable;

COUNT(*)
----------
64480

Elapsed: 00:00:00.03

With the processing speed of today's computers, we need significantly more records than this to observe the impact of matrialized views on our query. So we append the same records to our demo table twice, and commit each time:

SQL> INSERT /*+ APPEND */ INTO mv_demotable
2 SELECT * FROM mv_demotable;

64480 rows created.

Elapsed: 00:00:00.03
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00

SQL> INSERT /*+ APPEND */ INTO mv_demotable
2 SELECT * FROM mv_demotable;

128960 rows created.

Elapsed: 00:00:00.07
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
SQL> INSERT /*+ APPEND */ INTO mv_demotable
2 SELECT * FROM mv_demotable;

257920 rows created.

Elapsed: 00:00:03.06
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
SQL> SELECT COUNT(*) FROM mv_demotable;

COUNT(*)
----------
515840

Elapsed: 00:00:01.07


This is a significantly big table and we have enough records. Note that even getting a simple count is taking 60 milliseconds; we're in business! (A simple question to readers: why is COMMIT taking 0.00 seconds?)

We will execute a query against this demo table that shows the number of objects owned by each user. Doing such a query, as readers will observe, results in a full table scan. It uses up all possible resources on the server. And now we issue our first query to the demo table. And here we have it: ~

SQL> SELECT owner, COUNT(*) FROM mv_demotable
GROUP BY owner;

OWNER COUNT(*)
------------------------------ ----------
CMLCONV 15520
EXPERT 55840
KOTTAYAM 3280
OUTLN 560
PUBLIC 111680
SCIM 6320
SCOTT 560
SSANET 5040
SYS 273680
SYSTEM 28000
TEST 3280
TKYTE 3200
TOAD 5920
TRA 2960

14 rows selected.

Elapsed: 00:00:02.00

Execution Plan
-----------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (GROUP BY)
2 1 TABLE ACCESS (FULL) OF 'MV_DEMOTABLE'

Statistics
-----------------------------------------------------
0 recursive calls
27 db block gets
6367 consistent gets
4862 physical reads
60 redo size
1102 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
14 rows processed

We have not computed the statistics on our table. If we do that, we will get the cost-based statistics on our query. We issue the following: ~

SQL> ANALYZE TABLE mv_demotable COMPUTE STATISTICS;

Table analyzed.

Elapsed: 00:00:14.08

And we set the autotrace level to traceonly. since we're not interested in the data: ~

SQL> set autotrace traceonly

And now we look at the statistics for the full table scan: ~

SQL> SELECT owner, COUNT(*) FROM mv_demotable
GROUP BY owner;


14 rows selected.

Elapsed: 00:00:01.06

Execution Plan
-----------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=1581 Card=14 Bytes=70)


1 0 SORT (GROUP BY)
(Cost=1581 Card=14 Bytes=70)

2 1 TABLE ACCESS (FULL) OF 'MV_DEMOTABLE'
(Cost=965 Card=515840 Bytes=2579200)


Statistics
-----------------------------------------------------
0 recursive calls
27 db block gets
6366 consistent gets
4200 physical reads
0 redo size
1102 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
14 rows processed

Just look at the number of physical disk accesses to get a simple agrregated data! 4200 Physical reads just to send one kB of data to the client - the evidence seems to point heavily in the direction of some SQL tuning.

And now we create our materialized view. First, we need to grant some privileges, etc to our user, and enable the proper usage of materialized views.

SQL> GRANT QUERY REWRITE TO cmlconv;

Grant succeeded.

SQL> ALTER SESSION SET QUERY_REWRITE_ENABLED=TRUE;

Session altered.

SQL> ALTER SESSION SET QUERY_REWRITE_INTEGRITY=ENFORCED;

Session altered.

And now we create our materialized view.

SQL> CREATE MATERIALIZED VIEW mv_demo
2 BUILD IMMEDIATE
3 REFRESH ON COMMIT
4 ENABLE QUERY REWRITE
5 AS
6 SELECT owner, count(*) FROM mv_demotable
7 GROUP BY owner

8 /

Materialized view created.

Elapsed: 00:00:04.09

And we analyse our table: ~

SQL> ANALYZE TABLE mv_demotable COMPUTE STATISTICS;

Table analyzed.

Elapsed: 00:00:14.07

As before, the analysis part takes much longer. this is the penalty for genating the summary table; the inserts on the table themselves are not of much concern, but in order to refresh our summary table, it takes a lot of time - the database performs a full scan, so to say. This constraint forces the developer to exercise caution in using materialized views. If the matrialized view should be refreshed too frequently, then it might prove marginally effective at best and counter-productive at worst.

What we have done is the following: ~
Now let us see the view in action. We use the same query, and we query the same table: ~

SQL> SELECT owner, count(*) FROM mv_demotable
GROUP BY owner;


OWNER COUNT(*)
------------------------------ ----------
CMLCONV 15520
EXPERT 55840
KOTTAYAM 3280
OUTLN 560
PUBLIC 111680
SCIM 6320
SCOTT 560
SSANET 5040
SYS 273680
SYSTEM 28000
TEST 3280
TKYTE 3200
TOAD 5920
TRA 2960

14 rows selected.

Elapsed: 00:00:00.01

Execution Plan
-----------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=1 Card=82 Bytes=2460)


1 0 TABLE ACCESS (FULL) OF 'MV_DEMOTABLE'
(Cost=1 Card=82 Bytes=2460)


Statistics
-----------------------------------------------------
0 recursive calls
6 db block gets
4 consistent gets
0 physical reads
0 redo size
1102 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
14 rows processed

And there you have it! From 4200 physical reads to 0! Here you have a comparison of the parameters.



Before creating
materialized view
After creating
materialized view

Item
Value



Value




Elapsed
Time
1.06s



0.01s


Stat-
istics










recursive
calls
0



11




db block
gets
27



10




consistent
gets
6366



15




physical
reads
4200



9




redo size
0



0




sorts
(memory)
1



4




sorts
(disk)
0



0



Exec-
ution
Plan


Cost
Card
Bytes

Cost Card
Bytes

SELECT
STATEMENT

1581
14
70

1
82
2460

SORT
(GROUP BY)

1581
14
70

-
-
-

TABLE
ACCESS (FULL)
OF 'MV_
DEMOTABLE'

965
515840
2579200

1
82
2460











We can see that the improvement is by over a factor of 100. What's even more interesting is that such a query will be effectively cached, as subsequent runs of the query will return within 0.00s. We can also see that since have created the summary table on the basis og the GROUP BY clause, the execution plan does not have the entry for the SORT (GROUP BY) clause. Finally, we can see that the table access has been almost completely eliminated, with Oracle having to look up the tables only for a handful of data that it somehow failed to find. We can also see that the SQL engine does not have to process the 2MB odd bytes to send out the 14-rows of summary information to the SQL*Plus client; it merely sends the 2460 bytes that's already available in the summary table. Instead of a complex aggregation, all it needs to do is a simole table lookup, which it can do with less than ten physical reads.

I hope I have clearly demonstrated the use of materialized views. Especially, the performance advantage is using materialized view in a particular test situation has been demonstrated. How far your practical requirement resembles this test scenario is for you to find out; but in Oracle programming, you can always find a way to make something work better and faster than it currently does... so, best of luck.

This blog is already quite big for a single read, so we'll discuss Oracle's query rewriting mechanism and other aspects of materialized views (such as the REFRESH ON COMMIT option) in a subsequent post.

Until then, have a good time experimenting...

Zeinab

Comments: Post a Comment

Subscribe to Post Comments [Atom]





<< Home

Archives

September 2006   January 2008  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]