Oracle Programming Concepts

Wednesday, September 27, 2006

Materialized Views

Materialized views, previously known as snapshots, are a grossly underutilized feature of the Oracle database. These "summary tables" can significantly reduce data access times by several orders of magnitude. With the use of materialized views, Oracle throws open the possibility of querying terabytes of data in a matter of seconds. they accomplish this by transparently using pre-computed summarizations and joins of data.

I guess I have already answered the "serious questions" associated with the use of materialized views, namely,

Can I make a materialized view of a complex join/subquery?
Does a materialized view take up space?
What is the major difference (or similarity) between a materializd view and a view?

The answers to the above questions would be

Yes, we make views/snapshots entirely to this end: when we want to query more than one table. It is the primary reason why we would use a view or a snapshot.
Yes, of course. A view also takes up some space. In a similar manner, so does an index, constraint, etc. A data structure either takes up memory or takes up space or both. In the case of index, snapshot,l constraint etc, it evident that these take up space.
A view is formally defined as "an alternative way to look at data." Typically, we cobble together a custom "view" which includes several fields from one table, a few from another and so on, all of which may or may not be related with a matching condition (such as primary key constraint). A snapshot, however, is an aggregate table. A view is essentially something that refers directly to the most recent possible state of the table. A snapshot could be quite old. For this simple reason, taking data from a snapshot for an OLTP application is not recommended (more on this later). A materialized view is similar to a regular view in the way it is defined (we use a similar command to create the snapshot), but we store the results of the defining query. In other words, the results of the defining query are materialized as a persistent database object (in simple terms, stored in some physical file, just like a table). Caveat: Use this feature mainly for data mining applications, not for OLTP-oriented applications. If your database tables are frequently updated, then it's better not to use it.

Use of materialzed views results in huge gains in performance. By calculating the answers to the really hard questions up front, we will greatly reduce the load on our server. The positive effects are the following: ~

Less physical reads - There is less data to scan through.
Less writes - We will not be sorting/aggregating as frequently.
Decreased CPU consumption - We will not be calculating the aggregates and functions on the data, as we wil have already done that.
Markedly faster response times - Our queries will return incredibly quickly when a summary table is used, as opposed to the details. In some cases this may result in a reduction in work of many orders of magnitude.

However, these are the limitations: ~

There is a small amount of disk space - the summary tables need disk space.
Materialized views work well in a read-intensive or read-only environment.
Concurrency are associated with REFRESH ON COMMIT option: there are six rows in the summary tables, and we can have at most six people committing at the same time (assuming they all affecta a different owner). So there has to be a workaround. We can use full refreshes on a recurring basis (during off-peak time). This will add no overhead, and there will be no concurrency issues.
The REFRESH ON COMMIT option introduces contention.
Another solution is to maintain the summary data separate, and report on yesterday's activities, and not query the live OLTP data for reports.
An alternative to the REFRESH ON COMMIT option is to use the REFRESH ON DEMAND option, with a full refresh. This can be done on a timed basis. The refresh may be scheduled for a time when the load is light. In this case, you should use QUERY_REWRITE_INTEGRITY = STALE_TOLERATED. This will force Oracle into using the materialized view even if it knows that the data it is derived from has changed. Note, however, that this will result in data that is always incorrect, but by an acceptable degree. And then you can use:

SQL> exec DBMS_MVIEW.REFRESH('mv_demo');

to refresh the view. Please note that the materialized view in this case was using REFRESH_ON_COMMIT, and it still could be refreshed manually by issuing this command. Normally we would be using this statement to refresh a mateiralized view with the REFRESH ON DEMAND option.

We shall look at the use of this splendid tool by going through some examples. Readers are encouraged o try out these examples, specifically change these to suit your requirement, find out if there is a performance gain, and use it if they are satisfied. Like on all other occasions, I would remind you not to accept any stated fact without proof.

To run the examples, your user account needs to have these specific privileges:

GRANT CREATE SESSION
GRANT CREATE TABLE
GRANT CREATE MATERIALIZED VIEW
GRANT QUERY REWRITE

The first three privileges may be granted to one of your roles, while the fourth should be granted directly to you. You will also need around 200 MB of free space (that is, your quota shall be at least 200 MB). Your version of Oracle database is assumed to be using the Cost Based Optimizer (CBO), so that you can make use of the query-rewrite capability.

First Example: What a Snapshot is Capable of

We shall try to see if we can reduce the execution time of a long running query by using materialized views. Since we need a pretty large table to demonstrate the dramatic impact of mateirlized views, we shall be creating a large table. The following points may be noted: ~

You will issue a query against a large table. Understandably, this will take an appreciable amount of time (which you will verify by setting timed statistics on)
By creating a materialized view, we shall rewrite the query against the large table into a query against a small table, without any loss in accuracy.
We shall see if there has been any savings in execution time.

We'll start with a query to our own data dictionary, and get a list of all the objects and their owners.
First, we shall setup timed statistics:

SQL> set timing on

Now we shall create our table. From the time, we can see that it is a considerable table.

SQL> CREATE TABLE mv_demotable NOLOGGING
2 AS SELECT * FROM all_objects
3 UNION ALL SELECT * FROM all_objects
4 UNION ALL SELECT * FROM all_objects
5 UNION ALL SELECT * FROM all_objects
6 UNION ALL SELECT * FROM all_objects
7 UNION ALL SELECT * FROM all_objects
8 UNION ALL SELECT * FROM all_objects
9 UNION ALL SELECT * FROM all_objects
10 UNION ALL SELECT * FROM all_objects
11 UNION ALL SELECT * FROM all_objects
12 /

Table created.

Elapsed: 00:00:04.08

And we get the count of records as

SQL> SELECT COUNT(*) FROM mv_demotable;

COUNT(*)
----------
64480

Elapsed: 00:00:00.03

With the processing speed of today's computers, we need significantly more records than this to observe the impact of matrialized views on our query. So we append the same records to our demo table twice, and commit each time:

SQL> INSERT /*+ APPEND */ INTO mv_demotable
2 SELECT * FROM mv_demotable;

64480 rows created.

Elapsed: 00:00:00.03
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00

SQL> INSERT /*+ APPEND */ INTO mv_demotable
2 SELECT * FROM mv_demotable;

128960 rows created.

Elapsed: 00:00:00.07
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
SQL> INSERT /*+ APPEND */ INTO mv_demotable
2 SELECT * FROM mv_demotable;

257920 rows created.

Elapsed: 00:00:03.06
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
SQL> SELECT COUNT(*) FROM mv_demotable;

COUNT(*)
----------
515840

Elapsed: 00:00:01.07

This is a significantly big table and we have enough records. Note that even getting a simple count is taking 60 milliseconds; we're in business! (A simple question to readers: why is COMMIT taking 0.00 seconds?)

We will execute a query against this demo table that shows the number of objects owned by each user. Doing such a query, as readers will observe, results in a full table scan. It uses up all possible resources on the server. And now we issue our first query to the demo table. And here we have it: ~

SQL> SELECT owner, COUNT(*) FROM mv_demotable
GROUP BY owner;

OWNER                            COUNT(*)      
------------------------------ ----------      
CMLCONV                             15520      
EXPERT                              55840      
KOTTAYAM                             3280      
OUTLN                                 560      
PUBLIC                             111680      
SCIM                                 6320      
SCOTT                                 560      
SSANET                               5040      
SYS                                273680      
SYSTEM                              28000      
TEST                                 3280      
TKYTE                                3200      
TOAD                                 5920      
TRA                                  2960      
                                
14 rows selected.                              
                                
Elapsed: 00:00:02.00                           
                                
Execution Plan                                 
-----------------------------------------------------
0      SELECT STATEMENT Optimizer=CHOOSE    
1    0   SORT (GROUP BY)                    
2    1     TABLE ACCESS (FULL) OF 'MV_DEMOTABLE'
                                
Statistics                                     
-----------------------------------------------------
0     recursive calls                   
27    db block gets                     
6367  consistent gets                   
4862  physical reads                    
60    redo size                         
1102  bytes sent via SQL*Net to client  
503   bytes received via SQL*Net from client
2     SQL*Net roundtrips to/from client 
1     sorts (memory)                    
0     sorts (disk)                      
14    rows processed
             

We have not computed the statistics on our table. If we do that, we will get the cost-based statistics on our query. We issue the following: ~

SQL> ANALYZE TABLE mv_demotable COMPUTE STATISTICS;

Table analyzed.

Elapsed: 00:00:14.08

And we set the autotrace level to traceonly. since we're not interested in the data: ~

SQL> set autotrace traceonly

And now we look at the statistics for the full table scan: ~

SQL> SELECT owner, COUNT(*) FROM mv_demotable
GROUP BY owner;

14 rows selected.

Elapsed: 00:00:01.06

Execution Plan
-----------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
           (Cost=1581 Card=14 Bytes=70)

   1    0 SORT (GROUP BY)
           (Cost=1581 Card=14 Bytes=70)
   2    1 TABLE ACCESS (FULL) OF 'MV_DEMOTABLE'
           (Cost=965 Card=515840 Bytes=2579200)

Statistics
-----------------------------------------------------
          0  recursive calls
         27  db block gets
       6366  consistent gets
       4200  physical reads
          0  redo size
       1102  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         14  rows processed

Just look at the number of physical disk accesses to get a simple agrregated data! 4200 Physical reads just to send one kB of data to the client - the evidence seems to point heavily in the direction of some SQL tuning.

And now we create our materialized view. First, we need to grant some privileges, etc to our user, and enable the proper usage of materialized views.

SQL> GRANT QUERY REWRITE TO cmlconv;

Grant succeeded.

SQL> ALTER SESSION SET QUERY_REWRITE_ENABLED=TRUE;

Session altered.

SQL> ALTER SESSION SET QUERY_REWRITE_INTEGRITY=ENFORCED;

Session altered.

And now we create our materialized view.

SQL> CREATE MATERIALIZED VIEW mv_demo
2  BUILD IMMEDIATE
3  REFRESH ON COMMIT
4  ENABLE QUERY REWRITE
5  AS
6  SELECT owner, count(*) FROM mv_demotable
7  GROUP BY owner
8  /

Materialized view created.

Elapsed: 00:00:04.09

And we analyse our table: ~

SQL> ANALYZE TABLE mv_demotable COMPUTE STATISTICS;

Table analyzed.

Elapsed: 00:00:14.07

As before, the analysis part takes much longer. this is the penalty for genating the summary table; the inserts on the table themselves are not of much concern, but in order to refresh our summary table, it takes a lot of time - the database performs a full scan, so to say. This constraint forces the developer to exercise caution in using materialized views. If the matrialized view should be refreshed too frequently, then it might prove marginally effective at best and counter-productive at worst.

What we have done is the following: ~

We calculated the object count
We defined this summary report as a materialized view
We asked this view to be immediately built and populated
Then we analyzed the TABLE. A materialized view creates a real table. This table may be indexed and analysed like a regular table.

Now let us see the view in action. We use the same query, and we query the same table: ~

SQL> SELECT owner, count(*) FROM mv_demotable
GROUP BY owner;

OWNER                            COUNT(*)
------------------------------ ----------
CMLCONV                             15520
EXPERT                              55840
KOTTAYAM                             3280
OUTLN                                 560
PUBLIC                             111680
SCIM                                 6320
SCOTT                                 560
SSANET                               5040
SYS                                273680
SYSTEM                              28000
TEST                                 3280
TKYTE                                3200
TOAD                                 5920
TRA                                  2960

14 rows selected.

Elapsed: 00:00:00.01

Execution Plan
-----------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
(Cost=1 Card=82 Bytes=2460)

   1    0   TABLE ACCESS (FULL) OF 'MV_DEMOTABLE'
(Cost=1 Card=82 Bytes=2460)

Statistics
-----------------------------------------------------
          0  recursive calls
          6  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
       1102  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         14  rows processed

And there you have it! From 4200 physical reads to 0! Here you have a comparison of the parameters.

		Before creating materialized view				After creating materialized view
	Item	Value				Value
	Elapsed Time	1.06s				0.01s
Stat- istics
	recursive calls	0				11
	db block gets	27				10
	consistent gets	6366				15
	physical reads	4200				9
	redo size	0				0
	sorts (memory)	1				4
	sorts (disk)	0				0
Exec- ution Plan			Cost	Card	Bytes		Cost	Card	Bytes
	SELECT STATEMENT		1581	14	70		1	82	2460
	SORT (GROUP BY)		1581	14	70		-	-	-
	TABLE ACCESS (FULL) OF 'MV_ DEMOTABLE'		965	515840	2579200		1	82	2460

We can see that the improvement is by over a factor of 100. What's even more interesting is that such a query will be effectively cached, as subsequent runs of the query will return within 0.00s. We can also see that since have created the summary table on the basis og the GROUP BY clause, the execution plan does not have the entry for the SORT (GROUP BY) clause. Finally, we can see that the table access has been almost completely eliminated, with Oracle having to look up the tables only for a handful of data that it somehow failed to find. We can also see that the SQL engine does not have to process the 2MB odd bytes to send out the 14-rows of summary information to the SQL*Plus client; it merely sends the 2460 bytes that's already available in the summary table. Instead of a complex aggregation, all it needs to do is a simole table lookup, which it can do with less than ten physical reads.

I hope I have clearly demonstrated the use of materialized views. Especially, the performance advantage is using materialized view in a particular test situation has been demonstrated. How far your practical requirement resembles this test scenario is for you to find out; but in Oracle programming, you can always find a way to make something work better and faster than it currently does... so, best of luck.

This blog is already quite big for a single read, so we'll discuss Oracle's query rewriting mechanism and other aspects of materialized views (such as the REFRESH ON COMMIT option) in a subsequent post.

Until then, have a good time experimenting...

Zeinab

posted by Zeinab # 11:13 AM

Comments: Post a Comment

Subscribe to Post Comments [Atom]

<< Home

Oracle Programming Concepts

Wednesday, September 27, 2006

Materialized Views

Archives