Materialized views, previously known as snapshots, are a grossly underutilized feature of the Oracle database. These "summary tables" can significantly reduce data access times by several orders of magnitude. With the use of materialized views, Oracle throws open the possibility of querying terabytes of data in a matter of seconds. they accomplish this by transparently using pre-computed summarizations and joins of data.
I guess I have already answered the "serious questions" associated with the use of materialized views, namely,
- Can I make a materialized view of a complex join/subquery?
- Does a materialized view take up space?
- What is the major difference (or similarity) between a materializd view and a view?
The answers to the above questions would be
- Yes, we make views/snapshots entirely to this end: when we want to query more than one table. It is the primary reason why we would use a view or a snapshot.
- Yes, of course. A view also takes up some space. In a similar manner, so does an index, constraint, etc. A data structure either takes up memory or takes up space or both. In the case of index, snapshot,l constraint etc, it evident that these take up space.
- A view is formally defined as "an alternative way to look at data." Typically, we cobble together a custom "view" which includes several fields from one table, a few from another and so on, all of which may or may not be related with a matching condition (such as primary key constraint). A snapshot, however, is an aggregate table. A view is essentially something that refers directly to the most recent possible state of the table. A snapshot could be quite old. For this simple reason, taking data from a snapshot for an OLTP application is not recommended (more on this later). A materialized view is similar to a regular view in the way it is defined (we use a similar command to create the snapshot), but we store the results of the defining query. In other words, the results of the defining query are materialized as a persistent database object (in simple terms, stored in some physical file, just like a table). Caveat: Use this feature mainly for data mining applications, not for OLTP-oriented applications. If your database tables are frequently updated, then it's better not to use it.
Use of materialzed views results in huge gains in performance. By calculating the answers to the really hard questions up front, we will greatly reduce the load on our server. The positive effects are the following: ~
- Less physical reads - There is less data to scan through.
- Less writes - We will not be sorting/aggregating as frequently.
- Decreased CPU consumption - We will not be calculating the aggregates and functions on the data, as we wil have already done that.
- Markedly faster response times - Our queries will return incredibly quickly when a summary table is used, as opposed to the details. In some cases this may result in a reduction in work of many orders of magnitude.
However, these are the limitations: ~
- There is a small amount of disk space - the summary tables need disk space.
- Materialized views work well in a read-intensive or read-only environment.
- Concurrency are associated with REFRESH ON COMMIT option: there are six rows in the summary tables, and we can have at most six people committing at the same time (assuming they all affecta a different owner). So there has to be a workaround. We can use full refreshes on a recurring basis (during off-peak time). This will add no overhead, and there will be no concurrency issues.
- The REFRESH ON COMMIT option introduces contention.
- Another solution is to maintain the summary data separate, and report on yesterday's activities, and not query the live OLTP data for reports.
- An alternative to the REFRESH ON COMMIT option is to use the REFRESH ON DEMAND option, with a full refresh. This can be done on a timed basis. The refresh may be scheduled for a time when the load is light. In this case, you should use QUERY_REWRITE_INTEGRITY = STALE_TOLERATED. This will force Oracle into using the materialized view even if it knows that the data it is derived from has changed. Note, however, that this will result in data that is always incorrect, but by an acceptable degree. And then you can use:
SQL> exec DBMS_MVIEW.REFRESH('mv_demo');
to refresh the view. Please note that the materialized view in this case was using REFRESH_ON_COMMIT, and it still could be refreshed manually by issuing this command. Normally we would be using this statement to refresh a mateiralized view with the REFRESH ON DEMAND option.
We shall look at the use of this splendid tool by going through some examples. Readers are encouraged o try out these examples, specifically change these to suit your requirement, find out if there is a performance gain, and use it if they are satisfied. Like on all other occasions, I would remind you not to accept any stated fact without proof.
To run the examples, your user account needs to have these specific privileges:
- GRANT CREATE SESSION
- GRANT CREATE TABLE
- GRANT CREATE MATERIALIZED VIEW
- GRANT QUERY REWRITE
The first three privileges may be granted to one of your roles, while the fourth should be granted directly to you. You will also need around 200 MB of free space (that is, your quota shall be at least 200 MB). Your version of Oracle database is assumed to be using the Cost Based Optimizer (CBO), so that you can make use of the query-rewrite capability.
First Example: What a Snapshot is Capable of
We shall try to see if we can reduce the execution time of a long running query by using materialized views. Since we need a pretty large table to demonstrate the dramatic impact of mateirlized views, we shall be creating a large table. The following points may be noted: ~
- You will issue a query against a large table. Understandably, this will take an appreciable amount of time (which you will verify by setting timed statistics on)
- By creating a materialized view, we shall rewrite the query against the large table into a query against a small table, without any loss in accuracy.
- We shall see if there has been any savings in execution time.
We'll start with a query to our own data dictionary, and get a list of all the objects and their owners.
First, we shall setup timed statistics:
SQL> set timing on
Now we shall create our table. From the time, we can see that it is a considerable table.
SQL> CREATE TABLE mv_demotable NOLOGGING
2 AS SELECT * FROM all_objects
3 UNION ALL SELECT * FROM all_objects
4 UNION ALL SELECT * FROM all_objects
5 UNION ALL SELECT * FROM all_objects
6 UNION ALL SELECT * FROM all_objects
7 UNION ALL SELECT * FROM all_objects
8 UNION ALL SELECT * FROM all_objects
9 UNION ALL SELECT * FROM all_objects
10 UNION ALL SELECT * FROM all_objects
11 UNION ALL SELECT * FROM all_objects
12 /
Table created.
Elapsed: 00:00:04.08
And we get the count of records as
SQL> SELECT COUNT(*) FROM mv_demotable;
COUNT(*)
----------
64480
Elapsed: 00:00:00.03
With the processing speed of today's computers, we need significantly more records than this to observe the impact of matrialized views on our query. So we append the same records to our demo table twice, and commit each time:
SQL> INSERT /*+ APPEND */ INTO mv_demotable
2 SELECT * FROM mv_demotable;
64480 rows created.
Elapsed: 00:00:00.03
SQL> commit;
Commit complete.
Elapsed: 00:00:00.00
SQL> INSERT /*+ APPEND */ INTO mv_demotable
2 SELECT * FROM mv_demotable;
128960 rows created.
Elapsed: 00:00:00.07
SQL> commit;
Commit complete.
Elapsed: 00:00:00.00
SQL> INSERT /*+ APPEND */ INTO mv_demotable
2 SELECT * FROM mv_demotable;
257920 rows created.
Elapsed: 00:00:03.06
SQL> commit;
Commit complete.
Elapsed: 00:00:00.00
SQL> SELECT COUNT(*) FROM mv_demotable;
COUNT(*)
----------
515840
Elapsed: 00:00:01.07
This is a significantly big table and we have enough records. Note that even getting a simple count is taking 60 milliseconds; we're in business! (A simple question to readers: why is COMMIT taking 0.00 seconds?)
We will execute a query against this demo table that shows the number of objects owned by each user. Doing such a query, as readers will observe, results in a full table scan. It uses up all possible resources on the server. And now we issue our first query to the demo table. And here we have it: ~
SQL> SELECT owner, COUNT(*) FROM mv_demotable
GROUP BY owner;
OWNER COUNT(*)
------------------------------ ----------
CMLCONV 15520
EXPERT 55840
KOTTAYAM 3280
OUTLN 560
PUBLIC 111680
SCIM 6320
SCOTT 560
SSANET 5040
SYS 273680
SYSTEM 28000
TEST 3280
TKYTE 3200
TOAD 5920
TRA 2960
14 rows selected.
Elapsed: 00:00:02.00
Execution Plan
-----------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (GROUP BY)
2 1 TABLE ACCESS (FULL) OF 'MV_DEMOTABLE'
Statistics
-----------------------------------------------------
0 recursive calls
27 db block gets
6367 consistent gets
4862 physical reads
60 redo size
1102 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
14 rows processed
We have not computed the statistics on our table. If we do that, we will get the cost-based statistics on our query. We issue the following: ~
SQL> ANALYZE TABLE mv_demotable COMPUTE STATISTICS;
Table analyzed.
Elapsed: 00:00:14.08
And we set the autotrace level to traceonly. since we're not interested in the data: ~
SQL> set autotrace traceonly
And now we look at the statistics for the full table scan: ~
SQL> SELECT owner, COUNT(*) FROM mv_demotable
GROUP BY owner;
14 rows selected.
Elapsed: 00:00:01.06
Execution Plan
-----------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=1581 Card=14 Bytes=70)
1 0 SORT (GROUP BY)
(Cost=1581 Card=14 Bytes=70)
2 1 TABLE ACCESS (FULL) OF 'MV_DEMOTABLE'
(Cost=965 Card=515840 Bytes=2579200)
Statistics
-----------------------------------------------------
0 recursive calls
27 db block gets
6366 consistent gets
4200 physical reads
0 redo size
1102 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
14 rows processed
Just look at the number of physical disk accesses to get a simple agrregated data! 4200 Physical reads just to send one kB of data to the client - the evidence seems to point heavily in the direction of some SQL tuning.
And now we create our materialized view. First, we need to grant some privileges, etc to our user, and enable the proper usage of materialized views.
SQL> GRANT QUERY REWRITE TO cmlconv;
Grant succeeded.
SQL> ALTER SESSION SET QUERY_REWRITE_ENABLED=TRUE;
Session altered.
SQL> ALTER SESSION SET QUERY_REWRITE_INTEGRITY=ENFORCED;
Session altered.
And now we create our materialized view.
SQL> CREATE MATERIALIZED VIEW mv_demo
2 BUILD IMMEDIATE
3 REFRESH ON COMMIT
4 ENABLE QUERY REWRITE
5 AS
6 SELECT owner, count(*) FROM mv_demotable
7 GROUP BY owner
8 /
Materialized view created.
Elapsed: 00:00:04.09
And we analyse our table: ~
SQL> ANALYZE TABLE mv_demotable COMPUTE STATISTICS;
Table analyzed.
Elapsed: 00:00:14.07
As before, the analysis part takes much longer. this is the penalty for genating the summary table; the inserts on the table themselves are not of much concern, but in order to refresh our summary table, it takes a lot of time - the database performs a full scan, so to say. This constraint forces the developer to exercise caution in using materialized views. If the matrialized view should be refreshed too frequently, then it might prove marginally effective at best and counter-productive at worst.
What we have done is the following: ~
- We calculated the object count
- We defined this summary report as a materialized view
- We asked this view to be immediately built and populated
- Then we analyzed the TABLE. A materialized view creates a real table. This table may be indexed and analysed like a regular table.
Now let us see the view in action. We use the same query, and we query the same table: ~
SQL> SELECT owner, count(*) FROM mv_demotable
GROUP BY owner;
OWNER COUNT(*)
------------------------------ ----------
CMLCONV 15520
EXPERT 55840
KOTTAYAM 3280
OUTLN 560
PUBLIC 111680
SCIM 6320
SCOTT 560
SSANET 5040
SYS 273680
SYSTEM 28000
TEST 3280
TKYTE 3200
TOAD 5920
TRA 2960
14 rows selected.
Elapsed: 00:00:00.01
Execution Plan
-----------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=1 Card=82 Bytes=2460)
1 0 TABLE ACCESS (FULL) OF 'MV_DEMOTABLE'
(Cost=1 Card=82 Bytes=2460)
Statistics
-----------------------------------------------------
0 recursive calls
6 db block gets
4 consistent gets
0 physical reads
0 redo size
1102 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
14 rows processed
And there you have it! From 4200 physical reads to 0! Here you have a comparison of the parameters.
|
| Before creating materialized view
| After creating materialized view
|
| Item
| Value
|
|
|
| Value
|
|
|
|
| Elapsed Time
| 1.06s
|
|
|
| 0.01s |
|
|
|
Stat- istics |
|
|
|
|
|
|
|
|
|
| recursive calls
| 0
|
|
|
| 11
|
|
|
|
| db block gets
| 27
|
|
|
| 10
|
|
|
|
| consistent gets
| 6366
|
|
|
| 15
|
|
|
|
| physical reads
| 4200
|
|
|
| 9
|
|
|
|
| redo size
| 0
|
|
|
| 0
|
|
|
|
| sorts (memory)
| 1
|
|
|
| 4
|
|
|
|
| sorts (disk)
| 0
|
|
|
| 0
|
|
|
|
Exec- ution Plan
|
|
| Cost
| Card
| Bytes
|
| Cost | Card
| Bytes
|
| SELECT STATEMENT
|
| 1581
| 14
| 70
|
| 1
| 82
| 2460
|
| SORT (GROUP BY)
|
| 1581
| 14
| 70
|
| -
| -
| -
|
| TABLE ACCESS (FULL) OF 'MV_ DEMOTABLE'
|
| 965
| 515840
| 2579200
|
| 1
| 82
| 2460
|
|
|
|
|
|
|
|
|
|
|
We can see that the improvement is by over a factor of 100. What's even more interesting is that such a query will be effectively cached, as subsequent runs of the query will return within 0.00s. We can also see that since have created the summary table on the basis og the GROUP BY clause, the execution plan does not have the entry for the SORT (GROUP BY) clause. Finally, we can see that the table access has been almost completely eliminated, with Oracle having to look up the tables only for a handful of data that it somehow failed to find. We can also see that the SQL engine does not have to process the 2MB odd bytes to send out the 14-rows of summary information to the SQL*Plus client; it merely sends the 2460 bytes that's already available in the summary table. Instead of a complex aggregation, all it needs to do is a simole table lookup, which it can do with less than ten physical reads.
I hope I have clearly demonstrated the use of materialized views. Especially, the performance advantage is using materialized view in a particular test situation has been demonstrated. How far your practical requirement resembles this test scenario is for you to find out; but in Oracle programming, you can always find a way to make something work better and faster than it currently does... so, best of luck.
This blog is already quite big for a single read, so we'll discuss Oracle's query rewriting mechanism and other aspects of materialized views (such as the REFRESH ON COMMIT option) in a subsequent post.
Until then, have a good time experimenting...
Zeinab
SQL and PL/SQL
With
An Introduction to Database Programming Using PL/SQL Blocks
We are all acquainted with SQL as the language understood by relational databases like Oracle. We are all pretty happy that we have such a language at our disposal to manipulate databases. SQL is a powerful, easy-to-use language that uses the same daily-use type of language.
Before the widespread acceptance of SQL, database programming didn't exactly look like the simple English-language style queries and logical operations that we are accustomed yo using. C programmers would write entire database applications with their own C routines, employing their own code to provide structures like tables, views, and indexes, as functionality such as insert and delete. All these features were available in their databases also, but it was all done the hard way, with C datatypes like long and char, and the user was given a front-end for data entry and other types of interaction. He was not able to interact directly with the database engine (which was often a large multi-threaded C-program). Besides providing the basic features, the developers had to write custom code to handle errors and such.
Oracle was the first to formalize the process of implementing a database in the C programming language. They finally came out with a relational database in the 1970s, written entirely in C. Over the years they have added feature after feature to their exceedingly complex database, and nowadays Oracle is implementing most of the fancy stuff in Java. Still, the core of Oracle is build in C, which is why all Oracle installations require a C compiler (if you have installed Oracle on Linux or Unix, then you would recall the requirements of gcc/glibc etc). As it has been written in C, the Oracle code is exceedingly fast. The SQL engine of Oracle is comparable, or faster than, C programs (written using Pro*C) accessing the Oracle database.
One of the key features of Oracle — a crucial decision made way back in the '80s by the designers of the evolving Oracle database — was to implement the new Structured Query Language (SQL) standards. This has become a key feature of the Oracle database. Oracle was one of the first databases to support SQL. However, SQL was found to have some limitations when compared to existing third generation computer languages such as C. The most glaring of its limitations is that it does not allow procedural constructs such as the looping constructs supported by third-generation languages such as C or Fortran. (You might recall that an SQL statement that you use in sqlplus, such as a "select * from emp where ename='SMITH';" is just that - a one-off statement. For retrieving another record, you have to issue another statement. For processing record-wise, this is a pain). This is a great drawback, and, considering that SQL is a fourth generation language, this was a big handicap.
Naturally, Oracle decided to circumvent this deficiency. It wrote a procedural language extension to SQL, which is named, understandably, PL/SQL (Procedural Language Extensions in PL/SQL). Now you had something very close to a real programming language, and it was operating "very close" to the Oracle database itself. (The SQL engine would directly access the Oracle memory structures). Specifically, you could now use procedures and packages in the database. The concept of code reuse and attendant performance optimization was thus introduced to an already respectable database.
With this much of background into the evolution of PL/SQL and why it was developed, we can now proceed to the heart of the matter: using PL/SQL in our day-to-day work.
To interact with the Oracle database (database = memory structures + data files), Oracle provides one standard utility which is available in all platforms. It is the SQL*Plus, invoked from the command line (as oracle user in UNIXes and UNIX-like systems) by typing sqlplus. In the Windows environment too this is available, as sqlplus located in
To have a serious interaction with the Oracle database from your client PC, we recommend using command-line sqlplus. It is perhaps the best command-line program available on Windows.
C:\Documents and Settings\Owner>sqlplus
The command line is the only legitimate way to enter the Oracle database and be serious about the business. The Windows command line is probably the best way to use sqlplus, with its command history, flexible cut and paste options, etc. We hope that this point will be taken seriously, as the "GUI" version (sqlplusw) is a lame wrapper for the sqlplus executable. Furthermore, it has been desupported since September 2005.
The basic PL/SQL unit of execution, one that you can use straight away from the sqlplus prompt, is the PL/SQL block. It has the following structure.
declare
-- you declare cursors and program variables here
begin
-- statements go here
exception
-- exception handling statements go here
end;
Of these four key statements, begin and end; are mandatory, while declare and exception are optional. To execute a PL/SQL block, you must terminate it with a forward slash (/). SQL*Plus then sends the block to the database for execution.
Now we shall look at a sample PL/SQL code block. This is the code that you type into the sqlplus prompt.
declare
phno number(7);
begin
phno := 2454325;
insert into address_book (name, phone_no, location)
values ('HARI KAIMAL', phno, 'IT CELL ANNEXE');
commit;
end;
The actual sqlplus screendump is given below.
SQL> declare
2 phno number(7);
3 begin
4 phno := 2454325;
5
6 insert into address_book (name, phone_no, location)
7 values ('HARI KAIMAL', phno, 'IT CELL ANNEXE');
8
9 commit;
10 end;
11 /
PL/SQL procedure successfully completed.
SQL> select count(*) from address_book;
COUNT(*)
----------
1
Okay. Now we have produced our first block of PL/SQL on the sqlplus prompt. The semicolon after the end signals the sqlplus engine to execute the code, and we have our dml statement insert the data into the address_book table. This might seem like a lot of overhead, especially when you can simply write:
SQL> insert into address_book values ('HARI KAIMAL', 2464724, 'BSNL STR QTRS');
1 row created.
Still why do we go to all the lengths, declaring the variable, formally beginning the transaction, assigning a value to our variable(s), and then issuing the insert statement?
It is obvious that there must be some advantages to this procedural approach. If simple insert, update and delete were all the things that were possible with Oracle, then it would have been OK... but for that we don't need Oracle, or for that matter, any expensive database! A few thousand lines of C code would be enough to implement such functionality, and flat files would serve as the database tables.
The first thing we realise is that database programming is not about issuing random queries. In a well-implemented database application, such occasions to issue one-off queries should be rare. It reveals some nefarious activity - namely, the back-end laundering of form data. Needless to say, it is equivalent to bypassing the safety checks and whatever logic you have built up in your database application. It should be avoided at all costs. (It is habitual for us at the IT Cell to consider all our applications as "web applications" whereas they are all, in the real sense, database applications. Considering them as web applications - considering that 99% of all network applications in the world are web-applications - insulates from a grave sin: the fact that we are not using the database to do things which it is best equipped to do. Instead we, devise our own ways of manipulating SQL before sending it to the database. And still pay $40,000 per processor for the Oracle licence).
If we want record-level data processing, for example, we can't do it with a one-off query (unless it is a straight update - in which case also a PL/SQL procedure is more desirable). An evident advantage is that we are executing the insert as a procedure, and not as a straight transaction. Such execution gives us a lot of latitude in what we do with the DML. To use a simple example, if we issue a bad DML in a one-off insert, such as
SQL> insert into address_book values (
2 'HARI KAIMAL', 2464724, 'BSNL STR QRTRS', 'TVM');
insert into address_book values (
*
ERROR at line 1:
ORA-00913: too many values
The database correctly points out the error, but the data is lost, and the entire DML is discarded. In a procedure, we can devise ways to deal with such a possibility.
Just as in a program written in a procedural language program, PL/SQL gives us a chance to handle exceptions. Exceptions are abnormal conditions that might occur in a program, but can be handled by the program. Here is one such candidate, where we have tried to insert a value into a nonexistent field. We can catch exceptions and handle the exceptions in a more friendly, and useful way. For this, we have to
- Raise an error
- Handle the error with our own (application) code
- Give appropriate error message, or log the error in an error table.
Thus, we have a pretty useful record of the statement that caused the error. Otherwise, Oracle will simply move on after rejecting the DML. This is not a graceful way of dealing with errors that could be easily avoided.
Now that we have seen a little PL/SQL block, let us now observe what actually happens behind the scenes.
- SQL*Plus switches you into PL/SQL mode when you begin entering a PL/SQL block, by looking out for the DECLARE and/or BEGIN keywords.
- What you type in is buffered by SQL*Plus
- When you hit a forward slash, the entire block is sent to the database server for processing
Note the last point: whatever SQL or PL/SQL that you have typed in is processed in the database. And that is a most interesting point. Parsing and syntax checking are done by the server and not by the sqlplus utility. In relation to our own case, it means the data processing load is shifted to the database side, where it should be done. There can be no other way.
And with such solid evidence, we are now in a position to appreciate why it is essential in a database application to go in for real database-side programming using stored procedures, functions, triggers, integrity constraints, and other advanced features provided by the horribly expensive Oracle. We can also appreciate why, even as developers, it is suicidal to modify data at the back-end, that is, the dangers of side-stepping the front-end, which is specifically built for that purpose. Starting PL/SQL blocks, and progressing successively on to stored procedures, functions and triggers, we shall climb the stairs to database programming heaven one step at a time.
But for all that we have to start somewhere. And here we have made start. With that comforting thought, we begin our ascent.