Oracle Programming Concepts

Wednesday, September 27, 2006

 

Materialized Views

Materialized views, previously known as snapshots, are a grossly underutilized feature of the Oracle database. These "summary tables" can significantly reduce data access times by several orders of magnitude. With the use of materialized views, Oracle throws open the possibility of querying terabytes of data in a matter of seconds. they accomplish this by transparently using pre-computed summarizations and joins of data.

I guess I have already answered the "serious questions" associated with the use of materialized views, namely,
The answers to the above questions would be

Use of materialzed views results in huge gains in performance. By calculating the answers to the really hard questions up front, we will greatly reduce the load on our server. The positive effects are the following: ~

However, these are the limitations: ~
We shall look at the use of this splendid tool by going through some examples. Readers are encouraged o try out these examples, specifically change these to suit your requirement, find out if there is a performance gain, and use it if they are satisfied. Like on all other occasions, I would remind you not to accept any stated fact without proof.

To run the examples, your user account needs to have these specific privileges:
The first three privileges may be granted to one of your roles, while the fourth should be granted directly to you. You will also need around 200 MB of free space (that is, your quota shall be at least 200 MB). Your version of Oracle database is assumed to be using the Cost Based Optimizer (CBO), so that you can make use of the query-rewrite capability.

First Example: What a Snapshot is Capable of

We shall try to see if we can reduce the execution time of a long running query by using materialized views. Since we need a pretty large table to demonstrate the dramatic impact of mateirlized views, we shall be creating a large table. The following points may be noted: ~


We'll start with a query to our own data dictionary, and get a list of all the objects and their owners.
First, we shall setup timed statistics:

SQL> set timing on

Now we shall create our table. From the time, we can see that it is a considerable table.

SQL> CREATE TABLE mv_demotable NOLOGGING
2 AS SELECT * FROM all_objects
3 UNION ALL SELECT * FROM all_objects
4 UNION ALL SELECT * FROM all_objects
5 UNION ALL SELECT * FROM all_objects
6 UNION ALL SELECT * FROM all_objects
7 UNION ALL SELECT * FROM all_objects
8 UNION ALL SELECT * FROM all_objects
9 UNION ALL SELECT * FROM all_objects
10 UNION ALL SELECT * FROM all_objects
11 UNION ALL SELECT * FROM all_objects
12 /

Table created.

Elapsed: 00:00:04.08

And we get the count of records as

SQL> SELECT COUNT(*) FROM mv_demotable;

COUNT(*)
----------
64480

Elapsed: 00:00:00.03

With the processing speed of today's computers, we need significantly more records than this to observe the impact of matrialized views on our query. So we append the same records to our demo table twice, and commit each time:

SQL> INSERT /*+ APPEND */ INTO mv_demotable
2 SELECT * FROM mv_demotable;

64480 rows created.

Elapsed: 00:00:00.03
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00

SQL> INSERT /*+ APPEND */ INTO mv_demotable
2 SELECT * FROM mv_demotable;

128960 rows created.

Elapsed: 00:00:00.07
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
SQL> INSERT /*+ APPEND */ INTO mv_demotable
2 SELECT * FROM mv_demotable;

257920 rows created.

Elapsed: 00:00:03.06
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
SQL> SELECT COUNT(*) FROM mv_demotable;

COUNT(*)
----------
515840

Elapsed: 00:00:01.07


This is a significantly big table and we have enough records. Note that even getting a simple count is taking 60 milliseconds; we're in business! (A simple question to readers: why is COMMIT taking 0.00 seconds?)

We will execute a query against this demo table that shows the number of objects owned by each user. Doing such a query, as readers will observe, results in a full table scan. It uses up all possible resources on the server. And now we issue our first query to the demo table. And here we have it: ~

SQL> SELECT owner, COUNT(*) FROM mv_demotable
GROUP BY owner;

OWNER COUNT(*)
------------------------------ ----------
CMLCONV 15520
EXPERT 55840
KOTTAYAM 3280
OUTLN 560
PUBLIC 111680
SCIM 6320
SCOTT 560
SSANET 5040
SYS 273680
SYSTEM 28000
TEST 3280
TKYTE 3200
TOAD 5920
TRA 2960

14 rows selected.

Elapsed: 00:00:02.00

Execution Plan
-----------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (GROUP BY)
2 1 TABLE ACCESS (FULL) OF 'MV_DEMOTABLE'

Statistics
-----------------------------------------------------
0 recursive calls
27 db block gets
6367 consistent gets
4862 physical reads
60 redo size
1102 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
14 rows processed

We have not computed the statistics on our table. If we do that, we will get the cost-based statistics on our query. We issue the following: ~

SQL> ANALYZE TABLE mv_demotable COMPUTE STATISTICS;

Table analyzed.

Elapsed: 00:00:14.08

And we set the autotrace level to traceonly. since we're not interested in the data: ~

SQL> set autotrace traceonly

And now we look at the statistics for the full table scan: ~

SQL> SELECT owner, COUNT(*) FROM mv_demotable
GROUP BY owner;


14 rows selected.

Elapsed: 00:00:01.06

Execution Plan
-----------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=1581 Card=14 Bytes=70)


1 0 SORT (GROUP BY)
(Cost=1581 Card=14 Bytes=70)

2 1 TABLE ACCESS (FULL) OF 'MV_DEMOTABLE'
(Cost=965 Card=515840 Bytes=2579200)


Statistics
-----------------------------------------------------
0 recursive calls
27 db block gets
6366 consistent gets
4200 physical reads
0 redo size
1102 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
14 rows processed

Just look at the number of physical disk accesses to get a simple agrregated data! 4200 Physical reads just to send one kB of data to the client - the evidence seems to point heavily in the direction of some SQL tuning.

And now we create our materialized view. First, we need to grant some privileges, etc to our user, and enable the proper usage of materialized views.

SQL> GRANT QUERY REWRITE TO cmlconv;

Grant succeeded.

SQL> ALTER SESSION SET QUERY_REWRITE_ENABLED=TRUE;

Session altered.

SQL> ALTER SESSION SET QUERY_REWRITE_INTEGRITY=ENFORCED;

Session altered.

And now we create our materialized view.

SQL> CREATE MATERIALIZED VIEW mv_demo
2 BUILD IMMEDIATE
3 REFRESH ON COMMIT
4 ENABLE QUERY REWRITE
5 AS
6 SELECT owner, count(*) FROM mv_demotable
7 GROUP BY owner

8 /

Materialized view created.

Elapsed: 00:00:04.09

And we analyse our table: ~

SQL> ANALYZE TABLE mv_demotable COMPUTE STATISTICS;

Table analyzed.

Elapsed: 00:00:14.07

As before, the analysis part takes much longer. this is the penalty for genating the summary table; the inserts on the table themselves are not of much concern, but in order to refresh our summary table, it takes a lot of time - the database performs a full scan, so to say. This constraint forces the developer to exercise caution in using materialized views. If the matrialized view should be refreshed too frequently, then it might prove marginally effective at best and counter-productive at worst.

What we have done is the following: ~
Now let us see the view in action. We use the same query, and we query the same table: ~

SQL> SELECT owner, count(*) FROM mv_demotable
GROUP BY owner;


OWNER COUNT(*)
------------------------------ ----------
CMLCONV 15520
EXPERT 55840
KOTTAYAM 3280
OUTLN 560
PUBLIC 111680
SCIM 6320
SCOTT 560
SSANET 5040
SYS 273680
SYSTEM 28000
TEST 3280
TKYTE 3200
TOAD 5920
TRA 2960

14 rows selected.

Elapsed: 00:00:00.01

Execution Plan
-----------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=1 Card=82 Bytes=2460)


1 0 TABLE ACCESS (FULL) OF 'MV_DEMOTABLE'
(Cost=1 Card=82 Bytes=2460)


Statistics
-----------------------------------------------------
0 recursive calls
6 db block gets
4 consistent gets
0 physical reads
0 redo size
1102 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
14 rows processed

And there you have it! From 4200 physical reads to 0! Here you have a comparison of the parameters.



Before creating
materialized view
After creating
materialized view

Item
Value



Value




Elapsed
Time
1.06s



0.01s


Stat-
istics










recursive
calls
0



11




db block
gets
27



10




consistent
gets
6366



15




physical
reads
4200



9




redo size
0



0




sorts
(memory)
1



4




sorts
(disk)
0



0



Exec-
ution
Plan


Cost
Card
Bytes

Cost Card
Bytes

SELECT
STATEMENT

1581
14
70

1
82
2460

SORT
(GROUP BY)

1581
14
70

-
-
-

TABLE
ACCESS (FULL)
OF 'MV_
DEMOTABLE'

965
515840
2579200

1
82
2460











We can see that the improvement is by over a factor of 100. What's even more interesting is that such a query will be effectively cached, as subsequent runs of the query will return within 0.00s. We can also see that since have created the summary table on the basis og the GROUP BY clause, the execution plan does not have the entry for the SORT (GROUP BY) clause. Finally, we can see that the table access has been almost completely eliminated, with Oracle having to look up the tables only for a handful of data that it somehow failed to find. We can also see that the SQL engine does not have to process the 2MB odd bytes to send out the 14-rows of summary information to the SQL*Plus client; it merely sends the 2460 bytes that's already available in the summary table. Instead of a complex aggregation, all it needs to do is a simole table lookup, which it can do with less than ten physical reads.

I hope I have clearly demonstrated the use of materialized views. Especially, the performance advantage is using materialized view in a particular test situation has been demonstrated. How far your practical requirement resembles this test scenario is for you to find out; but in Oracle programming, you can always find a way to make something work better and faster than it currently does... so, best of luck.

This blog is already quite big for a single read, so we'll discuss Oracle's query rewriting mechanism and other aspects of materialized views (such as the REFRESH ON COMMIT option) in a subsequent post.

Until then, have a good time experimenting...

Zeinab

Friday, September 01, 2006

 
SQL and PL/SQL
With
An Introduction to Database Programming Using PL/SQL Blocks


We are all acquainted with SQL as the language understood by relational databases like Oracle. We are all pretty happy that we have such a language at our disposal to manipulate databases. SQL is a powerful, easy-to-use language that uses the same daily-use type of language.

Before the widespread acceptance of SQL, database programming didn't exactly look like the simple English-language style queries and logical operations that we are accustomed yo using. C programmers would write entire database applications with their own C routines, employing their own code to provide structures like tables, views, and indexes, as functionality such as insert and delete. All these features were available in their databases also, but it was all done the hard way, with C datatypes like long and char, and the user was given a front-end for data entry and other types of interaction. He was not able to interact directly with the database engine (which was often a large multi-threaded C-program). Besides providing the basic features, the developers had to write custom code to handle errors and such.

Oracle was the first to formalize the process of implementing a database in the C programming language. They finally came out with a relational database in the 1970s, written entirely in C. Over the years they have added feature after feature to their exceedingly complex database, and nowadays Oracle is implementing most of the fancy stuff in Java. Still, the core of Oracle is build in C, which is why all Oracle installations require a C compiler (if you have installed Oracle on Linux or Unix, then you would recall the requirements of gcc/glibc etc). As it has been written in C, the Oracle code is exceedingly fast. The SQL engine of Oracle is comparable, or faster than, C programs (written using Pro*C) accessing the Oracle database.

One of the key features of Oracle — a crucial decision made way back in the '80s by the designers of the evolving Oracle database — was to implement the new Structured Query Language (SQL) standards. This has become a key feature of the Oracle database. Oracle was one of the first databases to support SQL. However, SQL was found to have some limitations when compared to existing third generation computer languages such as C. The most glaring of its limitations is that it does not allow procedural constructs such as the looping constructs supported by third-generation languages such as C or Fortran. (You might recall that an SQL statement that you use in sqlplus, such as a "select * from emp where ename='SMITH';" is just that - a one-off statement. For retrieving another record, you have to issue another statement. For processing record-wise, this is a pain). This is a great drawback, and, considering that SQL is a fourth generation language, this was a big handicap.

Naturally, Oracle decided to circumvent this deficiency. It wrote a procedural language extension to SQL, which is named, understandably, PL/SQL (Procedural Language Extensions in PL/SQL). Now you had something very close to a real programming language, and it was operating "very close" to the Oracle database itself. (The SQL engine would directly access the Oracle memory structures). Specifically, you could now use procedures and packages in the database. The concept of code reuse and attendant performance optimization was thus introduced to an already respectable database.

With this much of background into the evolution of PL/SQL and why it was developed, we can now proceed to the heart of the matter: using PL/SQL in our day-to-day work.

To interact with the Oracle database (database = memory structures + data files), Oracle provides one standard utility which is available in all platforms. It is the SQL*Plus, invoked from the command line (as oracle user in UNIXes and UNIX-like systems) by typing sqlplus. In the Windows environment too this is available, as sqlplus located in

To have a serious interaction with the Oracle database from your client PC, we recommend using command-line sqlplus. It is perhaps the best command-line program available on Windows.

C:\Documents and Settings\Owner>sqlplus


The command line is the only legitimate way to enter the Oracle database and be serious about the business. The Windows command line is probably the best way to use sqlplus, with its command history, flexible cut and paste options, etc. We hope that this point will be taken seriously, as the "GUI" version (sqlplusw) is a lame wrapper for the sqlplus executable. Furthermore, it has been desupported since September 2005.

The basic PL/SQL unit of execution, one that you can use straight away from the sqlplus prompt, is the PL/SQL block. It has the following structure.

declare
-- you declare cursors and program variables here
begin
-- statements go here
exception
-- exception handling statements go here
end;


Of these four key statements, begin and end; are mandatory, while declare and exception are optional. To execute a PL/SQL block, you must terminate it with a forward slash (/). SQL*Plus then sends the block to the database for execution.

Now we shall look at a sample PL/SQL code block. This is the code that you type into the sqlplus prompt.

declare
phno number(7);
begin
phno := 2454325;

insert into address_book (name, phone_no, location)
values ('HARI KAIMAL', phno, 'IT CELL ANNEXE');

commit;
end;

The actual sqlplus screendump is given below.

SQL> declare
2 phno number(7);
3 begin
4 phno := 2454325;
5
6 insert into address_book (name, phone_no, location)
7 values ('HARI KAIMAL', phno, 'IT CELL ANNEXE');
8
9 commit;
10 end;
11 /

PL/SQL procedure successfully completed.

SQL> select count(*) from address_book;

COUNT(*)
----------
1


Okay. Now we have produced our first block of PL/SQL on the sqlplus prompt. The semicolon after the end signals the sqlplus engine to execute the code, and we have our dml statement insert the data into the address_book table. This might seem like a lot of overhead, especially when you can simply write:

SQL> insert into address_book values ('HARI KAIMAL', 2464724, 'BSNL STR QTRS');

1 row created.
Still why do we go to all the lengths, declaring the variable, formally beginning the transaction, assigning a value to our variable(s), and then issuing the insert statement?

It is obvious that there must be some advantages to this procedural approach. If simple insert, update and delete were all the things that were possible with Oracle, then it would have been OK... but for that we don't need Oracle, or for that matter, any expensive database! A few thousand lines of C code would be enough to implement such functionality, and flat files would serve as the database tables.

The first thing we realise is that database programming is not about issuing random queries. In a well-implemented database application, such occasions to issue one-off queries should be rare. It reveals some nefarious activity - namely, the back-end laundering of form data. Needless to say, it is equivalent to bypassing the safety checks and whatever logic you have built up in your database application. It should be avoided at all costs. (It is habitual for us at the IT Cell to consider all our applications as "web applications" whereas they are all, in the real sense, database applications. Considering them as web applications - considering that 99% of all network applications in the world are web-applications - insulates from a grave sin: the fact that we are not using the database to do things which it is best equipped to do. Instead we, devise our own ways of manipulating SQL before sending it to the database. And still pay $40,000 per processor for the Oracle licence).

If we want record-level data processing, for example, we can't do it with a one-off query (unless it is a straight update - in which case also a PL/SQL procedure is more desirable). An evident advantage is that we are executing the insert as a procedure, and not as a straight transaction. Such execution gives us a lot of latitude in what we do with the DML. To use a simple example, if we issue a bad DML in a one-off insert, such as

SQL> insert into address_book values (
2 'HARI KAIMAL', 2464724, 'BSNL STR QRTRS', 'TVM');
insert into address_book values (
*
ERROR at line 1:
ORA-00913: too many values

The database correctly points out the error, but the data is lost, and the entire DML is discarded. In a procedure, we can devise ways to deal with such a possibility.

Just as in a program written in a procedural language program, PL/SQL gives us a chance to handle exceptions. Exceptions are abnormal conditions that might occur in a program, but can be handled by the program. Here is one such candidate, where we have tried to insert a value into a nonexistent field. We can catch exceptions and handle the exceptions in a more friendly, and useful way. For this, we have to
  • Raise an error
  • Handle the error with our own (application) code
  • Give appropriate error message, or log the error in an error table.

Thus, we have a pretty useful record of the statement that caused the error. Otherwise, Oracle will simply move on after rejecting the DML. This is not a graceful way of dealing with errors that could be easily avoided.

Now that we have seen a little PL/SQL block, let us now observe what actually happens behind the scenes.
Note the last point: whatever SQL or PL/SQL that you have typed in is processed in the database. And that is a most interesting point. Parsing and syntax checking are done by the server and not by the sqlplus utility. In relation to our own case, it means the data processing load is shifted to the database side, where it should be done. There can be no other way.

And with such solid evidence, we are now in a position to appreciate why it is essential in a database application to go in for real database-side programming using stored procedures, functions, triggers, integrity constraints, and other advanced features provided by the horribly expensive Oracle. We can also appreciate why, even as developers, it is suicidal to modify data at the back-end, that is, the dangers of side-stepping the front-end, which is specifically built for that purpose. Starting PL/SQL blocks, and progressing successively on to stored procedures, functions and triggers, we shall climb the stairs to database programming heaven one step at a time.

But for all that we have to start somewhere. And here we have made start. With that comforting thought, we begin our ascent.

Archives

September 2006   January 2008  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]