Thursday, February 14, 2008

Using Parallel Pipelined Functions in PL/SQL

Do you have a serial PL/SQL code reading data from table, doing complex computations and putting it back to the database?

DECLARE
BEGIN
FOR x IN (SELECT * FROM t) LOOP
-- Do complex calculations with x
INSERT INTO t2 (col1, col2) VALUES (x.calc1, x.calc2);
END LOOP;
END;

Want to speed it up? You can turn it to running in parallel using a pipelined function.
This feature allows you to read from a function as if it was a regular table and the best part of it is, that you can do it in parallel.
So, first you have to "change" the procedure logic a little, you will have to write a function doing the complex calculation part and returning the result as a cursor. Then you can issue one regular parallel DML command to insert the data to a new table.

Let's first create the tables.

CREATE TABLE t1 AS
SELECT object_id id, owner FROM all_objects
UNION ALL
SELECT object_id id, owner FROM all_objects;

CREATE TABLE t2 AS
SELECT id, owner, 111222333 random_number FROM t1 WHERE 1=0;

exec dbms_stats.gather_table_stats(ownname => user, tabname => 't1');

Then create the object types, that the pipelined function will be returning.

CREATE TYPE t2_type AS OBJECT (
id NUMBER,
owner VARCHAR2(30),
random_number NUMBER
)
/

CREATE TYPE t2_table_type AS TABLE OF t2_type
/

Now the pipelined function part.

CREATE OR REPLACE FUNCTION complex_processing(l_cursor IN sys_refcursor)
RETURN t2_table_type
PIPELINED PARALLEL_ENABLE(PARTITION l_cursor BY ANY)
IS
l_rec t1%ROWTYPE;
p_random_number NUMBER;
BEGIN
LOOP
FETCH l_cursor INTO l_rec;
EXIT WHEN l_cursor%NOTFOUND;
-- Complex calculation goes here
p_random_number:= DBMS_RANDOM.NORMAL;
--
PIPE ROW(t2_type(l_rec.id, l_rec.owner, p_random_number));
END LOOP;
CLOSE l_cursor;
RETURN;
END;
/

Here the PARALLEL_ENABLE clause tells Oracle that input can be processed in parallel and let Oracle to determine the best way to split up the data between parallel slaves. If you need some specific splitting of data, for example need that the slices are not overlapping, then modify the PARTITION clause.
First lets see what happens if we just issue a SELECT statement on this function:

SQL> set autot traceonly explain
SQL> SELECT * FROM TABLE(complex_processing(CURSOR(SELECT /*+PARALLEL(t1)*/ * FROM t1)));

----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 1148K| 20 (0)| 00:00:01 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | 8168 | 1148K| 20 (0)| 00:00:01 | Q1,00 | P->S | QC (RAND) |
| 3 | VIEW | | 8168 | 1148K| 20 (0)| 00:00:01 | Q1,00 | PCWP | |
| 4 | COLLECTION ITERATOR PICKLER FETCH| COMPLEX_PROCESSING | | | | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 83912 | 819K| 11 (10)| 00:00:01 | Q1,00 | PCWC | |
| 6 | TABLE ACCESS FULL | T1 | 83912 | 819K| 11 (10)| 00:00:01 | Q1,00 | PCWP | |
----------------------------------------------------------------------------------------------------------------------------------------

Looks nicely parallel to me!
Now, let's insert the data using parallel DML.

ALTER SESSION ENABLE PARALLEL DML;

INSERT /*+APPEND*/ INTO t2(id, owner, random_number)
SELECT * FROM TABLE(complex_processing(CURSOR(SELECT /*+PARALLEL(t1)*/ * FROM t1)));

COMMIT;

Done!

I first saw this technique in Tom Kyte's book Expert Oracle Database Architecture. Get this book!

No comments: