Generating data: union all vs insert all

To test out different physical table lay-outs, I had to repeatedly generate some test-data. After a few runs, I wanted to speed it up a bit. This is how I did it.

Snail with rocket

I needed to fill a table structure in a master-detail manner with some generated test-data. Basically, I could just use the master tables as a source and generate rows for the detail tables. In some cases, I needed several rows in the child table pr row in the master-table. My first take on this made me use a union-all statement structure like this:

insert into detail_table (id, text)
select id, text
from master_table
    union all
select id + 1000000, text || text
from master_table
... several union all's

As I was doing this with a million rows, and with several tables this took a bit of time, so I started wondering about a faster way to do this. That’s when I remembered that I could use an insert all-statement. The insert-all statement lets you insert into several tables in the same statement, but you can also insert into the same table, like this:

insert all
into detail_table (id, text)
    values (id, text)
into detail_table (id, text)
    values (id + 1000000, text || text)
select id, text
from master_table;

I tested it, and in my scenario I could cut my data generation time almost i half, just by that relatively simple change.

So – what’s the difference between the two methods? The union-all version does one table scan for each union-all-part:

-------------------------------------------------------------------------------------
| Id| Operation               | Name         | Rows | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT        |              |    2 |    40 |     4   (0)| 00:00:01 |
| 1 |  LOAD TABLE CONVENTIONAL| DETAIL_TABLE |      |       |            |          |
| 2 |   UNION-ALL             |              |      |       |            |          |
| 3 |    TABLE ACCESS FULL    | MASTER_TABLE |    1 |    20 |     2   (0)| 00:00:01 |
| 4 |    TABLE ACCESS FULL    | MASTER_TABLE |    1 |    20 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

But the insert-all version does just one table-scan and inserts multiple times:

------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT    |              |     1 |    20 |     2   (0)| 00:00:01 |
|   1 |  MULTI-TABLE INSERT |              |       |       |            |          |
|   2 |   TABLE ACCESS FULL | MASTER_TABLE |     1 |    20 |     2   (0)| 00:00:01 |
|   3 |   INTO              | DETAIL_TABLE |       |       |            |          |
|   4 |   INTO              | DETAIL_TABLE |       |       |            |          |
------------------------------------------------------------------------------------

Of course – if the source is a more complicated join, and/or you have several more child rows, the benefits may increase a lot more. Also, I’m doing this detail-table by detail-table, but could also have inserted into several different detail-tables from the same select statement.

Other things to keep in mind and check out are things like parallel dml, direct path, disabling foreign key constraints, indexes and more. But obviously, not reading the same data multiple times, may help a lot.

And why did I use union all, not just union? Because a union removes duplicates and to do that, the database has to do a costly sort-operation. Union all does not do this.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s