Getting in control with PL/Scope

I’m currently working on a project that involves a system that is basically unknown to me. We also have to make changes (additions) to the system as part of our project. The system is a PL/SQL hard-shell application with all business logic and queries in the Oracle database and has about 1 million lines of code. Being a PL/SQL application, it has the great advantage that the source code is easily available for quick searches through views like all_source. We also know that all queries are in the database.

My requirement was to intercept and change the logic around the update of a specific column in a specific table. This means that I will have to find and examine all statements that inserts into or updates this table.

A quick search through all_source using the table name (where upper(text) like ‘%MYTABLE%’) gave way too many hits to handle. Searching for the column name itself is not sufficient as there may or may not be code doing insert and updates using a variable declared as a MyTable%rowtype as illustrated below:

  lRowVariable MyTable%rowtype;
  select *
  into lRowVariable
  from MyTable
  where id = 1; := MyTable_seq.nextval;
  insert into MyTable
  values lRowVariable;

These (and other) examples may not exist, but they will have to be checked out of the case.

So this gave me an opportunity to try out PL/Scope, which of 12c also collects information about SQL Statements.

To get going, all my source had to be recompiled to collect the PL/Scope-info:

alter <object_type> <object_name> compile PLSCOPE_SETTINGS='IDENTIFIERS:ALL';

I wrote a PL/SQL loop, but there are other ways like the UTL_RECOMP-package, utlrp.sql-script and so forth. You may also use the dependency-views to just recompile referring objects if your scope is limited.

After my initial testing I decided to just turn it on as default in our development-database, since it will obviously come in handy later:

alter system set plscope_settings = 'IDENTIFIERS:ALL' scope=both;

I would not recommend doing this on the production system, as the compile time increases a little bit.

I then started playing with the all_statements and all_identifiers-views. Getting a quick overview over the statements in the system is real easy and might give you some information that you’re not aware of:

-- Overview of statement types
select type, count(*)
from user_statements
group by type
order by 2 desc;

You might, for instance, have a look at execute immediates and so forth.

The nice thing is that the SQL_ID is the same that you can find in v$sql, and you can use it to find duplicate statements (as an indicator of duplicate code):

-- Listing duplicate SQL statements in the code:
select distinct sql_id, count(*) occurrences, min(text) sql_text
from user_statements
where sql_id is not null
group by sql_id
having count(*) > 1
order by occurrences desc;

This also means that even though production runs on 11g, we can use 12c on development and test and still identify statements back and forth.

I also tested a quick way of getting the context where a particular query is located, displaying 5 lines before and after the line where the statement is (i used the materialize-hint to speed up the query):

with s as (
 select /*+materialize*/ object_name, object_type, line
 from user_statements s
 where s.sql_id = '&1'
select decode(src.line, s.line, '*****', ' . ') tag,, src.type, src.line, src.text
from user_source src
join s on = S.Object_Name 
  and src.type = S.Object_Type 
  and src.line between s.line -5 and s.line + 5
order by src.type,, src.line;

Now – back to my case – I want to identify insert and update-statements related to a certain table. By combining the all_statements and all_identifiers-views and utilizing the hieriarcical structure, I came up with this query to list all statements inserting and updating my table:

allReferences AS (
 'ID' plscope_type,
 ai.usage usage,
 ai.TYPE || ' ' || name,
 NULL as sql_id
 FROM all_identifiers ai
 'ST' plscope_type,
 st.TYPE usage,
 'STATEMENT' name,
 FROM all_statements st
 SELECT DISTINCT -- quick-fix for duplicates
 LEVEL reference_context_level,
 FROM allReferences r
 CONNECT BY PRIOR object_name = object_name 
 AND PRIOR object_type = object_type 
 AND PRIOR usage_context_id = usage_id
 START WITH r.signature = (
 FROM all_identifiers i
 WHERE type = 'TABLE' AND name = 'MYTABLE'
) myStatements
LEFT JOIN all_statements st on (myStatements.signature = st.signature)
WHERE myStatements.reference_context_level = 2 -- We want the context where this is used
 AND myStatements.plscope_type = 'ST' -- We want statements
 AND myStatements.usage IN ('UPDATE', 'INSERT')

The list of queries from the above query was considerably shorter and well manageable: Well suited for a further manual inspection of the code. Notice that it will also list occurrences where MyTable is used as a select source in a insert into MyOtherTable select … from MyTable.

Of course – after doing all this work I realized that I could also just have searched the all_statements-view alone:

FROM all_statements st
 AND st.full_text LIKE '%MYTABLE%';

This runs a lot faster than the large query above and basically gives the same result.

I have not checked out the PL/Scope-behaviour when you use synonyms, since I don’t have that in my code base.

We also have very few and controllable occurrences of dynamic SQL (EXECUTE IMMEDIATE and OPEN … FOR <string>). These are impossible to trace the contents of, but the location is recorded, so you know where to look. But it’s yet another reason to avoid dynamic SQL.

In this case, I’m lucky: All statements are in the database, so I can be in control. I would not have gotten far with this method if statements were coming from the client/web-server. In a same size system with SQL in the middle/end-tier this work would have taken days or maybe even weeks.

I’ll follow up with more posts where I use PL/Scope to identify dead and possibly dead code.

For more on PL/Scope, check out the fine blogs of Steven Feuerstein, Philipp Salvisberg (who also has a great git-repo and additions to SQL Developer), Tim Hall and of course the documentation.

ODC Appreciation Day : Live SQL

During a session on Oracle OpenWorld I had a pleasant time watching Keith Laker use extensively to walk through a tutorial on SQL pattern matching. Very powerful indeed! Both the pattern matching and the way the tutorial was built up.

So, having used it a little bit before, I thought I’d share an example of the in-database archiving functionality in 12c with my team, using Live SQL.

I first jotted down an example on my own, since I like to work through stuff on my own to explore caveats which sometimes are not obvious in the examples. I was surprised when my example didn’t work and spent a bit of time going through documentation.

I also tested an example from the fabulous library of Tim Hall‘s blog posts – and that didn’t work on Live SQL either.

However – both examples worked flawlessly in my VirtualBox database.

Going through the Help-pages of Live SQL i realized that this functionality probably isn’t supported due to session privileges. In fact – I found an example, added by Oracle, a couple of weeks ago demonstrating in-database archiving, where I found the following comment:

Note that the WHERE clause has been added for demonstration purposes in Live SQL. Outside the Live SQL environment the WHERE clause would not be needed because the ROW ARCHIVAL VISIBILITY session setting can be determined.

So – bottom lines:

  • In-database archiving works, but not on Live SQL.
  • There are already examples lying around on Live SQL if you want to save time.
  • Live SQL is a great platform to share knowledge examples! Try it out!

Selected to present on OUG Scotland 2017

I’m honoured to be selected to present twice on OUG Scotland 2017 in Glasgow in june:

Happily I get to deliver my favourite topic “Scalability, Maintainability, Correctness? Easier with #ThickDB“. In this session I’ll look at the Thick DB-paradigm and explain the major negative effects of not adhering to it it will have in a data-lifecycle perspective. We’ll take a red pill and look into The Matrix and see what really happens.

Sadly I’ll be doing that at the same time as Jonathan Lewis talks about “Reading Parallel Execution Plans”. I would really have liked to get a refresh on that topic.

In the afternoon, I’ll talk about a proof of concept (PoC) that we did in the last year: “Consolidating large critical medical databases with minimum downtime – a story from Norway“. The goal was to find a way to migrate 9 medical databases into one of about 100+ TB with a downtime of less that 4 hours for every migration. That includes replacing billions of keys in the process. This was a fun project where we could really test and evaluate different techniques of moving data from several sources into on large one. By applying better techiques and automation we reduced the data-loading downtime window from 15+ hours to 1-2.

I held both of these on OUGN’s spring seminar this year, and this time I assume that the latter session will be unobstructed by (the ferry’s) safety-announcements. 🙂