I’m currently working on a project that involves a system that is basically unknown to me. We also have to make changes (additions) to the system as part of our project. The system is a PL/SQL hard-shell application with all business logic and queries in the Oracle database and has about 1 million lines of code. Being a PL/SQL application, it has the great advantage that the source code is easily available for quick searches through views like all_source. We also know that all queries are in the database.
My requirement was to intercept and change the logic around the update of a specific column in a specific table. This means that I will have to find and examine all statements that inserts into or updates this table.
A quick search through all_source using the table name (where upper(text) like ‘%MYTABLE%’) gave way too many hits to handle. Searching for the column name itself is not sufficient as there may or may not be code doing insert and updates using a variable declared as a MyTable%rowtype as illustrated below:
declare lRowVariable MyTable%rowtype; begin select * into lRowVariable from MyTable where id = 1; lRowVariable.id := MyTable_seq.nextval; insert into MyTable values lRowVariable; end; /
These (and other) examples may not exist, but they will have to be checked out of the case.
So this gave me an opportunity to try out PL/Scope, which of 12c also collects information about SQL Statements.
To get going, all my source had to be recompiled to collect the PL/Scope-info:
alter <object_type> <object_name> compile PLSCOPE_SETTINGS='IDENTIFIERS:ALL';
I wrote a PL/SQL loop, but there are other ways like the UTL_RECOMP-package, utlrp.sql-script and so forth. You may also use the dependency-views to just recompile referring objects if your scope is limited.
After my initial testing I decided to just turn it on as default in our development-database, since it will obviously come in handy later:
alter system set plscope_settings = 'IDENTIFIERS:ALL' scope=both;
I would not recommend doing this on the production system, as the compile time increases a little bit.
I then started playing with the all_statements and all_identifiers-views. Getting a quick overview over the statements in the system is real easy and might give you some information that you’re not aware of:
-- Overview of statement types select type, count(*) from user_statements group by type order by 2 desc;
You might, for instance, have a look at execute immediates and so forth.
The nice thing is that the SQL_ID is the same that you can find in v$sql, and you can use it to find duplicate statements (as an indicator of duplicate code):
-- Listing duplicate SQL statements in the code: select distinct sql_id, count(*) occurrences, min(text) sql_text from user_statements where sql_id is not null group by sql_id having count(*) > 1 order by occurrences desc;
This also means that even though production runs on 11g, we can use 12c on development and test and still identify statements back and forth.
I also tested a quick way of getting the context where a particular query is located, displaying 5 lines before and after the line where the statement is (i used the materialize-hint to speed up the query):
with s as ( select /*+materialize*/ object_name, object_type, line from user_statements s where s.sql_id = '&1' ) select decode(src.line, s.line, '*****', ' . ') tag, src.name, src.type, src.line, src.text from user_source src join s on src.name = S.Object_Name and src.type = S.Object_Type and src.line between s.line -5 and s.line + 5 order by src.type, src.name, src.line;
Now – back to my case – I want to identify insert and update-statements related to a certain table. By combining the all_statements and all_identifiers-views and utilizing the hieriarcical structure, I came up with this query to list all statements inserting and updating my table:
WITH allReferences AS ( SELECT 'ID' plscope_type, ai.object_name, ai.object_type, ai.usage usage, ai.usage_id, ai.usage_context_id, ai.TYPE || ' ' || ai.name name, ai.line, ai.signature, NULL as sql_id FROM all_identifiers ai UNION ALL SELECT 'ST' plscope_type, st.object_name, st.object_type, st.TYPE usage, st.usage_id, st.usage_context_id, 'STATEMENT' name, st.line, st.signature, st.sql_id FROM all_statements st ) SELECT myStatements.object_name, myStatements.object_type, myStatements.usage, myStatements.line, myStatements.sql_id, st.FULL_TEXT FROM ( SELECT DISTINCT -- quick-fix for duplicates LEVEL reference_context_level, r.plscope_type, r.object_name, r.object_type, r.usage, r.usage_id, r.usage_context_id, r.line, r.name, r.signature, r.sql_id FROM allReferences r CONNECT BY PRIOR object_name = object_name AND PRIOR object_type = object_type AND PRIOR usage_context_id = usage_id AND LEVEL <= 2 START WITH r.signature = ( SELECT DISTINCT signature FROM all_identifiers i WHERE type = 'TABLE' AND name = 'MYTABLE' ) ) myStatements LEFT JOIN all_statements st on (myStatements.signature = st.signature) WHERE myStatements.reference_context_level = 2 -- We want the context where this is used AND myStatements.plscope_type = 'ST' -- We want statements AND myStatements.usage IN ('UPDATE', 'INSERT') ORDER BY myStatements.object_type, myStatements.object_name, myStatements.usage_id, myStatements.line
The list of queries from the above query was considerably shorter and well manageable: Well suited for a further manual inspection of the code. Notice that it will also list occurrences where MyTable is used as a select source in a insert into MyOtherTable select … from MyTable.
Of course – after doing all this work I realized that I could also just have searched the all_statements-view alone:
SELECT * FROM all_statements st WHERE st.type IN ('UPDATE', 'INSERT') AND st.full_text LIKE '%MYTABLE%';
This runs a lot faster than the large query above and basically gives the same result.
I have not checked out the PL/Scope-behaviour when you use synonyms, since I don’t have that in my code base.
We also have very few and controllable occurrences of dynamic SQL (EXECUTE IMMEDIATE and OPEN … FOR <string>). These are impossible to trace the contents of, but the location is recorded, so you know where to look. But it’s yet another reason to avoid dynamic SQL.
In this case, I’m lucky: All statements are in the database, so I can be in control. I would not have gotten far with this method if statements were coming from the client/web-server. In a same size system with SQL in the middle/end-tier this work would have taken days or maybe even weeks.
I’ll follow up with more posts where I use PL/Scope to identify dead and possibly dead code.