CDB catalog
We were ocassionally hitting “ORA-12269: client uses weak encryption/crypto-checksumming version” when creating CDB catalog with catcdb.sql script on Oracle database releases 19.13, 21.3 and 21.4 running on Oracle Linux 8.4. It wasn’t clear which operation was failing, because catcdb.sql had already deleted all the useful information. In order to obtain diagnostic information, we first analyzed how Oracle creates CDB catalog.
catcdb.sql
The following call to catcdb.sql creates a CDB catalog:
@?/rdbms/admin/catcdb.sql '/u00/oracle/orabase/admin/DB/create' 'catcdb.log'
catcdb.sql basically just calls the Perl program catcdb.pl:
...
column rdbms_admin_catcdb new_value rdbms_admin_catcdb noprint
select '&&rdbms_admin'||'&&slash'||'catcdb.pl' as rdbms_admin_catcdb from dual;
host perl -I &&rdbms_admin &&rdbms_admin_catcdb --logDirectory &&1 --logFilename &&2
catcdb.pl
The Perl program catcdb.pl creates the catalog by calling a series of SQL scripts. SQL Scripts are stored in an array of hash pointers:
my @START_SCRIPTS = (
{
LOG_FILENAME_OPT => 'catalog',
SCRIPT_COMMAND_OPT => sub { return 'catalog.sql' },
USERNAME_OPT => SYS_USER,
SCRIPT_DIRECTORY_OPT => $rdbms_admin,
},
{
LOG_FILENAME_OPT => 'catproc',
SCRIPT_COMMAND_OPT => sub { return 'catproc.sql' },
USERNAME_OPT => SYS_USER,
SCRIPT_DIRECTORY_OPT => $rdbms_admin,
},
{
LOG_FILENAME_OPT => 'catoctk',
SCRIPT_COMMAND_OPT => sub { return 'catoctk.sql '},
USERNAME_OPT => SYS_USER,
SCRIPT_DIRECTORY_OPT => $rdbms_admin,
},
{
LOG_FILENAME_OPT => 'owminst',
SCRIPT_COMMAND_OPT => sub { return 'owminst.plb' },
USERNAME_OPT => SYS_USER,
SCRIPT_DIRECTORY_OPT => $rdbms_admin,
},
...
catcdb.pl iterates through the array START_SCRIPTS and executes the configured scripts. Unfortunately, it neither parses the log files nor does it stop processing on error. Consequently, it deletes all the diagnostic information. So I was looking for a way to interrupt processing on error.
catcdb.pl calls catcon.pl which, in turn, imports functions from the Perl module catcon.pm. One of those functions calls sqlplus to execute catalog scripts.
catcon.pm
The function exec_DB_script is the central function that executes generated SQL scripts.
exec_DB_script arguments
There is an anomaly in this function, though, not related to our issue. I decided to include the information anyway.
The last parameter is the path to the sqlplus and is stored in the variable $sqlplus.
sub exec_DB_script (\@$) {
my ($statements, $marker, $DoneCmd, $DoneFilePathBase, $sqlplus) = @_;
But the $sqlplus variable is never used. Below is the call to sqlplus:
open my $Reader, "sqlplus /nolog \@$scriptFile | ";
“$” is missing in front of sqlplus. In Perl $sqlplus is the value of the variable called $sqlplus and sqlplus is just a string:
my $sqlplus = "/u00/oracle/orabase/product/21.3.0.0.0_a" ;
print "variable: $sqlplus \n" ;
print "string: sqlplus \n" ;
perl demo.pl
variable: /u00/oracle/orabase/product/21.3.0.0.0_a
string: sqlplus
The sqlplus path passed through the function argument was ignored and the sqlplus was resolved through the PATH variable. Since PATH was set correctly, the correct sqlplus was picked up, so it doesn’t make a difference in this case.
Interrupting exec_DB_script
Our goal was to check the environment and find out the command in the script where the problem happened. Therefore, I introduced the following code in the procedure exec_DB_script in the while loop that parses the output (I did those changes in the lab environment):
if ( $_ =~ m{ORA-12269} ) {
open(FH, '>', '/tmp/failed_' . time );
print FH "$scriptFile \n" ;
print FH "$ENV{PATH}\n" ;
print FH "$ENV{LD_LIBRARY_PATH}\n" ;
print FH join "\n", @Spool ;
close FH ;
die "Killed" ;
}
The code above spools the environment and stops the execution. After the execution stopped we examined the scripts and outputs in the working directory. The error was raised while establishing connection.
connect sys/"manager" as sysdba
Notice that this connection string produces a local connection with OS authentication. As long as you specify “as sysdba” and don’t specify the TNS connect string, you can provide any username and password:
sqlplus non-existing-user/invalid_password as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Wed Jan 12 13:27:10 2022
Version 19.13.0.0.0
Copyright (c) 1982, 2021, Oracle. All rights reserved.
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.13.0.0.0
SQL>
This is expected behavior, documented in the MOS note “Why Can I Login AS SYSDBA With Any Username and Password? (Doc ID 242258.1)”:
connect sys/”manager” as sysdba worked fine. Just when we ran it in a loop for a while, the error was raised. In contrast, we couldn’t reproduce the error with the usual connect string “/ as sysdba”.
Call Stack
We configured the event to gather the call stack on the error, which should provide a context where the error was generated:
event='12269 TRACE NAME ERRORSTACK LEVEL 3'
kgerelv()+137 call kgeade() 7F1FE0CB59C0 ? 7F1FE0CB5C08 ?
7F1FE0B50050 ? 000002FED ?
000000000 000000000
kserecl0()+189 call kgerelv() 7F1FE0CB59C0 ? 7F1FE0B50050 ?
7F1FE0B50050 ? 0134B62F0 ?
000000000 ? 000000000 ?
ksulbem()+454 call kserecl0() 7F1FE0CB59C0 ? 7F1FE0B50050 ?
7F1FE0B50050 ? 0000007FF
000000000 006E9AD80
opitsk()+3292 call ksulbem() 000000001 ? 000002FED ?
7FFFBA5E3C20 ? 0000007FF ?
000000000 ? 006E9AD80 ?
opiino()+936 call opitsk() 000000000 000000000
7FFFBA5E3C20 ? 0000007FF ?
000000000 ? 006E9AD80 ?
The lower functions on the stack beginning with “kge” and “kse” handle errors. Prima vista, the error seems to originate from ksulbem, which according to its naming convention shouldn’t have anything to do with error handling. ksulbem was called by opitsk, a generic function for handling local connections.
Since nobody outside Oracle seems to know about the purpose of this function, Stefan Koehler suggested tracing with Intel pin debugtrace.
By running debugtrace on sqlplus in a loop, we managed to capture the trace of a failed connection.
./pin -follow_execv -t source/tools/DebugTrace/obj-intel64/debugtrace.so -- sqlplus sys/"manager" as sysdba
Call /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:opitsk+0x00000000170f -> 0x00000000149e7a80 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:ksulbem(0x1, 0x2fed, ...)
| Call 0x00000000149e7c41 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:ksulbem+0x0000000001c1 -> 0x0000000000ea35a0 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:kserecl0(0x15a279c0, 0x15a279d4, ...)
Pay attention to the second argument of ksulbem: 0x2fed. 0x2fed is 12269 in decimal notation, which is our error code! Since the error code was passed to ksulbem, the error was not generated inthere. In other words, ksulbem is just an error handling function (“em” might stand for “error management”).
The error was generated before, in the function naedacc, when reading incoming packets:
Call 0x0000000014419a71 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:nsfull_pkt_rcv+0x0000000046f1 -> 0x00000000144d8870 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:naedacc(0x7ff1d79e4058, 0x7ff1d79e12e0, ...)
Return 0x00000000144d91d8 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:naedacc+0x000000000968 returns: 0x2fed
Call 0x0000000014419afc /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:nsfull_pkt_rcv+0x00000000477c -> 0x0000000006e32d40 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:nserrbc(0x7ff1d79df928, 0, ...)
nsfull_pkt_rcv called naedacc. naedacc is one of the functions handling encryption/decryption. It seems to be short-circuited – it exited quickly, without calling any other functions. Presumably, it found something in the byte stream that misled it to the wrong conclusion that weak ciphers are being used. As a result, it returned the error code 0x2fed.
This is how the naedacc stack looks like for a good execution (the indentation removed to improve the readability):
./pin -follow_execv -t source/tools/DebugTrace/obj-intel64/debugtrace.so -- sqlplus sys/"manager" as sysdba
Call 0x00000000144d8a4e /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:naedacc+0x0000000001de -> 0x00000000143420a0 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:naeaesd(0x7fcd39987c38, 0x7fcd399812e0, ...)
Call 0x00000000143420ee /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:naeaesd+0x00000000004e -> 0x0000000014faa9c0 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:ssMemMalloc(0x31, 0x7fcd399812e0, ...)
Tailcall 0x0000000014faa9d5 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:ssMemMalloc+0x000000000015 -> 0x00000000149ffca0 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:ksmem_malloc(0x31, 0x7fcd399812e0, ...)
Return 0x00000000149ffe62 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:ksmem_malloc+0x0000000001c2 returns: 0x7fcd3997f730
Call 0x000000001434211d /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:naeaesd+0x00000000007d -> 0x0000000006fa8480 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:ztcedec(0x7008001, 0x7fcd3998cd30, ...)
Call 0x0000000006fa84a7 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:ztcedec+0x000000000027 -> 0x0000000006fa84d0 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:ztcedec2(0x7008001, 0x7fcd3998cd30, ...)
Tailcall 0x0000000006fa84da /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:ztcedec2+0x00000000000a -> 0x000000001540f0b0 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:ztcecrypto_2(0x7008001, 0x7fcd3998cd30, ...)
Call 0x000000001540f135 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:ztcecrypto_2+0x000000000085 -> 0x000000001540f1f0 /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle:ztcei2(0x7ffede9f0b70, 0x7008001, ...)
...
We can see a long list of encryption handling functions (all beginning with “zt“, see Frits Hogland’s orafun.info“).
How does the program flow look like without encryption?
I’m setting TNS_ADMIN to a directory without sqlnet.ora (no encryption by default) and starting sqlplus with pin debugtrace:
export TNS_ADMIN=/tmp
./pin -follow_execv -t source/tools/DebugTrace/obj-intel64/debugtrace.so -- sqlplus sys/"manager" as sysdba
Expectedly, no naedacc call was made:
grep naedacc debugtrace.out
We indeed couldn’t reproduce the problem without encryption.
Summary
If you specify username and password despite using OS authentication, you might get “ORA-12269: client uses weak encryption/crypto-checksumming version”. The issue isn’t reproducible at will. I assume there’s a boundary condition where a byte in the encrypted stream is wrongly recognized as a flag that marks weak encryption algorithms. Comparing the condition in the function naedacc with the byte stream in SQLNet trace should provide a clue for a bug fix.
Specifying username and password with OS authentication is unnecessary. Therefore, the best way to avoid it to use the connect string without username and password, for example “/ as sysdba”.
CDB catalog creation, however, specifies username and password and may occasionally fail. The workaround consists of pointing TNS_ADMIN to a configuration without encryption before starting sqlplus that will run the catalog creation. Alternatively, you can point it to any directory without sqlnet.ora as encryption is switched off by default.
Update on January 20, 2022 – Internals
Meanwhile, I figured out the flag which triggers the error. I did the analysis on 21.3. The data structure that stores the flag might change with each release.
After logging in with sqlplus, I’m attaching with gdb to the dedicated server process:
attach 1506475
Attaching to process 1506475
Reading symbols from /u00/oracle/orabase/product/21.3.0.0.0_a/bin/oracle...
I’m setting the breakpoint at naedacc:
(gdb) b naedacc
Breakpoint 1 at 0x144d8870
(gdb) c
Continuing.
In the SQLPlus I’m running a query and hitting the breakpoint:
Breakpoint 1, 0x00000000144d8870 in naedacc ()
The CPU register contains the pointer to the structure that contains the flag that is tested for raising the error:
(gdb) p/x $rdi
$1 = 0x7fceede60058
The flag is stored in the byte 0x74 of the structure and is set to zero:
(gdb) x/u $rdi+0x74
0x7fceede600cc: 0
How can we verify that we’re looking at the correct location?
We can set it to a non-zero value and continue with the execution:
(gdb) set *0x7fceede600cc = 1
(gdb) x/u $rdi+0x74
0x7fceede600cc: 1
(gdb) c
Continuing.
Setting the flag to a non-zero value indeed causes the error:
SQL> select * from dba_users ;
select * from dba_users
*
ERROR at line 1:
ORA-12269: client uses weak encryption/crypto-checksumming version
First, we have to find out the flag address in the debugtrace output (the address remains constant for the session, but changes for each connection, when a new dedicated server process is forked.)
Below is the first call to naedacc during connection:
21.3.0.0.0_a/bin/oracle:nsfull_pkt_rcv+0x0000000046f1 -> 0x00000000144d8870 21.3.0.0.0_a/
bin/oracle:naedacc(0x7fb24c76c058, 0x7fb24c7692e0, ...)
Write *(UINT64*)0x00007ffec34c87a8 = 0x14419a76
Write *(UINT64*)0x00007ffec34c87a0 = 0x7ffec34c8c40
Write *(UINT64*)0x00007ffec34c8798 = 0x7fb24c756060
Write *(UINT64*)0x00007ffec34c8790 = 0x7fb24cef62e8
Write *(UINT64*)0x00007ffec34c8788 = 0
Write *(UINT64*)0x00007ffec34c8780 = 0x7fb24c759858
Write *(UINT64*)0x00007ffec34c8778 = 0x7fb24c767928
Read 0x7fb24f0f07c8 = *(UINT64*)0x00007fb24c76c070
Read 0x7fb24cef62e8 = *(UINT64*)0x00007fb24f0f0820
Read 0x20 = *(UINT8*)0x00007fb24cef62f1
Write *(UINT64*)0x00007ffec34c8770 = 0
Read 0x1 = *(UINT32*)0x00007fb24c76c0c8
Read 0x7fb24c76e6f0 = *(UINT64*)0x00007fb24c76c230
Read 0x7fb24c76fcf8 = *(UINT64*)0x00007fb24c76c238
Read 0x7fb24c76fc38 = *(UINT64*)0x00007fb24c76c240
Read 0 = *(UINT8*)0x00007fb24c769311
Read 0x3c = *(UINT64*)0x00007fb24c7562e8
Write *(UINT64*)0x00007fb24c7562e8 = 0x3b
Read 0x7fb24c76c058 = *(UINT64*)0x00007fb24c76e720
Read 0x1 = *(UINT32*)0x00007fb24c76c060
Read 0 = *(UINT8*)0x00007fb24c76eb34
Read 0 = *(UINT32*)0x00007fb24c76c0cc
Read 0x11 = *(UINT8*)0x00007fb24c76fc40
Read 0 = *(UINT32*)0x00007fb24c76fc70
Read 0 = *(UINT8*)0x00007fb24c76fc50
Write *(UINT64*)0x00007ffec34c8748 = 0x7fb24c76fcf8
The pointer to the structure 0x7fb24c76c058 is passed through the first parameter. We know this because the first argument is passed through the RDI CPU register according to the x86 calling convention, and we previously verified that the pointer is stored in the RDI register when entering naedacc. The flag is stored on the memory location 0x00007fb24c76c0cc (0x7fb24c76c058 + 0x74). We can see that a read from this location returned 0.
Next, we have to look in the debugtrace output where this location was set before naedacc call:
Tailcall 0x0000000007c9b024 21.3.0.0.0_a/bin/oracle:_intel_fast_memset.V+0x000000000004 -> 0x0000000007ca1fd0 21.3.0.0.0_a/bin/oracle:__intel_avx_rep_memset(0x7fb24c76c058, 0, ...)
Read 0x1c0 = *(UINT64*)0x0000000007ca2330
Write *(UINT64*)0x00007fb24c76c058 = 0
Read 0x8000 = *(UINT32*)0x000000001a46f528
Write *(UINT256)0x00007fb24c76c060 = 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
Write *(UINT256)0x00007fb24c76c080 = 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
Write *(UINT256)0x00007fb24c76c0a0 = 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
Write *(UINT256)0x00007fb24c76c0c0 = 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
The function __intel_fast_memset is too generic – we know that it initializes the memory locations, but it doesn’t tell us anything about the Oracle context. So we have to scroll further up until the next Oracle function:
Call 0x0000000006ede8b7 21.3.0.0.0_a/bin/oracle:nainit+0x000000000107 -> 0x0000000003e98f70 21.3.0.0.0_a/bin/oracle:ssMemCalloc(0x248, 0x1, ...)
The Oracle function nainit initialized the whole structure. Since after the initialization nobody has changed the flag, it has remained zero when naedacc was called. This means that in case of error, the flag is probably overwritten between nainit and naedacc calls. We need a trace of failed connection. Therefore, I’m currently running the following command in a loop:
pin -follow_execv -t source/tools/DebugTrace/obj-intel64/debugtrace.so -- sqlplus sys/manager as sysdba
As pin debugtrace with the memory option is extremely slow, the expected time to error is a couple of weeks.
Any news? 😀
The issue hasn’t appeared with pin debugtrace. Since the debugtrace has massively slowed down the process, there might be a timing component involved in reproducing the problem. As the error was thrown only when creating CDBs, we implemented the workaround of temporarily setting the TNS_ADMIN to a directory without the encryption configuration.