Date: 09-01-2011 Subject: RELEASE 9.5A DATA Manager Files These release notes pertain to the following programs or files: SUNDM.EXE 9.5A 01 Sep 2011 9,5,1,500 SUNDM64.EXE 9.5A 01 Sep 2011 9,5,1,500 *============================================================================== The following files have been changed as noted: ------------------------------------------------------------------------------- SUNDM - Created a 64 Bit Data Manager that can provide support for a very large number of concurrent users to be logged on to the Data Manager. The 64 Bit Data Manager is only limited by the resources available on the system where it is being executed. - Corrected a GPF error that would occur if a DBFETCH was executed against a SQLite database and a column with no data (i.e. empty) was encountered. - Modified the GPF crash handler for a child thread to close all SQL database connections. This change must be done to avoid a situation where a SQL database could become inaccessible when a child thread crashed. - Corrected a problem where OS handles could be lost when a SCHEMA instruction was executed to import a schema at the Data Manager. Repeated execution of the SCHEMA instruction at the Data Manager could eventually cause an unexpected I85 or I86 error with a subcode of 203. This problem ONLY occurs when importing a schema from a SQLIO xml schema file. ------------------------------------------------------------------------------- SUNDM (Unix) - Modified the Unix Data Manager to provide support for the SQLite database engine. - Corrected a SEGV problem when executing a DBCONNECT that was redirected to execute at a Data Manager. ------------------------------------------------------------------------------- SUNDM REPLICATION SUPPORT - Modified the Data Manager Replication support to allow multiple channels between the Primary and Secondary/Backup servers. In addition, the Primary server has been modified to write all IO and command transactions into a WAL (write-ahead log) file. A specialized channel is used to transfer the WAL data from the Primary to the Secondary/Backup servers. The operation of replication support utilizing multiple channels and WAL file data is identified as version 2 style of replication. Operation of a replication server with a single channel as originally implemented is identified as version 1 style of replication. Please note the following regarding the operations of the 2 replication styles. 1. The version 2 can co-exist with a version 1 style of replication. This means that a Primary configured to execute using the version 2 style of replication can support a Secondary/Backup server that is configured for version 1 or 2 styles of replication. However, the performance of the Primary in this case can be diminished of the interaction with the version 1 style of Secondary/Backup servers. 2. The implementation of the version 2 style of replication requires the use of multiple threads and message queues to provide faster transaction processing. This implementation removes any bottlenecks the existed in the version 1 style of replication. 3. The version 2 style of replication provides the following benefits: - Although, the startup synchronization is single threaded. The use of the WAL file eliminates application halts during the synchronization processing. - The file open ID requests for PLB OPEN operations no longer wait on the message queue with other transactions. This eliminates the possibility of having an application halt during a PLB OPEN for a heavily IO oriented system. - IO transactions no longer halt when the replication transaction queue and the message queue are full. This prevents a PLB application from hanging on an IO instruction when the queues are full. - The managed and unmanaged replication operations are processed separately. This change eliminates all interactions where replication unmanaged transactions could affect the performance of the managed PLB IO operations. The version 2 unmanaged operations no longer affect the performance of the PLB program IO operations. - The unmanaged directory scanning no longer suspends the transaction processing. - A slow Backup or Secondary server no longer dictates the speed of replication. The replication services for any one Backup/Secondary server does not affect the replication services for another Backup/Secondary server. 4. The Primary server processing threads are described as follows: - Main Thread - Handles actions for the Sundm command message queue. - Creates data channel threads based on demand. - Handles file open ID requests when a PLB OPEN/PREP instruction is executed. - Reads the commands from the command message queue. - Handles and processes the general commands like the terminate ( -t, -f, ... etc ). - Handles and processes all transactions for all version 1 style servers. - Transaction Processing Thread - Reads the transactions from the Sundm child transaction message queue. - Writes the transactions to the WAL files. - Unmanaged Scanner Thread - Scans the unmanaged file directories. - Writes unmanaged transactions to the WAL files. - Data Channel Thread - Handles the message requests from a Backup/Secondary server. - Each data channel thread allocates a file cache memory block to avoid interactions with other data channels. 5. The Backup/Secondary server processing threads are described as follows: - Main Thread - Handles message commands that are received from the Primary replication server and Sundm. - WAL Transfer Thread - Transfers the current transactions from the Primary server queue to a WAL file on disk. - This thread uses a data channel that is connected to the Primary server. - WAL Managed Processing Thread - This thread processes all managed transactions that have been received from the Primary server and stored into the WAL files. Managed transactions are created as PLB program IO operations are executed. - This thread uses a data channel that is connected to the Primary server. - WAL Unmanaged Processing Thread - This thread processes all unmanaged transactions that have been received from the Primary server and stored into the WAL files. The unmanaged transactions are created from the unmanaged thread that scans files on the Primary server. - This thread processes any Secondary recovery scanning that can be configured to determine that no files are unexpectedly deleted at the Secondary. - This thread uses a data channel that is connected to the Primary server. - Idle File Scanner thread (Error List) - This thread processes all error and file recovery actions to resolve errors that have been detected during transactions processing. - This thread closes any files that are open and have had no I/O longer then the secondary open idle time. See the IDLE_CLOSE keyword. - This thread uses a data channel that is connected to the Primary server. 5. The WAL file support has been implemented for the version 2 style of replication in the Data Manager. The WAL file support improves the overall performance for both managed and unmanaged processes for the replication servers. It eliminates all bottlenecks, thread conflicts, and scenarios where the PLB applications could be slowed by excessive IO transaction processing required for the replication servers. The following points of interest provide basic information about the WAL files: - A WAL file has a 1KB header. - Data in a WAL file following the header is composed of 60KB data segments. The data segments include all transaction command messages required to replicate data files from the Primary. - By default, the number of segments in a single WAL file is limited to a 1000 segments. However, the WAL_SEGMENT_MAX keyword can be used to set the segment count from 100 to 100000 segments in the WAL file. - The old physical WAL files are deleted after all of the WAL file messages have been processed. - New application file and new directory information is embedded WAL messages to improve the managed transaction performance. 6. New keywords have been implemented for the Sundm.cfg that can be placed in the [replication] section to support the version 2 replication operations. The new keywords are defined as follows: V2_REPLICATION={ON|OFF} This keyword is used to turn the version 2 replication ON or OFF. By default, the version 2 replication is turned OFF. This keyword must be added to the [replication] section of the replication servers to use the version 2 replication. WAL_DIR={path} This keyword is used to define where the WAL files are placed. If this keyword is not specified, the WAL files are placed in the current working directory for the Data Manager. WAL_SEGMENT_MAX={max} This keyword is used to specify the maximum number of segments that are written into a single WAL file. If this keyword is not specified, the default maximum segment count is 1000 segments. When this keyword is specified, the {max} value must be from 100 to 100000 segments. Otherwise, it is set to the default of 1000 segments. If the maximum segment value is set to a lower value, the WAL files are removed at a higher frequency than when the maximum segment value is set to a higher value. WAL_SEGMENT_TIMEOUT={timeout} This keyword is used to define the maximum number of seconds that a dirty WAL log segment can remain in memory before it is written to a WAL file. If this keyword is not specified, the {timeout} defaults to be 5 seconds. When this keyword is specified, the {timeout} value can be from 1 to 3600 seconds. Otherwise, the {timeout} value is set to 5 seconds. Examples of new keywords: [Replication] V2_REPLICATION=ON WAL_DIR=c:\temp\wal WAL_SEGMENT_MAX=100 WAL_SEGMENT_TIMEOUT=10 7. New ADMIN data items have been added for the Data Manager. These keywords can be used in the AdmGetInfo instruction. The new keywords are described as follows: AdmitemSrvWalMain (136) Returns the current write position of the WAL file. AdmItemSrvWalMan (137) Returns the current processed position of the WAL file by the managed transaction handler on a Secondary/Backup server. AdmitemSrvWalUnMan (138) Returns the current processed position of the WAL file by the unmanaged transaction handler on a Secondary/Backup server. - Modified to display more information when checking for a rollover master by a primary replication server. - Corrected a deadlock condition that could occur at a Primary replication server if an end-user application closed a managed file at exactly the same time as the Primary replication server. -------------------------------------------------------------------------------