MOLT Replicator

On this page Carat arrow pointing down

MOLT Replicator continuously replicates changes from source databases to CockroachDB as part of a database migration. It supports live ongoing migrations to CockroachDB from a source database, and enables backfill from CockroachDB to your source database for failback scenarios to preserve a rollback option during a migration window.

MOLT Replicator consumes change data from CockroachDB changefeeds, PostgreSQL logical replication streams, MySQL GTID-based replication, and Oracle LogMiner. It applies changes to target databases while maintaining configurable consistency , and features an embedded TypeScript/JavaScript environment for configuration and live data transforms.

Supported databases

The following source databases are supported:

  • CockroachDB
  • PostgreSQL 11-16
  • MySQL 5.7, 8.0 and later
  • Oracle Database 19c (Enterprise Edition) and 21c (Express Edition)

The following target databases are supported:

  • CockroachDB
  • PostgreSQL
  • MySQL
  • Oracle

Installation

To install MOLT, download the binary that matches your architecture and source database:

Operating System Architecture PostgreSQL/MySQL Oracle (Preview)
Windows AMD 64-bit Download N/A
ARM 64-bit Download N/A
Linux AMD 64-bit Download Download
ARM 64-bit Download N/A
Mac AMD 64-bit Download N/A
ARM 64-bit Download N/A

The following binaries are included:

  • molt
  • replicator

Both molt and replicator must be in your current working directory.

To display the current version of each binary, run molt --version and replicator --version.

For previous binaries, refer to the MOLT version manifest. For release details, refer to the MOLT changelog.

Docker images

MOLT Replicator

Docker images for MOLT Replicator are also available as a standalone binary:

icon/buttons/copy
docker pull cockroachdb/replicator

To pull a specific version (for example, v1.1.1):

icon/buttons/copy
docker pull cockroachdb/replicator:v1.1.1

For details on running in Docker, refer to Docker usage.

Setup

Complete the following items before using MOLT Replicator:

  • Follow the recommendations in Best practices.

  • Configure the source database for replication based on your source type. Refer to the migration workflow tutorials for PostgreSQL, MySQL, Oracle, and CockroachDB sources.

  • Ensure that the SQL user running MOLT Replicator has appropriate privileges on the source and target databases, as described in the migration workflow tutorials.

Docker usage

For details on pulling Docker images, refer to Docker images.

Performance

MOLT Fetch, Verify, and Replicator are likely to run more slowly in a Docker container than on a local machine. To improve performance, increase the memory or compute resources, or both, on your Docker container.

Local connection strings

When testing locally, specify the host as follows:

  • For macOS, use host.docker.internal. For example:

    --source 'postgres://postgres:postgres@host.docker.internal:5432/molt?sslmode=disable'
    --target "postgres://root@host.docker.internal:26257/molt?sslmode=disable"
    
  • For Linux and Windows, use 172.17.0.1. For example:

    --source 'postgres://postgres:postgres@172.17.0.1:5432/molt?sslmode=disable'
    --target "postgres://root@172.17.0.1:26257/molt?sslmode=disable"
    

How it works

Failback from CockroachDB (start): MOLT Replicator acts as an HTTP webhook sink for a CockroachDB changefeed. To avoid mixing streams of incoming data, only one changefeed should point to Replicator at a time. Replicator receives mutations from source cluster nodes, can optionally buffer them in a CockroachDB staging cluster, and then applies time-ordered transactional batches to the target database. Mutations are applied as UPSERT or DELETE statements while respecting foreign-key and table dependencies.

PostgreSQL source (pglogical): MOLT Replicator uses PostgreSQL logical replication, which is based on publications and replication slots. You create a publication for the target tables, and a slot marks consistent replication points. MOLT Replicator consumes this logical feed directly and applies the data in sorted batches to the target.

MySQL source (mylogical): MOLT Replicator relies on MySQL GTID-based replication to read change data from MySQL binlogs. It works with MySQL versions that support GTID-based replication and applies transactionally consistent feeds to the target. Binlog features that do not use GTIDs are not supported.

Oracle source (oraclelogminer): MOLT Replicator uses Oracle LogMiner to capture change data from Oracle redo logs, supporting replication for a single database user/schema. Both multitenant (CDB) and non-CDB Oracle architectures are supported. Replicator periodically queries LogMiner-populated views and processes transactional data in ascending SCN windows for reliable throughput while maintaining consistency.

Consistency modes

MOLT Replicator offers three consistency modes for balancing throughput and transactional guarantees:

  1. Consistent (default for CockroachDB sources): Preserves per-row order and source transaction atomicity. Concurrent transactions are controlled by --parallelism.

  2. BestEffort: Relaxes atomicity across tables that do not have foreign key constraints between them (maintains coherence within FK-connected groups). Enable with --bestEffortOnly or allow auto-entry via --bestEffortWindow set to a positive duration (e.g., 1s).

  3. Immediate (default for PostgreSQL, MySQL, and Oracle sources): Applies updates as they arrive to Replicator with no buffering or waiting for resolved timestamps. Provides highest throughput but requires no foreign keys on the target schema.

Commands

Command Usage
start Start the replicator in CockroachDB-to-target (failback) mode.
pglogical Start replication from a PostgreSQL source using logical replication.
mylogical Start replication from a MySQL source using GTID-based replication.
oraclelogminer Start replication from an Oracle source using LogMiner.

Flags

Replication flags

Flag Type Description
--applyTimeout DURATION The maximum amount of time to wait for an update to be applied.

Default: 30s
--dlqTableName IDENT The name of a table in the target schema for storing dead-letter entries.

Default: replicator_dlq
--enableParallelApplies BOOL Enable parallel application of independent table groups during replication. By default, applies are synchronous. When enabled, this increases throughput at the cost of higher target pool usage and memory usage.

Default: false
--flushPeriod DURATION Flush queued mutations after this duration.

Default: 1s
--flushSize INT Ideal batch size to determine when to flush mutations.

Default: 1000
--gracePeriod DURATION Allow background processes to exit.

Default: 30s
--logDestination STRING Write logs to a file. If not specified, write logs to stdout.
--logFormat STRING Choose log output format: "fluent", "text".

Default: "text"
--maxRetries INT Maximum number of times to retry a failed mutation on the target (for example, due to contention or a temporary unique constraint violation) before treating it as a hard failure.

Default: 10
--metricsAddr STRING A host:port on which to serve metrics and diagnostics. The metrics endpoint is http://{host}:{port}/_/varz.
--parallelism INT The number of concurrent database transactions to use.

Default: 16
--quiescentPeriod DURATION How often to retry deferred mutations.

Default: 10s
--retireOffset DURATION How long to delay removal of applied mutations.

Default: 24h0m0s
--retryInitialBackoff DURATION Initial delay before the first retry attempt when applying a mutation to the target database fails due to a retryable error, such as contention or a temporary unique constraint violation.

Default: 25ms
--retryMaxBackoff DURATION Maximum delay between retry attempts when applying mutations to the target database fails due to retryable errors.

Default: 2s
--retryMultiplier INT Multiplier that controls how quickly the backoff interval increases between successive retries of failed applies to the target database.

Default: 2
--scanSize INT The number of rows to retrieve from the staging database used to store metadata for replication.

Default: 10000
--schemaRefresh DURATION How often a watcher will refresh its schema. If this value is zero or negative, refresh behavior will be disabled.

Default: 1m0s
--sourceConn STRING The source database's connection string. When replicating from Oracle, this is the connection string of the Oracle container database (CDB). Refer to Oracle replication flags.
--stageDisableCreateTableReaderIndex BOOL Disable the creation of partial covering indexes to improve read performance on staging tables. Set to true if creating indexes on existing tables would cause a significant operational impact.

Default: false
--stageMarkAppliedLimit INT Limit the number of mutations to be marked applied in a single statement.

Default: 100000
--stageSanityCheckPeriod DURATION How often to validate staging table apply order (-1 to disable).

Default: 10m0s
--stageSanityCheckWindow DURATION How far back to look when validating staging table apply order.

Default: 1h0m0s
--stageUnappliedPeriod DURATION How often to report the number of unapplied mutations in staging tables (-1 to disable).

Default: 1m0s
--stagingConn STRING The staging database's connection string.
--stagingCreateSchema Automatically create the staging schema if it does not exist.
--stagingIdleTime DURATION Maximum lifetime of an idle connection.

Default: 1m0s
--stagingJitterTime DURATION The time over which to jitter database pool disconnections.

Default: 15s
--stagingMaxLifetime DURATION The maximum lifetime of a database connection.

Default: 5m0s
--stagingMaxPoolSize INT The maximum number of staging database connections.

Default: 128
--stagingSchema STRING Name of the CockroachDB schema that stores replication metadata. Required each time replicator is rerun after being interrupted, as the schema contains a checkpoint table that enables replication to resume from the correct transaction.

Default: _replicator.public
--targetApplyQueueSize INT Size of the apply queue that buffers mutations before they are written to the target database. Larger values can improve throughput, but increase memory usage. This flag applies only to CockroachDB and PostgreSQL (pglogical) sources, and replaces the deprecated --copierChannel and --stageCopierChannelSize flags.
--targetConn STRING The target database's connection string.
--targetIdleTime DURATION Maximum lifetime of an idle connection.

Default: 1m0s
--targetJitterTime DURATION The time over which to jitter database pool disconnections.

Default: 15s
--targetMaxLifetime DURATION The maximum lifetime of a database connection.

Default: 5m0s
--targetMaxPoolSize INT The maximum number of target database connections.

Default: 128
--targetSchema STRING The SQL database schema in the target cluster to update.
--targetStatementCacheSize INT The maximum number of prepared statements to retain.

Default: 128
--taskGracePeriod DURATION How long to allow for task cleanup when recovering from errors.

Default: 1m0s
--timestampLimit INT The maximum number of source timestamps to coalesce into a target transaction.

Default: 1000
--userscript STRING The path to a TypeScript configuration script. For example, --userscript 'script.ts'.
-v, --verbose COUNT Increase logging verbosity to debug; repeat for trace.
PostgreSQL replication flags

The following flags are used when replicating from a PostgreSQL source database.

Flag Type Description
--publicationName STRING The publication within the source database to replicate.
--slotName STRING The replication slot in the source database.

Default: "replicator"
--standbyTimeout DURATION How often to report WAL progress to the source server.

Default: 5s
MySQL replication flags

The following flags are used when replicating from a MySQL source database.

Flag Type Description
--defaultGTIDSet STRING Default GTID set, in the format source_uuid:min(interval_start)-max(interval_end). Required the first time replicator is run, as the GTID set provides a replication marker for streaming changes.
--fetchMetadata Fetch column metadata explicitly, for older versions of MySQL that do not support binlog_row_metadata.
--replicationProcessID UINT32 The replication process ID to report to the source database.

Default: 10
Oracle replication flags

The following flags are used when replicating from an Oracle source database.

Flag Type Description
--sourceSchema STRING Required. Source schema name on Oracle where tables will be replicated from.
--scn INT The snapshot System Change Number (SCN) queried by MOLT Fetch for the initial data load.
--backfillFromSCN INT The SCN of the earliest active transaction at the time of the initial snapshot. Ensures no transactions are skipped when starting replication from Oracle.
--sourcePDBConn STRING Connection string for the Oracle pluggable database (PDB). Only required when using an Oracle multitenant configuration. --sourceConn must be included.
--schema-filter STRING Restricts replication to the specified Oracle PDB schema (user). Set to the PDB user that owns the tables you want to replicate. Without this flag, replication will be attempted on tables from other users.
--oracle-application-users STRING List of Oracle usernames responsible for DML transactions in the PDB schema. Enables replication from the latest-possible starting point. Usernames are case-sensitive and must match the internal Oracle usernames (e.g., PDB_USER).
Failback replication flags

The following flags are used for failback from CockroachDB.

Flag Type Description
--assumeIdempotent Disable the extra staging table queries that debounce non-idempotent redelivery in changefeeds.
--bestEffortOnly Eventually-consistent mode; useful for high-throughput, skew-tolerant schemas with foreign keys.
--bestEffortWindow DURATION Use an eventually-consistent mode for initial backfill or when replication is behind; 0 to disable.

Default: 1h0m0s
--bindAddr STRING The network address to bind to.

Default: ":26258"
--disableAuthentication Disable authentication of incoming Replicator requests; not recommended for production.
--enableCheckpointStream Enable checkpoint streaming (use an internal changefeed from the staging table for real-time updates), rather than checkpoint polling (query the staging table for periodic updates), for failback replication.

Default: false (use checkpoint polling)
--discard Dangerous: Discard all incoming HTTP requests; useful for changefeed throughput testing. Not intended for production.
--discardDelay DURATION Adds additional delay in discard mode; useful for gauging the impact of changefeed round-trip time (RTT).
--healthCheckTimeout DURATION The timeout for the health check endpoint.

Default: 5s
--httpResponseTimeout DURATION The maximum amount of time to allow an HTTP handler to execute.

Default: 2m0s
--immediate Bypass staging tables and write directly to target; recommended only for KV-style workloads with no foreign keys.
--limitLookahead INT Limit number of checkpoints to be considered when computing the resolving range; may cause replication to stall completely if older mutations cannot be applied.
--ndjsonBufferSize INT The maximum amount of data to buffer while reading a single line of ndjson input; increase when source cluster has large blob values.

Default: 65536
--tlsCertificate STRING A path to a PEM-encoded TLS certificate chain.
--tlsPrivateKey STRING A path to a PEM-encoded TLS private key.
--tlsSelfSigned If true, generate a self-signed TLS certificate valid for localhost.

Usage

The following sections describe how to use MOLT Replicator with different source database types.

Source and target databases

Tip:

Follow the recommendations in Connection strings.

--sourceConn

--sourceConn specifies the connection string of the source database.

CockroachDB:

icon/buttons/copy
--sourceConn 'postgresql://{username}:{password}@{host}:{port}/{database}'

PostgreSQL:

icon/buttons/copy
--sourceConn 'postgresql://{username}:{password}@{host}:{port}/{database}'

MySQL:

icon/buttons/copy
--sourceConn 'mysql://{username}:{password}@{protocol}({host}:{port})/{database}'

Oracle:

icon/buttons/copy
--sourceConn 'oracle://{username}:{password}@{host}:{port}/{service_name}'

In Oracle migrations, the --sourceConn connection string specifies a CDB and the --sourcePDBConn connection string specifies a PDB (in Oracle Multitenant databases). The {username} corresponds to the owner of the tables you will migrate.

--targetConn

--targetConn specifies the target database connection string:

CockroachDB and PostgreSQL:

icon/buttons/copy
--targetConn 'postgresql://{username}:{password}@{host}:{port}/{database}'

MySQL:

icon/buttons/copy
--targetConn 'mysql://{username}:{password}@{protocol}({host}:{port})/{database}'

Oracle:

icon/buttons/copy
--targetConn 'oracle://{username}:{password}@{host}:{port}/{service_name}'

Data transformations

Use --userscript to specify the path to a TypeScript/JavaScript file for data transformations and filtering. The userscript provides an embedded environment for configuration and live data transforms during replication.

Note:

When filtering out tables in a schema with a userscript, Oracle replication performance may decrease because filtered tables are still included in LogMiner queries and processed before being discarded.

Resume replication

MOLT Replicator can resume replication if it is interrupted. Specify the staging schema with the --stagingSchema flag. MOLT Replicator uses the schema as a replication marker for streaming changes.

icon/buttons/copy
--stagingSchema {schema_name}

You must include the --stagingSchema flag when resuming replication, as the schema provides checkpointing data that enables replication to resume from the correct point.

For detailed steps, refer to Resume replication.

Staging schema

The staging schema stores replication metadata, checkpoint information, and buffered mutations during replication. MOLT Replicator uses this schema to:

  • Track replication progress: Store checkpoints that enable resuming from the correct point after interruptions
  • Buffer mutations: Temporarily store data changes before applying them to the target in transaction order
  • Maintain consistency: Ensure MVCC time-ordered transactional batches while respecting table dependencies
  • Enable recovery: Provide restart capabilities with robust failure handling

For fresh replication runs, clean out the staging database memo table to avoid conflicts with previous checkpoints. Use --stagingCreateSchema to automatically create the staging schema if it doesn't exist.

Best practices

Test and validate

To verify that your connections and configuration work properly, run MOLT Replicator in a staging environment before replicating any data in production. Use a test or development environment that closely resembles production.

Connection strings

URL-encode the connection strings for the source and target databases. This ensures that the MOLT tools can parse special characters in your password.

To keep your database credentials out of shell history and logs, follow these best practices when specifying your source and target connection strings:

  • Avoid plaintext connection strings.

  • URL-encode connection strings for the source database and CockroachDB so special characters in passwords are handled correctly.

    • Given a password a$52&, pass it to the molt escape-password command with single quotes:

      icon/buttons/copy
      molt escape-password --password 'a$52&'
      

      Use the encoded password in your --source connection string. For example:

      --source 'postgres://migration_user:a%2452%26@localhost:5432/replicationload'
      
  • Provide your connection strings as environment variables. For example:

    export SOURCE="postgres://migration_user:a%2452%26@localhost:5432/molt?sslmode=verify-full"
    export TARGET="postgres://root@localhost:26257/molt?sslmode=verify-full"
    

    Afterward, reference the environment variables as follows:

    --source $SOURCE
    --target $TARGET
    
  • If possible, use an external secrets manager to load the environment variables from stored secrets.

Optimize performance

Configure the following replicator flags to optimize replication throughput and resource usage. Test different combinations in a staging environment to find the optimal balance of stability and performance for your workload.

Note:

The following parameters apply to PostgreSQL, Oracle, and CockroachDB (failback) sources.

Flag Description
--parallelism Control the maximum number of concurrent target transactions. Higher values increase throughput but require more target connections. Start with a conservative value and increase based on target database capacity.
--flushSize Balance throughput and latency. Controls how many mutations are batched into each query to the target. Increase for higher throughput at the cost of higher latency.
--targetApplyQueueSize Control memory usage during operation. Increase to allow higher throughput at the expense of memory; decrease to apply backpressure and limit memory consumption.
--targetMaxPoolSize Set larger than --parallelism by a safety factor to avoid exhausting target pool connections. Cockroach Labs recommends setting parallelism to 80% of this value.
--collapseMutations Reduce the number of queries to the target by combining multiple mutations on the same primary key within each batch. Disable only if exact mutation order matters more than end state.
--enableParallelApplies Improve apply throughput for independent tables and table groups that share foreign key dependencies. Increases memory and target connection usage, so ensure you increase --targetMaxPoolSize or reduce --parallelism.
--flushPeriod Set to the maximum allowable time between flushes (for example, 10s if data must be applied within 10 seconds). Works with --flushSize to control when buffered mutations are committed to the target.
--quiescentPeriod Lower this value if constraint violations resolve quickly on your workload to make retries more frequent and reduce latency. Do not lower if constraint violations take time to resolve.
--scanSize Applies to failback (replicator start) scenarios only. Balance memory usage and throughput. Increase to read more rows at once for higher throughput, at the cost of memory pressure. Decrease to reduce memory pressure and increase stability.

Monitoring and observability

  • Monitor lag metrics (source_lag_seconds, target_lag_seconds) and apply/stage duration/error counters to track replication performance. Set up Prometheus and Grafana to poll the metrics and view them in a dashboard. Use the published Grafana dashboard to visualize these metrics.

  • Pay close attention to warning and error level logging, as it indicates when Replicator is misbehaving. Enable trace logging with -vv for additional visibility when troubleshooting.

By default, MOLT Replicator exports Prometheus metrics at the address specified by --metricsAddr (default :30005) at the path /_/varz. For example: http://localhost:30005/_/varz.

Cockroach Labs recommends monitoring the following metrics during replication:

Metric Name Description
source_lag_seconds CockroachDB sources only: Time between when an incoming resolved MVCC timestamp originated on the source CockroachDB cluster and when it was received by Replicator.
target_lag_seconds CockroachDB sources only: End-to-end lag from when an incoming resolved MVCC timestamp originated on the source CockroachDB to when all data changes up to that timestamp were written to the target database.
source_lag_seconds_histogram CockroachDB sources: Same as source_lag_seconds but stored as a histogram for analyzing distributions over time.
Non-CockroachDB sources: Time between when a source transaction is committed and when its COMMIT transaction log arrives at Replicator.
target_lag_seconds_histogram CockroachDB sources: Same as target_lag_seconds but stored as a histogram for analyzing distributions over time.
Non-CockroachDB sources: End-to-end lag from when a source transaction is committed to when its changes are fully written to the target CockroachDB.
replicator_applier_mutations_staged Number of mutations that have been staged for application to the target database.
replicator_applier_mutations_applied Number of mutations that have been successfully applied to the target database.

You can use the Replicator Grafana dashboard to visualize these metrics. For Oracle-specific metrics, import this Oracle Grafana dashboard.

To check MOLT Replicator health, run curl http://localhost:30005/_/healthz. This returns a status code of 200 if Replicator is running.

Logging

By default, MOLT Replicator writes two streams of logs: operational logs to stdout (including warning, info, trace, and some errors) and final errors to stderr.

Redirect both streams properly to ensure all logs are captured for troubleshooting:

icon/buttons/copy
# Merge both streams to console
./replicator ... 2>&1

# Redirect both streams to a file
./replicator ... > output.log 2>&1

# Merge streams to console while saving to file
./replicator > >(tee replicator.log) 2>&1

# Use logDestination flag to write all logs to a file
./replicator --logDestination replicator.log ...

Security

Cockroach Labs strongly recommends the following:

Connection security and credentials

  • Use TLS-enabled connection strings for --sourceConn, --targetConn, and --stagingConn parameters. Remove sslmode=disable from production connection strings.
  • For webhook endpoints (failback scenarios), configure server certificates using --tlsCertificate and --tlsPrivateKey to specify the certificate and private key file paths.
  • Configure proper TLS certificates in CockroachDB changefeed webhook URLs instead of using insecure_tls_skip_verify=true.
  • If relevant, enable JWT authentication for additional security.

CockroachDB changefeeds

For failback scenarios, generate TLS certificates using self-signed certificates, certificate authorities like Let's Encrypt, or your organization's certificate management system.

Encode certificates for changefeed webhook URLs:

  • Webhook URLs: Use both URL encoding and base64 encoding: base64 -i ./client.crt | jq -R -r '@uri'
  • Non-webhook contexts: Use base64 encoding only: base64 -w 0 ca.cert

Client certificates in changefeed webhook URLs must correspond to server certificates in MOLT Replicator configuration to ensure proper TLS handshake.

Production considerations

  • Avoid --disableAuthentication and --tlsSelfSigned flags in production environments. These flags should only be used for testing or development purposes.

Examples

For detailed examples of using MOLT Replicator usage, refer to the migration workflow tutorials:

See also

×