Of course! Working with MySQL Binlog in Java is a powerful way to build real-time data synchronization, change data capture (CDC), and event-driven applications. Here’s a comprehensive guide covering the concepts, popular libraries, a practical code example, and best practices.

What is a MySQL Binlog?
The Binary Log (binlog) is a sequence of log files that records all changes to a database's data (e.g., INSERT, UPDATE, DELETE), as well as changes to database structure (e.g., CREATE TABLE, ALTER TABLE). It's a write-ahead log, making it essential for:
- Point-in-Time Recovery: Recovering data to a specific moment in time.
- Replication: Synchronizing data from a master server to one or more slave servers.
- Change Data Capture (CDC): Analyzing data changes for auditing, reporting, or triggering other services.
For Java applications, we read this log to react to data changes as they happen.
Prerequisites
Before you start, you need to configure your MySQL server to enable and expose the binlog.
1. Enable Binlog on MySQL Server
Edit your MySQL configuration file (usually my.cnf on Linux or my.ini on Windows).

[mysqld] # Enable the binary log log-bin=mysql-bin # The server's unique ID (required for replication) server-id=1 # (Optional) Limit binlog to specific databases for better performance binlog-do-db=my_database # (Optional) Exclude specific databases # binlog-ignore-db=mysql # (Optional) Use row-based logging, which is often easier for CDC binlog-format=ROW
Restart your MySQL server for the changes to take effect.
2. Create a Replication User
You need a MySQL user with the REPLICATION SLAVE privilege. This user only needs SELECT permission on the mysql database to read the mysql.user table for authentication.
CREATE USER 'repl_user'@'%' IDENTIFIED BY 'your_strong_password'; GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'%'; FLUSH PRIVILEGES;
Java Libraries for Reading Binlog
You don't parse the raw binlog files yourself. Use a well-maintained library that handles the complex protocol. Here are the most popular options:
| Library | Language | Key Features | Pros | Cons |
|---|---|---|---|---|
| Canal | Java | Developed by Alibaba. High-performance, supports MySQL & MariaDB. | Very stable, widely used in large companies, good documentation. | Primarily designed for data replication to Elasticsearch/MaxCompute. |
| Debezium | Java (Kafka Connect) | Part of the Kafka Connect ecosystem. | Excellent integration with Kafka, schema management, exactly-once semantics. | Requires a Kafka cluster, which adds complexity. |
| Otter | Java | Developed by Alibaba (like Canal). A data synchronization platform. | Provides a complete web UI for managing synchronization tasks. | Can be heavier than Canal if you only need the client. |
| L-JDBC (Log-based JDBC) | Java | A simpler, lightweight library. | Easy to set up and use for basic needs. | Less actively maintained than Canal or Debezium. |
Recommendation: For most new projects, Canal is an excellent choice. It's robust, performant, and directly provides a Java client. Debezium is the go-to if you're already in the Kafka ecosystem.

Practical Example: Using Canal
We'll use Canal to connect to MySQL, read the binlog, and print the data changes.
1. Project Setup (Maven)
Add the Canal dependency to your pom.xml.
<dependency>
<groupId>com.alibaba.otter</groupId>
<artifactId>canal.client</artifactId>
<version>1.1.7</version> <!-- Use a recent stable version -->
</dependency>
2. Java Code
This code connects to the MySQL server, subscribes to the binlog for a specific database, and prints every change it receives.
import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.client.CanalConnectors;
import com.alibaba.otter.canal.protocol.CanalEntry;
import com.alibaba.otter.canal.protocol.Message;
import java.net.InetSocketAddress;
import java.util.List;
public class BinlogConsumer {
public static void main(String[] args) {
// 1. Create a connection to the Canal server
// Canal acts as a proxy to the MySQL binlog.
// If you are connecting directly to MySQL, use CanalConnectors.newDirectConnector
CanalConnector connector = CanalConnectors.newSingleConnector(
new InetSocketAddress("127.0.0.1", 11111), // Canal server host and port
"example", // destination (can be anything)
"canal", // username
"canal" // password
);
int batchSize = 1000;
try {
// 2. Connect to the server
connector.connect();
// 3. Subscribe to the binlog of a specific database
// You can subscribe to multiple databases or all databases using ".*\\..*"
connector.subscribe("my_database\\..*"); // Subscribe to all tables in 'my_database'
// 4. Start processing messages
while (true) {
// Fetch a batch of messages from the binlog
Message message = connector.getWithoutAck(batchSize);
long batchId = message.getId();
if (batchId == -1 || message.getEntries().isEmpty()) {
// No new data, wait for a bit
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
break;
}
continue;
}
// 5. Process each entry in the message
List<CanalEntry.Entry> entries = message.getEntries();
for (CanalEntry.Entry entry : entries) {
if (entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONBEGIN || entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONEND) {
continue;
}
CanalEntry.RowChange rowChange;
try {
rowChange = CanalEntry.RowChange.parseFrom(entry.getStoreValue());
} catch (Exception e) {
throw new RuntimeException("ERROR ## parser the entry failed!", e);
}
CanalEntry.EventType eventType = rowChange.getEventType();
System.out.println("======================================================");
System.out.println("Binlog File: " + entry.getHeader().getLogfileName());
System.out.println("Binlog Offset: " + entry.getHeader().getLogfileOffset());
System.out.println("Schema: " + entry.getHeader().getSchemaName());
System.out.println("Table: " + entry.getHeader().getTableName());
System.out.println("Event Type: " + eventType);
// 6. Process each row data change
for (CanalEntry.RowData rowData : rowChange.getRowDatasList()) {
if (eventType == CanalEntry.EventType.DELETE) {
printColumn(rowData.getBeforeColumnsList(), "BEFORE");
} else if (eventType == CanalEntry.EventType.INSERT) {
printColumn(rowData.getAfterColumnsList(), "AFTER");
} else if (eventType == CanalEntry.EventType.UPDATE) {
System.out.println("--- UPDATE ---");
printColumn(rowData.getBeforeColumnsList(), "BEFORE");
printColumn(rowData.getAfterColumnsList(), "AFTER");
}
}
}
// 7. Acknowledge the batch of messages (so they are not re-processed)
connector.ack(batchId);
}
} finally {
// 8. Disconnect
connector.disconnect();
}
}
private static void printColumn(List<CanalEntry.Column> columns, String prefix) {
System.out.println("--- " + prefix + " COLUMNS ---");
for (CanalEntry.Column column : columns) {
System.out.printf("%s: %s (updated: %b)%n",
column.getName(),
column.getValue(),
column.getUpdated());
}
}
}
3. Running the Example
- Start a Canal Server: You need to run a standalone Canal server. You can download it from the Alibaba Canal GitHub. The server connects to your MySQL instance and exposes a port (default 11111) for clients like our Java app.
- Configure Canal Instance: Edit the
canal.instance.mysql.slaveIdandcanal.instance.dbUsername/canal.instance.dbPasswordin thecanal.propertiesfile inside theconf/example/directory to match your MySQL setup. - Run the Java Code: Execute the
BinlogConsumerclass.
Now, perform any INSERT, UPDATE, or DELETE operations on tables in the my_database in MySQL. You will see the changes printed in your Java application's console in near real-time.
Important Considerations & Best Practices
-
Error Handling and Resilience: Your consumer must be robust. What happens if it crashes? It needs to be able to resume from the last known position.
- Acknowledge Messages: The
connector.ack(batchId)call is crucial. It tells Canal that the message has been successfully processed. If your app crashes before acknowledging, Canal will re-send the message when it reconnects. - Save Position: For long-running processes, you should save the
batchId(or thelogfileNameandlogfileOffsetfrom theentry.getHeader()) to a persistent store (like a database or file). On restart, you can useconnector.subscribe("...", savedPosition)to resume from where you left off.
- Acknowledge Messages: The
-
Performance:
- Batching: Always fetch messages in batches (
connector.getWithoutAck(batchSize)) rather than one by one. This significantly reduces network overhead. - Asynchronous Processing: Don't perform slow, blocking operations (like calling another API or writing to a slow database) directly in the consumer loop. Use a message queue (like Kafka or RabbitMQ) or a thread pool to process the changes asynchronously.
- Batching: Always fetch messages in batches (
-
Schema Evolution: If the schema of a table changes (e.g., a column is added or removed), your consumer application must be able to handle it gracefully. Debezium is particularly strong here as it provides schema change events. With Canal, you need to design your application to be resilient to missing or new columns.
-
Security: Never hardcode credentials in your code. Use environment variables, a configuration file with restricted permissions, or a secret management system (like HashiCorp Vault or AWS Secrets Manager).
