Replication

From WikiContent

Revision as of 01:48, 25 January 2010 by Chuck.bell (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search
  • What is replication?
  • Getting started with replication
  • Monitoring replication
  • Advanced replication topics
    • The relay log
    • The replication threads
    • Securing replication with SSL

8<----

Introduction

A good maintenance plan for any database system includes a means to preserve the data. The main objective is to archive the data in case of failure or accidental loss. There are many ways you can backup and restore data in MySQL. The two most prominent methods are the MySQL Backup feature and the mysqldump utility. The following sections describe these methods as well as a few of the popular third party solutions.

The MySQL Replication Feature

Starting in versions


Replication is the distribution of a database over two or more servers. The purpose is to generate two or more viable copies of the databases to increase reliability and availability. Following that paradigm, MySQL Replication (available in version 3.23.15 and higher) is designed to provide high availability, load balancing, scale-out and offsite duplication of data. Indeed, MySQL Replication in its simplest form allows databases to be replicated from one machine (the master) to another (the slave). A prerequisite for replication is the master must have binary logging turned on. That is, the master must be recording all changes to data and metadata in a log that can then be transmitted to the slave and replayed. In this way, replication allows a slave to mimic the data on the master by reproducing the steps that occurred on the master via the binary log. For more information about how to use MySQL Replication, please see the Replication chapter of the MySQL 5.1 Reference Manual.


MySQL Replication is currently an integral part of the server code base and therefore is compiled with the server (mysqld). It connects to the server proper in a variety of ways including several roles as performed by the server during replication. The combination of these roles results in the successful replication of data. Terms Before embarking on an exploration of how replication works, let us consider a few terms. These terms are used throughout most of the replication documentation. Understanding these terms are essential for a concise architecture explanation.

Master – a server that is designated as the basis for replication. All data originates at the master and is propagated to the slaves. Slave – a server that is designated as the recipient for replicated data. Topology – the logical layout of how a series of servers are connected in a replication scenario. A simply topology would be a single master with a single slave. A more common topology is a single master and many slaves – also known as a ‘star’ topology because it resembles the light rays of a star when drawn. Log/Logging – the process of writing data and metadata changes to a log (file) for later processing. A log allows one to record the historical changes of data that occur on a server. Binary log/Binlog – a form of logging that generates a log containing binary data rather than pure textual data. Binary logs in MySQL can be read with the mysqlbinlog utility. Relay log – a type of binary log used in MySQL Replication for transferring binary log information from the master to the slave. The relay log resides on the slaves.

Replication Roles The replication code provides several modes or roles that a server can play during replication. Some of these roles may be performed at the same time. These roles include master, slave, observer, and event handler. In the following sections, we will examine what the roles are, how they contribute to replication, and learn how the source code for replication is organized. Master Role The master role is one of the most important roles in replication. The master role is responsible for ensuring all of the changes to data and metadata are written to the binary log. The following summarizes these aspects. Binary logging. This component is responsible for define binlog transactions, logging events and ensure that events are handled by binlog observers. Binary log file. This component is responsible for storing logging events. This includes, but is not limited to, durability and serialization. Binlog Dump. Upon request, sends a sequence of events that are stored in a file to the client. The source code components that make up the master role are located in the log.h/.cc file as the MYSQL_BINARY_LOG class. This class is used throughout the storage engine and server core areas to write all data and metadata changes to the binary log. Binlogger is the manager class of binary logging mechanism, the Binlogger can be attached one or more Binlog_handlers, which processes the events generated and propagated by Binlogger. This is used to implement a more universal replication interface that is suitable for many replication protocols, not necessary for semi-synchronous replication. This can separate the binary event logging (generating) and handling(storing, processing), so that more than one binlog event process mechanisam can be attached. Slave Role The slave role is another of the required roles in replication. The slave role is responsible for reading the log rows from the master, storing them in the relay log, and executing them. The following summarizes these aspects. Binlog relay. Request events from master, store events to relay log file and Execute the events to apply the modification on slave. The main source code components that comprise this role are the slave.cc, rpl_mi.h/.cc, and rpl_rli.h/.cc. Other code components are used and these are described briefly below. Observer Role The observer role is a subtask of the master and slave roles. It has many functions but is primarily concerned with the binary and relay log. The following summarizes these aspects. Binlog server observer. Observe binlog service start/stop. Binlog transaction observer. Observe binlog transactional activities. Binlog logging observer. Observe and handling events from logger. Binlog dump observer. Observe master dumping events. Binlog relay IO/SQL observer. Observe slave relay log IO and SQL execution. The observe role is actually a number of observer classes in the replication code. Most of the observer declarations can be found in the replication.h source code file. Definitions of the observer classes can be found in the rpl_handler.h/.cc source code files. Event Handler The event handler is an auxiliary role for the master. It is designed as an event handler to handle the events related to data and metadata changes that occur on the master. It is implemented in the log_event.h/.cc code component with methods for detecting the type of event and utilities for retrieving information about the associated table(s) and database(s). A Typical Replication Topology Now that we understand what roles the server can play in replication, let us examine a drawing of the replication architecture in a single master – slave topology.


It’s all in the Threads The underlining mechanism that enables replication is controlled by three threads. Each thread performs a specific role in replication. On the master, there is a single thread per connected slave called the ‘Binlog Dump thread’. This thread is responsible for sending the binary log events to the connected slave. On the slave, there are two threads, the ‘Slave I/O thread’ (or simply ‘I/O thread’) and the ‘Slave SQL thread’ (or simply ‘SQL thread’). The I/O thread is responsible for reading the incoming binary log events from the master and writing them to the slave’s relay log. The SQL thread is responsible for reading the events in the relay log and executing them.

You can see these threads in action along with their current states by issuing the SHOW PROCESSLIST command. For example, on the master one could see the following:

mysql> SHOW PROCESSLIST \G

                                                      • 1. row ***************************
    Id: 1
  User: rpl
  Host: localhost:54197
    db: NULL

Command: Binlog Dump

  Time: 25
 State: Master has sent all binlog to slave; waiting for binlog to be updated
  Info: NULL

Similarly, on the slave one could see the following:

mysql> SHOW PROCESSLIST \G

                                                      • 1. row ***************************
    Id: 2
  User: system user
  Host:
    db: NULL

Command: Connect

  Time: 127
 State: Waiting for master to send event
  Info: NULL
                                                      • 2. row ***************************
    Id: 3
  User: system user
  Host:
    db: NULL

Command: Connect

  Time: 10
 State: Slave has read all relay log; waiting for the slave I/O thread to 
        update it
  Info: NULL

It is always a good idea to use the SHOW PROCESSLIST command to check the status of the replication when troubleshooting.

Now that we know there are three threads and what they generally do, let us now follow the path of a typical SQL command. Please refer to the drawing above for details as you follow along. Note: The following assumes replication has been started and there is one master and one slave in the topology.

When a SQL command is issued on the master, the master’s normal processing takes place. For example, if a CREATE DATABASE command is issued on the master, the database is created on the master. At the end of that process before the command is terminated, a binary log event is written to the master’s binary log. It is at this point that the event is captured by the master’s Binlog Dump thread and sent to the slave. When the slave’s I/O thread receives the event, the row is written to the slave’s relay log. At this point, the slave’s SQL thread wakes up and reads the event from the relay log and executes it. It is at this point that the slave processes the CREATE DATABASE command. Indeed, except for the initiation mechanism, the CREATE DATABASE command is processed in the same manner as if the command were issued by a client. In this case, the CREATE DATABASE command is processed and the database is created on the slave.

For more information about MySQL replication features, see the online MySQL Reference Manual. [refman]

Personal tools