In distributed database systems, replication is a process employed to create and maintain multiple copies of a database, or to integrate database objects between different databases. Enterprises rely on this mechanism to streamline the availability and performance of their database systems by offloading data entries from their primary database to a secondary system for analytic purposes and to facilitate distributed data processing on various locations. Replication can be used to serve a myriad of purposes, including but not limited to migrating data from a legacy system to a new IT infrastructure, create a sandbox environment for testing operations, reinforce data from multiple sites, increase data availability across multiple remote offices, and consolidate security with an additional backup in case of a failover.

 

Companies and businesses utilize database replication to unlock a series of benefits:


  • streamline performance and increase the availability of company data
  • facilitate data access to satellite offices
  • enhance the scalability of data-centered server-side applications
  • ensure data redundancy for failover purposes 

 

Currently, the market is flooded with a multitude of database systems, each of them designed to improve upon certain operational aspects and answer different storage needs depending on a company’s business model. Regardless of this aspect, database replication has established itself as a balancing act between data consistency and system performance. Due to this paradigm, three general types of replication have emerged as a standard in the industry (keep in mind that this is not an exhaustive list, as there are numerous replication methods):


  • Snapshot replication  – as the name implies, this type of replication takes a snapshot of a database (publisher) and moves it to a different server or database system (subscriber). After the initial snapshot, the data is refreshed based on an established schedule. This replication method is usually used in systems where data changes are sparse. Although it is the easiest type of replication to maintain, snapshot replication is considerably slower than other replication methods as it copies all the data whenever the table is refreshed.
  • Transactional replication – a type of database replication where the subscriber initially receives a snapshot of the publisher. After the initial copy is received, the subscriber is updated in real-time as changes occur in the publisher. The advantage of transactional replication over snapshot replication is that it guarantees transactional consistency. This is because the system accurately replicates each change in the publisher.
  • Merge replication – in this replication method, data from multiple databases is combined into a central database. Similar to transactional replication, an initial snapshot is synchronized to the subscriber databases. The difference between the two methods stems from the fact that in merge replication the subscriber and the publisher can independently make changes to the database, even if the subscribers aren’t connected to the network. When the subscribers reconnect to the network, the system combines all the changes made to the data and replicate them on the publisher. Usually employed in server-to-client environments, merge replication is considered one of the most complex database replication types.